Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-04 Thread kefu chai
On Tue, Jun 5, 2018 at 6:13 AM, Paul Emmerich  wrote:
> Hi,
>
> 2018-06-04 20:39 GMT+02:00 Sage Weil :
>>
>> We'd love to build for stretch, but until there is a newer gcc for that
>> distro it's not possible.  We could build packages for 'testing', but I'm
>> not sure if those will be usable on stretch.
>
>
> you can install gcc (and only gcc) from testing on Stretch to build Ceph for
> Stretch:
>
> echo 'deb http://ftp.de.debian.org/debian/ testing main' >>
> /etc/apt/sources.list
> echo 'APT::Default-Release "stable";' >
> /etc/apt/apt.conf.d/stable-as-default
> apt update
> apt install g++ -t testing
>
> That's how we are building our Debian packages.

thanks for sharing this, Paul ! does the built binary require any
runtime dependency offered the testing repo? if the answer is no, i
think we should offer the pre-built package for debian stable then.

>
> Still, Debian not providing gcc in backports is clearly something that needs
> to be
> fixed in Debian.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Regards
Kefu Chai
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-04 Thread Charles Alva
I see, thanks for the detailed information, Sage!

Kind regards,

Charles Alva
Sent from Gmail Mobile


On Tue, Jun 5, 2018 at 1:39 AM Sage Weil  wrote:

> [adding ceph-maintainers]
>
> On Mon, 4 Jun 2018, Charles Alva wrote:
> > Hi Guys,
> >
> > When will the Ceph Mimic packages for Debian Stretch released? I could
> not
> > find the packages even after changing the sources.list.
>
> The problem is that we're now using c++17, which requires a newer gcc
> than stretch or jessie provide, and Debian does not provide backports of
> the newer gcc packages.  We currently can't build the latest Ceph for
> those releases.
>
> We raised this with the Debian package maintainers about a month ago[1][2]
> when the first release candidate was built and didn't get any response
> (beyond a "yes, we there are not gcc package backports").  Both ubuntu and
> fedora/rhel/centos (and I presume sles/opensuse) provide compiler
> backports we did not anticipate this being a problem.
>
> We'd love to build for stretch, but until there is a newer gcc for that
> distro it's not possible.  We could build packages for 'testing', but I'm
> not sure if those will be usable on stretch.
>
> sage
>
>
> [1]
> http://lists.ceph.com/private.cgi/ceph-maintainers-ceph.com/2018-April/000603.html
> [2]
> http://lists.ceph.com/private.cgi/ceph-maintainers-ceph.com/2018-April/000611.html
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg inconsistent, scrub stat mismatch on bytes

2018-06-04 Thread Adrian
Hi Cephers,

We recently upgraded one of our clusters from hammer to jewel and then to
luminous (12.2.5, 5 mons/mgr, 21 storage nodes * 9 osd's). After some
deep-scubs we have an inconsistent pg with a log message we've not seen
before:

HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 6.20 is active+clean+inconsistent, acting [114,26,44]


Ceph log shows

2018-06-03 06:53:35.467791 osd.114 osd.114 172.26.28.25:6825/40819 395
: cluster [ERR] 6.20 scrub stat mismatch, got 6526/6526 objects, 87/87
clones, 6526/6526 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive,
0/0 whiteouts, 25952454144/25952462336 bytes, 0/0 hit_set_archive
bytes.
2018-06-03 06:53:35.467799 osd.114 osd.114 172.26.28.25:6825/40819 396
: cluster [ERR] 6.20 scrub 1 errors
2018-06-03 06:53:40.701632 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0
41298 : cluster [ERR] Health check failed: 1 scrub errors
(OSD_SCRUB_ERRORS)
2018-06-03 06:53:40.701668 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0
41299 : cluster [ERR] Health check failed: Possible data damage: 1 pg
inconsistent (PG_DAMAGED)
2018-06-03 07:00:00.000137 mon.mon1-ceph1-qh2 mon.0 172.26.28.8:6789/0
41345 : cluster [ERR] overall HEALTH_ERR 1 scrub errors; Possible data
damage: 1 pg inconsistent

There are no EC pools - looks like it may be the same as
https://tracker.ceph.com/issues/22656 although as in #7 this is not a cache
pool.

Wondering if this is ok to issue a pg repair on 6.20 or if there's
something else we should be looking at first ?

Thanks in advance,
Adrian.

---
Adrian : aussie...@gmail.com
If violence doesn't solve your problem, you're not using enough of it.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to run MySQL (or other database ) on Ceph using KRBD ?

2018-06-04 Thread 李昊华
Thanks for reading my questions!

I want to run MySQL on Ceph using KRBD because KRBD is faster than librbd. And 
I know KRBD is a kernel module and we can use KRBD to mount the RBD device on 
the operating systems.

It is easy to use command line tool to mount the RBD device on the operating 
system. It there any other ways to use RBD module , such as changing MySQL IO 
interface by using krbd interface?

I saw krbd.h and I found there was a little function interfaces blew to use. 
Whereas librdb offers us many interfaces such as create, clone a RBD device.

And I want to Verify my hypothesis blew:

1. Librbd provides a richer interfaces than krbd, and some functions cannot be 
implemented through krdb

2. Applications can only use krbd via the command line tool instead of code 
interfaces.

Thank you very much!!!

//interfaces in krbd.h
int krbd_create_from_context(rados_config_t cct, struct krbd_ctx **pctx);
void krbd_destroy(struct krbd_ctx *ctx);

int krbd_map(struct krbd_ctx *ctx, const char *pool, const char *image,
 const char *snap, const char *options, char **pdevnode);
int krbd_is_mapped(struct krbd_ctx *ctx, const char *pool, const char *image,
   const char *snap, char **pdevnode);

int krbd_unmap(struct krbd_ctx *ctx, const char *devnode,
   const char *options);
int krbd_unmap_by_spec(struct krbd_ctx *ctx, const char *pool,
   const char *image, const char *snap,
   const char *options);




--
Jack Lee











___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel/Luminous Filestore/Bluestore for a new cluster

2018-06-04 Thread Subhachandra Chandra
We have been running a Luminous.04 + Bluestor for about 3 months in
production. All the daemons run as docker containers and were installed
using ceph-ansible. 540 spinning drives with journal/wal/db on the same
drive spread across 9 hosts. Using librados object interface directly with
steady 100MB/s writes to it.

We have not observed any major issues. We have had occasional OSD daemon
crashes due to an assert which is a known bug but the cluster recovered
without any intervention each time. All the nodes have been rebooted 2-3
times due to CoreOS updates and no issues with that either.

If you have any specific questions related to the cluster, please post them
on this thread.

Subhachandra

On Wed, May 30, 2018 at 1:06 PM, Simon Ironside 
wrote:

> On 30/05/18 20:35, Jack wrote:
>
>> Why would you deploy a Jewel cluster, which is almost 3 majors versions
>> away ?
>> Bluestore is also the good answer
>> It works well, have many advantages, and is simply the future of Ceph
>>
>
> Indeed, and normally I wouldn't even ask, but as I say there's been some
> comments/threads recently that make me doubt the obvious Luminous +
> Bluestore path. A few that stand out in my memory are:
>
> * "Useless due to http://tracker.ceph.com/issues/22102; [1]
> * OSD crash with segfault Luminous 12.2.4 [2] [3] [4]
>
> There are others but those two stuck out for me. I realise that people
> will generally only report problems rather than "I installed ceph and
> everything went fine!" stories to this list but it was enough to motivate
> me to ask if Luminous/Bluestore was considered a good choice for a fresh
> install or if I should wait a bit.
>
> Thanks,
> Simon.
>
> [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-
> May/026339.html
> [2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-
> March/025373.html
> [3] http://tracker.ceph.com/issues/23431
> [4] http://tracker.ceph.com/issues/23352
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd@ service keeps restarting after removing osd

2018-06-04 Thread Michael Burk
On Thu, May 31, 2018 at 4:40 PM Gregory Farnum  wrote:

> On Thu, May 24, 2018 at 9:15 AM Michael Burk 
> wrote:
>
>> Hello,
>>
>> I'm trying to replace my OSDs with higher capacity drives. I went through
>> the steps to remove the OSD on the OSD node:
>> # ceph osd out osd.2
>> # ceph osd down osd.2
>> # ceph osd rm osd.2
>> Error EBUSY: osd.2 is still up; must be down before removal.
>> # systemctl stop ceph-osd@2
>> # ceph osd rm osd.2
>> removed osd.2
>> # ceph osd crush rm osd.2
>> removed item id 2 name 'osd.2' from crush map
>> # ceph auth del osd.2
>> updated
>>
>> umount /var/lib/ceph/osd/ceph-2
>>
>> It no longer shows in the crush map, and I am ready to remove the drive.
>> However, the ceph-osd@ service keeps restarting and mounting the disk in
>> /var/lib/ceph/osd. I do "systemctl stop ceph-osd@2" and umount the disk,
>> but then the service starts again and mounts the drive.
>>
>> # systemctl stop ceph-osd@2
>> # umount /var/lib/ceph/osd/ceph-2
>>
>> /dev/sdb1 on /var/lib/ceph/osd/ceph-2 type xfs
>> (rw,noatime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota)
>>
>> ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous
>> (stable)
>>
>> What am I missing?
>>
>
> Obviously this is undesired!
> In general, when using ceph-disk (as you presumably are) the OSD is
> designed to turn on automatically when a formatted disk gets mounted. I'd
> imagine that something (quite possibly included with ceph) is auto-mounting
> the disk after you umount this. We have a ceph-disk@.service which is
> supposed to get fired once, but perhaps there's something else I'm missing
> so that udev fires an event, it gets captured by one of the ceph tools that
> sees there's an available drive tagged for Ceph, and then it auto-mounts?
> I'm not sure why this would be happening for you and not others, though.
>
​I'm guessing it's because I'm replacing a batch of disks at once. The time
between stopping ceph-osd@ and seeing it start again is at least several
seconds, so if you just do one disk and remove it right away you probably
wouldn't have this problem. But since I do several at a time and watch
"ceph -w" after each one, it can be several minutes before I get to the
point of removing the volumes from the array controller.

>
> All this changes with ceph-volume, which will be the default in Mimic, by
> the way.
>
> Hmm, just poking
> ​​
> at things a little more, I think maybe you wanted to put a "ceph-disk
> deactivate" invocation in there. Try playing around with that?
>
​Ahh, good catch. I will test this. Thank you!

-Greg
>
>
>
>>
>> Thanks,
>> Michael
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-04 Thread Paul Emmerich
Hi,

2018-06-04 20:39 GMT+02:00 Sage Weil :

> We'd love to build for stretch, but until there is a newer gcc for that
> distro it's not possible.  We could build packages for 'testing', but I'm
> not sure if those will be usable on stretch.
>

you can install gcc (and only gcc) from testing on Stretch to build Ceph
for Stretch:

echo 'deb http://ftp.de.debian.org/debian/ testing main' >>
/etc/apt/sources.list
echo 'APT::Default-Release "stable";' >
/etc/apt/apt.conf.d/stable-as-default
apt update
apt install g++ -t testing

That's how we are building our Debian packages.

Still, Debian not providing gcc in backports is clearly something that
needs to be
fixed in Debian.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-04 Thread Joao Eduardo Luis
On 06/04/2018 07:39 PM, Sage Weil wrote:
> [1] 
> http://lists.ceph.com/private.cgi/ceph-maintainers-ceph.com/2018-April/000603.html
> [2] 
> http://lists.ceph.com/private.cgi/ceph-maintainers-ceph.com/2018-April/000611.html

Just a heads up, seems the ceph-maintainers archives are not public.

  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-04 Thread Jack
My reaction when I read that there will be no Mimic soon on Stretch:
https://pix.milkywan.fr/JDjOJWnx.png

Anyway, thank you for the kind explanation, as well as for getting in
touch with the Debian team about this issue


On 06/04/2018 08:39 PM, Sage Weil wrote:
> [adding ceph-maintainers]
> 
> On Mon, 4 Jun 2018, Charles Alva wrote:
>> Hi Guys,
>>
>> When will the Ceph Mimic packages for Debian Stretch released? I could not
>> find the packages even after changing the sources.list.
> 
> The problem is that we're now using c++17, which requires a newer gcc 
> than stretch or jessie provide, and Debian does not provide backports of 
> the newer gcc packages.  We currently can't build the latest Ceph for 
> those releases.
> 
> We raised this with the Debian package maintainers about a month ago[1][2] 
> when the first release candidate was built and didn't get any response 
> (beyond a "yes, we there are not gcc package backports").  Both ubuntu and 
> fedora/rhel/centos (and I presume sles/opensuse) provide compiler 
> backports we did not anticipate this being a problem.
> 
> We'd love to build for stretch, but until there is a newer gcc for that 
> distro it's not possible.  We could build packages for 'testing', but I'm 
> not sure if those will be usable on stretch.
> 
> sage
> 
> 
> [1] 
> http://lists.ceph.com/private.cgi/ceph-maintainers-ceph.com/2018-April/000603.html
> [2] 
> http://lists.ceph.com/private.cgi/ceph-maintainers-ceph.com/2018-April/000611.html
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-04 Thread Sage Weil
[adding ceph-maintainers]

On Mon, 4 Jun 2018, Charles Alva wrote:
> Hi Guys,
> 
> When will the Ceph Mimic packages for Debian Stretch released? I could not
> find the packages even after changing the sources.list.

The problem is that we're now using c++17, which requires a newer gcc 
than stretch or jessie provide, and Debian does not provide backports of 
the newer gcc packages.  We currently can't build the latest Ceph for 
those releases.

We raised this with the Debian package maintainers about a month ago[1][2] 
when the first release candidate was built and didn't get any response 
(beyond a "yes, we there are not gcc package backports").  Both ubuntu and 
fedora/rhel/centos (and I presume sles/opensuse) provide compiler 
backports we did not anticipate this being a problem.

We'd love to build for stretch, but until there is a newer gcc for that 
distro it's not possible.  We could build packages for 'testing', but I'm 
not sure if those will be usable on stretch.

sage


[1] 
http://lists.ceph.com/private.cgi/ceph-maintainers-ceph.com/2018-April/000603.html
[2] 
http://lists.ceph.com/private.cgi/ceph-maintainers-ceph.com/2018-April/000611.html

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Reed Dier
Appreciate the input.

Wasn’t sure if ceph-volume was the one setting these bits of metadata or 
something else.

Appreciate the help guys.

Thanks,

Reed

> The fix is in core Ceph (the OSD/BlueStore code), not ceph-volume. :) 
> journal_rotational is still a thing in BlueStore; it represents the combined 
> WAL+DB devices.
> -Greg 
> On Jun 4, 2018, at 11:53 AM, Alfredo Deza  wrote:
> 
> ceph-volume doesn't do anything here with the device metadata, and is
> something that bluestore has as an internal mechanism. Unsure if there
> is anything
> one can do to change this on the OSD itself (vs. injecting args)


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-04 Thread Alfredo Deza
There aren't any builds for Debian because the distro does not have
compiler backports required for building Mimic

On Mon, Jun 4, 2018 at 8:55 AM, Ronny Aasen  wrote:
> On 04. juni 2018 06:41, Charles Alva wrote:
>>
>> Hi Guys,
>>
>> When will the Ceph Mimic packages for Debian Stretch released? I could not
>> find the packages even after changing the sources.list.
>>
>>
>
> I am also eager to test mimic on my ceph
>
> debian-mimic only contains ceph-deploy atm.
>
> kind regards
> Ronny Aasen
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Alfredo Deza
On Mon, Jun 4, 2018 at 12:37 PM, Reed Dier  wrote:
> Hi Caspar,
>
> David is correct, in that the issue I was having with SSD OSD’s having NVMe
> bluefs_db reporting as HDD creating an artificial throttle based on what
> David was mentioning, a prevention to keep spinning rust from thrashing. Not
> sure if the journal_rotational bit should be 1, but either way, it shouldn’t
> affect you being hdd OSDs. Curious how these OSD’s were deployed, per the
> below part of the message.
>
> Copying Alfredo, as I’m not sure if something changed with respect to
> ceph-volume in 12.2.2 (when this originally happened) to 12.2.5 (I’m sure
> plenty did), because I recently had an NVMe drive fail on me unexpectedly
> (curse you Micron), and had to nuke and redo some SSD OSDs, and it was my
> first time deploying with ceph-deploy after the ceph-disk deprecation. The
> new OSD’s appear to report correctly wrt to the rotational status, where the
> others did not. So that appears to be working correctly, just wanted to
> provide some positive feedback there. Not sure if there’s an easy way to
> change those metadata tags on the OSDs, so that I don’t have to inject the
> args every time I need to reweight. Also feels like journal_rotational
> wouldn’t be a thing in bluestore?

ceph-volume doesn't do anything here with the device metadata, and is
something that bluestore has as an internal mechanism. Unsure if there
is anything
one can do to change this on the OSD itself (vs. injecting args)

>
> ceph osd metadata |grep ‘id\|model\|type\|rotational’
>
> "id": 63,
>
> "bluefs_db_model": "MTFDHAX1T2MCF-1AN1ZABYY",
>
> "bluefs_db_rotational": "0",
>
> "bluefs_db_type": "nvme",
>
> "bluefs_slow_model": "",
>
> "bluefs_slow_rotational": "0",
>
> "bluefs_slow_type": "ssd",
>
> "bluestore_bdev_model": "",
>
> "bluestore_bdev_rotational": "0",
>
> "bluestore_bdev_type": "ssd",
>
> "journal_rotational": "1",
>
> "rotational": "0"
>
> "id": 64,
>
> "bluefs_db_model": "INTEL SSDPED1D960GAY",
>
> "bluefs_db_rotational": "0",
>
> "bluefs_db_type": "nvme",
>
> "bluefs_slow_model": "",
>
> "bluefs_slow_rotational": "0",
>
> "bluefs_slow_type": "ssd",
>
> "bluestore_bdev_model": "",
>
> "bluestore_bdev_rotational": "0",
>
> "bluestore_bdev_type": "ssd",
>
> "journal_rotational": "0",
>
> "rotational": "0"
>
>
> osd.63 being one deployed using ceph-volume lvm in 12.2.2 and osd.64 being
> redeployed using ceph-deploy in 12.2.5 using ceph-volume  backend.
>
> Reed
>
> On Jun 4, 2018, at 8:16 AM, David Turner  wrote:
>
> I don't believe this really applies to you. The problem here was with an SSD
> osd that was incorrectly labeled as an HDD osd by ceph. The fix was to
> inject a sleep seeing if 0 for those osds to speed up recovery. The sleep is
> needed to not kill hdds to avoid thrashing, but the bug was SSDs were being
> incorrectly identified as HDD and SSDs don't have a problem with thrashing.
>
> You can try increasing osd_max_backfills. Watch your disk utilization as you
> do this so that you don't accidentally kill your client io by setting that
> too high, assuming that still needs priority.
>
> On Mon, Jun 4, 2018, 3:55 AM Caspar Smit  wrote:
>>
>> Hi Reed,
>>
>> "Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on
>> bluestore opened the floodgates."
>>
>> What exactly did you change/inject here?
>>
>> We have a cluster with 10TB SATA HDD's which each have a 100GB SSD based
>> block.db
>>
>> Looking at ceph osd metadata for each of those:
>>
>> "bluefs_db_model": "SAMSUNG MZ7KM960",
>> "bluefs_db_rotational": "0",
>> "bluefs_db_type": "ssd",
>> "bluefs_slow_model": "ST1NM0086-2A",
>> "bluefs_slow_rotational": "1",
>> "bluefs_slow_type": "hdd",
>> "bluestore_bdev_rotational": "1",
>> "bluestore_bdev_type": "hdd",
>> "default_device_class": "hdd",
>> "journal_rotational": "1",
>> "osd_objectstore": "bluestore",
>> "rotational": "1"
>>
>> Looks to me if i'm hitting the same issue, isn't it?
>>
>> ps. An upgrade of Ceph is planned in the near future but for now i would
>> like to use the workaround if applicable to me.
>>
>> Thank you in advance.
>>
>> Kind regards,
>> Caspar Smit
>>
>> 2018-02-26 23:22 GMT+01:00 Reed Dier :
>>>
>>> Quick turn around,
>>>
>>> Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on
>>> bluestore opened the floodgates.
>>>
>>> pool objects-ssd id 20
>>>   recovery io 1512 MB/s, 21547 objects/s
>>>
>>> pool fs-metadata-ssd id 16
>>>   recovery io 0 B/s, 6494 keys/s, 271 objects/s
>>>   client io 82325 B/s rd, 68146 B/s wr, 1 op/s rd, 0 op/s wr
>>>
>>>
>>> Graph of performance jump. Extremely marked.
>>> https://imgur.com/a/LZR9R
>>>
>>> So at least we now have the gun 

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Gregory Farnum
On Mon, Jun 4, 2018 at 9:38 AM Reed Dier  wrote:

> Copying Alfredo, as I’m not sure if something changed with respect to
> ceph-volume in 12.2.2 (when this originally happened) to 12.2.5 (I’m sure
> plenty did), because I recently had an NVMe drive fail on me unexpectedly
> (curse you Micron), and had to nuke and redo some SSD OSDs, and it was my
> first time deploying with ceph-deploy after the ceph-disk deprecation. The
> new OSD’s appear to report correctly wrt to the rotational status, where
> the others did not. So that appears to be working correctly, just wanted to
> provide some positive feedback there. Not sure if there’s an easy way to
> change those metadata tags on the OSDs, so that I don’t have to inject the
> args every time I need to reweight. Also feels like journal_rotational
> wouldn’t be a thing in bluestore?
>
>
The fix is in core Ceph (the OSD/BlueStore code), not ceph-volume. :)
journal_rotational is still a thing in BlueStore; it represents the
combined WAL+DB devices.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Reed Dier
Hi Caspar,

David is correct, in that the issue I was having with SSD OSD’s having NVMe 
bluefs_db reporting as HDD creating an artificial throttle based on what David 
was mentioning, a prevention to keep spinning rust from thrashing. Not sure if 
the journal_rotational bit should be 1, but either way, it shouldn’t affect you 
being hdd OSDs. Curious how these OSD’s were deployed, per the below part of 
the message.

Copying Alfredo, as I’m not sure if something changed with respect to 
ceph-volume in 12.2.2 (when this originally happened) to 12.2.5 (I’m sure 
plenty did), because I recently had an NVMe drive fail on me unexpectedly 
(curse you Micron), and had to nuke and redo some SSD OSDs, and it was my first 
time deploying with ceph-deploy after the ceph-disk deprecation. The new OSD’s 
appear to report correctly wrt to the rotational status, where the others did 
not. So that appears to be working correctly, just wanted to provide some 
positive feedback there. Not sure if there’s an easy way to change those 
metadata tags on the OSDs, so that I don’t have to inject the args every time I 
need to reweight. Also feels like journal_rotational wouldn’t be a thing in 
bluestore?

> ceph osd metadata |grep ‘id\|model\|type\|rotational’
> "id": 63,
> "bluefs_db_model": "MTFDHAX1T2MCF-1AN1ZABYY",
> "bluefs_db_rotational": "0",
> "bluefs_db_type": "nvme",
> "bluefs_slow_model": "",
> "bluefs_slow_rotational": "0",
> "bluefs_slow_type": "ssd",
> "bluestore_bdev_model": "",
> "bluestore_bdev_rotational": "0",
> "bluestore_bdev_type": "ssd",
> "journal_rotational": "1",
> "rotational": "0"
> "id": 64,
> "bluefs_db_model": "INTEL SSDPED1D960GAY",
> "bluefs_db_rotational": "0",
> "bluefs_db_type": "nvme",
> "bluefs_slow_model": "",
> "bluefs_slow_rotational": "0",
> "bluefs_slow_type": "ssd",
> "bluestore_bdev_model": "",
> "bluestore_bdev_rotational": "0",
> "bluestore_bdev_type": "ssd",
> "journal_rotational": "0",
> "rotational": "0"


osd.63 being one deployed using ceph-volume lvm in 12.2.2 and osd.64 being 
redeployed using ceph-deploy in 12.2.5 using ceph-volume  backend.

Reed

> On Jun 4, 2018, at 8:16 AM, David Turner  wrote:
> 
> I don't believe this really applies to you. The problem here was with an SSD 
> osd that was incorrectly labeled as an HDD osd by ceph. The fix was to inject 
> a sleep seeing if 0 for those osds to speed up recovery. The sleep is needed 
> to not kill hdds to avoid thrashing, but the bug was SSDs were being 
> incorrectly identified as HDD and SSDs don't have a problem with thrashing.
> 
> You can try increasing osd_max_backfills. Watch your disk utilization as you 
> do this so that you don't accidentally kill your client io by setting that 
> too high, assuming that still needs priority.
> 
> On Mon, Jun 4, 2018, 3:55 AM Caspar Smit  > wrote:
> Hi Reed,
> 
> "Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on 
> bluestore opened the floodgates."
> 
> What exactly did you change/inject here?
> 
> We have a cluster with 10TB SATA HDD's which each have a 100GB SSD based 
> block.db
> 
> Looking at ceph osd metadata for each of those:
> 
> "bluefs_db_model": "SAMSUNG MZ7KM960",
> "bluefs_db_rotational": "0",
> "bluefs_db_type": "ssd",
> "bluefs_slow_model": "ST1NM0086-2A",
> "bluefs_slow_rotational": "1",
> "bluefs_slow_type": "hdd",
> "bluestore_bdev_rotational": "1",
> "bluestore_bdev_type": "hdd",
> "default_device_class": "hdd",
> "journal_rotational": "1",
> "osd_objectstore": "bluestore",
> "rotational": "1"
> 
> Looks to me if i'm hitting the same issue, isn't it?
> 
> ps. An upgrade of Ceph is planned in the near future but for now i would like 
> to use the workaround if applicable to me.
> 
> Thank you in advance.
> 
> Kind regards,
> Caspar Smit
> 
> 2018-02-26 23:22 GMT+01:00 Reed Dier  >:
> Quick turn around,
> 
> Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on 
> bluestore opened the floodgates.
> 
>> pool objects-ssd id 20
>>   recovery io 1512 MB/s, 21547 objects/s
>> 
>> pool fs-metadata-ssd id 16
>>   recovery io 0 B/s, 6494 keys/s, 271 objects/s
>>   client io 82325 B/s rd, 68146 B/s wr, 1 op/s rd, 0 op/s wr
> 
> Graph of performance jump. Extremely marked.
> https://imgur.com/a/LZR9R 
> 
> So at least we now have the gun to go with the smoke.
> 
> Thanks for the help and appreciate you pointing me in some directions that I 
> was able to use to figure out the issue.
> 
> Adding to ceph.conf for future OSD conversions.
> 
> Thanks,
> 
> Reed
> 
> 
>> On Feb 26, 2018, at 4:12 PM, Reed Dier > 

Re: [ceph-users] Unexpected data

2018-06-04 Thread Paul Emmerich
There's some metadata on Bluestore OSDs (the rocksdb database), it's
usually ~1% of your data.
The DB will start out at a size of around 1GB, so that's expected.


Paul

2018-06-04 15:55 GMT+02:00 Marc-Antoine Desrochers <
marc-antoine.desroch...@sogetel.com>:

> Hi,
>
>
>
> Im not sure if it’s normal or not but each time I add a new osd with
> ceph-deploy osd create --data /dev/sdg ceph-n1.
>
> It add 1GB to my global data but I just format the drive so it’s supposed
> to be at 0 right ?
>
> So I have 6 osd in my ceph and it took 6gib.
>
>
>
> [root@ceph-n1 ~]# ceph -s
>
>   cluster:
>
> id: 1d97aa70-2029-463a-b6fa-20e98f3e21fb
>
> health: HEALTH_OK
>
>
>
>   services:
>
> mon: 1 daemons, quorum ceph-n1
>
> mgr: ceph-n1(active)
>
> mds: cephfs-1/1/1 up  {0=ceph-n1=up:active}
>
> osd: 6 osds: 6 up, 6 in
>
>
>
>   data:
>
> pools:   2 pools, 600 pgs
>
> objects: 341 objects, 63109 kB
>
> usage:   6324 MB used, 2782 GB / 2788 GB avail
>
> pgs: 600 active+clean
>
>
>
>
>
> So im kind of confused...
>
> Thanks for your help.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mimic: failed to load OSD map for epoch X, got 0 bytes

2018-06-04 Thread Sergey Malinin
Hello,

Freshly created OSD won't start after upgrading to mimic:


2018-06-04 17:00:23.135 7f48cbecb240  0 osd.3 0 done with init, starting boot 
process
2018-06-04 17:00:23.135 7f48cbecb240  1 osd.3 0 start_boot
2018-06-04 17:00:23.135 7f48cbecb240 10 osd.3 0 start_boot - have maps 0..0
2018-06-04 17:00:23.139 7f48bc625700 10 osd.3 0 OSD::ms_get_authorizer type=mgr
2018-06-04 17:00:23.139 7f48b07fa700 10 osd.3 0 ms_handle_connect con 
0x562b92aa2a00
2018-06-04 17:00:23.139 7f48a6bc0700 10 osd.3 0 _preboot _preboot mon has 
osdmaps 17056..17606
2018-06-04 17:00:23.139 7f48a6bc0700 20 osd.3 0 update_osd_stat osd_stat(1.0 
GiB used, 3.6 TiB avail, 3.6 TiB total, peers [] op hist [])
2018-06-04 17:00:23.139 7f48a6bc0700  5 osd.3 0 heartbeat: osd_stat(1.0 GiB 
used, 3.6 TiB avail, 3.6 TiB total, peers [] op hist [])
2018-06-04 17:00:23.139 7f48a6bc0700 -1 osd.3 0 waiting for initial osdmap
2018-06-04 17:00:23.139 7f48b07fa700 20 osd.3 0 OSD::ms_dispatch: 
osd_map(17056..17056 src has 17056..17606 +gap_removed_snaps) v4
2018-06-04 17:00:23.139 7f48b07fa700 10 osd.3 0 do_waiters -- start
2018-06-04 17:00:23.139 7f48b07fa700 10 osd.3 0 do_waiters -- finish
2018-06-04 17:00:23.139 7f48b07fa700 20 osd.3 0 _dispatch 0x562b9276de40 
osd_map(17056..17056 src has 17056..17606 +gap_removed_snaps) v4
2018-06-04 17:00:23.139 7f48b07fa700  3 osd.3 0 handle_osd_map epochs 
[17056,17056], i have 0, src has [17056,17606]
2018-06-04 17:00:23.139 7f48b07fa700 10 osd.3 0 handle_osd_map message skips 
epochs 1..17055
2018-06-04 17:00:23.139 7f48b07fa700 10 osd.3 0 handle_osd_map  got full map 
for epoch 17056
2018-06-04 17:00:23.139 7f48b07fa700 20 osd.3 0 got_full_map 17056, nothing 
requested
2018-06-04 17:00:23.139 7f48b07fa700 20 osd.3 0 get_map 17055 - loading and 
decoding 0x562b92aea480
2018-06-04 17:00:23.139 7f48b07fa700 -1 osd.3 0 failed to load OSD map for 
epoch 17055, got 0 bytes
2018-06-04 17:00:23.147 7f48b07fa700 -1 /build/ceph-13.2.0/src/osd/OSD.h: In 
function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f48b07fa700 time 
2018-06-04 17:00:23.144480
/build/ceph-13.2.0/src/osd/OSD.h: 828: FAILED assert(ret)

ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) 
[0x7f48c32f35e2]
2: (()+0x26b7a7) [0x7f48c32f37a7]
3: (OSDService::get_map(unsigned int)+0x4a) [0x562b90410e9a]
4: (OSD::handle_osd_map(MOSDMap*)+0xfb1) [0x562b903b7dc1]
5: (OSD::_dispatch(Message*)+0xa1) [0x562b903c0a21]
6: (OSD::ms_dispatch(Message*)+0x56) [0x562b903c0d76]
7: (DispatchQueue::entry()+0xb92) [0x7f48c336c452]
8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f48c340a6cd]
9: (()+0x76db) [0x7f48c19ee6db]
10: (clone()+0x3f) [0x7f48c09b288f]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unexpected data

2018-06-04 Thread Marc-Antoine Desrochers
Hi,

 

Im not sure if it's normal or not but each time I add a new osd with
ceph-deploy osd create --data /dev/sdg ceph-n1.

It add 1GB to my global data but I just format the drive so it's supposed to
be at 0 right ?

So I have 6 osd in my ceph and it took 6gib.

 

[root@ceph-n1 ~]# ceph -s

  cluster:

id: 1d97aa70-2029-463a-b6fa-20e98f3e21fb

health: HEALTH_OK

 

  services:

mon: 1 daemons, quorum ceph-n1

mgr: ceph-n1(active)

mds: cephfs-1/1/1 up  {0=ceph-n1=up:active}

osd: 6 osds: 6 up, 6 in

 

  data:

pools:   2 pools, 600 pgs

objects: 341 objects, 63109 kB

usage:   6324 MB used, 2782 GB / 2788 GB avail

pgs: 600 active+clean

 

 

So im kind of confused...

Thanks for your help.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread David Turner
I don't believe this really applies to you. The problem here was with an
SSD osd that was incorrectly labeled as an HDD osd by ceph. The fix was to
inject a sleep seeing if 0 for those osds to speed up recovery. The sleep
is needed to not kill hdds to avoid thrashing, but the bug was SSDs were
being incorrectly identified as HDD and SSDs don't have a problem with
thrashing.

You can try increasing osd_max_backfills. Watch your disk utilization as
you do this so that you don't accidentally kill your client io by setting
that too high, assuming that still needs priority.

On Mon, Jun 4, 2018, 3:55 AM Caspar Smit  wrote:

> Hi Reed,
>
> "Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on
> bluestore opened the floodgates."
>
> What exactly did you change/inject here?
>
> We have a cluster with 10TB SATA HDD's which each have a 100GB SSD based
> block.db
>
> Looking at ceph osd metadata for each of those:
>
> "bluefs_db_model": "SAMSUNG MZ7KM960",
> "bluefs_db_rotational": "0",
> "bluefs_db_type": "ssd",
> "bluefs_slow_model": "ST1NM0086-2A",
> "bluefs_slow_rotational": "1",
> "bluefs_slow_type": "hdd",
> "bluestore_bdev_rotational": "1",
> "bluestore_bdev_type": "hdd",
> "default_device_class": "hdd",
> *"journal_rotational": "1",*
> "osd_objectstore": "bluestore",
> "rotational": "1"
>
> Looks to me if i'm hitting the same issue, isn't it?
>
> ps. An upgrade of Ceph is planned in the near future but for now i would
> like to use the workaround if applicable to me.
>
> Thank you in advance.
>
> Kind regards,
> Caspar Smit
>
> 2018-02-26 23:22 GMT+01:00 Reed Dier :
>
>> Quick turn around,
>>
>> Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on
>> bluestore opened the floodgates.
>>
>> pool objects-ssd id 20
>>   recovery io 1512 MB/s, 21547 objects/s
>>
>> pool fs-metadata-ssd id 16
>>   recovery io 0 B/s, 6494 keys/s, 271 objects/s
>>   client io 82325 B/s rd, 68146 B/s wr, 1 op/s rd, 0 op/s wr
>>
>>
>> Graph of performance jump. Extremely marked.
>> https://imgur.com/a/LZR9R
>>
>> So at least we now have the gun to go with the smoke.
>>
>> Thanks for the help and appreciate you pointing me in some directions
>> that I was able to use to figure out the issue.
>>
>> Adding to ceph.conf for future OSD conversions.
>>
>> Thanks,
>>
>> Reed
>>
>>
>> On Feb 26, 2018, at 4:12 PM, Reed Dier  wrote:
>>
>> For the record, I am not seeing a demonstrative fix by injecting the
>> value of 0 into the OSDs running.
>>
>> osd_recovery_sleep_hybrid = '0.00' (not observed, change may require
>> restart)
>>
>>
>> If it does indeed need to be restarted, I will need to wait for the
>> current backfills to finish their process as restarting an OSD would bring
>> me under min_size.
>>
>> However, doing config show on the osd daemon appears to have taken the
>> value of 0.
>>
>> ceph daemon osd.24 config show | grep recovery_sleep
>> "osd_recovery_sleep": "0.00",
>> "osd_recovery_sleep_hdd": "0.10",
>> "osd_recovery_sleep_hybrid": "0.00",
>> "osd_recovery_sleep_ssd": "0.00",
>>
>>
>> I may take the restart as an opportunity to also move to 12.2.3 at the
>> same time, since it is not expected that that should affect this issue.
>>
>> I could also attempt to change osd_recovery_sleep_hdd as well, since
>> these are ssd osd’s, it shouldn’t make a difference, but its a free move.
>>
>> Thanks,
>>
>> Reed
>>
>> On Feb 26, 2018, at 3:42 PM, Gregory Farnum  wrote:
>>
>> On Mon, Feb 26, 2018 at 12:26 PM Reed Dier  wrote:
>>
>>> I will try to set the hybrid sleeps to 0 on the affected OSDs as an
>>> interim solution to getting the metadata configured correctly.
>>>
>>
>> Yes, that's a good workaround as long as you don't have any actual hybrid
>> OSDs (or aren't worried about them sleeping...I'm not sure if that setting
>> came from experience or not).
>>
>>
>>>
>>> For reference, here is the complete metadata for osd.24, bluestore SATA
>>> SSD with NVMe block.db.
>>>
>>> {
>>> "id": 24,
>>> "arch": "x86_64",
>>> "back_addr": "",
>>> "back_iface": "bond0",
>>> "bluefs": "1",
>>> "bluefs_db_access_mode": "blk",
>>> "bluefs_db_block_size": "4096",
>>> "bluefs_db_dev": "259:0",
>>> "bluefs_db_dev_node": "nvme0n1",
>>> "bluefs_db_driver": "KernelDevice",
>>> "bluefs_db_model": "INTEL SSDPEDMD400G4 ",
>>> "bluefs_db_partition_path": "/dev/nvme0n1p4",
>>> "bluefs_db_rotational": "0",
>>> "bluefs_db_serial": " ",
>>> "bluefs_db_size": "16000221184",
>>> "bluefs_db_type": "nvme",
>>> "bluefs_single_shared_device": "0",
>>> "bluefs_slow_access_mode": "blk",
>>> "bluefs_slow_block_size": "4096",
>>> "bluefs_slow_dev": "253:8",
>>> "bluefs_slow_dev_node": "dm-8",
>>> 

Re: [ceph-users] Ceph Mimic on Debian 9 Stretch

2018-06-04 Thread Ronny Aasen

On 04. juni 2018 06:41, Charles Alva wrote:

Hi Guys,

When will the Ceph Mimic packages for Debian Stretch released? I could 
not find the packages even after changing the sources.list.





I am also eager to test mimic on my ceph

debian-mimic only contains ceph-deploy atm.

kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Should ceph-volume lvm prepare not be backwards compitable with ceph-disk?

2018-06-04 Thread Alfredo Deza
On Sat, Jun 2, 2018 at 12:31 PM, Oliver Freyermuth
 wrote:
> Am 02.06.2018 um 11:44 schrieb Marc Roos:
>>
>>
>> ceph-disk does not require bootstrap-osd/ceph.keyring and ceph-volume
>> does
>
> I believe that's expected when you use "prepare".
> For ceph-volume, "prepare" already bootstraps the OSD and fetches a fresh OSD 
> id,
> for which it needs the keyring.
> For ceph-disk, this was not part of "prepare", but you only needed a key for 
> "activate" later, I think.

This is exactly create.

Do note that the split into prepare+activate is just to try and
accommodate the use case of not wanting multiple OSDs
up at the same time.

There are some ceph-volume internals that required us to know what OSD
id this was before activating, which is why
the bootstrap is being used here.

It would be best to just use `create` really, and not bother with these steps.

>
> Since we always use "create" here via ceph-deploy, I'm not an expert on the 
> subtle command differences, though -
> but ceph-deploy is doing a good job at making you survive without learning 
> them ;-).

ceph-deploy just bypasses the prepare+activate and calls 'create' on
ceph-volume directly.

>
> Cheers,
> Oliver
>
>>
>>
>>
>> [@~]# ceph-disk prepare --bluestore --zap-disk /dev/sdf
>>
>> ***
>> Found invalid GPT and valid MBR; converting MBR to GPT format.
>> ***
>>
>> GPT data structures destroyed! You may now partition the disk using
>> fdisk or
>> other utilities.
>> Creating new GPT entries.
>> The operation has completed successfully.
>> The operation has completed successfully.
>> The operation has completed successfully.
>> The operation has completed successfully.
>> meta-data=/dev/sdf1  isize=2048   agcount=4, agsize=6400
>> blks
>>  =   sectsz=4096  attr=2, projid32bit=1
>>  =   crc=1finobt=0, sparse=0
>> data =   bsize=4096   blocks=25600, imaxpct=25
>>  =   sunit=0  swidth=0 blks
>> naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
>> log  =internal log   bsize=4096   blocks=1608, version=2
>>  =   sectsz=4096  sunit=1 blks, lazy-count=1
>> realtime =none   extsz=4096   blocks=0, rtextents=0
>> Warning: The kernel is still using the old partition table.
>> The new table will be used at the next reboot.
>> The operation has completed successfully.
>>
>> [@~]# ceph-disk  zap /dev/sdf
>> /dev/sdf1: 4 bytes were erased at offset 0x (xfs): 58 46 53 42
>> 100+0 records in
>> 100+0 records out
>> 104857600 bytes (105 MB) copied, 0.946816 s, 111 MB/s
>> 110+0 records in
>> 110+0 records out
>> 115343360 bytes (115 MB) copied, 0.876412 s, 132 MB/s
>> Caution: invalid backup GPT header, but valid main header; regenerating
>> backup header from main header.
>>
>> Warning! Main and backup partition tables differ! Use the 'c' and 'e'
>> options
>> on the recovery & transformation menu to examine the two tables.
>>
>> Warning! One or more CRCs don't match. You should repair the disk!
>>
>> 
>> 
>> Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but
>> disk
>> verification and recovery are STRONGLY recommended.
>> 
>> 
>> GPT data structures destroyed! You may now partition the disk using
>> fdisk or
>> other utilities.
>> Creating new GPT entries.
>> The operation has completed successfully.
>>
>>
>>
>> [@ ~]# fdisk -l /dev/sdf
>> WARNING: fdisk GPT support is currently new, and therefore in an
>> experimental phase. Use at your own discretion.
>>
>> Disk /dev/sdf: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk label type: gpt
>> Disk identifier: 7DB3B9B6-CD8E-41B5-85BA-3ABB566BAF8E
>>
>>
>> # Start  EndSize  TypeName
>>
>>
>> [@ ~]# ceph-volume lvm prepare --bluestore --data /dev/sdf
>> Running command: /bin/ceph-authtool --gen-print-key
>> Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
>> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
>> 8a2440c2-55a3-4b09-8906-965c25e36066
>>  stderr: 2018-06-02 17:00:47.309487 7f5a083c1700 -1 auth: unable to find
>> a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file
>> or directory
>>  stderr: 2018-06-02 17:00:47.309502 7f5a083c1700 -1 monclient: ERROR:
>> missing keyring, cannot use cephx for authentication
>>  stderr: 2018-06-02 17:00:47.309505 7f5a083c1700  0 librados:
>> client.bootstrap-osd initialization error (2) No such file or directory
>>  

Re: [ceph-users] Bug? if ceph-volume fails, it does not clean up created osd auth id

2018-06-04 Thread Alfredo Deza
ceph-volume has a 'rollback' functionality that if it was able to
create an OSD id, and the creation of the OSD
fails, it will remove the id. In this case, it failed to create the
id, so the tool can't be sure it has to 'clean up'.


On Sat, Jun 2, 2018 at 5:52 AM, Marc Roos  wrote:
>
>
> [@ bootstrap-osd]# ceph-volume lvm prepare --bluestore --data /dev/sdf
> Running command: /bin/ceph-authtool --gen-print-key
> Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> c32036fe-ca0b-47d1-be3f-e28943ee3a97
>  stderr: Error EEXIST: entity osd.19 exists but key does not match
> -->  RuntimeError: Unable to create a new OSD id
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-06-04 Thread Lenz Grimmer
On 05/08/2018 07:21 AM, Kai Wagner wrote:

> Looks very good. Is it anyhow possible to display the reason why a
> cluster is in an error or warning state? Thinking about the output from
> ceph -s if this could by shown in case there's a failure. I think this
> will not be provided by default but wondering if it's possible to add.

Sorry for the late reply. We actually discussed this aspect during one
of the calls we had when discussing the Grafana dashboard integration
into the Ceph Manager Dashboard. Such kind of state information is
somewhat difficult to track and visualize using Prometheus/Grafana (or
any other TSDB, FWIW), as you can't store the actual reasons for why the
cluster is in HEALTH_WARN or HEALTH_ERR, for example.

We are therefore considering displaying this information in the form of
"native" widgets on the Manager Dashboard, and using the Grafana
dashboards for visualizing the other more suitable performance metrics.

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-04 Thread Wladimir Mutel
On Mon, Jun 04, 2018 at 11:12:58AM +0300, Wladimir Mutel wrote:
>   /disks> create pool=rbd image=win2016-3tb-1 size=2861589M 
> CMD: /disks/ create pool=rbd image=win2016-3tb-1 size=2861589M count=1 
> max_data_area_mb=None
> pool 'rbd' is ok to use
> Creating/mapping disk rbd/win2016-3tb-1
> Issuing disk create request
> Failed : disk create/update failed on p10s. LUN allocation failure

>   Surely I could investigate what is happening by studying gwcli sources,
>   but if anyone already knows how to fix that, I would appreciate your 
> response.

Well, this was fixed by updating kernel to v4.17 from Ubuntu 
kernel/mainline PPA
Going on...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD-primary crush rule doesn't work as intended

2018-06-04 Thread Horace
I won't run out of write iops when I have ssd journal in place. I know that I 
can use the dual root method from Sebastien's web site, but I thought the 
'storage class' feature is the way to solve this kind of problem. 

https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
 

Regards, 
Horace Ng 


From: "Peter Linder"  
To: "Paul Emmerich" , "horace"  
Cc: "ceph-users"  
Sent: Thursday, May 24, 2018 3:46:59 PM 
Subject: Re: [ceph-users] SSD-primary crush rule doesn't work as intended 



It will also only work reliably if you use a single level tree structure with 
failure domain "host". If you want say, separate data center failure domains, 
you need extra steps to make sure a SSD host and a HDD host do not get selected 
from the same DC. 


I have done such a layout so it is possible (see my older posts) but you need 
to be careful when you construct the additional trees that are needed in order 
to force the correct elections. 

In reality however, even if you force all reads to the SSD using primary 
affinity, you will soon run out of write IOPS on the HDDs. To keep up with the 
SSD's you will need so many HDDs for an average workload that in order to keep 
up performance you will not save any money. 


Regards, 

Peter 




Den 2018-05-23 kl. 14:37, skrev Paul Emmerich: 



You can't mix HDDs and SSDs in a server if you want to use such a rule. 
The new selection step after "emit" can't know what server was selected 
previously. 

Paul 

2018-05-23 11:02 GMT+02:00 Horace < [ mailto:hor...@hkisl.net | 
hor...@hkisl.net ] > : 

BQ_BEGIN
Add to the info, I have a slightly modified rule to take advantage of the new 
storage class. 

rule ssd-hybrid { 
id 2 
type replicated 
min_size 1 
max_size 10 
step take default class ssd 
step chooseleaf firstn 1 type host 
step emit 
step take default class hdd 
step chooseleaf firstn -1 type host 
step emit 
} 

Regards, 
Horace Ng 

- Original Message - 
From: "horace" < [ mailto:hor...@hkisl.net | hor...@hkisl.net ] > 
To: "ceph-users" < [ mailto:ceph-users@lists.ceph.com | 
ceph-users@lists.ceph.com ] > 
Sent: Wednesday, May 23, 2018 3:56:20 PM 
Subject: [ceph-users] SSD-primary crush rule doesn't work as intended 

I've set up the rule according to the doc, but some of the PGs are still being 
assigned to the same host. 

[ http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/ | 
http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/ ] 

rule ssd-primary { 
ruleset 5 
type replicated 
min_size 5 
max_size 10 
step take ssd 
step chooseleaf firstn 1 type host 
step emit 
step take platter 
step chooseleaf firstn -1 type host 
step emit 
} 

Crush tree: 

[root@ceph0 ~]# ceph osd crush tree 
ID CLASS WEIGHT TYPE NAME 
-1 58.63989 root default 
-2 19.55095 host ceph0 
0 hdd 2.73000 osd.0 
1 hdd 2.73000 osd.1 
2 hdd 2.73000 osd.2 
3 hdd 2.73000 osd.3 
12 hdd 4.54999 osd.12 
15 hdd 3.71999 osd.15 
18 ssd 0.2 osd.18 
19 ssd 0.16100 osd.19 
-3 19.55095 host ceph1 
4 hdd 2.73000 osd.4 
5 hdd 2.73000 osd.5 
6 hdd 2.73000 osd.6 
7 hdd 2.73000 osd.7 
13 hdd 4.54999 osd.13 
16 hdd 3.71999 osd.16 
20 ssd 0.16100 osd.20 
21 ssd 0.2 osd.21 
-4 19.53799 host ceph2 
8 hdd 2.73000 osd.8 
9 hdd 2.73000 osd.9 
10 hdd 2.73000 osd.10 
11 hdd 2.73000 osd.11 
14 hdd 3.71999 osd.14 
17 hdd 4.54999 osd.17 
22 ssd 0.18700 osd.22 
23 ssd 0.16100 osd.23 

#ceph pg ls-by-pool ssd-hybrid 

27.8 1051 0 0 0 0 4399733760 1581 1581 active+clean 2018-05-23 06:20:56.088216 
27957'185553 27959:368828 [23,1,11] 23 [23,1,11] 23 27953'182582 2018-05-23 
06:20:56.088172 27843'162478 2018-05-20 18:28:20.118632 

With osd.23 and osd.11 being assigned on the same host. 

Regards, 
Horace Ng 
___ 
ceph-users mailing list 
[ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
[ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 
___ 
ceph-users mailing list 
[ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] 
[ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 






-- 
-- 
Paul Emmerich 

Looking for help with your Ceph cluster? Contact us at [ https://croit.io/ | 
https://croit.io ] 

croit GmbH 
Freseniusstr. 31h 
81247 München 
[ http://www.croit.io/ | www.croit.io ] 
Tel: +49 89 1896585 90 


___
ceph-users mailing list [ mailto:ceph-users@lists.ceph.com | 
ceph-users@lists.ceph.com ] [ 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] 

BQ_END


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-04 Thread Wladimir Mutel
On Fri, Jun 01, 2018 at 08:20:12PM +0300, Wladimir Mutel wrote:
> 
>   And still, when I do '/disks create ...' in gwcli, it says
>   that it wants 2 existing gateways. Probably this is related
>   to the created 2-TPG structure and I should look for more ways
>   to 'improve' that json config so that rbd-target-gw loads it
>   as I need on single host.

Well, I decided to bond my network interfaces and assign a single IP on 
them (as mchristi@ suggested)
Also I put 'minimum_gateways = 1' into /etc/ceph/iscsi-gateway.cfg and 
got rid of 'At least 2 gateways required' in gwcli
But now I have one more stumble :

gwcli -d
Adding ceph cluster 'ceph' to the UI
Fetching ceph osd information
Querying ceph for state information
Refreshing disk information from the config object
- Scanning will use 8 scan threads
- rbd image scan complete: 0s
Refreshing gateway & client information
- checking iSCSI/API ports on p10s
Querying ceph for state information
Gathering pool stats for cluster 'ceph'

/disks> create pool=rbd image=win2016-3tb-1 size=2861589M 
CMD: /disks/ create pool=rbd image=win2016-3tb-1 size=2861589M count=1 
max_data_area_mb=None
pool 'rbd' is ok to use
Creating/mapping disk rbd/win2016-3tb-1
Issuing disk create request
Failed : disk create/update failed on p10s. LUN allocation failure

Surely I could investigate what is happening by studying gwcli sources,
but if anyone already knows how to fix that, I would appreciate your 
response.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Caspar Smit
Hi Reed,

"Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on
bluestore opened the floodgates."

What exactly did you change/inject here?

We have a cluster with 10TB SATA HDD's which each have a 100GB SSD based
block.db

Looking at ceph osd metadata for each of those:

"bluefs_db_model": "SAMSUNG MZ7KM960",
"bluefs_db_rotational": "0",
"bluefs_db_type": "ssd",
"bluefs_slow_model": "ST1NM0086-2A",
"bluefs_slow_rotational": "1",
"bluefs_slow_type": "hdd",
"bluestore_bdev_rotational": "1",
"bluestore_bdev_type": "hdd",
"default_device_class": "hdd",
*"journal_rotational": "1",*
"osd_objectstore": "bluestore",
"rotational": "1"

Looks to me if i'm hitting the same issue, isn't it?

ps. An upgrade of Ceph is planned in the near future but for now i would
like to use the workaround if applicable to me.

Thank you in advance.

Kind regards,
Caspar Smit

2018-02-26 23:22 GMT+01:00 Reed Dier :

> Quick turn around,
>
> Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on
> bluestore opened the floodgates.
>
> pool objects-ssd id 20
>   recovery io 1512 MB/s, 21547 objects/s
>
> pool fs-metadata-ssd id 16
>   recovery io 0 B/s, 6494 keys/s, 271 objects/s
>   client io 82325 B/s rd, 68146 B/s wr, 1 op/s rd, 0 op/s wr
>
>
> Graph of performance jump. Extremely marked.
> https://imgur.com/a/LZR9R
>
> So at least we now have the gun to go with the smoke.
>
> Thanks for the help and appreciate you pointing me in some directions that
> I was able to use to figure out the issue.
>
> Adding to ceph.conf for future OSD conversions.
>
> Thanks,
>
> Reed
>
>
> On Feb 26, 2018, at 4:12 PM, Reed Dier  wrote:
>
> For the record, I am not seeing a demonstrative fix by injecting the value
> of 0 into the OSDs running.
>
> osd_recovery_sleep_hybrid = '0.00' (not observed, change may require
> restart)
>
>
> If it does indeed need to be restarted, I will need to wait for the
> current backfills to finish their process as restarting an OSD would bring
> me under min_size.
>
> However, doing config show on the osd daemon appears to have taken the
> value of 0.
>
> ceph daemon osd.24 config show | grep recovery_sleep
> "osd_recovery_sleep": "0.00",
> "osd_recovery_sleep_hdd": "0.10",
> "osd_recovery_sleep_hybrid": "0.00",
> "osd_recovery_sleep_ssd": "0.00",
>
>
> I may take the restart as an opportunity to also move to 12.2.3 at the
> same time, since it is not expected that that should affect this issue.
>
> I could also attempt to change osd_recovery_sleep_hdd as well, since these
> are ssd osd’s, it shouldn’t make a difference, but its a free move.
>
> Thanks,
>
> Reed
>
> On Feb 26, 2018, at 3:42 PM, Gregory Farnum  wrote:
>
> On Mon, Feb 26, 2018 at 12:26 PM Reed Dier  wrote:
>
>> I will try to set the hybrid sleeps to 0 on the affected OSDs as an
>> interim solution to getting the metadata configured correctly.
>>
>
> Yes, that's a good workaround as long as you don't have any actual hybrid
> OSDs (or aren't worried about them sleeping...I'm not sure if that setting
> came from experience or not).
>
>
>>
>> For reference, here is the complete metadata for osd.24, bluestore SATA
>> SSD with NVMe block.db.
>>
>> {
>> "id": 24,
>> "arch": "x86_64",
>> "back_addr": "",
>> "back_iface": "bond0",
>> "bluefs": "1",
>> "bluefs_db_access_mode": "blk",
>> "bluefs_db_block_size": "4096",
>> "bluefs_db_dev": "259:0",
>> "bluefs_db_dev_node": "nvme0n1",
>> "bluefs_db_driver": "KernelDevice",
>> "bluefs_db_model": "INTEL SSDPEDMD400G4 ",
>> "bluefs_db_partition_path": "/dev/nvme0n1p4",
>> "bluefs_db_rotational": "0",
>> "bluefs_db_serial": " ",
>> "bluefs_db_size": "16000221184",
>> "bluefs_db_type": "nvme",
>> "bluefs_single_shared_device": "0",
>> "bluefs_slow_access_mode": "blk",
>> "bluefs_slow_block_size": "4096",
>> "bluefs_slow_dev": "253:8",
>> "bluefs_slow_dev_node": "dm-8",
>> "bluefs_slow_driver": "KernelDevice",
>> "bluefs_slow_model": "",
>> "bluefs_slow_partition_path": "/dev/dm-8",
>> "bluefs_slow_rotational": "0",
>> "bluefs_slow_size": "1920378863616",
>> "bluefs_slow_type": "ssd",
>> "bluestore_bdev_access_mode": "blk",
>> "bluestore_bdev_block_size": "4096",
>> "bluestore_bdev_dev": "253:8",
>> "bluestore_bdev_dev_node": "dm-8",
>> "bluestore_bdev_driver": "KernelDevice",
>> "bluestore_bdev_model": "",
>> "bluestore_bdev_partition_path": "/dev/dm-8",
>> "bluestore_bdev_rotational": "0",
>> "bluestore_bdev_size": "1920378863616",
>> "bluestore_bdev_type": "ssd",
>> "ceph_version": "ceph version 12.2.2
>>