date:20171128

Re: [ceph-users] S3 object notifications

2017-11-28 Thread Yehuda Sadeh-Weinraub

On Wed, Nov 29, 2017 at 12:43 AM,   wrote:
> Hi Yehuda.
>
> Are there any examples (doc's, blog posts, ...):
> - how to use that "framework" and especially for the "callbacks"

There's a minimal sync module implementation that does nothing other
than write a debug log for each sync event:

https://github.com/ceph/ceph/blob/master/src/rgw/rgw_sync_module_log.cc

Docs-wise, the elasticsearch luminous blog has a concrete config
example that can be applied to other sync modules:
http://ceph.com/rgw/new-luminous-rgw-metadata-search/


> - for the latest "Metasearch" feature / usage with a S3 client/tools like 
> CyberDuck, s3cmd, AWSCLI or at least boto3?
>   - i.e. is an external ELK still needed or is this somehow included in RGW 
> now?
>

You still need an external elasticsearch server, it's not part of
ceph. However, search can be done by sending requests through rgw's
RESTful api. We have a test that uses boto to generate such requests,
but might not be exactly what you're looking for:
https://github.com/ceph/ceph/blob/master/src/test/rgw/rgw_multi/zone_es.py

Yehuda

> Thanks & regards
>
>
> Gesendet: Dienstag, 28. November 2017 um 13:52 Uhr
> Von: "Yehuda Sadeh-Weinraub" 
> An: "Sean Purdy" 
> Cc: "ceph-users@lists.ceph.com" 
> Betreff: Re: [ceph-users] S3 object notifications
> rgw has a sync modules framework that allows you to write your own
> sync plugins. The system identifies objects changes and triggers
> callbacks that can then act on those changes. For example, the
> metadata search feature that was added recently is using this to send
> objects metadata into elasticsearch for indexing.
>
> Yehuda
>
> On Tue, Nov 28, 2017 at 2:22 PM, Sean Purdy  wrote:
>> Hi,
>>
>>
>> http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object 
>> notifications are not supported. I'd like something like object 
>> notifications so that we can backup new objects in realtime, instead of 
>> trawling the whole object list for what's changed.
>>
>> Is there anything similar I can use? I've found Spreadshirt's haproxy fork 
>> which traps requests and updates redis - 
>> https://github.com/spreadshirt/s3gw-haproxy[https://github.com/spreadshirt/s3gw-haproxy][https://github.com/spreadshirt/s3gw-haproxy[https://github.com/spreadshirt/s3gw-haproxy]]
>>  Anybody used that?
>>
>>
>> Thanks,
>>
>> Sean Purdy
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]]
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]]
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs Hadoop Plugin and CEPH integration

2017-11-28 Thread Orit Wasserman

On Tue, Nov 28, 2017 at 7:26 PM, Aristeu Gil Alves Jr
 wrote:
> Greg and Donny,
>
> Thanks for the answers. It helped a lot!
>
> I just watched the swifta presentation and it looks quite good!
>

I would highly recommend using s3a and not swifta as it is much more
mature and is more used.

Cheers,
Orit

> Due the lack of updates/development, and the fact that we can choose spark
> also, I think maybe swift/swifta with ceph is a good strategy too.
> I need to study it more, tho.
>
> Can I get the same results (performance and integrated data-layout APIs)
> with it?
>
> Is there a migration cases/tutorials from a cephfs to a swift with ceph
> scenario that you could suggest?
>
> Best regards,
> --
> Aristeu
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] force scrubbing

2017-11-28 Thread David Turner

I personally set max_scrubs to 0 on the cluster and then set it to 1 only
on the osds involved in the PG you want to scrub.  Setting the cluster to
max_scrubs of 1 and then upping the involved osds to 2 might help, but is
not a guarantee.

On Tue, Nov 28, 2017 at 7:25 PM Gregory Farnum  wrote:

> On Mon, Nov 13, 2017 at 1:01 AM Kenneth Waegeman <
> kenneth.waege...@ugent.be> wrote:
>
>> Hi all,
>>
>>
>> Is there a way to force scrub a pg of an erasure coded pool?
>>
>> I tried  ceph pg deep-scrub 5.4c7, but after a week it still hasn't
>> scrubbed the pg (last scrub timestamp not changed)
>>
>
> Much to my surprise, it appears you can't force a scrub to start
> immediately, or even to be the next one in the queue. :/ I think you can
> hack around it with the correct combination of scrub settings; maybe one of
> the more experienced cluster admins can tell you. (Something about turning
> down the scrubs allowed to zero and the scrub intervals to very large
> numbers? Then telling it to scrub the one you want and increasing the
> number of allowed scrubs to 1?)
>
> Anyway, can you submit a ticket for that feature? It would be pretty
> useful.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] force scrubbing

2017-11-28 Thread Gregory Farnum

On Mon, Nov 13, 2017 at 1:01 AM Kenneth Waegeman 
wrote:

> Hi all,
>
>
> Is there a way to force scrub a pg of an erasure coded pool?
>
> I tried  ceph pg deep-scrub 5.4c7, but after a week it still hasn't
> scrubbed the pg (last scrub timestamp not changed)
>

Much to my surprise, it appears you can't force a scrub to start
immediately, or even to be the next one in the queue. :/ I think you can
hack around it with the correct combination of scrub settings; maybe one of
the more experienced cluster admins can tell you. (Something about turning
down the scrubs allowed to zero and the scrub intervals to very large
numbers? Then telling it to scrub the one you want and increasing the
number of allowed scrubs to 1?)

Anyway, can you submit a ticket for that feature? It would be pretty useful.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Broken upgrade from Hammer to Luminous

2017-11-28 Thread Gregory Farnum

I thought somebody else was going to contact you about this, but in case it
didn't happen off-list:

This appears to be an embarrassing issue on our end where we alter the disk
state despite not being able to start up all the way, and rely on our users
to read release notes carefully. ;) :/

At this point, you're going to need to manually manipulate the OSDs. It
will involve identifying exactly what the Luminous daemons did; *hopefully*
they have only set new features on disk. If that's so, you can probably use
ceph-dencoder on whatever the feature flags file is and pull out everything
added after hammer.

But I'm not sure if that's the only thing that happened. You may need to
get some consulting from somebody who has experience doing Ceph cluster
recovery.
-Greg

On Thu, Nov 16, 2017 at 7:58 PM Gianfilippo  wrote:

> Hi all,
> I did a pretty bit mistake doing our upgrade from hammer to luminous,
> skipping the jewel release.
> When I realized and tried to switch back to jewel, it was too late  -
> the cluster now won't start, complaining about "The disk uses features
> unsupported by the executable.":
>
> 2017-11-17 01:27:26.190971 7fb446ab58c0  0 ceph version 0.94.10
> (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 19638
> 2017-11-17 01:27:26.209600 7fb446ab58c0  0
> filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
> 2017-11-17 01:27:26.277323 7fb446ab58c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is supported and appears to work
> 2017-11-17 01:27:26.277353 7fb446ab58c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2017-11-17 01:27:26.302508 7fb446ab58c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2017-11-17 01:27:26.302668 7fb446ab58c0  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
> disabled by conf
> 2017-11-17 01:27:26.325121 7fb446ab58c0  0
> filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2017-11-17 01:27:26.343360 7fb446ab58c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 01:27:26.393876 7fb446ab58c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2017-11-17 01:27:26.394746 7fb446ab58c0 -1 osd.2 0 The disk uses
> features unsupported by the executable.
> 2017-11-17 01:27:26.394758 7fb446ab58c0 -1 osd.2 0  ondisk features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
>
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,12=transaction
> hints,13=pg meta object,14=explicit missing set,15=fastinfo pg
> attr,16=deletes in missing set}
> 2017-11-17 01:27:26.394780 7fb446ab58c0 -1 osd.2 0  daemon features
> compat={},rocompat={},incompat={1=initial feature set(~v.18),2=pginfo
> object,3=object
>
> locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded
> objects,12=transaction hints,13=pg meta object}
> 2017-11-17 01:27:26.394794 7fb446ab58c0 -1 osd.2 0 Cannot write to disk!
> Missing features: compat={},rocompat={},incompat={14=explicit missing
> set,15=fastinfo pg attr,16=deletes in missing set}
> 2017-11-17 01:27:26.419854 7fb446ab58c0  1 journal close
> /var/lib/ceph/osd/ceph-2/journal
> 2017-11-17 01:27:26.422687 7fb446ab58c0 -1 ESC[0;31m ** ERROR: osd init
> failed: (95) Operation not supportedESC[0m
> 2017-11-17 01:27:26.863514 7fcc5f1428c0  0 ceph version 0.94.10
> (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 19731
> 2017-11-17 01:27:26.878617 7fcc5f1428c0  0
> filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
> 2017-11-17 01:27:26.880689 7fcc5f1428c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is supported and appears to work
> 2017-11-17 01:27:26.880703 7fcc5f1428c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2017-11-17 01:27:26.898681 7fcc5f1428c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2017-11-17 01:27:26.898829 7fcc5f1428c0  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: extsize is
> disabled by conf
> 2017-11-17 01:27:26.906300 7fcc5f1428c0  0
> filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2017-11-17 01:27:26.917013 7fcc5f1428c0  1 journal _open
> /var/lib/ceph/osd/ceph-2/journal fd 20: 5368709120 bytes, block size
> 4096 bytes, directio = 1, aio = 1
>

Re: [ceph-users] CephFS - Mounting a second Ceph file system

2017-11-28 Thread Nigel Williams

On 29 November 2017 at 01:51, Daniel Baumann  wrote:
> On 11/28/17 15:09, Geoffrey Rhodes wrote:
>> I'd like to run more than one Ceph file system in the same cluster.

Are their opinions on how stable multiple filesystems per single Ceph
cluster is in practice? is anyone using it actively with a stressful
load?

I see the docs still place it under Experimental:

http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 object notifications

2017-11-28 Thread ceph . novice

Hi Yehuda.
 
Are there any examples (doc's, blog posts, ...):
- how to use that "framework" and especially for the "callbacks"
- for the latest "Metasearch" feature / usage with a S3 client/tools like 
CyberDuck, s3cmd, AWSCLI or at least boto3?
  - i.e. is an external ELK still needed or is this somehow included in RGW now?
 
Thanks & regards
 

Gesendet: Dienstag, 28. November 2017 um 13:52 Uhr
Von: "Yehuda Sadeh-Weinraub" 
An: "Sean Purdy" 
Cc: "ceph-users@lists.ceph.com" 
Betreff: Re: [ceph-users] S3 object notifications
rgw has a sync modules framework that allows you to write your own
sync plugins. The system identifies objects changes and triggers
callbacks that can then act on those changes. For example, the
metadata search feature that was added recently is using this to send
objects metadata into elasticsearch for indexing.

Yehuda

On Tue, Nov 28, 2017 at 2:22 PM, Sean Purdy  wrote:
> Hi,
>
>
> http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object 
> notifications are not supported. I'd like something like object notifications 
> so that we can backup new objects in realtime, instead of trawling the whole 
> object list for what's changed.
>
> Is there anything similar I can use? I've found Spreadshirt's haproxy fork 
> which traps requests and updates redis - 
> https://github.com/spreadshirt/s3gw-haproxy[https://github.com/spreadshirt/s3gw-haproxy][https://github.com/spreadshirt/s3gw-haproxy[https://github.com/spreadshirt/s3gw-haproxy]]
>  Anybody used that?
>
>
> Thanks,
>
> Sean Purdy
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com][http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]]
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-28 Thread German Anders

Don't know if there's any statistics available really, but Im running some
sysbench tests with mysql before the changes and the idea is to run those
tests again after the 'tuning' and see if numbers get better in any way,
also I'm gathering numbers from some collectd and statsd collectors running
on the osd nodes so, I hope to get some info about that :)


*German*

2017-11-28 16:12 GMT-03:00 Marc Roos :

>
> I was wondering if there are any statistics available that show the
> performance increase of doing such things?
>
>
>
>
>
>
> -Original Message-
> From: German Anders [mailto:gand...@despegar.com]
> Sent: dinsdag 28 november 2017 19:34
> To: Luis Periquito
> Cc: ceph-users
> Subject: Re: [ceph-users] ceph all-nvme mysql performance tuning
>
> Thanks a lot Luis, I agree with you regarding the CPUs, but
> unfortunately those were the best CPU model that we can afford :S
>
> For the NUMA part, I manage to pinned the OSDs by changing the
> /usr/lib/systemd/system/ceph-osd@.service file and adding the
> CPUAffinity list to it. But, this is for ALL the OSDs to specific nodes
> or specific CPU list. But I can't find the way to specify a list for
> only a specific number of OSDs.
>
> Also, I notice that the NVMe disks are all on the same node (since I'm
> using half of the shelf - so the other half will be pinned to the other
> node), so the lanes of the NVMe disks are all on the same CPU (in this
> case 0). Also, I find that the IB adapter that is mapped to the OSD
> network (osd replication) is pinned to CPU 1, so this will cross the QPI
> path.
>
> And for the memory, from the other email, we are already using the
> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES parameter with a value of
> 134217728
>
> In this case I can pinned all the actual OSDs to CPU 0, but in the near
> future when I add more nvme disks to the OSD nodes, I'll definitely need
> to pinned the other half OSDs to CPU 1, someone already did this?
>
> Thanks a lot,
>
> Best,
>
>
>
> German
>
> 2017-11-28 6:36 GMT-03:00 Luis Periquito :
>
>
> There are a few things I don't like about your machines... If you
> want latency/IOPS (as you seemingly do) you really want the highest
> frequency CPUs, even over number of cores. These are not too bad, but
> not great either.
>
> Also you have 2x CPU meaning NUMA. Have you pinned OSDs to NUMA
> nodes? Ideally OSD is pinned to same NUMA node the NVMe device is
> connected to. Each NVMe device will be running on PCIe lanes generated
> by one of the CPUs...
>
> What versions of TCMalloc (or jemalloc) are you running? Have you
> tuned them to have a bigger cache?
>
> These are from what I've learned using filestore - I've yet to run
> full tests on bluestore - but they should still apply...
>
> On Mon, Nov 27, 2017 at 5:10 PM, German Anders
>  wrote:
>
>
> Hi Nick,
>
> yeah, we are using the same nvme disk with an additional
> partition to use as journal/wal. We double check the c-state and it was
> not configure to use c1, so we change that on all the osd nodes and mon
> nodes and we're going to make some new tests, and see how it goes. I'll
> get back as soon as get got those tests running.
>
> Thanks a lot,
>
> Best,
>
>
>
>
>
>
> German
>
> 2017-11-27 12:16 GMT-03:00 Nick Fisk :
>
>
> From: ceph-users
> [mailto:ceph-users-boun...@lists.ceph.com
>  ] On Behalf Of German Anders
> Sent: 27 November 2017 14:44
> To: Maged Mokhtar 
> Cc: ceph-users 
> Subject: Re: [ceph-users] ceph all-nvme mysql
> performance
> tuning
>
>
>
> Hi Maged,
>
>
>
> Thanks a lot for the response. We try with
> different
> number of threads and we're getting almost the same kind of difference
> between the storage types. Going to try with different rbd stripe size,
> object size values and see if we get more competitive numbers. Will get
> back with more tests and param changes to see if we get better :)
>
>
>
>
>
> Just to echo a couple of comments. Ceph will always
> struggle to match the performance of a traditional array for mainly 2
> reasons.
>
>
>
> 1.  You are replacing some sort of dual ported
> SAS or
> internally RDMA connected device with a network for Ceph replication
> traffic. This will instantly have a large impact on write latency
> 2.  Ceph locks at the PG level and a PG will
> most
> likely cover at least one 4MB object, so lots of small accesses to the
> same blocks (on a block device) will wait on each other and go
> effectively at a single threaded rate.

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-28 Thread Marc Roos

 
I was wondering if there are any statistics available that show the 
performance increase of doing such things?






-Original Message-
From: German Anders [mailto:gand...@despegar.com] 
Sent: dinsdag 28 november 2017 19:34
To: Luis Periquito
Cc: ceph-users
Subject: Re: [ceph-users] ceph all-nvme mysql performance tuning

Thanks a lot Luis, I agree with you regarding the CPUs, but 
unfortunately those were the best CPU model that we can afford :S

For the NUMA part, I manage to pinned the OSDs by changing the 
/usr/lib/systemd/system/ceph-osd@.service file and adding the 
CPUAffinity list to it. But, this is for ALL the OSDs to specific nodes 
or specific CPU list. But I can't find the way to specify a list for 
only a specific number of OSDs. 

Also, I notice that the NVMe disks are all on the same node (since I'm 
using half of the shelf - so the other half will be pinned to the other 
node), so the lanes of the NVMe disks are all on the same CPU (in this 
case 0). Also, I find that the IB adapter that is mapped to the OSD 
network (osd replication) is pinned to CPU 1, so this will cross the QPI 
path.

And for the memory, from the other email, we are already using the 
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES parameter with a value of 
134217728

In this case I can pinned all the actual OSDs to CPU 0, but in the near 
future when I add more nvme disks to the OSD nodes, I'll definitely need 
to pinned the other half OSDs to CPU 1, someone already did this?

Thanks a lot,

Best,



German

2017-11-28 6:36 GMT-03:00 Luis Periquito :


There are a few things I don't like about your machines... If you 
want latency/IOPS (as you seemingly do) you really want the highest 
frequency CPUs, even over number of cores. These are not too bad, but 
not great either.

Also you have 2x CPU meaning NUMA. Have you pinned OSDs to NUMA 
nodes? Ideally OSD is pinned to same NUMA node the NVMe device is 
connected to. Each NVMe device will be running on PCIe lanes generated 
by one of the CPUs...

What versions of TCMalloc (or jemalloc) are you running? Have you 
tuned them to have a bigger cache?

These are from what I've learned using filestore - I've yet to run 
full tests on bluestore - but they should still apply...

On Mon, Nov 27, 2017 at 5:10 PM, German Anders 
 wrote:


Hi Nick, 

yeah, we are using the same nvme disk with an additional 
partition to use as journal/wal. We double check the c-state and it was 
not configure to use c1, so we change that on all the osd nodes and mon 
nodes and we're going to make some new tests, and see how it goes. I'll 
get back as soon as get got those tests running.

Thanks a lot,

Best,






German

2017-11-27 12:16 GMT-03:00 Nick Fisk :


From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com 
 ] On Behalf Of German Anders
Sent: 27 November 2017 14:44
To: Maged Mokhtar 
Cc: ceph-users 
Subject: Re: [ceph-users] ceph all-nvme mysql 
performance 
tuning

 

Hi Maged,

 

Thanks a lot for the response. We try with different 
number of threads and we're getting almost the same kind of difference 
between the storage types. Going to try with different rbd stripe size, 
object size values and see if we get more competitive numbers. Will get 
back with more tests and param changes to see if we get better :)

 

 

Just to echo a couple of comments. Ceph will always 
struggle to match the performance of a traditional array for mainly 2 
reasons.

 

1.  You are replacing some sort of dual ported SAS 
or 
internally RDMA connected device with a network for Ceph replication 
traffic. This will instantly have a large impact on write latency
2.  Ceph locks at the PG level and a PG will most 
likely cover at least one 4MB object, so lots of small accesses to the 
same blocks (on a block device) will wait on each other and go 
effectively at a single threaded rate.

 

The best thing you can do to mitigate these, is to run 
the fastest journal/WAL devices you can, fastest network connections (ie 
25Gb/s) and run your CPU’s at max C and P states.

 

You stated that you are running the performance profile 
on the CPU’s. Could you also just double check that the C-states are 
being held at

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Vasu Kulkarni

On Tue, Nov 28, 2017 at 9:22 AM, David Turner  wrote:
> Isn't marking something as deprecated meaning that there is a better option
> that we want you to use and you should switch to it sooner than later? I
> don't understand how this is ready to be marked as such if ceph-volume can't
> be switched to for all supported use cases. If ZFS, encryption, FreeBSD, etc
> are all going to be supported under ceph-volume, then how can ceph-disk be
> deprecated before ceph-volume can support them? I can imagine many Ceph
> admins wasting time chasing an erroneous deprecated warning because it came
> out before the new solution was mature enough to replace the existing
> solution.

There is no need to worry about this deprecation, Its mostly for
admins to be prepared
for the changes coming ahead and its mostly for *new* installations
that can plan on using ceph-volume which provides
great flexibility compared to ceph-disk.

a) many dont use ceph-disk or ceph-volume directly, so the tool you
have right now eg: ceph-deploy or ceph-ansible
will still support the ceph-disk, the previous ceph-deploy release is
still available from pypi
  https://pypi.python.org/pypi/ceph-deploy

b) also the current push will help anyone who is using ceph-deploy or
ceph-disk in scripts/chef/etc
   to have time to think about using newer cli based on ceph-volume


> On Tue, Nov 28, 2017 at 9:26 AM Willem Jan Withagen  wrote:
>>
>> On 28-11-2017 13:32, Alfredo Deza wrote:
>> >
>> > I understand that this would involve a significant effort to fully
>> > port over and drop ceph-disk entirely, and I don't think that dropping
>> > ceph-disk in Mimic is set in stone (yet).
>>
>> Alfredo,
>>
>> When I expressed my concers about deprecating ceph-disk, I was led to
>> beleive that I had atleast two release cycles to come up with something
>> of a 'ceph-volume zfs '
>>
>> Reading this, there is a possibility that it will get dropped IN mimic?
>> Which means that there is less than 1 release cycle to get it working?
>>
>> Thanx,
>> --WjW
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-28 Thread German Anders

Thanks a lot Luis, I agree with you regarding the CPUs, but unfortunately
those were the best CPU model that we can afford :S

For the NUMA part, I manage to pinned the OSDs by changing the
/usr/lib/systemd/system/ceph-osd@.service file and adding the CPUAffinity
list to it. But, this is for ALL the OSDs to specific nodes or specific CPU
list. But I can't find the way to specify a list for only a specific number
of OSDs.

Also, I notice that the NVMe disks are all on the same node (since I'm
using half of the shelf - so the other half will be pinned to the other
node), so the lanes of the NVMe disks are all on the same CPU (in this case
0). Also, I find that the IB adapter that is mapped to the OSD network (osd
replication) is pinned to CPU 1, so this will cross the QPI path.

And for the memory, from the other email, we are already using the
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES parameter with a value of 134217728

In this case I can pinned all the actual OSDs to CPU 0, but in the near
future when I add more nvme disks to the OSD nodes, I'll definitely need to
pinned the other half OSDs to CPU 1, someone already did this?

Thanks a lot,

Best,


*German*

2017-11-28 6:36 GMT-03:00 Luis Periquito :

> There are a few things I don't like about your machines... If you want
> latency/IOPS (as you seemingly do) you really want the highest frequency
> CPUs, even over number of cores. These are not too bad, but not great
> either.
>
> Also you have 2x CPU meaning NUMA. Have you pinned OSDs to NUMA nodes?
> Ideally OSD is pinned to same NUMA node the NVMe device is connected to.
> Each NVMe device will be running on PCIe lanes generated by one of the
> CPUs...
>
> What versions of TCMalloc (or jemalloc) are you running? Have you tuned
> them to have a bigger cache?
>
> These are from what I've learned using filestore - I've yet to run full
> tests on bluestore - but they should still apply...
>
> On Mon, Nov 27, 2017 at 5:10 PM, German Anders 
> wrote:
>
>> Hi Nick,
>>
>> yeah, we are using the same nvme disk with an additional partition to use
>> as journal/wal. We double check the c-state and it was not configure to use
>> c1, so we change that on all the osd nodes and mon nodes and we're going to
>> make some new tests, and see how it goes. I'll get back as soon as get got
>> those tests running.
>>
>> Thanks a lot,
>>
>> Best,
>>
>>
>> *German*
>>
>> 2017-11-27 12:16 GMT-03:00 Nick Fisk :
>>
>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>> Behalf Of *German Anders
>>> *Sent:* 27 November 2017 14:44
>>> *To:* Maged Mokhtar 
>>> *Cc:* ceph-users 
>>> *Subject:* Re: [ceph-users] ceph all-nvme mysql performance tuning
>>>
>>>
>>>
>>> Hi Maged,
>>>
>>>
>>>
>>> Thanks a lot for the response. We try with different number of threads
>>> and we're getting almost the same kind of difference between the storage
>>> types. Going to try with different rbd stripe size, object size values and
>>> see if we get more competitive numbers. Will get back with more tests and
>>> param changes to see if we get better :)
>>>
>>>
>>>
>>>
>>>
>>> Just to echo a couple of comments. Ceph will always struggle to match
>>> the performance of a traditional array for mainly 2 reasons.
>>>
>>>
>>>
>>>1. You are replacing some sort of dual ported SAS or internally RDMA
>>>connected device with a network for Ceph replication traffic. This will
>>>instantly have a large impact on write latency
>>>2. Ceph locks at the PG level and a PG will most likely cover at
>>>least one 4MB object, so lots of small accesses to the same blocks (on a
>>>block device) will wait on each other and go effectively at a single
>>>threaded rate.
>>>
>>>
>>>
>>> The best thing you can do to mitigate these, is to run the fastest
>>> journal/WAL devices you can, fastest network connections (ie 25Gb/s) and
>>> run your CPU’s at max C and P states.
>>>
>>>
>>>
>>> You stated that you are running the performance profile on the CPU’s.
>>> Could you also just double check that the C-states are being held at C1(e)?
>>> There are a few utilities that can show this in realtime.
>>>
>>>
>>>
>>> Other than that, although there could be some minor tweaks, you are
>>> probably nearing the limit of what you can hope to achieve.
>>>
>>>
>>>
>>> Nick
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Best,
>>>
>>>
>>> *German*
>>>
>>>
>>>
>>> 2017-11-27 11:36 GMT-03:00 Maged Mokhtar :
>>>
>>> On 2017-11-27 15:02, German Anders wrote:
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I've a performance question, we recently install a brand new Ceph
>>> cluster with all-nvme disks, using ceph version 12.2.0 with bluestore
>>> configured. The back-end of the cluster is using a bond IPoIB
>>> (active/passive) , and for the front-end we are using a bonding config with
>>> active/active (20GbE) to communicate with the clients.

[ceph-users] Cache tier or RocksDB

2017-11-28 Thread Jorge Pinilla López

Hey!
I have a setup with blueStore with 8 hdd and 1 nmve of 400GB por host,
how would I get more performance, by using the nmve as equal RocksDB
partitions (50GB each), setting it all as a cache tier osd for the rest
of hdd or doing a mix with less DB space like 15-25GB partitions and the
rest for cache tier osd?
Has anyone tested it?

*Jorge Pinilla López*
jorp...@unizar.es
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs Hadoop Plugin and CEPH integration

2017-11-28 Thread Aristeu Gil Alves Jr

Greg and Donny,

Thanks for the answers. It helped a lot!

I just watched the swifta presentation and it looks quite good!

Due the lack of updates/development, and the fact that we can choose spark
also, I think maybe swift/swifta with ceph is a good strategy too.
I need to study it more, tho.

Can I get the same results (performance and integrated data-layout APIs)
with it?

Is there a migration cases/tutorials from a cephfs to a swift with ceph
scenario that you could suggest?

Best regards,
--
Aristeu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread David Turner

Isn't marking something as deprecated meaning that there is a better option
that we want you to use and you should switch to it sooner than later? I
don't understand how this is ready to be marked as such if ceph-volume
can't be switched to for all supported use cases. If ZFS, encryption,
FreeBSD, etc are all going to be supported under ceph-volume, then how can
ceph-disk be deprecated before ceph-volume can support them? I can imagine
many Ceph admins wasting time chasing an erroneous deprecated warning
because it came out before the new solution was mature enough to replace
the existing solution.
On Tue, Nov 28, 2017 at 9:26 AM Willem Jan Withagen  wrote:

> On 28-11-2017 13:32, Alfredo Deza wrote:
> >
> > I understand that this would involve a significant effort to fully
> > port over and drop ceph-disk entirely, and I don't think that dropping
> > ceph-disk in Mimic is set in stone (yet).
>
> Alfredo,
>
> When I expressed my concers about deprecating ceph-disk, I was led to
> beleive that I had atleast two release cycles to come up with something
> of a 'ceph-volume zfs '
>
> Reading this, there is a possibility that it will get dropped IN mimic?
> Which means that there is less than 1 release cycle to get it working?
>
> Thanx,
> --WjW
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 object notifications

2017-11-28 Thread Sean Purdy

On Tue, 28 Nov 2017, Yehuda Sadeh-Weinraub said:
> rgw has a sync modules framework that allows you to write your own
> sync plugins. The system identifies objects changes and triggers

I am not a C++ developer though.

http://ceph.com/rgw/new-luminous-rgw-metadata-search/ says

"Stay tuned in future releases for sync plugins that replicate data to (or even 
from) cloud storage services like S3!"

But then it looks like you wrote that blog post!  I guess I'll stay tuned


Sean


> callbacks that can then act on those changes. For example, the
> metadata search feature that was added recently is using this to send
> objects metadata into elasticsearch for indexing.
> 
> Yehuda
> 
> On Tue, Nov 28, 2017 at 2:22 PM, Sean Purdy  wrote:
> > Hi,
> >
> >
> > http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object 
> > notifications are not supported.  I'd like something like object 
> > notifications so that we can backup new objects in realtime, instead of 
> > trawling the whole object list for what's changed.
> >
> > Is there anything similar I can use?  I've found Spreadshirt's haproxy fork 
> > which traps requests and updates redis - 
> > https://github.com/spreadshirt/s3gw-haproxy  Anybody used that?
> >
> >
> > Thanks,
> >
> > Sean Purdy
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] monitor crash issue

2017-11-28 Thread Joao Eduardo Luis


Hi Zhongyan,

On 11/28/2017 02:25 PM, Zhongyan Gu wrote:

Hi There,
We hit a monitor crash bug in our production clusters during adding more 
nodes into one of  clusters.


Thanks for reporting this. Can you please share the log resulting from 
the crash?


I'll be looking into this.

  -Joao

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Transparent huge pages

2017-11-28 Thread Nigel Williams

Given that memory is a key resource for Ceph, this advice about switching
Transparent Huge Pages kernel setting to madvise would be worth testing to
see if THP is helping or hindering.

Article:
https://blog.nelhage.com/post/transparent-hugepages/

Discussion:
https://news.ycombinator.com/item?id=15795337


echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS - Mounting a second Ceph file system

2017-11-28 Thread Daniel Baumann

On 11/28/17 15:09, Geoffrey Rhodes wrote:
> I'd like to run more than one Ceph file system in the same cluster.
> Can anybody point me in the right direction to explain how to mount the
> second file system?

if you use the kernel client, you can use the mds_namespace option, i.e.:

  mount -t ceph $monitor_address:/ -o mds_namespace=$fsname \
  /mnt/$you_mountpoint

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS - Mounting a second Ceph file system

2017-11-28 Thread Geoffrey Rhodes

Thanks John for the assistance.

Geoff

On 28 November 2017 at 16:30, John Spray  wrote:

> On Tue, Nov 28, 2017 at 2:09 PM, Geoffrey Rhodes 
> wrote:
> > Good day,
> >
> > I'd like to run more than one Ceph file system in the same cluster.
> > Can anybody point me in the right direction to explain how to mount the
> > second file system?
>
> With the kernel mount you can use "-o mds_namespace=" to specify which
> filesystem you want, and with the fuse client you have a
> --client_mds_namespace option.
>
> Cheers,
> John
>
> >
> > Thanks
> >
> > OS: Ubuntu 16.04.3 LTS
> > Ceph version: 12.2.1 - Luminous
> >
> >
> > Kind regards
> > Geoffrey Rhodes
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Willem Jan Withagen


On 28-11-2017 13:32, Alfredo Deza wrote:


I understand that this would involve a significant effort to fully
port over and drop ceph-disk entirely, and I don't think that dropping
ceph-disk in Mimic is set in stone (yet).


Alfredo,

When I expressed my concers about deprecating ceph-disk, I was led to 
beleive that I had atleast two release cycles to come up with something 
of a 'ceph-volume zfs '


Reading this, there is a possibility that it will get dropped IN mimic?
Which means that there is less than 1 release cycle to get it working?

Thanx,
--WjW


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] monitor crash issue

2017-11-28 Thread Zhongyan Gu

Hi There,
We hit a monitor crash bug in our production clusters during adding more
nodes into one of  clusters.
The stack trace looks like below:
lc 25431444 0> 2017-11-23 15:41:16.688046 7f93883f2700 -1 error_msg
mon/OSDMonitor.cc: In function 'MOSDMap*
OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7f93883f2700 time
2017-11-23 15:41:16.683525
mon/OSDMonitor.cc: 2123: FAILED assert(0)

ceph version .94.5.9 (e92a4716ae7404566753964959ddd84411b5dd18)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x7b4735]
2: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x9ab)
[0x5e2e5b]
3: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool)+0xb1)
[0x5e85b1]
4: (OSDMonitor::check_sub(Subscription*)+0x217) [0x5e8c17]
5: (Monitor::handle_subscribe(MMonSubscribe*)+0x440) [0x571810]
6: (Monitor::dispatch(MonSession*, Message*, bool)+0x3eb) [0x592d5b]
7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x593716]
8: (Monitor::ms_dispatch(Message*)+0x23) [0x5b2ac3]
9: (DispatchQueue::entry()+0x62a) [0x8a44aa]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x79c97d]
11: (()+0x7dc5) [0x7f93ad51ddc5]
12: (clone()+0x6d) [0x7f93ac00176d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.

the exact assert failure is:
MOSDMap *OSDMonitor::build_incremental(epoch_t from, epoch_t to)
{
  dout(10) << "build_incremental [" << from << ".." << to << "]" << dendl;
  MOSDMap *m = new MOSDMap(mon->monmap->fsid);
  m->oldest_map = get_first_committed();
  m->newest_map = osdmap.get_epoch();

  for (epoch_t e = to; e >= from && e > 0; e--) {
  bufferlist bl;
  int err = get_version(e, bl);
  if (err == 0) {
  assert(bl.length());
  // if (get_version(e, bl) > 0) {
  dout(20) << "build_incremental inc " << e << " "
<< bl.length() << " bytes" << dendl;
  m->incremental_maps[e] = bl;
  } else {
  assert(err == -ENOENT);
  assert(!bl.length());
  get_version_full(e, bl);
  if (bl.length() > 0) {
  //else if (get_version("full", e, bl) > 0) {
  dout(20) << "build_incremental full " << e << " "
<< bl.length() << " bytes" << dendl;
  m->maps[e] = bl;
  } else {
assert(0); // we should have all maps.   <===assert failed
  }
  }
  }
  return m;
}

we checked the code and found there could be race condition between
mondbstore read operation and osdmap trim operation. The panic scenario
looks like mondbstore is trimming osdmap and concurrently, new added osd is
requesting osdmap which invoked OSDMonitor::build_incremental(). if the
requested map is trimmed, get_version_full can not get the osdmaps from
mondbstore, then the assert failure is triggered. Though we run into this
issue with hammer, we checked the latest master branch and believe the race
condition is still there. Can anyone confirm this?

BTW, we think this is a dup of http://tracker.ceph.com/issues/11332 and
updated the comments but no response by now.

zhongyan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-28 Thread Jakub Jaszewski

Hi David, thanks for quick feedback.

Then why some PGs were remapped and some were not?

# LOOKS THAT 338 PGs IN ERASURE CODED POOLS HAVE BEEN REMAPPED
# I DONT GET WHY 540 PGs STILL ENCOUNTER active+undersized+degraded STATE

root at host01 :~#
ceph pg dump pgs_brief  |grep 'active+remapped'
dumped pgs_brief in format plain
16.6f active+remapped [43,2147483647 <(214)%20748-3647>,2,31,12] 43
[43,33,2,31,12] 43
16.6e active+remapped [10,5,35,44,2147483647] 10 [10,5,35,44,41] 10
root at host01
:~# egrep
'16.6f|16.6e' PGs_on_HOST_host05
16.6f active+clean [43,33,2,59,12] 43 [43,33,2,59,12] 43
16.6e active+clean [10,5,49,35,41] 10 [10,5,49,35,41] 10root at host01
:~#

like PG 16.6f, prior to ceph services stop it was on [43,33,2,59,12]
then was remapped to [43,33,2,31,12], so OSD@31 and OSD@33 are on the
same HOST.

But for example PG 16.ee get to active+undersized+degraded state,
prior to services stop it was on

pg_stat state up up_primary acting acting_primary
16.ee active+clean [5,22,33,55,45] 5 [5,22,33,55,45] 5

after the stop of services on the host it was not remapped

16.ee   active+undersized+degraded  [5,22,33,2147483647
<(214)%20748-3647>,45]  5   [5,22,33,2147483647 <(214)%20748-3647>,45]  
5
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Joao Eduardo Luis


On 11/28/2017 12:52 PM, Alfredo Deza wrote:

On Tue, Nov 28, 2017 at 7:38 AM, Joao Eduardo Luis  wrote:

On 11/28/2017 11:54 AM, Alfredo Deza wrote:


On Tue, Nov 28, 2017 at 3:12 AM, Wido den Hollander  wrote:




Op 27 november 2017 om 14:36 schreef Alfredo Deza :


For the upcoming Luminous release (12.2.2), ceph-disk will be
officially in 'deprecated' mode (bug fixes only). A large banner with
deprecation information has been added, which will try to raise
awareness.



As much as I like ceph-volume and the work being done, is it really a
good idea to use a minor release to deprecate a tool?

Can't we just introduce ceph-volume and deprecate ceph-disk at the
release of M? Because when you upgrade to 12.2.2 suddenly existing
integrations will have deprecation warnings being thrown at them while they
haven't upgraded to a new major version.



ceph-volume has been present since the very first release of Luminous,
the deprecation warning in ceph-disk is the only "new" thing
introduced for 12.2.2.



I think Wido's question still stands: why can't ceph-disk be deprecated
solely in M, and removed by N?


Like I mentioned, I don't think this is set in stone (yet), but it was
the idea from the beginning (See Oct 9th thread "killing ceph-disk"),
and I don't think it would
be terribly bad to keep ceph-disk in Mimic, but fully frozen, with no
updates or bug fixes. And full removal in N

The deprecation warnings need to stay for Luminous though.


I can live with this, granted Luminous still sees bug fixes despite the 
deprecation warning - but I'm guessing that's what you meant by only 
fully freezing in Mimic :).


Thanks.

  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CephFS - Mounting a second Ceph file system

2017-11-28 Thread Geoffrey Rhodes

Good day,

I'd like to run more than one Ceph file system in the same cluster.
Can anybody point me in the right direction to explain how to mount the
second file system?

Thanks

OS: Ubuntu 16.04.3 LTS
Ceph version: 12.2.1 - Luminous


Kind regards
Geoffrey Rhodes
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-28 Thread David Turner

Your EC profile requires 5 servers to be healthy.  When you remove 1 OSD
from the cluster, it recovers by moving all of the copies on that OSD to
other OSDs in the same host.  However when you remove an entire host, it
cannot store 5 copies of the data on the 4 remaining servers with your
crush rules.  The EC profile you're using does not work with this type of
testing based on your hardware configuration.

On Tue, Nov 28, 2017 at 8:43 AM Jakub Jaszewski 
wrote:

> Hi, I'm trying to understand erasure coded pools and why CRUSH rules seem
> to work for only part of PGs in EC pools.
>
> Basically what I'm trying to do is to check erasure coded pool recovering
> behaviour after the single OSD or single HOST failure.
> I noticed that in case of HOST failure only part of PGs get recovered to
> active+remapped when other PGs remain in active+undersized+degraded state.
> Why??
> EC pool profile I use is k=3 , m=2.
>
> Also I'm not really sure what is the meaning of all steps of below crush
> rule (perhaps it is the root cause).
> rule ecpool_3_2 {
> ruleset 1
> type erasure
> min_size 3
> max_size 5
> step set_chooseleaf_tries 5 # should I maybe try to increase this number
> of retry ?? Can I apply the changes to existing EC crush rule and pool or
> need to create a new one ?
> step set_choose_tries 100
> step take default
> step chooseleaf indep 0 type host # Does it allow to choose more than one
> OSD from single HOST but first trying to get only one OSD per HOST if there
> are enough HOSTs in the cluster?
> step emit
> }
>
> ceph version 10.2.9 (jewel)
>
> # INITIAL CLUSTER STATE
> root@host01:~# ceph osd tree
> ID  WEIGHTTYPE NAMEUP/DOWN REWEIGHT
> PRIMARY-AFFINITY
>  -1 218.18401 root default
>
>  -6 218.18401 region MyRegion
>
>  -5 218.18401 datacenter MyDC
>
>  -4 218.18401 room MyRoom
>
>  -3  43.63699 rack Rack01
>
>  -2  43.63699 host host01
>
>   0   3.63599 osd.0 up  1.0
>  1.0
>   3   3.63599 osd.3 up  1.0
>  1.0
>   4   3.63599 osd.4 up  1.0
>  1.0
>   6   3.63599 osd.6 up  1.0
>  1.0
>   8   3.63599 osd.8 up  1.0
>  1.0
>  10   3.63599 osd.10up  1.0
>  1.0
>  12   3.63599 osd.12up  1.0
>  1.0
>  14   3.63599 osd.14up  1.0
>  1.0
>  16   3.63599 osd.16up  1.0
>  1.0
>  19   3.63599 osd.19up  1.0
>  1.0
>  22   3.63599 osd.22up  1.0
>  1.0
>  25   3.63599 osd.25up  1.0
>  1.0
>  -8  43.63699 rack Rack02
>
>  -7  43.63699 host host02
>
>   1   3.63599 osd.1 up  1.0
>  1.0
>   2   3.63599 osd.2 up  1.0
>  1.0
>   5   3.63599 osd.5 up  1.0
>  1.0
>   7   3.63599 osd.7 up  1.0
>  1.0
>   9   3.63599 osd.9 up  1.0
>  1.0
>  11   3.63599 osd.11up  1.0
>  1.0
>  13   3.63599 osd.13up  1.0
>  1.0
>  15   3.63599 osd.15up  1.0
>  1.0
>  17   3.63599 osd.17up  1.0
>  1.0
>  20   3.63599 osd.20up  1.0
>  1.0
>  23   3.63599 osd.23up  1.0
>  1.0
>  26   3.63599 osd.26up  1.0
>  1.0
> -10 130.91000 rack Rack03
>
>  -9  43.63699 host host03
>
>  18   3.63599 osd.18up  1.0
>  1.0
>  21   3.63599 osd.21up  1.0
>  1.0
>  24   3.63599 osd.24up  1.0
>  1.0
>  27   3.63599 osd.27up  1.0
>  1.0
>  28   3.63599 osd.28up  1.0
>  1.0
>  29   3.63599 osd.29up  1.0
>  1.0
>  30   3.63599 osd.30up  1.0
>  1.0
>  31   3.63599 osd.31up  1.0
>  1.0
>  32   3.63599 osd.32up  1.0
>  1.0
>  33   3.63599 osd.33up  1.0
>  1.0
>  34   3.63599 osd.34up  1.0
>  1.0
>  35   3.63599 osd.35up  1.0
>  1.0
> -11  43.63699

Re: [ceph-users] "failed to open ino"

2017-11-28 Thread Jens-U. Mozdzen


Hi David,

Zitat von David C :

On 27 Nov 2017 1:06 p.m., "Jens-U. Mozdzen"  wrote:

Hi David,

Zitat von David C :

Hi Jens


We also see these messages quite frequently, mainly the "replicating
dir...". Only seen "failed to open ino" a few times so didn't do any real
investigation. Our set up is very similar to yours, 12.2.1, active/standby
MDS and exporting cephfs through KNFS (hoping to replace with Ganesha
soon).



been there, done that - using Ganesha more than doubled the run-time of our
jobs, while with knfsd, the run-time is about the same for CephFS-based and
"local disk"-based files. But YMMV, so if you see speeds with Ganesha that
are similar to knfsd, please report back with details...


I'd be interested to know if you tested Ganesha over a cephfs kernel mount
(ie using the VFS fsal) or if you used the Ceph fsal. Also the server and
client versions you tested.


I had tested Ganesha only via the Ceph FSAL. Our Ceph nodes (including  
the one used as a Ganesha server) are running  
ceph-12.2.1+git.1507910930.aea79b8b7a on OpenSUSE 42.3, SUSE's kernel  
4.4.76-1-default (which has a number of back-ports in it), Ganesha is  
at version nfs-ganesha-2.5.2.0+git.1504275777.a9d23b98f.


The NFS clients are a broad mix of current and older systems.


Prior to Luminous, Ganesha writes were terrible due to a bug with fsync
calls in the mds code. The fix went into the mds and client code. If you're
doing Ganesha over the top of the kernel mount you'll need a pretty recent
kernel to see the write improvements.


As we were testing the Ceph FSAL, this should not be the cause.


From my limited Ganesha testing so far, reads are better when exporting the
kernel mount, writes are much better with the Ceph fsal. But that's
expected for me as I'm using the CentOS kernel. I was hoping the
aforementioned fix would make it into the rhel 7.4 kernel but doesn't look
like it has.


When exporting the kernel-mounted CephFS via kernel nfsd, we see  
similar speeds to serving the same set of files from a local bcache'd  
RAID1 array on SAS disks. This is for a mix of reads and writes,  
mostly small files (compile jobs, some packaging).



From what I can see, it would have to be A/A/P, since MDS demands at least
one stand-by.


That's news to me.


From http://docs.ceph.com/docs/master/cephfs/multimds/ :

"Each CephFS filesystem has a max_mds setting, which controls how many  
ranks will be created. The actual number of ranks in the filesystem  
will only be increased if a spare daemon is available to take on the  
new rank. For example, if there is only one MDS daemon running, and  
max_mds is set to two, no second rank will be created."


Might well be I was mis-reading this... I had first read it to mean  
that a spare daemon needs to be available *while running* A/A, but the  
example sounds like the spare is required when *switching to* A/A.



Is it possible you still had standby config in your ceph.conf?


Not sure what you're asking for, is this related to active/active or  
to our Ganesha tests? We have not yet tried to switch to A/A, so our  
config actually contains standby parameters.


Regards,
Jens

--
Jens-U. Mozdzen voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15   mobile  : +49-179-4 98 21 98
D-22423 Hamburg e-mail  : jmozd...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Torlée-Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-28 Thread Jakub Jaszewski

Hi, I'm trying to understand erasure coded pools and why CRUSH rules seem
to work for only part of PGs in EC pools.

Basically what I'm trying to do is to check erasure coded pool recovering
behaviour after the single OSD or single HOST failure.
I noticed that in case of HOST failure only part of PGs get recovered to
active+remapped when other PGs remain in active+undersized+degraded state.
Why??
EC pool profile I use is k=3 , m=2.

Also I'm not really sure what is the meaning of all steps of below crush
rule (perhaps it is the root cause).
rule ecpool_3_2 {
ruleset 1
type erasure
min_size 3
max_size 5
step set_chooseleaf_tries 5 # should I maybe try to increase this number of
retry ?? Can I apply the changes to existing EC crush rule and pool or need
to create a new one ?
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host # Does it allow to choose more than one
OSD from single HOST but first trying to get only one OSD per HOST if there
are enough HOSTs in the cluster?
step emit
}

ceph version 10.2.9 (jewel)

# INITIAL CLUSTER STATE
root@host01:~# ceph osd tree
ID  WEIGHTTYPE NAMEUP/DOWN REWEIGHT
PRIMARY-AFFINITY
 -1 218.18401 root default

 -6 218.18401 region MyRegion

 -5 218.18401 datacenter MyDC

 -4 218.18401 room MyRoom

 -3  43.63699 rack Rack01

 -2  43.63699 host host01

  0   3.63599 osd.0 up  1.0
 1.0
  3   3.63599 osd.3 up  1.0
 1.0
  4   3.63599 osd.4 up  1.0
 1.0
  6   3.63599 osd.6 up  1.0
 1.0
  8   3.63599 osd.8 up  1.0
 1.0
 10   3.63599 osd.10up  1.0
 1.0
 12   3.63599 osd.12up  1.0
 1.0
 14   3.63599 osd.14up  1.0
 1.0
 16   3.63599 osd.16up  1.0
 1.0
 19   3.63599 osd.19up  1.0
 1.0
 22   3.63599 osd.22up  1.0
 1.0
 25   3.63599 osd.25up  1.0
 1.0
 -8  43.63699 rack Rack02

 -7  43.63699 host host02

  1   3.63599 osd.1 up  1.0
 1.0
  2   3.63599 osd.2 up  1.0
 1.0
  5   3.63599 osd.5 up  1.0
 1.0
  7   3.63599 osd.7 up  1.0
 1.0
  9   3.63599 osd.9 up  1.0
 1.0
 11   3.63599 osd.11up  1.0
 1.0
 13   3.63599 osd.13up  1.0
 1.0
 15   3.63599 osd.15up  1.0
 1.0
 17   3.63599 osd.17up  1.0
 1.0
 20   3.63599 osd.20up  1.0
 1.0
 23   3.63599 osd.23up  1.0
 1.0
 26   3.63599 osd.26up  1.0
 1.0
-10 130.91000 rack Rack03

 -9  43.63699 host host03

 18   3.63599 osd.18up  1.0
 1.0
 21   3.63599 osd.21up  1.0
 1.0
 24   3.63599 osd.24up  1.0
 1.0
 27   3.63599 osd.27up  1.0
 1.0
 28   3.63599 osd.28up  1.0
 1.0
 29   3.63599 osd.29up  1.0
 1.0
 30   3.63599 osd.30up  1.0
 1.0
 31   3.63599 osd.31up  1.0
 1.0
 32   3.63599 osd.32up  1.0
 1.0
 33   3.63599 osd.33up  1.0
 1.0
 34   3.63599 osd.34up  1.0
 1.0
 35   3.63599 osd.35up  1.0
 1.0
-11  43.63699 host host04

 36   3.63599 osd.36up  1.0
 1.0
 37   3.63599 osd.37up  1.0
 1.0
 38   3.63599 osd.38up  1.0
 1.0
 39   3.63599 osd.39up  1.0
 1.0
 40   3.63599 osd.40up  1.0
 1.0
 41   3.63599 osd.41up  1.0
 1.0
 42   3.63599 osd.42up  1.0
 1.0
 43   3.63599 osd.43up  1.0
 1.0
 44   3.63599 osd.44up  1.0
 1.0
 45   3.63599 osd.45up  1.0

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Andreas Calminder

Thanks!
I'll start looking into rebuilding my roles once 12.2.2 is out then.

On 28 November 2017 at 13:37, Alfredo Deza  wrote:
> On Tue, Nov 28, 2017 at 7:22 AM, Andreas Calminder
>  wrote:
>>> For the `simple` sub-command there is no prepare/activate, it is just
>>> a way of taking over management of an already deployed OSD. For *new*
>>> OSDs, yes, we are implying that we are going only with Logical Volumes
>>> for data devices. It is a bit more flexible for Journals, block.db,
>>> and block.wal as those
>>> can be either logical volumes or GPT partitions (ceph-volume will not
>>> create these for you).
>>
>> Ok, so if I understand this correctly, for future one-device-per-osd
>> setups I would create a volume group per device before handing it over
>> to ceph-volume, to get the "same" functionality as ceph-disk. I
>> understand the flexibility aspect of this, my setup will have an extra
>> step setting up lvm for my osd devices which is fine.
>
> If you don't require any special configuration for your logical volume
> and don't mind a naive LV handling, then ceph-volume can create
> the logical volume for you from either a partition or a device (for
> data), although it will still require a GPT partition for Journals,
> block.wal, and block.db
>
> For example:
>
> ceph-volume lvm create --data /path/to/device
>
> Would create a new volume group with the device and then produce a
> single LV from it.
>
>> Apologies if I
>> missed the information, but is it possible to get command output as
>> json, something like "ceph-disk list --format json" since it's quite
>> helpful while setting up stuff through ansible
>
> Yes, this is implemented in both "pretty" and JSON formats:
> http://docs.ceph.com/docs/master/ceph-volume/lvm/list/#ceph-volume-lvm-list
>>
>> Thanks,
>> Andreas
>>
>> On 28 November 2017 at 12:47, Alfredo Deza  wrote:
>>> On Tue, Nov 28, 2017 at 1:56 AM, Andreas Calminder
>>>  wrote:
 Hello,
 Thanks for the heads-up. As someone who's currently maintaining a
 Jewel cluster and are in the process of setting up a shiny new
 Luminous cluster and writing Ansible roles in the process to make
 setup reproducible. I immediately proceeded to look into ceph-volume
 and I've some questions/concerns, mainly due to my own setup, which is
 one osd per device, simple.

 Running ceph-volume in Luminous 12.2.1 suggests there's only the lvm
 subcommand available and the man-page only covers lvm. The online
 documentation http://docs.ceph.com/docs/master/ceph-volume/ lists
 simple however it's lacking some of the ceph-disk commands, like
 'prepare' which seems crucial in the 'simple' scenario. Does the
 ceph-disk deprecation imply that lvm is mandatory for using devices
 with ceph or is just the documentation and tool features lagging
 behind, I.E the 'simple' parts will be added well in time for Mimic
 and during the Luminous lifecycle? Or am I missing something?
>>>
>>> In your case, all your existing OSDs will be able to be managed by
>>> `ceph-volume` once scanned and the information persisted. So anything
>>> from Jewel should still work. For 12.2.1 you are right, that command
>>> is not yet available, it will be present in 12.2.2
>>>
>>> For the `simple` sub-command there is no prepare/activate, it is just
>>> a way of taking over management of an already deployed OSD. For *new*
>>> OSDs, yes, we are implying that we are going only with Logical Volumes
>>> for data devices. It is a bit more flexible for Journals, block.db,
>>> and block.wal as those
>>> can be either logical volumes or GPT partitions (ceph-volume will not
>>> create these for you).
>>>

 Best regards,
 Andreas

 On 27 November 2017 at 14:36, Alfredo Deza  wrote:
> For the upcoming Luminous release (12.2.2), ceph-disk will be
> officially in 'deprecated' mode (bug fixes only). A large banner with
> deprecation information has been added, which will try to raise
> awareness.
>
> We are strongly suggesting using ceph-volume for new (and old) OSD
> deployments. The only current exceptions to this are encrypted OSDs
> and FreeBSD systems
>
> Encryption support is planned and will be coming soon to ceph-volume.
>
> A few items to consider:
>
> * ceph-disk is expected to be fully removed by the Mimic release
> * Existing OSDs are supported by ceph-volume. They can be "taken over" [0]
> * ceph-ansible already fully supports ceph-volume and will soon default 
> to it
> * ceph-deploy support is planned and should be fully implemented soon
>
>
> [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More

Re: [ceph-users] S3 object notifications

2017-11-28 Thread Yehuda Sadeh-Weinraub

rgw has a sync modules framework that allows you to write your own
sync plugins. The system identifies objects changes and triggers
callbacks that can then act on those changes. For example, the
metadata search feature that was added recently is using this to send
objects metadata into elasticsearch for indexing.

Yehuda

On Tue, Nov 28, 2017 at 2:22 PM, Sean Purdy  wrote:
> Hi,
>
>
> http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object 
> notifications are not supported.  I'd like something like object 
> notifications so that we can backup new objects in realtime, instead of 
> trawling the whole object list for what's changed.
>
> Is there anything similar I can use?  I've found Spreadshirt's haproxy fork 
> which traps requests and updates redis - 
> https://github.com/spreadshirt/s3gw-haproxy  Anybody used that?
>
>
> Thanks,
>
> Sean Purdy
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Alfredo Deza

On Tue, Nov 28, 2017 at 7:38 AM, Joao Eduardo Luis  wrote:
> On 11/28/2017 11:54 AM, Alfredo Deza wrote:
>>
>> On Tue, Nov 28, 2017 at 3:12 AM, Wido den Hollander  wrote:
>>>
>>>
 Op 27 november 2017 om 14:36 schreef Alfredo Deza :


 For the upcoming Luminous release (12.2.2), ceph-disk will be
 officially in 'deprecated' mode (bug fixes only). A large banner with
 deprecation information has been added, which will try to raise
 awareness.

>>>
>>> As much as I like ceph-volume and the work being done, is it really a
>>> good idea to use a minor release to deprecate a tool?
>>>
>>> Can't we just introduce ceph-volume and deprecate ceph-disk at the
>>> release of M? Because when you upgrade to 12.2.2 suddenly existing
>>> integrations will have deprecation warnings being thrown at them while they
>>> haven't upgraded to a new major version.
>>
>>
>> ceph-volume has been present since the very first release of Luminous,
>> the deprecation warning in ceph-disk is the only "new" thing
>> introduced for 12.2.2.
>
>
> I think Wido's question still stands: why can't ceph-disk be deprecated
> solely in M, and removed by N?

Like I mentioned, I don't think this is set in stone (yet), but it was
the idea from the beginning (See Oct 9th thread "killing ceph-disk"),
and I don't think it would
be terribly bad to keep ceph-disk in Mimic, but fully frozen, with no
updates or bug fixes. And full removal in N

The deprecation warnings need to stay for Luminous though.

>
> I get that it probably seems nuts to support ceph-disk and ceph-volume; and
> by deprecating and removing in (less than) a full release cycle will force
> people to actually move from one to the other. But we're also doing it when
> roughly 4 months away from Mimic being frozen.
>
> This is the sort of last minute overall, core, changes that are not expected
> from a project that should be as mature as Ceph. This is not some internal
> feature that users won't notice - we're effectively changing the way users
> deploy and orchestrate their clusters.
>
>
>   -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Joao Eduardo Luis


On 11/28/2017 11:54 AM, Alfredo Deza wrote:

On Tue, Nov 28, 2017 at 3:12 AM, Wido den Hollander  wrote:



Op 27 november 2017 om 14:36 schreef Alfredo Deza :


For the upcoming Luminous release (12.2.2), ceph-disk will be
officially in 'deprecated' mode (bug fixes only). A large banner with
deprecation information has been added, which will try to raise
awareness.



As much as I like ceph-volume and the work being done, is it really a good idea 
to use a minor release to deprecate a tool?

Can't we just introduce ceph-volume and deprecate ceph-disk at the release of 
M? Because when you upgrade to 12.2.2 suddenly existing integrations will have 
deprecation warnings being thrown at them while they haven't upgraded to a new 
major version.


ceph-volume has been present since the very first release of Luminous,
the deprecation warning in ceph-disk is the only "new" thing
introduced for 12.2.2.


I think Wido's question still stands: why can't ceph-disk be deprecated 
solely in M, and removed by N?


I get that it probably seems nuts to support ceph-disk and ceph-volume; 
and by deprecating and removing in (less than) a full release cycle will 
force people to actually move from one to the other. But we're also 
doing it when roughly 4 months away from Mimic being frozen.


This is the sort of last minute overall, core, changes that are not 
expected from a project that should be as mature as Ceph. This is not 
some internal feature that users won't notice - we're effectively 
changing the way users deploy and orchestrate their clusters.



  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Alfredo Deza

On Tue, Nov 28, 2017 at 7:22 AM, Andreas Calminder
 wrote:
>> For the `simple` sub-command there is no prepare/activate, it is just
>> a way of taking over management of an already deployed OSD. For *new*
>> OSDs, yes, we are implying that we are going only with Logical Volumes
>> for data devices. It is a bit more flexible for Journals, block.db,
>> and block.wal as those
>> can be either logical volumes or GPT partitions (ceph-volume will not
>> create these for you).
>
> Ok, so if I understand this correctly, for future one-device-per-osd
> setups I would create a volume group per device before handing it over
> to ceph-volume, to get the "same" functionality as ceph-disk. I
> understand the flexibility aspect of this, my setup will have an extra
> step setting up lvm for my osd devices which is fine.

If you don't require any special configuration for your logical volume
and don't mind a naive LV handling, then ceph-volume can create
the logical volume for you from either a partition or a device (for
data), although it will still require a GPT partition for Journals,
block.wal, and block.db

For example:

ceph-volume lvm create --data /path/to/device

Would create a new volume group with the device and then produce a
single LV from it.

> Apologies if I
> missed the information, but is it possible to get command output as
> json, something like "ceph-disk list --format json" since it's quite
> helpful while setting up stuff through ansible

Yes, this is implemented in both "pretty" and JSON formats:
http://docs.ceph.com/docs/master/ceph-volume/lvm/list/#ceph-volume-lvm-list
>
> Thanks,
> Andreas
>
> On 28 November 2017 at 12:47, Alfredo Deza  wrote:
>> On Tue, Nov 28, 2017 at 1:56 AM, Andreas Calminder
>>  wrote:
>>> Hello,
>>> Thanks for the heads-up. As someone who's currently maintaining a
>>> Jewel cluster and are in the process of setting up a shiny new
>>> Luminous cluster and writing Ansible roles in the process to make
>>> setup reproducible. I immediately proceeded to look into ceph-volume
>>> and I've some questions/concerns, mainly due to my own setup, which is
>>> one osd per device, simple.
>>>
>>> Running ceph-volume in Luminous 12.2.1 suggests there's only the lvm
>>> subcommand available and the man-page only covers lvm. The online
>>> documentation http://docs.ceph.com/docs/master/ceph-volume/ lists
>>> simple however it's lacking some of the ceph-disk commands, like
>>> 'prepare' which seems crucial in the 'simple' scenario. Does the
>>> ceph-disk deprecation imply that lvm is mandatory for using devices
>>> with ceph or is just the documentation and tool features lagging
>>> behind, I.E the 'simple' parts will be added well in time for Mimic
>>> and during the Luminous lifecycle? Or am I missing something?
>>
>> In your case, all your existing OSDs will be able to be managed by
>> `ceph-volume` once scanned and the information persisted. So anything
>> from Jewel should still work. For 12.2.1 you are right, that command
>> is not yet available, it will be present in 12.2.2
>>
>> For the `simple` sub-command there is no prepare/activate, it is just
>> a way of taking over management of an already deployed OSD. For *new*
>> OSDs, yes, we are implying that we are going only with Logical Volumes
>> for data devices. It is a bit more flexible for Journals, block.db,
>> and block.wal as those
>> can be either logical volumes or GPT partitions (ceph-volume will not
>> create these for you).
>>
>>>
>>> Best regards,
>>> Andreas
>>>
>>> On 27 November 2017 at 14:36, Alfredo Deza  wrote:
 For the upcoming Luminous release (12.2.2), ceph-disk will be
 officially in 'deprecated' mode (bug fixes only). A large banner with
 deprecation information has been added, which will try to raise
 awareness.

 We are strongly suggesting using ceph-volume for new (and old) OSD
 deployments. The only current exceptions to this are encrypted OSDs
 and FreeBSD systems

 Encryption support is planned and will be coming soon to ceph-volume.

 A few items to consider:

 * ceph-disk is expected to be fully removed by the Mimic release
 * Existing OSDs are supported by ceph-volume. They can be "taken over" [0]
 * ceph-ansible already fully supports ceph-volume and will soon default to 
 it
 * ceph-deploy support is planned and should be fully implemented soon


 [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
 --
 To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Alfredo Deza

On Tue, Nov 28, 2017 at 3:39 AM, Piotr Dałek  wrote:
> On 17-11-28 09:12 AM, Wido den Hollander wrote:
>>
>>
>>> Op 27 november 2017 om 14:36 schreef Alfredo Deza :
>>>
>>>
>>> For the upcoming Luminous release (12.2.2), ceph-disk will be
>>> officially in 'deprecated' mode (bug fixes only). A large banner with
>>> deprecation information has been added, which will try to raise
>>> awareness.
>>>
>>
>> As much as I like ceph-volume and the work being done, is it really a good
>> idea to use a minor release to deprecate a tool?
>>
>> Can't we just introduce ceph-volume and deprecate ceph-disk at the release
>> of M? Because when you upgrade to 12.2.2 suddenly existing integrations will
>> have deprecation warnings being thrown at them while they haven't upgraded
>> to a new major version.
>>
>> As ceph-deploy doesn't support ceph-disk either I don't think it's a good
>> idea to deprecate it right now.
>>
>> How do others feel about this?
>
>
> Same, although we don't have a *big* problem with this (we haven't upgraded
> to Luminous yet, so we can skip to next point release and move to
> ceph-volume together with Luminous). It's still a problem, though - now we
> have more of our infrastructure to migrate and test, meaning even more
> delays in production upgrades.

I understand that this would involve a significant effort to fully
port over and drop ceph-disk entirely, and I don't think that dropping
ceph-disk in Mimic is set in stone (yet).

We could treat Luminous as a "soft" deprecation where ceph-disk will
still receive bug-fixes, and then in Mimic, it would be frozen - with
no updates whatsoever.

At some point a migration will have to happen for older clusters,
which is why we've added support in ceph-volume for existing OSDs. An
upgrade to Luminous doesn't mean ceph-disk
will not work, the only thing that has been added to ceph-disk is a
deprecation warning.


>
> --
> Piotr Dałek
> piotr.da...@corp.ovh.com
> https://www.ovh.com/us/
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Wido den Hollander


> Op 28 november 2017 om 12:54 schreef Alfredo Deza :
> 
> 
> On Tue, Nov 28, 2017 at 3:12 AM, Wido den Hollander  wrote:
> >
> >> Op 27 november 2017 om 14:36 schreef Alfredo Deza :
> >>
> >>
> >> For the upcoming Luminous release (12.2.2), ceph-disk will be
> >> officially in 'deprecated' mode (bug fixes only). A large banner with
> >> deprecation information has been added, which will try to raise
> >> awareness.
> >>
> >
> > As much as I like ceph-volume and the work being done, is it really a good 
> > idea to use a minor release to deprecate a tool?
> >
> > Can't we just introduce ceph-volume and deprecate ceph-disk at the release 
> > of M? Because when you upgrade to 12.2.2 suddenly existing integrations 
> > will have deprecation warnings being thrown at them while they haven't 
> > upgraded to a new major version.
> 
> ceph-volume has been present since the very first release of Luminous,
> the deprecation warning in ceph-disk is the only "new" thing
> introduced for 12.2.2.
> 

Yes, but deprecating a functional tool in a minor release? Yes, I am aware that 
ceph-volume works, but suddenly during a release saying it's now deprecated?

Why can't that be moved to the M release? Leave ceph-disk as-is and deprecate 
it in master.

Again, I really do like ceph-volume! Great work!

Wido

> >
> > As ceph-deploy doesn't support ceph-disk either I don't think it's a good 
> > idea to deprecate it right now.
> 
> ceph-deploy work is being done to support ceph-volume exclusively
> (ceph-disk support is dropped fully), which will mean a change in its
> API in a non-backwards compatible
> way. A major version change in ceph-deploy, documentation, and a bunch
> of documentation is being worked on to allow users to transition to
> it.
> 
> >
> > How do others feel about this?
> >
> > Wido
> >
> >> We are strongly suggesting using ceph-volume for new (and old) OSD
> >> deployments. The only current exceptions to this are encrypted OSDs
> >> and FreeBSD systems
> >>
> >> Encryption support is planned and will be coming soon to ceph-volume.
> >>
> >> A few items to consider:
> >>
> >> * ceph-disk is expected to be fully removed by the Mimic release
> >> * Existing OSDs are supported by ceph-volume. They can be "taken over" [0]
> >> * ceph-ansible already fully supports ceph-volume and will soon default to 
> >> it
> >> * ceph-deploy support is planned and should be fully implemented soon
> >>
> >>
> >> [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] S3 object notifications

2017-11-28 Thread Sean Purdy

Hi,


http://docs.ceph.com/docs/master/radosgw/s3/ says that S3 object notifications 
are not supported.  I'd like something like object notifications so that we can 
backup new objects in realtime, instead of trawling the whole object list for 
what's changed.

Is there anything similar I can use?  I've found Spreadshirt's haproxy fork 
which traps requests and updates redis - 
https://github.com/spreadshirt/s3gw-haproxy  Anybody used that?


Thanks,

Sean Purdy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Andreas Calminder

> For the `simple` sub-command there is no prepare/activate, it is just
> a way of taking over management of an already deployed OSD. For *new*
> OSDs, yes, we are implying that we are going only with Logical Volumes
> for data devices. It is a bit more flexible for Journals, block.db,
> and block.wal as those
> can be either logical volumes or GPT partitions (ceph-volume will not
> create these for you).

Ok, so if I understand this correctly, for future one-device-per-osd
setups I would create a volume group per device before handing it over
to ceph-volume, to get the "same" functionality as ceph-disk. I
understand the flexibility aspect of this, my setup will have an extra
step setting up lvm for my osd devices which is fine. Apologies if I
missed the information, but is it possible to get command output as
json, something like "ceph-disk list --format json" since it's quite
helpful while setting up stuff through ansible

Thanks,
Andreas

On 28 November 2017 at 12:47, Alfredo Deza  wrote:
> On Tue, Nov 28, 2017 at 1:56 AM, Andreas Calminder
>  wrote:
>> Hello,
>> Thanks for the heads-up. As someone who's currently maintaining a
>> Jewel cluster and are in the process of setting up a shiny new
>> Luminous cluster and writing Ansible roles in the process to make
>> setup reproducible. I immediately proceeded to look into ceph-volume
>> and I've some questions/concerns, mainly due to my own setup, which is
>> one osd per device, simple.
>>
>> Running ceph-volume in Luminous 12.2.1 suggests there's only the lvm
>> subcommand available and the man-page only covers lvm. The online
>> documentation http://docs.ceph.com/docs/master/ceph-volume/ lists
>> simple however it's lacking some of the ceph-disk commands, like
>> 'prepare' which seems crucial in the 'simple' scenario. Does the
>> ceph-disk deprecation imply that lvm is mandatory for using devices
>> with ceph or is just the documentation and tool features lagging
>> behind, I.E the 'simple' parts will be added well in time for Mimic
>> and during the Luminous lifecycle? Or am I missing something?
>
> In your case, all your existing OSDs will be able to be managed by
> `ceph-volume` once scanned and the information persisted. So anything
> from Jewel should still work. For 12.2.1 you are right, that command
> is not yet available, it will be present in 12.2.2
>
> For the `simple` sub-command there is no prepare/activate, it is just
> a way of taking over management of an already deployed OSD. For *new*
> OSDs, yes, we are implying that we are going only with Logical Volumes
> for data devices. It is a bit more flexible for Journals, block.db,
> and block.wal as those
> can be either logical volumes or GPT partitions (ceph-volume will not
> create these for you).
>
>>
>> Best regards,
>> Andreas
>>
>> On 27 November 2017 at 14:36, Alfredo Deza  wrote:
>>> For the upcoming Luminous release (12.2.2), ceph-disk will be
>>> officially in 'deprecated' mode (bug fixes only). A large banner with
>>> deprecation information has been added, which will try to raise
>>> awareness.
>>>
>>> We are strongly suggesting using ceph-volume for new (and old) OSD
>>> deployments. The only current exceptions to this are encrypted OSDs
>>> and FreeBSD systems
>>>
>>> Encryption support is planned and will be coming soon to ceph-volume.
>>>
>>> A few items to consider:
>>>
>>> * ceph-disk is expected to be fully removed by the Mimic release
>>> * Existing OSDs are supported by ceph-volume. They can be "taken over" [0]
>>> * ceph-ansible already fully supports ceph-volume and will soon default to 
>>> it
>>> * ceph-deploy support is planned and should be fully implemented soon
>>>
>>>
>>> [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Maged Mokhtar

I tend to agree with Wido. May of us still reply on ceph-disk and hope
to see it live a little longer. 

Maged 

On 2017-11-28 13:54, Alfredo Deza wrote:

> On Tue, Nov 28, 2017 at 3:12 AM, Wido den Hollander  wrote: 
> Op 27 november 2017 om 14:36 schreef Alfredo Deza :
> 
> For the upcoming Luminous release (12.2.2), ceph-disk will be
> officially in 'deprecated' mode (bug fixes only). A large banner with
> deprecation information has been added, which will try to raise
> awareness.
> 
> As much as I like ceph-volume and the work being done, is it really a good 
> idea to use a minor release to deprecate a tool?
> 
> Can't we just introduce ceph-volume and deprecate ceph-disk at the release of 
> M? Because when you upgrade to 12.2.2 suddenly existing integrations will 
> have deprecation warnings being thrown at them while they haven't upgraded to 
> a new major version.

ceph-volume has been present since the very first release of Luminous,
the deprecation warning in ceph-disk is the only "new" thing
introduced for 12.2.2.

> As ceph-deploy doesn't support ceph-disk either I don't think it's a good 
> idea to deprecate it right now.

ceph-deploy work is being done to support ceph-volume exclusively
(ceph-disk support is dropped fully), which will mean a change in its
API in a non-backwards compatible
way. A major version change in ceph-deploy, documentation, and a bunch
of documentation is being worked on to allow users to transition to
it.

> How do others feel about this?
> 
> Wido
> 
>> We are strongly suggesting using ceph-volume for new (and old) OSD
>> deployments. The only current exceptions to this are encrypted OSDs
>> and FreeBSD systems
>> 
>> Encryption support is planned and will be coming soon to ceph-volume.
>> 
>> A few items to consider:
>> 
>> * ceph-disk is expected to be fully removed by the Mimic release
>> * Existing OSDs are supported by ceph-volume. They can be "taken over" [0 
>> [1]]
>> * ceph-ansible already fully supports ceph-volume and will soon default to it
>> * ceph-deploy support is planned and should be fully implemented soon
>> 
>> [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

 

Links:
--
[1] http://docs.ceph.com/docs/master/ceph-volume/simple/___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Alfredo Deza

On Tue, Nov 28, 2017 at 3:12 AM, Wido den Hollander  wrote:
>
>> Op 27 november 2017 om 14:36 schreef Alfredo Deza :
>>
>>
>> For the upcoming Luminous release (12.2.2), ceph-disk will be
>> officially in 'deprecated' mode (bug fixes only). A large banner with
>> deprecation information has been added, which will try to raise
>> awareness.
>>
>
> As much as I like ceph-volume and the work being done, is it really a good 
> idea to use a minor release to deprecate a tool?
>
> Can't we just introduce ceph-volume and deprecate ceph-disk at the release of 
> M? Because when you upgrade to 12.2.2 suddenly existing integrations will 
> have deprecation warnings being thrown at them while they haven't upgraded to 
> a new major version.

ceph-volume has been present since the very first release of Luminous,
the deprecation warning in ceph-disk is the only "new" thing
introduced for 12.2.2.

>
> As ceph-deploy doesn't support ceph-disk either I don't think it's a good 
> idea to deprecate it right now.

ceph-deploy work is being done to support ceph-volume exclusively
(ceph-disk support is dropped fully), which will mean a change in its
API in a non-backwards compatible
way. A major version change in ceph-deploy, documentation, and a bunch
of documentation is being worked on to allow users to transition to
it.

>
> How do others feel about this?
>
> Wido
>
>> We are strongly suggesting using ceph-volume for new (and old) OSD
>> deployments. The only current exceptions to this are encrypted OSDs
>> and FreeBSD systems
>>
>> Encryption support is planned and will be coming soon to ceph-volume.
>>
>> A few items to consider:
>>
>> * ceph-disk is expected to be fully removed by the Mimic release
>> * Existing OSDs are supported by ceph-volume. They can be "taken over" [0]
>> * ceph-ansible already fully supports ceph-volume and will soon default to it
>> * ceph-deploy support is planned and should be fully implemented soon
>>
>>
>> [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Alfredo Deza

On Tue, Nov 28, 2017 at 1:56 AM, Andreas Calminder
 wrote:
> Hello,
> Thanks for the heads-up. As someone who's currently maintaining a
> Jewel cluster and are in the process of setting up a shiny new
> Luminous cluster and writing Ansible roles in the process to make
> setup reproducible. I immediately proceeded to look into ceph-volume
> and I've some questions/concerns, mainly due to my own setup, which is
> one osd per device, simple.
>
> Running ceph-volume in Luminous 12.2.1 suggests there's only the lvm
> subcommand available and the man-page only covers lvm. The online
> documentation http://docs.ceph.com/docs/master/ceph-volume/ lists
> simple however it's lacking some of the ceph-disk commands, like
> 'prepare' which seems crucial in the 'simple' scenario. Does the
> ceph-disk deprecation imply that lvm is mandatory for using devices
> with ceph or is just the documentation and tool features lagging
> behind, I.E the 'simple' parts will be added well in time for Mimic
> and during the Luminous lifecycle? Or am I missing something?

In your case, all your existing OSDs will be able to be managed by
`ceph-volume` once scanned and the information persisted. So anything
from Jewel should still work. For 12.2.1 you are right, that command
is not yet available, it will be present in 12.2.2

For the `simple` sub-command there is no prepare/activate, it is just
a way of taking over management of an already deployed OSD. For *new*
OSDs, yes, we are implying that we are going only with Logical Volumes
for data devices. It is a bit more flexible for Journals, block.db,
and block.wal as those
can be either logical volumes or GPT partitions (ceph-volume will not
create these for you).

>
> Best regards,
> Andreas
>
> On 27 November 2017 at 14:36, Alfredo Deza  wrote:
>> For the upcoming Luminous release (12.2.2), ceph-disk will be
>> officially in 'deprecated' mode (bug fixes only). A large banner with
>> deprecation information has been added, which will try to raise
>> awareness.
>>
>> We are strongly suggesting using ceph-volume for new (and old) OSD
>> deployments. The only current exceptions to this are encrypted OSDs
>> and FreeBSD systems
>>
>> Encryption support is planned and will be coming soon to ceph-volume.
>>
>> A few items to consider:
>>
>> * ceph-disk is expected to be fully removed by the Mimic release
>> * Existing OSDs are supported by ceph-volume. They can be "taken over" [0]
>> * ceph-ansible already fully supports ceph-volume and will soon default to it
>> * ceph-deploy support is planned and should be fully implemented soon
>>
>>
>> [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "failed to open ino"

2017-11-28 Thread Jens-U. Mozdzen


Hi,

Zitat von "Yan, Zheng" :

On Sat, Nov 25, 2017 at 2:27 AM, Jens-U. Mozdzen  wrote:

Hi all,
[...]
In the log of the active MDS, we currently see the following two inodes
reported over and over again, about every 30 seconds:

--- cut here ---
2017-11-24 18:24:16.496397 7fa308cf0700  0 mds.0.cache  failed to open ino
0x10001e45e1d err -22/0
2017-11-24 18:24:16.497037 7fa308cf0700  0 mds.0.cache  failed to open ino
0x10001e4d6a1 err -22/-22
[...]
--- cut here ---

There were other reported inodes with other errors, too ("err -5/0", for
instance), the root cause seems to be the same (see below).

[...]
It's likely caused by NFS export.  MDS reveals this error message if
NFS client tries to access a deleted file. The error causes NFS client
to return -ESTALE.


you were right on the spot - a process remained active after the test  
runs and had that directory as its current working directory. Stopping  
the process stops the messages.


Thank you for pointing me there!

Best regards,
Jens

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph all-nvme mysql performance tuning

2017-11-28 Thread Luis Periquito

There are a few things I don't like about your machines... If you want
latency/IOPS (as you seemingly do) you really want the highest frequency
CPUs, even over number of cores. These are not too bad, but not great
either.

Also you have 2x CPU meaning NUMA. Have you pinned OSDs to NUMA nodes?
Ideally OSD is pinned to same NUMA node the NVMe device is connected to.
Each NVMe device will be running on PCIe lanes generated by one of the
CPUs...

What versions of TCMalloc (or jemalloc) are you running? Have you tuned
them to have a bigger cache?

These are from what I've learned using filestore - I've yet to run full
tests on bluestore - but they should still apply...

On Mon, Nov 27, 2017 at 5:10 PM, German Anders  wrote:

> Hi Nick,
>
> yeah, we are using the same nvme disk with an additional partition to use
> as journal/wal. We double check the c-state and it was not configure to use
> c1, so we change that on all the osd nodes and mon nodes and we're going to
> make some new tests, and see how it goes. I'll get back as soon as get got
> those tests running.
>
> Thanks a lot,
>
> Best,
>
>
> *German*
>
> 2017-11-27 12:16 GMT-03:00 Nick Fisk :
>
>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
>> Of *German Anders
>> *Sent:* 27 November 2017 14:44
>> *To:* Maged Mokhtar 
>> *Cc:* ceph-users 
>> *Subject:* Re: [ceph-users] ceph all-nvme mysql performance tuning
>>
>>
>>
>> Hi Maged,
>>
>>
>>
>> Thanks a lot for the response. We try with different number of threads
>> and we're getting almost the same kind of difference between the storage
>> types. Going to try with different rbd stripe size, object size values and
>> see if we get more competitive numbers. Will get back with more tests and
>> param changes to see if we get better :)
>>
>>
>>
>>
>>
>> Just to echo a couple of comments. Ceph will always struggle to match the
>> performance of a traditional array for mainly 2 reasons.
>>
>>
>>
>>1. You are replacing some sort of dual ported SAS or internally RDMA
>>connected device with a network for Ceph replication traffic. This will
>>instantly have a large impact on write latency
>>2. Ceph locks at the PG level and a PG will most likely cover at
>>least one 4MB object, so lots of small accesses to the same blocks (on a
>>block device) will wait on each other and go effectively at a single
>>threaded rate.
>>
>>
>>
>> The best thing you can do to mitigate these, is to run the fastest
>> journal/WAL devices you can, fastest network connections (ie 25Gb/s) and
>> run your CPU’s at max C and P states.
>>
>>
>>
>> You stated that you are running the performance profile on the CPU’s.
>> Could you also just double check that the C-states are being held at C1(e)?
>> There are a few utilities that can show this in realtime.
>>
>>
>>
>> Other than that, although there could be some minor tweaks, you are
>> probably nearing the limit of what you can hope to achieve.
>>
>>
>>
>> Nick
>>
>>
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Best,
>>
>>
>> *German*
>>
>>
>>
>> 2017-11-27 11:36 GMT-03:00 Maged Mokhtar :
>>
>> On 2017-11-27 15:02, German Anders wrote:
>>
>> Hi All,
>>
>>
>>
>> I've a performance question, we recently install a brand new Ceph cluster
>> with all-nvme disks, using ceph version 12.2.0 with bluestore configured.
>> The back-end of the cluster is using a bond IPoIB (active/passive) , and
>> for the front-end we are using a bonding config with active/active (20GbE)
>> to communicate with the clients.
>>
>>
>>
>> The cluster configuration is the following:
>>
>>
>>
>> *MON Nodes:*
>>
>> OS: Ubuntu 16.04.3 LTS | kernel 4.12.14
>>
>> 3x 1U servers:
>>
>>   2x Intel Xeon E5-2630v4 @2.2Ghz
>>
>>   128G RAM
>>
>>   2x Intel SSD DC S3520 150G (in RAID-1 for OS)
>>
>>   2x 82599ES 10-Gigabit SFI/SFP+ Network Connection
>>
>>
>>
>> *OSD Nodes:*
>>
>> OS: Ubuntu 16.04.3 LTS | kernel 4.12.14
>>
>> 4x 2U servers:
>>
>>   2x Intel Xeon E5-2640v4 @2.4Ghz
>>
>>   128G RAM
>>
>>   2x Intel SSD DC S3520 150G (in RAID-1 for OS)
>>
>>   1x Ethernet Controller 10G X550T
>>
>>   1x 82599ES 10-Gigabit SFI/SFP+ Network Connection
>>
>>   12x Intel SSD DC P3520 1.2T (NVMe) for OSD daemons
>>
>>   1x Mellanox ConnectX-3 InfiniBand FDR 56Gb/s Adapter (dual port)
>>
>>
>>
>>
>>
>> Here's the tree:
>>
>>
>>
>> ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
>>
>> -7   48.0 root root
>>
>> -5   24.0 rack rack1
>>
>> -1   12.0 node cpn01
>>
>>  0  nvme  1.0 osd.0  up  1.0 1.0
>>
>>  1  nvme  1.0 osd.1  up  1.0 1.0
>>
>>  2  nvme  1.0 osd.2  up  1.0 1.0
>>
>>  3  nvme  1.0 osd.3  up  1.0 1.0
>>
>>  4  nvme  1.0 osd.4  up  1.0 1.0
>>
>>  5  nvme  1.0 osd.5  up  1.0 1.0

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Piotr Dałek


On 17-11-28 09:12 AM, Wido den Hollander wrote:



Op 27 november 2017 om 14:36 schreef Alfredo Deza :


For the upcoming Luminous release (12.2.2), ceph-disk will be
officially in 'deprecated' mode (bug fixes only). A large banner with
deprecation information has been added, which will try to raise
awareness.



As much as I like ceph-volume and the work being done, is it really a good idea 
to use a minor release to deprecate a tool?

Can't we just introduce ceph-volume and deprecate ceph-disk at the release of 
M? Because when you upgrade to 12.2.2 suddenly existing integrations will have 
deprecation warnings being thrown at them while they haven't upgraded to a new 
major version.

As ceph-deploy doesn't support ceph-disk either I don't think it's a good idea 
to deprecate it right now.

How do others feel about this?


Same, although we don't have a *big* problem with this (we haven't upgraded 
to Luminous yet, so we can skip to next point release and move to 
ceph-volume together with Luminous). It's still a problem, though - now we 
have more of our infrastructure to migrate and test, meaning even more 
delays in production upgrades.


--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-28 Thread Wido den Hollander


> Op 27 november 2017 om 14:36 schreef Alfredo Deza :
> 
> 
> For the upcoming Luminous release (12.2.2), ceph-disk will be
> officially in 'deprecated' mode (bug fixes only). A large banner with
> deprecation information has been added, which will try to raise
> awareness.
> 

As much as I like ceph-volume and the work being done, is it really a good idea 
to use a minor release to deprecate a tool?

Can't we just introduce ceph-volume and deprecate ceph-disk at the release of 
M? Because when you upgrade to 12.2.2 suddenly existing integrations will have 
deprecation warnings being thrown at them while they haven't upgraded to a new 
major version.

As ceph-deploy doesn't support ceph-disk either I don't think it's a good idea 
to deprecate it right now.

How do others feel about this?

Wido

> We are strongly suggesting using ceph-volume for new (and old) OSD
> deployments. The only current exceptions to this are encrypted OSDs
> and FreeBSD systems
> 
> Encryption support is planned and will be coming soon to ceph-volume.
> 
> A few items to consider:
> 
> * ceph-disk is expected to be fully removed by the Mimic release
> * Existing OSDs are supported by ceph-volume. They can be "taken over" [0]
> * ceph-ansible already fully supports ceph-volume and will soon default to it
> * ceph-deploy support is planned and should be fully implemented soon
> 
> 
> [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

43 matches

Mail list logo