Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Up Safe
Hi,

can you be a bit more specific?
I need to understand whether this is doable at all.
Other options would be using ganesha, but I understand it's very limited on
NFS;
or start looking at gluster.

Basically, I need the multi site option, i.e. active-active read-write.

Thanks

On Wed, May 16, 2018 at 5:50 PM, David Turner  wrote:

> Object storage multi-site is very specific to using object storage.  It
> uses the RGW API's to sync s3 uploads between each site.  For CephFS you
> might be able to do a sync of the rados pools, but I don't think that's
> actually a thing yet.  RBD mirror is also a layer on top of things to sync
> between sites.  Basically I think you need to do something on top of the
> Filesystem as opposed to within Ceph  to sync it between sites.
>
> On Wed, May 16, 2018 at 9:51 AM Up Safe  wrote:
>
>> But this is not the question here.
>> The question is whether I can configure multi site for CephFS.
>> Will I be able to do so by following the guide to set up the multi site
>> for object storage?
>>
>> Thanks
>>
>> On Wed, May 16, 2018, 16:45 John Hearns  wrote:
>>
>>> The answer given at the seminar yesterday was that a practical limit was
>>> around 60km.
>>> I don't think 100km is that much longer.  I defer to the experts here.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 16 May 2018 at 15:24, Up Safe  wrote:
>>>
 Hi,

 About a 100 km.
 I have a 2-4ms latency between them.

 Leon

 On Wed, May 16, 2018, 16:13 John Hearns  wrote:

> Leon,
> I was at a Lenovo/SuSE seminar yesterday and asked a similar question
> regarding separated sites.
> How far apart are these two geographical locations?   It does matter.
>
> On 16 May 2018 at 15:07, Up Safe  wrote:
>
>> Hi,
>>
>> I'm trying to build a multi site setup.
>> But the only guides I've found on the net were about building it with
>> object storage or rbd.
>> What I need is cephfs.
>>
>> I.e. I need to have 2 synced file storages at 2 geographical
>> locations.
>> Is this possible?
>>
>> Also, if I understand correctly - cephfs is just a component on top
>> of the object storage.
>> Following this logic - it should be possible, right?
>>
>> Or am I totally off here?
>>
>> Thanks,
>> Leon
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Adrian Saul

We run CephFS in a limited fashion in a stretched cluster of about 40km with 
redundant 10G fibre between sites – link latency is in the order of 1-2ms.  
Performance is reasonable for our usage but is noticeably slower than 
comparable local ceph based RBD shares.

Essentially we just setup the ceph pools behind cephFS to have replicas on each 
site.  To export it we are simply using Linux kernel NFS and it gets exported 
from 4 hosts that act as CephFS clients.  Those 4 hosts are then setup in an 
DNS record that resolves to all 4 IPs, and we then use automount to do 
automatic mounting and host failover on the NFS clients.  Automount takes care 
of finding the quickest and available NFS server.

I stress this is a limited setup that we use for some fairly light duty, but we 
are looking to move things like user home directories onto this.  YMMV.


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Up Safe
Sent: Monday, 21 May 2018 5:36 PM
To: David Turner 
Cc: ceph-users 
Subject: Re: [ceph-users] multi site with cephfs

Hi,
can you be a bit more specific?
I need to understand whether this is doable at all.
Other options would be using ganesha, but I understand it's very limited on NFS;
or start looking at gluster.

Basically, I need the multi site option, i.e. active-active read-write.

Thanks

On Wed, May 16, 2018 at 5:50 PM, David Turner 
mailto:drakonst...@gmail.com>> wrote:
Object storage multi-site is very specific to using object storage.  It uses 
the RGW API's to sync s3 uploads between each site.  For CephFS you might be 
able to do a sync of the rados pools, but I don't think that's actually a thing 
yet.  RBD mirror is also a layer on top of things to sync between sites.  
Basically I think you need to do something on top of the Filesystem as opposed 
to within Ceph  to sync it between sites.

On Wed, May 16, 2018 at 9:51 AM Up Safe 
mailto:upands...@gmail.com>> wrote:
But this is not the question here.
The question is whether I can configure multi site for CephFS.
Will I be able to do so by following the guide to set up the multi site for 
object storage?

Thanks

On Wed, May 16, 2018, 16:45 John Hearns 
mailto:hear...@googlemail.com>> wrote:
The answer given at the seminar yesterday was that a practical limit was around 
60km.
I don't think 100km is that much longer.  I defer to the experts here.






On 16 May 2018 at 15:24, Up Safe 
mailto:upands...@gmail.com>> wrote:
Hi,

About a 100 km.
I have a 2-4ms latency between them.

Leon

On Wed, May 16, 2018, 16:13 John Hearns 
mailto:hear...@googlemail.com>> wrote:
Leon,
I was at a Lenovo/SuSE seminar yesterday and asked a similar question regarding 
separated sites.
How far apart are these two geographical locations?   It does matter.

On 16 May 2018 at 15:07, Up Safe 
mailto:upands...@gmail.com>> wrote:
Hi,
I'm trying to build a multi site setup.
But the only guides I've found on the net were about building it with object 
storage or rbd.
What I need is cephfs.
I.e. I need to have 2 synced file storages at 2 geographical locations.
Is this possible?
Also, if I understand correctly - cephfs is just a component on top of the 
object storage.
Following this logic - it should be possible, right?
Or am I totally off here?
Thanks,
Leon

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Up Safe
ok, thanks.
but it seems to me that having pool replicas spread over sites is a bit too
risky performance wise.
how about ganesha? will it work with cephfs and multi site setup?

I was previously reading about rgw with ganesha and it was full of
limitations.
with cephfs - there is only one and one I can live with.

Will it work?


On Mon, May 21, 2018 at 10:57 AM, Adrian Saul  wrote:

>
>
> We run CephFS in a limited fashion in a stretched cluster of about 40km
> with redundant 10G fibre between sites – link latency is in the order of
> 1-2ms.  Performance is reasonable for our usage but is noticeably slower
> than comparable local ceph based RBD shares.
>
>
>
> Essentially we just setup the ceph pools behind cephFS to have replicas on
> each site.  To export it we are simply using Linux kernel NFS and it gets
> exported from 4 hosts that act as CephFS clients.  Those 4 hosts are then
> setup in an DNS record that resolves to all 4 IPs, and we then use
> automount to do automatic mounting and host failover on the NFS clients.
> Automount takes care of finding the quickest and available NFS server.
>
>
>
> I stress this is a limited setup that we use for some fairly light duty,
> but we are looking to move things like user home directories onto this.
> YMMV.
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Up Safe
> *Sent:* Monday, 21 May 2018 5:36 PM
> *To:* David Turner 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] multi site with cephfs
>
>
>
> Hi,
>
> can you be a bit more specific?
>
> I need to understand whether this is doable at all.
>
> Other options would be using ganesha, but I understand it's very limited
> on NFS;
>
> or start looking at gluster.
>
>
>
> Basically, I need the multi site option, i.e. active-active read-write.
>
>
>
> Thanks
>
>
>
> On Wed, May 16, 2018 at 5:50 PM, David Turner 
> wrote:
>
> Object storage multi-site is very specific to using object storage.  It
> uses the RGW API's to sync s3 uploads between each site.  For CephFS you
> might be able to do a sync of the rados pools, but I don't think that's
> actually a thing yet.  RBD mirror is also a layer on top of things to sync
> between sites.  Basically I think you need to do something on top of the
> Filesystem as opposed to within Ceph  to sync it between sites.
>
>
>
> On Wed, May 16, 2018 at 9:51 AM Up Safe  wrote:
>
> But this is not the question here.
>
> The question is whether I can configure multi site for CephFS.
>
> Will I be able to do so by following the guide to set up the multi site
> for object storage?
>
>
>
> Thanks
>
>
>
> On Wed, May 16, 2018, 16:45 John Hearns  wrote:
>
> The answer given at the seminar yesterday was that a practical limit was
> around 60km.
>
> I don't think 100km is that much longer.  I defer to the experts here.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 16 May 2018 at 15:24, Up Safe  wrote:
>
> Hi,
>
>
>
> About a 100 km.
>
> I have a 2-4ms latency between them.
>
>
>
> Leon
>
>
>
> On Wed, May 16, 2018, 16:13 John Hearns  wrote:
>
> Leon,
>
> I was at a Lenovo/SuSE seminar yesterday and asked a similar question
> regarding separated sites.
>
> How far apart are these two geographical locations?   It does matter.
>
>
>
> On 16 May 2018 at 15:07, Up Safe  wrote:
>
> Hi,
>
> I'm trying to build a multi site setup.
>
> But the only guides I've found on the net were about building it with
> object storage or rbd.
>
> What I need is cephfs.
>
> I.e. I need to have 2 synced file storages at 2 geographical locations.
>
> Is this possible?
>
> Also, if I understand correctly - cephfs is just a component on top of the
> object storage.
>
> Following this logic - it should be possible, right?
>
> Or am I totally off here?
>
> Thanks,
>
> Leon
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> Confidentiality: This email and any attachments are confidential and may
> be subject to copyright, legal or some other professional privilege. They
> are intended solely for the attention and use of the named addressee(s).
> They may only be copied, distributed or disclosed with the consent of the
> copyright owner. If you have received this email by mistake or by breach of
> the confidentiality clause, please notify the sender immediately by return
> email and delete or destroy all copies of the email. Any confidentiality,
> privilege or copyright is not waived

Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

2018-05-21 Thread Thomas Byrne - UKRI STFC
mon_compact_on_start was not changed from default (false). From the logs, it 
looks like the monitor with the excessive resource usage (mon1) was up and 
winning the majority of elections throughout the period of unresponsiveness, 
with other monitors occasionally winning an election without mon1 participating 
(I’m guessing as it failed to respond).

That’s interesting about the false map updates. We had a short networking blip 
(caused by me) on some monitors shortly before the trouble started, which 
caused some monitors to start calling frequent (every few seconds) elections. 
Could this rapid creation of new monmaps have the same effect as updating pool 
settings? Thus causing the monitor to try and clean up in one go, causing the 
observed resource usage and unresponsiveness.

I’ve been bringing in the storage as you described, I’m in the process of 
adding 6PB of new storage to a ~10PB (raw) cluster (with ~8PB raw utilisation), 
so I’m feeling around for the largest backfills we can safely do. I had been 
weighting up storage in steps that take ~5 days to finish, but have been 
starting the next reweight as we get to the tail end of the previous, so not 
giving the mons time to compact their stores. Although it’s far from ideal 
(from a total time to get new storage weighted up), I’ll be letting the mons 
compact between every backfill until I have a better idea of what went on last 
week.

From: David Turner 
Sent: 17 May 2018 18:57
To: Byrne, Thomas (STFC,RAL,SC) 
Cc: Wido den Hollander ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] A question about HEALTH_WARN and monitors holding 
onto cluster maps

Generally they clean up slowly by deleting 30 maps every time the maps update.  
You can speed that up by creating false map updates with something like 
updating a pool setting to what it already is.  What it sounds like happened to 
you is that your mon crashed and restarted.  If it crashed and has the setting 
to compact the mon store on start, then it would cause it to forcibly go 
through and clean everything up in 1 go.

I generally plan my backfilling to not take longer than a week.  Any longer 
than that is pretty rough on the mons.  You can achieve that by bringing in new 
storage with a weight of 0.0 and increase it appropriately as opposed to just 
adding it with it's full weight and having everything move at once.

On Thu, May 17, 2018 at 12:56 PM Thomas Byrne - UKRI STFC 
mailto:tom.by...@stfc.ac.uk>> wrote:
That seems like a sane way to do it, thanks for the clarification Wido.

As a follow-up, do you have any feeling as to whether the trimming a 
particularly intensive task? We just had a fun afternoon where the monitors 
became unresponsive (no ceph status etc) for several hours, seemingly due to 
the leaders monitor process consuming all available ram+swap (64GB+32GB) on 
that monitor. This was then followed by the actual trimming of the stores 
(26GB->11GB), which took a few minutes and happened simultaneously across the 
monitors.

If this is something to be expected, it'll be a good reason to plan our long 
backfills much more carefully in the future!

> -Original Message-
> From: ceph-users 
> mailto:ceph-users-boun...@lists.ceph.com>> 
> On Behalf Of Wido
> den Hollander
> Sent: 17 May 2018 15:40
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] A question about HEALTH_WARN and monitors
> holding onto cluster maps
>
>
>
> On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote:
> > Hi all,
> >
> >
> >
> > As far as I understand, the monitor stores will grow while not
> > HEALTH_OK as they hold onto all cluster maps. Is this true for all
> > HEALTH_WARN reasons? Our cluster recently went into HEALTH_WARN
> due to
> > a few weeks of backfilling onto new hardware pushing the monitors data
> > stores over the default 15GB threshold. Are they now prevented from
> > shrinking till I increase the threshold above their current size?
> >
>
> No, monitors will trim their data store with all PGs are active+clean, not 
> when
> they are HEALTH_OK.
>
> So a 'noout' flag triggers a WARN, but that doesn't prevent the MONs from
> trimming for example.
>
> Wido
>
> >
> >
> > Cheers
> >
> > Tom
> >
> >
> >
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-user

[ceph-users] rgw default user quota for OpenStack users

2018-05-21 Thread Massimo Sgaravatto
I set:

 rgw user default quota max size = 2G

in the ceph configuration file and I see that this works for users created
using the "radosgw-admin user create" command [**]

I see that instead quota is not set for users created through keystone.

This [*] is the relevant part of my ceph configuration file

Any hints ?

Thanks, Massimo


[*]

[global]
rgw user default quota max size = 2G

[client.myhostname]
rgw_frontends="civetweb port=7480"
rgw_zone=cloudtest
rgw_zonegroup=cloudtest
rgw_realm=cloudtest
debug rgw = 5
rgw keystone url = https://fqdn:35357
rgw keystone accepted roles = project_manager, _member_, user, admin, Member
rgw keystone api version = 3
rgw keystone admin token = xyz
rgw keystone token cache size = 0
rgw s3 auth use keystone = true
nss_db_path = /var/ceph/nss


[**]

# radosgw-admin user create --uid=xyz --display-name="xyz"
--rgw-realm=cloudtest


# radosgw-admin user info --uid=xyz --display-name="xyz"
--rgw-realm=cloudtest
...
"user_quota": {
"enabled": true,
"check_on_raw": false,
"max_size": 2147483648,
"max_size_kb": 2097152,
"max_objects": -1
},
...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Up Safe
guys,
please tell me if I'm in the right direction.
If ceph object storage can be set up in multi site configuration,
and I add ganesha (which to my understanding is an "adapter"
that serves s3 objects via nfs to clients) -
won't this work as active-active?


Thanks

On Mon, May 21, 2018 at 11:48 AM, Up Safe  wrote:

> ok, thanks.
> but it seems to me that having pool replicas spread over sites is a bit
> too risky performance wise.
> how about ganesha? will it work with cephfs and multi site setup?
>
> I was previously reading about rgw with ganesha and it was full of
> limitations.
> with cephfs - there is only one and one I can live with.
>
> Will it work?
>
>
> On Mon, May 21, 2018 at 10:57 AM, Adrian Saul <
> adrian.s...@tpgtelecom.com.au> wrote:
>
>>
>>
>> We run CephFS in a limited fashion in a stretched cluster of about 40km
>> with redundant 10G fibre between sites – link latency is in the order of
>> 1-2ms.  Performance is reasonable for our usage but is noticeably slower
>> than comparable local ceph based RBD shares.
>>
>>
>>
>> Essentially we just setup the ceph pools behind cephFS to have replicas
>> on each site.  To export it we are simply using Linux kernel NFS and it
>> gets exported from 4 hosts that act as CephFS clients.  Those 4 hosts are
>> then setup in an DNS record that resolves to all 4 IPs, and we then use
>> automount to do automatic mounting and host failover on the NFS clients.
>> Automount takes care of finding the quickest and available NFS server.
>>
>>
>>
>> I stress this is a limited setup that we use for some fairly light duty,
>> but we are looking to move things like user home directories onto this.
>> YMMV.
>>
>>
>>
>>
>>
>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
>> Of *Up Safe
>> *Sent:* Monday, 21 May 2018 5:36 PM
>> *To:* David Turner 
>> *Cc:* ceph-users 
>> *Subject:* Re: [ceph-users] multi site with cephfs
>>
>>
>>
>> Hi,
>>
>> can you be a bit more specific?
>>
>> I need to understand whether this is doable at all.
>>
>> Other options would be using ganesha, but I understand it's very limited
>> on NFS;
>>
>> or start looking at gluster.
>>
>>
>>
>> Basically, I need the multi site option, i.e. active-active read-write.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Wed, May 16, 2018 at 5:50 PM, David Turner 
>> wrote:
>>
>> Object storage multi-site is very specific to using object storage.  It
>> uses the RGW API's to sync s3 uploads between each site.  For CephFS you
>> might be able to do a sync of the rados pools, but I don't think that's
>> actually a thing yet.  RBD mirror is also a layer on top of things to sync
>> between sites.  Basically I think you need to do something on top of the
>> Filesystem as opposed to within Ceph  to sync it between sites.
>>
>>
>>
>> On Wed, May 16, 2018 at 9:51 AM Up Safe  wrote:
>>
>> But this is not the question here.
>>
>> The question is whether I can configure multi site for CephFS.
>>
>> Will I be able to do so by following the guide to set up the multi site
>> for object storage?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Wed, May 16, 2018, 16:45 John Hearns  wrote:
>>
>> The answer given at the seminar yesterday was that a practical limit was
>> around 60km.
>>
>> I don't think 100km is that much longer.  I defer to the experts here.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 16 May 2018 at 15:24, Up Safe  wrote:
>>
>> Hi,
>>
>>
>>
>> About a 100 km.
>>
>> I have a 2-4ms latency between them.
>>
>>
>>
>> Leon
>>
>>
>>
>> On Wed, May 16, 2018, 16:13 John Hearns  wrote:
>>
>> Leon,
>>
>> I was at a Lenovo/SuSE seminar yesterday and asked a similar question
>> regarding separated sites.
>>
>> How far apart are these two geographical locations?   It does matter.
>>
>>
>>
>> On 16 May 2018 at 15:07, Up Safe  wrote:
>>
>> Hi,
>>
>> I'm trying to build a multi site setup.
>>
>> But the only guides I've found on the net were about building it with
>> object storage or rbd.
>>
>> What I need is cephfs.
>>
>> I.e. I need to have 2 synced file storages at 2 geographical locations.
>>
>> Is this possible?
>>
>> Also, if I understand correctly - cephfs is just a component on top of
>> the object storage.
>>
>> Following this logic - it should be possible, right?
>>
>> Or am I totally off here?
>>
>> Thanks,
>>
>> Leon
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> Confidentia

Re: [ceph-users] multi site with cephfs

2018-05-21 Thread David Turner
Not a lot of people use object storage multi-site.  I doubt anyone is using
this like you are.  In theory it would work, but even if somebody has this
setup running, it's almost impossible to tell if it would work for your
needs and use case.  You really should try it out for yourself to see if it
works to your needs.  And if you feel so inclined, report back here with
how it worked.

If you're asking for advice, why do you need a networked posix filesystem?
Unless you are using proprietary software with this requirement, it's
generally lazy coding that requires a mounted filesystem like this and you
should aim towards using object storage instead without any sort of NFS
layer.  It's a little more work for the developers, but is drastically
simpler to support and manage.

On Mon, May 21, 2018 at 10:06 AM Up Safe  wrote:

> guys,
> please tell me if I'm in the right direction.
> If ceph object storage can be set up in multi site configuration,
> and I add ganesha (which to my understanding is an "adapter"
> that serves s3 objects via nfs to clients) -
> won't this work as active-active?
>
>
> Thanks
>
> On Mon, May 21, 2018 at 11:48 AM, Up Safe  wrote:
>
>> ok, thanks.
>> but it seems to me that having pool replicas spread over sites is a bit
>> too risky performance wise.
>> how about ganesha? will it work with cephfs and multi site setup?
>>
>> I was previously reading about rgw with ganesha and it was full of
>> limitations.
>> with cephfs - there is only one and one I can live with.
>>
>> Will it work?
>>
>>
>> On Mon, May 21, 2018 at 10:57 AM, Adrian Saul <
>> adrian.s...@tpgtelecom.com.au> wrote:
>>
>>>
>>>
>>> We run CephFS in a limited fashion in a stretched cluster of about 40km
>>> with redundant 10G fibre between sites – link latency is in the order of
>>> 1-2ms.  Performance is reasonable for our usage but is noticeably slower
>>> than comparable local ceph based RBD shares.
>>>
>>>
>>>
>>> Essentially we just setup the ceph pools behind cephFS to have replicas
>>> on each site.  To export it we are simply using Linux kernel NFS and it
>>> gets exported from 4 hosts that act as CephFS clients.  Those 4 hosts are
>>> then setup in an DNS record that resolves to all 4 IPs, and we then use
>>> automount to do automatic mounting and host failover on the NFS clients.
>>> Automount takes care of finding the quickest and available NFS server.
>>>
>>>
>>>
>>> I stress this is a limited setup that we use for some fairly light duty,
>>> but we are looking to move things like user home directories onto this.
>>> YMMV.
>>>
>>>
>>>
>>>
>>>
>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>> Behalf Of *Up Safe
>>> *Sent:* Monday, 21 May 2018 5:36 PM
>>> *To:* David Turner 
>>> *Cc:* ceph-users 
>>> *Subject:* Re: [ceph-users] multi site with cephfs
>>>
>>>
>>>
>>> Hi,
>>>
>>> can you be a bit more specific?
>>>
>>> I need to understand whether this is doable at all.
>>>
>>> Other options would be using ganesha, but I understand it's very limited
>>> on NFS;
>>>
>>> or start looking at gluster.
>>>
>>>
>>>
>>> Basically, I need the multi site option, i.e. active-active read-write.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Wed, May 16, 2018 at 5:50 PM, David Turner 
>>> wrote:
>>>
>>> Object storage multi-site is very specific to using object storage.  It
>>> uses the RGW API's to sync s3 uploads between each site.  For CephFS you
>>> might be able to do a sync of the rados pools, but I don't think that's
>>> actually a thing yet.  RBD mirror is also a layer on top of things to sync
>>> between sites.  Basically I think you need to do something on top of the
>>> Filesystem as opposed to within Ceph  to sync it between sites.
>>>
>>>
>>>
>>> On Wed, May 16, 2018 at 9:51 AM Up Safe  wrote:
>>>
>>> But this is not the question here.
>>>
>>> The question is whether I can configure multi site for CephFS.
>>>
>>> Will I be able to do so by following the guide to set up the multi site
>>> for object storage?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Wed, May 16, 2018, 16:45 John Hearns  wrote:
>>>
>>> The answer given at the seminar yesterday was that a practical limit was
>>> around 60km.
>>>
>>> I don't think 100km is that much longer.  I defer to the experts here.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 16 May 2018 at 15:24, Up Safe  wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> About a 100 km.
>>>
>>> I have a 2-4ms latency between them.
>>>
>>>
>>>
>>> Leon
>>>
>>>
>>>
>>> On Wed, May 16, 2018, 16:13 John Hearns  wrote:
>>>
>>> Leon,
>>>
>>> I was at a Lenovo/SuSE seminar yesterday and asked a similar question
>>> regarding separated sites.
>>>
>>> How far apart are these two geographical locations?   It does matter.
>>>
>>>
>>>
>>> On 16 May 2018 at 15:07, Up Safe  wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to build a multi site setup.
>>>
>>> But the only guides I've found on the net were about building it with
>>> object storage or rbd.
>>>
>>> What I need is cephfs.
>>>
>>> I.e. I nee

Re: [ceph-users] rgw default user quota for OpenStack users

2018-05-21 Thread David Turner
Is openstack/keystone maintaining it's own version of the ceph config
file?  I know that's the case with software like Proxmox.  That might be a
good place to start.  You could also look at the keystone code to see if
it's manually specifying things based on an application config file.

On Mon, May 21, 2018 at 9:21 AM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> I set:
>
>  rgw user default quota max size = 2G
>
> in the ceph configuration file and I see that this works for users created
> using the "radosgw-admin user create" command [**]
>
> I see that instead quota is not set for users created through keystone.
>
> This [*] is the relevant part of my ceph configuration file
>
> Any hints ?
>
> Thanks, Massimo
>
>
> [*]
>
> [global]
> rgw user default quota max size = 2G
>
> [client.myhostname]
> rgw_frontends="civetweb port=7480"
> rgw_zone=cloudtest
> rgw_zonegroup=cloudtest
> rgw_realm=cloudtest
> debug rgw = 5
> rgw keystone url = https://fqdn:35357
> rgw keystone accepted roles = project_manager, _member_, user, admin,
> Member
> rgw keystone api version = 3
> rgw keystone admin token = xyz
> rgw keystone token cache size = 0
> rgw s3 auth use keystone = true
> nss_db_path = /var/ceph/nss
>
>
> [**]
>
> # radosgw-admin user create --uid=xyz --display-name="xyz"
> --rgw-realm=cloudtest
>
>
> # radosgw-admin user info --uid=xyz --display-name="xyz"
> --rgw-realm=cloudtest
> ...
> "user_quota": {
> "enabled": true,
> "check_on_raw": false,
> "max_size": 2147483648 <(214)%20748-3648>,
> "max_size_kb": 2097152,
> "max_objects": -1
> },
> ...
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Up Safe
I'll explain.
Right now we have 2 sites (racks) with several dozens of servers at each
accessing a NAS (let's call it a NAS, although it's an IBM v7000 Unified
that serves the files via NFS).

The biggest problem is that it works active-passive, i.e. we always access
one of the storages for read/write
and the other one is replicated once every few hours, so it's more for
backup needs.

In this setup once the power goes down in our main site - we're stuck with
a bit (several hours) outdated files
and we need to remount all of the servers and what not.

The multi site ceph was supposed to solve this problem for us. This way we
would have only local mounts, i.e.
each server would only access the filesystem that is in the same site. And
if one of the sited go down - no pain.

The files are rather small, pdfs and xml of 50-300KB mostly.
The total size is about 25 TB right now.

We're a low budget company, so your advise about developing is not going to
happen as we have no such skills or resources for this.
Plus, I want to make this transparent for the devs and everyone - just an
infrastructure replacement that will buy me all of the ceph benefits and
allow the company to survive the power outages or storage crashes.



On Mon, May 21, 2018 at 5:12 PM, David Turner  wrote:

> Not a lot of people use object storage multi-site.  I doubt anyone is
> using this like you are.  In theory it would work, but even if somebody has
> this setup running, it's almost impossible to tell if it would work for
> your needs and use case.  You really should try it out for yourself to see
> if it works to your needs.  And if you feel so inclined, report back here
> with how it worked.
>
> If you're asking for advice, why do you need a networked posix
> filesystem?  Unless you are using proprietary software with this
> requirement, it's generally lazy coding that requires a mounted filesystem
> like this and you should aim towards using object storage instead without
> any sort of NFS layer.  It's a little more work for the developers, but is
> drastically simpler to support and manage.
>
> On Mon, May 21, 2018 at 10:06 AM Up Safe  wrote:
>
>> guys,
>> please tell me if I'm in the right direction.
>> If ceph object storage can be set up in multi site configuration,
>> and I add ganesha (which to my understanding is an "adapter"
>> that serves s3 objects via nfs to clients) -
>> won't this work as active-active?
>>
>>
>> Thanks
>>
>> On Mon, May 21, 2018 at 11:48 AM, Up Safe  wrote:
>>
>>> ok, thanks.
>>> but it seems to me that having pool replicas spread over sites is a bit
>>> too risky performance wise.
>>> how about ganesha? will it work with cephfs and multi site setup?
>>>
>>> I was previously reading about rgw with ganesha and it was full of
>>> limitations.
>>> with cephfs - there is only one and one I can live with.
>>>
>>> Will it work?
>>>
>>>
>>> On Mon, May 21, 2018 at 10:57 AM, Adrian Saul <
>>> adrian.s...@tpgtelecom.com.au> wrote:
>>>


 We run CephFS in a limited fashion in a stretched cluster of about 40km
 with redundant 10G fibre between sites – link latency is in the order of
 1-2ms.  Performance is reasonable for our usage but is noticeably slower
 than comparable local ceph based RBD shares.



 Essentially we just setup the ceph pools behind cephFS to have replicas
 on each site.  To export it we are simply using Linux kernel NFS and it
 gets exported from 4 hosts that act as CephFS clients.  Those 4 hosts are
 then setup in an DNS record that resolves to all 4 IPs, and we then use
 automount to do automatic mounting and host failover on the NFS clients.
 Automount takes care of finding the quickest and available NFS server.



 I stress this is a limited setup that we use for some fairly light
 duty, but we are looking to move things like user home directories onto
 this.  YMMV.





 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
 Behalf Of *Up Safe
 *Sent:* Monday, 21 May 2018 5:36 PM
 *To:* David Turner 
 *Cc:* ceph-users 
 *Subject:* Re: [ceph-users] multi site with cephfs



 Hi,

 can you be a bit more specific?

 I need to understand whether this is doable at all.

 Other options would be using ganesha, but I understand it's very
 limited on NFS;

 or start looking at gluster.



 Basically, I need the multi site option, i.e. active-active read-write.



 Thanks



 On Wed, May 16, 2018 at 5:50 PM, David Turner 
 wrote:

 Object storage multi-site is very specific to using object storage.  It
 uses the RGW API's to sync s3 uploads between each site.  For CephFS you
 might be able to do a sync of the rados pools, but I don't think that's
 actually a thing yet.  RBD mirror is also a layer on top of things to sync
 between sites.  Ba

Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Paul Emmerich
For active/passive and async replication with a POSIX filesystem:
Maybe two Ceph clusters with RBD mirror and re-exporting the RBD(s) via NFS?


Paul

2018-05-21 16:33 GMT+02:00 Up Safe :

> I'll explain.
> Right now we have 2 sites (racks) with several dozens of servers at each
> accessing a NAS (let's call it a NAS, although it's an IBM v7000 Unified
> that serves the files via NFS).
>
> The biggest problem is that it works active-passive, i.e. we always access
> one of the storages for read/write
> and the other one is replicated once every few hours, so it's more for
> backup needs.
>
> In this setup once the power goes down in our main site - we're stuck with
> a bit (several hours) outdated files
> and we need to remount all of the servers and what not.
>
> The multi site ceph was supposed to solve this problem for us. This way we
> would have only local mounts, i.e.
> each server would only access the filesystem that is in the same site. And
> if one of the sited go down - no pain.
>
> The files are rather small, pdfs and xml of 50-300KB mostly.
> The total size is about 25 TB right now.
>
> We're a low budget company, so your advise about developing is not going
> to happen as we have no such skills or resources for this.
> Plus, I want to make this transparent for the devs and everyone - just an
> infrastructure replacement that will buy me all of the ceph benefits and
> allow the company to survive the power outages or storage crashes.
>
>
>
> On Mon, May 21, 2018 at 5:12 PM, David Turner 
> wrote:
>
>> Not a lot of people use object storage multi-site.  I doubt anyone is
>> using this like you are.  In theory it would work, but even if somebody has
>> this setup running, it's almost impossible to tell if it would work for
>> your needs and use case.  You really should try it out for yourself to see
>> if it works to your needs.  And if you feel so inclined, report back here
>> with how it worked.
>>
>> If you're asking for advice, why do you need a networked posix
>> filesystem?  Unless you are using proprietary software with this
>> requirement, it's generally lazy coding that requires a mounted filesystem
>> like this and you should aim towards using object storage instead without
>> any sort of NFS layer.  It's a little more work for the developers, but is
>> drastically simpler to support and manage.
>>
>> On Mon, May 21, 2018 at 10:06 AM Up Safe  wrote:
>>
>>> guys,
>>> please tell me if I'm in the right direction.
>>> If ceph object storage can be set up in multi site configuration,
>>> and I add ganesha (which to my understanding is an "adapter"
>>> that serves s3 objects via nfs to clients) -
>>> won't this work as active-active?
>>>
>>>
>>> Thanks
>>>
>>> On Mon, May 21, 2018 at 11:48 AM, Up Safe  wrote:
>>>
 ok, thanks.
 but it seems to me that having pool replicas spread over sites is a bit
 too risky performance wise.
 how about ganesha? will it work with cephfs and multi site setup?

 I was previously reading about rgw with ganesha and it was full of
 limitations.
 with cephfs - there is only one and one I can live with.

 Will it work?


 On Mon, May 21, 2018 at 10:57 AM, Adrian Saul <
 adrian.s...@tpgtelecom.com.au> wrote:

>
>
> We run CephFS in a limited fashion in a stretched cluster of about
> 40km with redundant 10G fibre between sites – link latency is in the order
> of 1-2ms.  Performance is reasonable for our usage but is noticeably 
> slower
> than comparable local ceph based RBD shares.
>
>
>
> Essentially we just setup the ceph pools behind cephFS to have
> replicas on each site.  To export it we are simply using Linux kernel NFS
> and it gets exported from 4 hosts that act as CephFS clients.  Those 4
> hosts are then setup in an DNS record that resolves to all 4 IPs, and we
> then use automount to do automatic mounting and host failover on the NFS
> clients.  Automount takes care of finding the quickest and available NFS
> server.
>
>
>
> I stress this is a limited setup that we use for some fairly light
> duty, but we are looking to move things like user home directories onto
> this.  YMMV.
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
> Behalf Of *Up Safe
> *Sent:* Monday, 21 May 2018 5:36 PM
> *To:* David Turner 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] multi site with cephfs
>
>
>
> Hi,
>
> can you be a bit more specific?
>
> I need to understand whether this is doable at all.
>
> Other options would be using ganesha, but I understand it's very
> limited on NFS;
>
> or start looking at gluster.
>
>
>
> Basically, I need the multi site option, i.e. active-active read-write.
>
>
>
> Thanks
>
>
>
> On Wed, May 16, 2018 at 5:50 PM, David 

Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Up Safe
Active-passive sounds not what I want.
 But maybe I misunderstand.

Does rbd mirror replicate both ways?
And how do I do it with nfs?

Thanks

On Mon, May 21, 2018, 17:42 Paul Emmerich  wrote:

> For active/passive and async replication with a POSIX filesystem:
> Maybe two Ceph clusters with RBD mirror and re-exporting the RBD(s) via
> NFS?
>
>
> Paul
>
> 2018-05-21 16:33 GMT+02:00 Up Safe :
>
>> I'll explain.
>> Right now we have 2 sites (racks) with several dozens of servers at each
>> accessing a NAS (let's call it a NAS, although it's an IBM v7000 Unified
>> that serves the files via NFS).
>>
>> The biggest problem is that it works active-passive, i.e. we always
>> access one of the storages for read/write
>> and the other one is replicated once every few hours, so it's more for
>> backup needs.
>>
>> In this setup once the power goes down in our main site - we're stuck
>> with a bit (several hours) outdated files
>> and we need to remount all of the servers and what not.
>>
>> The multi site ceph was supposed to solve this problem for us. This way
>> we would have only local mounts, i.e.
>> each server would only access the filesystem that is in the same site.
>> And if one of the sited go down - no pain.
>>
>> The files are rather small, pdfs and xml of 50-300KB mostly.
>> The total size is about 25 TB right now.
>>
>> We're a low budget company, so your advise about developing is not going
>> to happen as we have no such skills or resources for this.
>> Plus, I want to make this transparent for the devs and everyone - just an
>> infrastructure replacement that will buy me all of the ceph benefits and
>> allow the company to survive the power outages or storage crashes.
>>
>>
>>
>> On Mon, May 21, 2018 at 5:12 PM, David Turner 
>> wrote:
>>
>>> Not a lot of people use object storage multi-site.  I doubt anyone is
>>> using this like you are.  In theory it would work, but even if somebody has
>>> this setup running, it's almost impossible to tell if it would work for
>>> your needs and use case.  You really should try it out for yourself to see
>>> if it works to your needs.  And if you feel so inclined, report back here
>>> with how it worked.
>>>
>>> If you're asking for advice, why do you need a networked posix
>>> filesystem?  Unless you are using proprietary software with this
>>> requirement, it's generally lazy coding that requires a mounted filesystem
>>> like this and you should aim towards using object storage instead without
>>> any sort of NFS layer.  It's a little more work for the developers, but is
>>> drastically simpler to support and manage.
>>>
>>> On Mon, May 21, 2018 at 10:06 AM Up Safe  wrote:
>>>
 guys,
 please tell me if I'm in the right direction.
 If ceph object storage can be set up in multi site configuration,
 and I add ganesha (which to my understanding is an "adapter"
 that serves s3 objects via nfs to clients) -
 won't this work as active-active?


 Thanks

 On Mon, May 21, 2018 at 11:48 AM, Up Safe  wrote:

> ok, thanks.
> but it seems to me that having pool replicas spread over sites is a
> bit too risky performance wise.
> how about ganesha? will it work with cephfs and multi site setup?
>
> I was previously reading about rgw with ganesha and it was full of
> limitations.
> with cephfs - there is only one and one I can live with.
>
> Will it work?
>
>
> On Mon, May 21, 2018 at 10:57 AM, Adrian Saul <
> adrian.s...@tpgtelecom.com.au> wrote:
>
>>
>>
>> We run CephFS in a limited fashion in a stretched cluster of about
>> 40km with redundant 10G fibre between sites – link latency is in the 
>> order
>> of 1-2ms.  Performance is reasonable for our usage but is noticeably 
>> slower
>> than comparable local ceph based RBD shares.
>>
>>
>>
>> Essentially we just setup the ceph pools behind cephFS to have
>> replicas on each site.  To export it we are simply using Linux kernel NFS
>> and it gets exported from 4 hosts that act as CephFS clients.  Those 4
>> hosts are then setup in an DNS record that resolves to all 4 IPs, and we
>> then use automount to do automatic mounting and host failover on the NFS
>> clients.  Automount takes care of finding the quickest and available NFS
>> server.
>>
>>
>>
>> I stress this is a limited setup that we use for some fairly light
>> duty, but we are looking to move things like user home directories onto
>> this.  YMMV.
>>
>>
>>
>>
>>
>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>> Behalf Of *Up Safe
>> *Sent:* Monday, 21 May 2018 5:36 PM
>> *To:* David Turner 
>> *Cc:* ceph-users 
>> *Subject:* Re: [ceph-users] multi site with cephfs
>>
>>
>>
>> Hi,
>>
>> can you be a bit more specific?
>>
>> I need to understand whether this is doable at a

Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Brady Deetz
At this point in the conversation, based on what's already been said, I
have 2 recommendations.

If you haven't already, read a lot of the architecture documentation for
ceph. This will give you a good idea what capabilities exist and don't
exist.

If after reading the architecture documentation, you are still unsure,
don't invest in Ceph. It's a great platform for many people, but it isn't
for every team or problem.

On Mon, May 21, 2018, 9:56 AM Up Safe  wrote:

> Active-passive sounds not what I want.
>  But maybe I misunderstand.
>
> Does rbd mirror replicate both ways?
> And how do I do it with nfs?
>
> Thanks
>
> On Mon, May 21, 2018, 17:42 Paul Emmerich  wrote:
>
>> For active/passive and async replication with a POSIX filesystem:
>> Maybe two Ceph clusters with RBD mirror and re-exporting the RBD(s) via
>> NFS?
>>
>>
>> Paul
>>
>> 2018-05-21 16:33 GMT+02:00 Up Safe :
>>
>>> I'll explain.
>>> Right now we have 2 sites (racks) with several dozens of servers at each
>>> accessing a NAS (let's call it a NAS, although it's an IBM v7000 Unified
>>> that serves the files via NFS).
>>>
>>> The biggest problem is that it works active-passive, i.e. we always
>>> access one of the storages for read/write
>>> and the other one is replicated once every few hours, so it's more for
>>> backup needs.
>>>
>>> In this setup once the power goes down in our main site - we're stuck
>>> with a bit (several hours) outdated files
>>> and we need to remount all of the servers and what not.
>>>
>>> The multi site ceph was supposed to solve this problem for us. This way
>>> we would have only local mounts, i.e.
>>> each server would only access the filesystem that is in the same site.
>>> And if one of the sited go down - no pain.
>>>
>>> The files are rather small, pdfs and xml of 50-300KB mostly.
>>> The total size is about 25 TB right now.
>>>
>>> We're a low budget company, so your advise about developing is not going
>>> to happen as we have no such skills or resources for this.
>>> Plus, I want to make this transparent for the devs and everyone - just
>>> an infrastructure replacement that will buy me all of the ceph benefits and
>>> allow the company to survive the power outages or storage crashes.
>>>
>>>
>>>
>>> On Mon, May 21, 2018 at 5:12 PM, David Turner 
>>> wrote:
>>>
 Not a lot of people use object storage multi-site.  I doubt anyone is
 using this like you are.  In theory it would work, but even if somebody has
 this setup running, it's almost impossible to tell if it would work for
 your needs and use case.  You really should try it out for yourself to see
 if it works to your needs.  And if you feel so inclined, report back here
 with how it worked.

 If you're asking for advice, why do you need a networked posix
 filesystem?  Unless you are using proprietary software with this
 requirement, it's generally lazy coding that requires a mounted filesystem
 like this and you should aim towards using object storage instead without
 any sort of NFS layer.  It's a little more work for the developers, but is
 drastically simpler to support and manage.

 On Mon, May 21, 2018 at 10:06 AM Up Safe  wrote:

> guys,
> please tell me if I'm in the right direction.
> If ceph object storage can be set up in multi site configuration,
> and I add ganesha (which to my understanding is an "adapter"
> that serves s3 objects via nfs to clients) -
> won't this work as active-active?
>
>
> Thanks
>
> On Mon, May 21, 2018 at 11:48 AM, Up Safe  wrote:
>
>> ok, thanks.
>> but it seems to me that having pool replicas spread over sites is a
>> bit too risky performance wise.
>> how about ganesha? will it work with cephfs and multi site setup?
>>
>> I was previously reading about rgw with ganesha and it was full of
>> limitations.
>> with cephfs - there is only one and one I can live with.
>>
>> Will it work?
>>
>>
>> On Mon, May 21, 2018 at 10:57 AM, Adrian Saul <
>> adrian.s...@tpgtelecom.com.au> wrote:
>>
>>>
>>>
>>> We run CephFS in a limited fashion in a stretched cluster of about
>>> 40km with redundant 10G fibre between sites – link latency is in the 
>>> order
>>> of 1-2ms.  Performance is reasonable for our usage but is noticeably 
>>> slower
>>> than comparable local ceph based RBD shares.
>>>
>>>
>>>
>>> Essentially we just setup the ceph pools behind cephFS to have
>>> replicas on each site.  To export it we are simply using Linux kernel 
>>> NFS
>>> and it gets exported from 4 hosts that act as CephFS clients.  Those 4
>>> hosts are then setup in an DNS record that resolves to all 4 IPs, and we
>>> then use automount to do automatic mounting and host failover on the NFS
>>> clients.  Automount takes care of finding the quickest and available NFS
>>> server.
>>>
>>>
>>>

[ceph-users] samba gateway experiences with cephfs ?

2018-05-21 Thread Jake Grimmett
Dear All,

Excited to see snapshots finally becoming a stable feature in cephfs :)

Unfortunately we have a large number (~200) of Windows and Macs clients
which need CIFS/SMB  access to cephfs.

None-the-less, snapshots have prompted us to start testing ceph to see
if we can use it as a scale-out NAS...

cephfs native performance on our test setup appears good, however tests
accessing via samba have been slightly disappointing, especially with
small file I/O. Large file I/O is fair, but could still be improved.

Using Helios LanTest 6.0.0 on Osx.

Create 300 Files
 Cephfs (kernel) > samba. average 5100 ms
 Isilon > CIFS  average 2600 ms
 ZFS > samba average  121 ms

Remove 300 files
 Cephfs (kernel) > samba. average 2100 ms
 Isilon > CIFS  average  900 ms
 ZFS > samba average  421 ms

Write 300MB to file
 Cephfs (kernel) > samba. average 25 MB/s
 Isilon > CIFS  average  17.9 MB/s
 ZFS > samba average  64.4 MB/s

Hardware Used:
CephFS: five node dual Xeon cluster (120 bluestore OSD, 4 x nvme
metadata for Cephfs, bulk data EC 4+1), Scientific Linux 7.5, ceph
12.2.5, kernel client (fuse significantly slower).
Isilon: 6 year old, 8 x NL108
ZFS: SL 6.4 on a Dell R730XD, 24 x 1.8TB drives

Ceph Samba gateway is a separate machine: dual Xeon, 40Gb ethernet,
128GB RAM, also running SL 7.5.

Finally, is the vfs_ceph module for Samba useful? It doesn't seem to be
widely available pre-complied for for RHEL derivatives. Can anyone
comment on their experiences using vfs_ceph, or point me to a Centos 7.x
repo that has it?

many thanks for all and any advice,

Jake

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Brady Deetz
What is your expected behavior for when Client A writes to File B in
Datacenter 1 and Client C writes to File B in Datacenter 2 at the exact
same time?

I don't think you can perfectly achieve what you are requesting with Ceph
or many other storage solutions.

On Mon, May 21, 2018 at 9:33 AM, Up Safe  wrote:

> I'll explain.
> Right now we have 2 sites (racks) with several dozens of servers at each
> accessing a NAS (let's call it a NAS, although it's an IBM v7000 Unified
> that serves the files via NFS).
>
> The biggest problem is that it works active-passive, i.e. we always access
> one of the storages for read/write
> and the other one is replicated once every few hours, so it's more for
> backup needs.
>
> In this setup once the power goes down in our main site - we're stuck with
> a bit (several hours) outdated files
> and we need to remount all of the servers and what not.
>
> The multi site ceph was supposed to solve this problem for us. This way we
> would have only local mounts, i.e.
> each server would only access the filesystem that is in the same site. And
> if one of the sited go down - no pain.
>
> The files are rather small, pdfs and xml of 50-300KB mostly.
> The total size is about 25 TB right now.
>
> We're a low budget company, so your advise about developing is not going
> to happen as we have no such skills or resources for this.
> Plus, I want to make this transparent for the devs and everyone - just an
> infrastructure replacement that will buy me all of the ceph benefits and
> allow the company to survive the power outages or storage crashes.
>
>
>
> On Mon, May 21, 2018 at 5:12 PM, David Turner 
> wrote:
>
>> Not a lot of people use object storage multi-site.  I doubt anyone is
>> using this like you are.  In theory it would work, but even if somebody has
>> this setup running, it's almost impossible to tell if it would work for
>> your needs and use case.  You really should try it out for yourself to see
>> if it works to your needs.  And if you feel so inclined, report back here
>> with how it worked.
>>
>> If you're asking for advice, why do you need a networked posix
>> filesystem?  Unless you are using proprietary software with this
>> requirement, it's generally lazy coding that requires a mounted filesystem
>> like this and you should aim towards using object storage instead without
>> any sort of NFS layer.  It's a little more work for the developers, but is
>> drastically simpler to support and manage.
>>
>> On Mon, May 21, 2018 at 10:06 AM Up Safe  wrote:
>>
>>> guys,
>>> please tell me if I'm in the right direction.
>>> If ceph object storage can be set up in multi site configuration,
>>> and I add ganesha (which to my understanding is an "adapter"
>>> that serves s3 objects via nfs to clients) -
>>> won't this work as active-active?
>>>
>>>
>>> Thanks
>>>
>>> On Mon, May 21, 2018 at 11:48 AM, Up Safe  wrote:
>>>
 ok, thanks.
 but it seems to me that having pool replicas spread over sites is a bit
 too risky performance wise.
 how about ganesha? will it work with cephfs and multi site setup?

 I was previously reading about rgw with ganesha and it was full of
 limitations.
 with cephfs - there is only one and one I can live with.

 Will it work?


 On Mon, May 21, 2018 at 10:57 AM, Adrian Saul <
 adrian.s...@tpgtelecom.com.au> wrote:

>
>
> We run CephFS in a limited fashion in a stretched cluster of about
> 40km with redundant 10G fibre between sites – link latency is in the order
> of 1-2ms.  Performance is reasonable for our usage but is noticeably 
> slower
> than comparable local ceph based RBD shares.
>
>
>
> Essentially we just setup the ceph pools behind cephFS to have
> replicas on each site.  To export it we are simply using Linux kernel NFS
> and it gets exported from 4 hosts that act as CephFS clients.  Those 4
> hosts are then setup in an DNS record that resolves to all 4 IPs, and we
> then use automount to do automatic mounting and host failover on the NFS
> clients.  Automount takes care of finding the quickest and available NFS
> server.
>
>
>
> I stress this is a limited setup that we use for some fairly light
> duty, but we are looking to move things like user home directories onto
> this.  YMMV.
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
> Behalf Of *Up Safe
> *Sent:* Monday, 21 May 2018 5:36 PM
> *To:* David Turner 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] multi site with cephfs
>
>
>
> Hi,
>
> can you be a bit more specific?
>
> I need to understand whether this is doable at all.
>
> Other options would be using ganesha, but I understand it's very
> limited on NFS;
>
> or start looking at gluster.
>
>
>
> Basically, I need the multi site option, i.e

Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Up Safe
I don't believe I have this kind of behavior.
AFAIK, files are created or modified by only 1 client at a time.

On Mon, May 21, 2018, 19:06 Brady Deetz  wrote:

> What is your expected behavior for when Client A writes to File B in
> Datacenter 1 and Client C writes to File B in Datacenter 2 at the exact
> same time?
>
> I don't think you can perfectly achieve what you are requesting with Ceph
> or many other storage solutions.
>
> On Mon, May 21, 2018 at 9:33 AM, Up Safe  wrote:
>
>> I'll explain.
>> Right now we have 2 sites (racks) with several dozens of servers at each
>> accessing a NAS (let's call it a NAS, although it's an IBM v7000 Unified
>> that serves the files via NFS).
>>
>> The biggest problem is that it works active-passive, i.e. we always
>> access one of the storages for read/write
>> and the other one is replicated once every few hours, so it's more for
>> backup needs.
>>
>> In this setup once the power goes down in our main site - we're stuck
>> with a bit (several hours) outdated files
>> and we need to remount all of the servers and what not.
>>
>> The multi site ceph was supposed to solve this problem for us. This way
>> we would have only local mounts, i.e.
>> each server would only access the filesystem that is in the same site.
>> And if one of the sited go down - no pain.
>>
>> The files are rather small, pdfs and xml of 50-300KB mostly.
>> The total size is about 25 TB right now.
>>
>> We're a low budget company, so your advise about developing is not going
>> to happen as we have no such skills or resources for this.
>> Plus, I want to make this transparent for the devs and everyone - just an
>> infrastructure replacement that will buy me all of the ceph benefits and
>> allow the company to survive the power outages or storage crashes.
>>
>>
>>
>> On Mon, May 21, 2018 at 5:12 PM, David Turner 
>> wrote:
>>
>>> Not a lot of people use object storage multi-site.  I doubt anyone is
>>> using this like you are.  In theory it would work, but even if somebody has
>>> this setup running, it's almost impossible to tell if it would work for
>>> your needs and use case.  You really should try it out for yourself to see
>>> if it works to your needs.  And if you feel so inclined, report back here
>>> with how it worked.
>>>
>>> If you're asking for advice, why do you need a networked posix
>>> filesystem?  Unless you are using proprietary software with this
>>> requirement, it's generally lazy coding that requires a mounted filesystem
>>> like this and you should aim towards using object storage instead without
>>> any sort of NFS layer.  It's a little more work for the developers, but is
>>> drastically simpler to support and manage.
>>>
>>> On Mon, May 21, 2018 at 10:06 AM Up Safe  wrote:
>>>
 guys,
 please tell me if I'm in the right direction.
 If ceph object storage can be set up in multi site configuration,
 and I add ganesha (which to my understanding is an "adapter"
 that serves s3 objects via nfs to clients) -
 won't this work as active-active?


 Thanks

 On Mon, May 21, 2018 at 11:48 AM, Up Safe  wrote:

> ok, thanks.
> but it seems to me that having pool replicas spread over sites is a
> bit too risky performance wise.
> how about ganesha? will it work with cephfs and multi site setup?
>
> I was previously reading about rgw with ganesha and it was full of
> limitations.
> with cephfs - there is only one and one I can live with.
>
> Will it work?
>
>
> On Mon, May 21, 2018 at 10:57 AM, Adrian Saul <
> adrian.s...@tpgtelecom.com.au> wrote:
>
>>
>>
>> We run CephFS in a limited fashion in a stretched cluster of about
>> 40km with redundant 10G fibre between sites – link latency is in the 
>> order
>> of 1-2ms.  Performance is reasonable for our usage but is noticeably 
>> slower
>> than comparable local ceph based RBD shares.
>>
>>
>>
>> Essentially we just setup the ceph pools behind cephFS to have
>> replicas on each site.  To export it we are simply using Linux kernel NFS
>> and it gets exported from 4 hosts that act as CephFS clients.  Those 4
>> hosts are then setup in an DNS record that resolves to all 4 IPs, and we
>> then use automount to do automatic mounting and host failover on the NFS
>> clients.  Automount takes care of finding the quickest and available NFS
>> server.
>>
>>
>>
>> I stress this is a limited setup that we use for some fairly light
>> duty, but we are looking to move things like user home directories onto
>> this.  YMMV.
>>
>>
>>
>>
>>
>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>> Behalf Of *Up Safe
>> *Sent:* Monday, 21 May 2018 5:36 PM
>> *To:* David Turner 
>> *Cc:* ceph-users 
>> *Subject:* Re: [ceph-users] multi site with cephfs
>>
>>
>>
>> Hi,
>>
>> can 

Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Janne Johansson
Den mån 21 maj 2018 kl 18:28 skrev Up Safe :

> I don't believe I have this kind of behavior.
> AFAIK, files are created or modified by only 1 client at a time.
>

Make sure that this is the case then, its _very_ easy to start out with
something along the lines of "right now I cannot image this" and later on,
someone makes a change that allows and does writes to both sites, and if
that part for some reason works without weird races, this solution gets
entrenched and after that, someone finds out there sometimes is a race and
sometimes gives bad results and none understands how it got this bad.

Storage solutions require a lot of planning and summing up demands,
especially when you start doing multisite things. Not because you can't add
disks, network cards or boxes later, but because some of the early choices
might affect what it can or cannot do later on.



> On Mon, May 21, 2018, 19:06 Brady Deetz  wrote:
>
>> What is your expected behavior for when Client A writes to File B in
>> Datacenter 1 and Client C writes to File B in Datacenter 2 at the exact
>> same time?
>>
>> I don't think you can perfectly achieve what you are requesting with Ceph
>> or many other storage solutions.
>>
>>

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help/advice with crush rules

2018-05-21 Thread Andras Pataki

Hi Greg,

Thanks for the detailed explanation - the examples make a lot of sense.

One followup question regarding a two level crush rule like:

step take default
step choose 3 type=rack
step chooseleaf 3 type=host
step emit

If the erasure code has 9 chunks, this lines up exactly without any 
problems.  What if the erasure code isn't a nice product of the racks 
and hosts/rack, for example 6+2 with the above example?  Will it just 
take 3 chunks in the first two racks and 2 from the last without any 
issues?  The other direction I presume can't work, i.e. on the above 
example I can't put any erasure code with more than 9 chunks.


Andras


On 05/18/2018 06:30 PM, Gregory Farnum wrote:
On Thu, May 17, 2018 at 9:05 AM Andras Pataki 
mailto:apat...@flatironinstitute.org>> 
wrote:


I've been trying to wrap my head around crush rules, and I need some
help/advice.  I'm thinking of using erasure coding instead of
replication, and trying to understand the possibilities for
planning for
failure cases.

For a simplified example, consider a 2 level topology, OSDs live on
hosts, and hosts live in racks.  I'd like to set up a rule for a 6+3
erasure code that would put at most 1 of the 9 chunks on a host,
and no
more than 3 chunks in a rack (so in case the rack is lost, we
still have
a way to recover).  Some racks may not have 3 hosts in them, so they
could potentially accept only 1 or 2 chunks then.  How can something
like this be implemented as a crush rule?  Or, if not exactly this,
something in this spirit?  I don't want to say that all chunks
need to
live in a separate rack because that is too restrictive (some
racks may
be much bigger than others, or there might not even be 9 racks).


Unfortunately what you describe here is a little too detailed in ways 
CRUSH can't easily specify. You should think of a CRUSH rule as a 
sequence of steps that start out at a root (the "take" step), and 
incrementally specify more detail about which piece of the CRUSH 
hierarchy they run on, but run the *same* rule on every piece they select.


So the simplest thing that comes close to what you suggest is:
(forgive me if my syntax is slightly off, I'm doing this from memory)
step take default
step chooseleaf n type=rack
step emit

That would start at the default root, select "n" racks (9, in your 
case) and then for each rack find an OSD within it. (chooseleaf is 
special and more flexibly than most of the CRUSH language; it's nice 
because if it can't find an OSD in one of the selected racks, it will 
pick another rack).

But a rule that's more illustrative of how things work is:
step take default
step choose 3 type=rack
step chooseleaf 3 type=host
step emit

That one selects three racks, then selects three OSDs within different 
hosts *in each rack*. (You'll note that it doesn't necessarily work 
out so well if you don't want 9 OSDs!) If one of the racks it selected 
doesn't have 3 separate hosts...well, tough, it tried to do what you 
told it. :/


If you were dedicated, you could split up your racks into 
equivalently-sized units — let's say rows. Then you could do

step take default
step choose 3 type=row
step chooseleaf 3 type=host
step emit

Assuming you have 3+ rows of good size, that'll get you 9 OSDs which 
are all on different hosts.

-Greg


Thanks,

Andras

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bucket reporting content inconsistently

2018-05-21 Thread Pavan Rallabhandi
Can possibly be due to these http://tracker.ceph.com/issues/20380, 
http://tracker.ceph.com/issues/22555

Thanks,

From: ceph-users  on behalf of Tom W 

Date: Saturday, May 12, 2018 at 10:57 AM
To: ceph-users 
Subject: EXT: Re: [ceph-users] Bucket reporting content inconsistently

Thanks for posting this for me Sean. Just to update, it seems that despite the 
bucket checks completing and reporting no issues, the objects continued to show 
in any tools to list the contents of the bucket.

I put together a simple loop to upload a new file to overwrite the existing one 
then trigger a delete request though the API and this seems to be working in 
lieu of a cleaner solution.

We will be upgrading to Luminous in the coming week, I’ll report back if we see 
any significant change in this issue when we do.

Kind Regards,

Tom

From: ceph-users  On Behalf Of Sean Redmond
Sent: 11 May 2018 17:15
To: ceph-users 
Subject: [ceph-users] Bucket reporting content inconsistently


HI all,



We have recently upgraded to 10.2.10 in preparation for our upcoming upgrade to 
Luminous and I have been attempting to remove a bucket. When using tools such 
as s3cmd I can see files are listed, verified by the checking with bi list too 
as shown below:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bi list 
--bucket='bucketnamehere' | grep -i "\"idx\":" | wc -l

3278



However, on attempting to delete the bucket and purge the objects , it appears 
not to be recognised:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bucket rm --bucket= 
bucketnamehere --purge-objects

2018-05-10 14:11:05.393851 7f0ab07b6a00 -1 ERROR: unable to remove bucket(2) No 
such file or directory



Checking the bucket stats, it does appear that the bucket is reporting no 
content, and repeat the above content test there has been no change to the 3278 
figure:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bucket stats 
--bucket="bucketnamehere"

{

"bucket": "bucketnamehere",

"pool": ".rgw.buckets",

"index_pool": ".rgw.buckets.index",

"id": "default.28142894.1",

"marker": "default.28142894.1",

"owner": "16355",

"ver": 
"0#5463545,1#5483686,2#5483484,3#5474696,4#5479052,5#5480339,6#5469460,7#5463976",

"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0",

"mtime": "2015-12-08 12:42:26.286153",

"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#",

"usage": {

"rgw.main": {

"size_kb": 0,

"size_kb_actual": 0,

"num_objects": 0

},

"rgw.multimeta": {

"size_kb": 0,

"size_kb_actual": 0,

"num_objects": 0

}

},

"bucket_quota": {

"enabled": false,

"max_size_kb": -1,

"max_objects": -1

}

}



I have attempted a bucket index check and fix on this, however, it does not 
appear to have made a difference and no fixes or errors reported from it. Does 
anyone have any advice on how to proceed with removing this content? At this 
stage I am not too concerned if the method needed to remove this generates 
orphans, as we will shortly be running a large orphan scan after our upgrade to 
Luminous. Cluster health otherwise reports normal.



Thanks

Sean Redmond



NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. We may monitor all incoming and outgoing emails in line with 
current legislation. We have taken steps to ensure that this email and 
attachments are free from any virus, but it remains your responsibility to 
ensure that viruses do not adversely affect you

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help/advice with crush rules

2018-05-21 Thread Gregory Farnum
On Mon, May 21, 2018 at 11:19 AM Andras Pataki <
apat...@flatironinstitute.org> wrote:

> Hi Greg,
>
> Thanks for the detailed explanation - the examples make a lot of sense.
>
> One followup question regarding a two level crush rule like:
>
>
> step take default
> step choose 3 type=rack
> step chooseleaf 3 type=host
> step emit
>
> If the erasure code has 9 chunks, this lines up exactly without any
> problems.  What if the erasure code isn't a nice product of the racks and
> hosts/rack, for example 6+2 with the above example?  Will it just take 3
> chunks in the first two racks and 2 from the last without any issues?
>

Yes, assuming your ceph install is new enough. (At one point it crashed if
you did that :o)



The other direction I presume can't work, i.e. on the above example I can't
> put any erasure code with more than 9 chunks.
>

Right


>
> Andras
>
>
>
> On 05/18/2018 06:30 PM, Gregory Farnum wrote:
>
> On Thu, May 17, 2018 at 9:05 AM Andras Pataki <
> apat...@flatironinstitute.org> wrote:
>
>> I've been trying to wrap my head around crush rules, and I need some
>> help/advice.  I'm thinking of using erasure coding instead of
>> replication, and trying to understand the possibilities for planning for
>> failure cases.
>>
>> For a simplified example, consider a 2 level topology, OSDs live on
>> hosts, and hosts live in racks.  I'd like to set up a rule for a 6+3
>> erasure code that would put at most 1 of the 9 chunks on a host, and no
>> more than 3 chunks in a rack (so in case the rack is lost, we still have
>> a way to recover).  Some racks may not have 3 hosts in them, so they
>> could potentially accept only 1 or 2 chunks then.  How can something
>> like this be implemented as a crush rule?  Or, if not exactly this,
>> something in this spirit?  I don't want to say that all chunks need to
>> live in a separate rack because that is too restrictive (some racks may
>> be much bigger than others, or there might not even be 9 racks).
>>
>
> Unfortunately what you describe here is a little too detailed in ways
> CRUSH can't easily specify. You should think of a CRUSH rule as a sequence
> of steps that start out at a root (the "take" step), and incrementally
> specify more detail about which piece of the CRUSH hierarchy they run on,
> but run the *same* rule on every piece they select.
>
> So the simplest thing that comes close to what you suggest is:
> (forgive me if my syntax is slightly off, I'm doing this from memory)
> step take default
> step chooseleaf n type=rack
> step emit
>
> That would start at the default root, select "n" racks (9, in your case)
> and then for each rack find an OSD within it. (chooseleaf is special and
> more flexibly than most of the CRUSH language; it's nice because if it
> can't find an OSD in one of the selected racks, it will pick another rack).
> But a rule that's more illustrative of how things work is:
> step take default
> step choose 3 type=rack
> step chooseleaf 3 type=host
> step emit
>
> That one selects three racks, then selects three OSDs within different
> hosts *in each rack*. (You'll note that it doesn't necessarily work out so
> well if you don't want 9 OSDs!) If one of the racks it selected doesn't
> have 3 separate hosts...well, tough, it tried to do what you told it. :/
>
> If you were dedicated, you could split up your racks into
> equivalently-sized units — let's say rows. Then you could do
> step take default
> step choose 3 type=row
> step chooseleaf 3 type=host
> step emit
>
> Assuming you have 3+ rows of good size, that'll get you 9 OSDs which are
> all on different hosts.
> -Greg
>
>
>>
>> Thanks,
>>
>> Andras
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Too many objects per pg than average: deadlock situation

2018-05-21 Thread Mike A
Hello,

> 21 мая 2018 г., в 2:05, Sage Weil  написал(а):
> 
> On Sun, 20 May 2018, Mike A wrote:
>> Hello!
>> 
>> In our cluster, we see a deadlock situation.
>> This is a standard cluster for an OpenStack without a RadosGW, we have a 
>> standard block access pools and one for metrics from a gnocchi.
>> The amount of data in the gnocchi pool is small, but objects are just a lot.
>> 
>> When planning a distribution of PG between pools, the PG are distributed 
>> depending on the estimated data size of each pool. Correspondingly, as 
>> suggested by pgcalc for the gnocchi pool, it is necessary to allocate a 
>> little PG quantity.
>> 
>> As a result, the cluster is constantly hanging with the error "1 pools have 
>> many more objects per pg than average" and this is understandable: the 
>> gnocchi produces a lot of small objects and in comparison with the rest of 
>> pools it is tens times larger.
>> 
>> And here we are at a deadlock:
>> 1. We can not increase the amount of PG on the gnocchi pool, since it is 
>> very small in data size
>> 2. Even if we increase the number of PG - we can cross the recommended 200 
>> PGs limit for each OSD in cluster
>> 3. Constantly holding the cluster in the HEALTH_WARN mode is a bad idea
>> 4. We can set the parameter "mon pg warn max object skew", but we do not 
>> know how the Ceph will work when there is one pool with a huge object / pool 
>> ratio
>> 
>> There is no obvious solution.
>> 
>> How to solve this problem correctly?
> 
> As a workaround, I'd just increase the skew option to make the warning go 
> away.
> 
> It seems to me like the underlying problem is that we're looking at object 
> count vs pg count, but ignoring the object sizes.  Unfortunately it's a 
> bit awkward to fix because we don't have a way to quantify the size of 
> omap objects via the stats (currently).  So for now, just adjust the skew 
> value enough to make the warning go away!
> 
> sage

Ok.
It seems that increase this config option, is the only acceptable option.

Thank!

— 
Mike, runs!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] samba gateway experiences with cephfs ?

2018-05-21 Thread Daniel Baumann
Hi

On 05/21/2018 05:38 PM, Jake Grimmett wrote:
> Unfortunately we have a large number (~200) of Windows and Macs clients
> which need CIFS/SMB  access to cephfs.

we too, which is why we're (partially) exporting cephfs over samba too,
1.5y in production now.

for us, cephfs-over-samba is significantly slower than cephfs directly
too, but it's not really an issue here (basically, if people use a
windows client here, they're already on the slow track anyway).

we had to do two things to get it working reliably though:

a) disable all locking on samba (otherwise "opportunistic locking" on
windows clients killed within hours all mds (kraken at that time))

b) only allow writes to a specific space on cephfs, reserved to samba
(with luminous; otherwise, we'd have problems with data consistency on
cephfs with people writing the same files from linux->cephfs and
samba->cephfs concurrently). my hunch is that samba caches writes and
doesn't give them back appropriatly.

> Finally, is the vfs_ceph module for Samba useful? It doesn't seem to be
> widely available pre-complied for for RHEL derivatives. Can anyone
> comment on their experiences using vfs_ceph, or point me to a Centos 7.x
> repo that has it?

we use debian, with backported kernel and backported samba, which has
vfs_ceph pre-compiled. however, we couldn't make vfs_ceph work at all -
the snapshot patters just don't seem to match/align (and nothing we
tried seem to work).

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW won't start after upgrade to 12.2.5

2018-05-21 Thread Marc Spencer
Hi,

  I have a test cluster of 4 servers running Luminous. We were running 12.2.2 
under Fedora 17 and have just completed upgrading to 12.2.5 under Fedora 18.

  All seems well: all MONs are up, OSDs are up, I can see objects stored as 
expected with rados -p default.rgw.buckets.data ls. 

  But when i start RGW, my load goes through the roof as radosgw continuously 
rapid-fire core dumps. 

Log Excerpt:


… 

   -16> 2018-05-21 15:52:48.244579 7fc70eeda700  5 -- 10.19.33.13:0/3446208184 
>> 10.19.33.14:6800/1417 conn(0x55e78a610800 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14567 cs=1 l=1). rx osd.6 seq 
7 0x55e78a67b500 osd_op_reply(47 notify.6 [watch watch cookie 94452947886080] 
v1092'43446 uv43445 ondisk = 0) v8
   -15> 2018-05-21 15:52:48.244619 7fc70eeda700  1 -- 10.19.33.13:0/3446208184 
<== osd.6 10.19.33.14:6800/1417 7  osd_op_reply(47 notify.6 [watch watch 
cookie 94452947886080] v1092'43446 uv43445 ondisk = 0) v8  152+0+0 
(1199963694 0 0) 0x55e78a67b500 con 0x55e78a610800
   -14> 2018-05-21 15:52:48.244777 7fc723656000  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:48 16.1 
16:93e5b521:::notify.7:head [create] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a67bc00 con 0
   -13> 2018-05-21 15:52:48.275650 7fc70eeda700  5 -- 10.19.33.13:0/3446208184 
>> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 seq 
7 0x55e78a678380 osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 ondisk = 
0) v8
   -12> 2018-05-21 15:52:48.275675 7fc70eeda700  1 -- 10.19.33.13:0/3446208184 
<== osd.2 10.19.33.15:6800/1433 7  osd_op_reply(48 notify.7 [create] 
v1092'43453 uv43453 ondisk = 0) v8  152+0+0 (2720997170 0 0) 0x55e78a678380 
con 0x55e78a65e000
   -11> 2018-05-21 15:52:48.275849 7fc723656000  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:49 16.1 
16:93e5b521:::notify.7:head [watch watch cookie 94452947887232] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a688000 con 0
   -10> 2018-05-21 15:52:48.296799 7fc70eeda700  5 -- 10.19.33.13:0/3446208184 
>> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 seq 
8 0x55e78a688000 osd_op_reply(49 notify.7 [watch watch cookie 94452947887232] 
v1092'43454 uv43453 ondisk = 0) v8
-9> 2018-05-21 15:52:48.296824 7fc70eeda700  1 -- 10.19.33.13:0/3446208184 
<== osd.2 10.19.33.15:6800/1433 8  osd_op_reply(49 notify.7 [watch watch 
cookie 94452947887232] v1092'43454 uv43453 ondisk = 0) v8  152+0+0 
(3812136207 0 0) 0x55e78a688000 con 0x55e78a65e000
-8> 2018-05-21 15:52:48.296924 7fc723656000  2 all 8 watchers are set, 
enabling cache
-7> 2018-05-21 15:52:48.297135 7fc57cbb6700  2 garbage collection: start
-6> 2018-05-21 15:52:48.297185 7fc57c3b5700  2 object expiration: start
-5> 2018-05-21 15:52:48.297321 7fc57cbb6700  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:50 18.3 
18:d242335b:gc::gc.2:head [call lock.lock] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a692000 con 0
-4> 2018-05-21 15:52:48.297395 7fc57c3b5700  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:51 18.0 
18:1a734c59:::obj_delete_at_hint.00:head [call lock.lock] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a692380 con 0
-3> 2018-05-21 15:52:48.299463 7fc568b8e700  5 schedule life cycle next 
start time: Tue May 22 04:00:00 2018
-2> 2018-05-21 15:52:48.299528 7fc567b8c700  5 ERROR: sync_all_users() 
returned ret=-2
-1> 2018-05-21 15:52:48.299698 7fc56738b700  1 -- 10.19.33.13:0/3446208184 
--> 10.19.33.14:6800/1417 -- osd_op(unknown.0.0:52 18.7 
18:e9187ab8:reshard::reshard.00:head [call lock.lock] snapc 0=[] 
ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a54fc00 con 0
 0> 2018-05-21 15:52:48.301978 7fc723656000 -1 *** Caught signal (Aborted) 
**
 in thread 7fc723656000 thread_name:radosgw

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous 
(stable)
 1: (()+0x22d82c) [0x55e7861a882c]
 2: (()+0x11fb0) [0x7fc719270fb0]
 3: (gsignal()+0x10b) [0x7fc716603f4b]
 4: (abort()+0x12b) [0x7fc7165ee591]
 5: (parse_rgw_ldap_bindpw[abi:cxx11](CephContext*)+0x68b) [0x55e78647409b]
 6: (rgw::auth::s3::LDAPEngine::init(CephContext*)+0xb9) [0x55e7863a38f9]
 7: (rgw::auth::s3::ExternalAuthStrategy::ExternalAuthStrategy(CephContext*, 
RGWRados*, rgw::auth::s3::AWSEngine::VersionAbstractor*)+0x74) [0x55e786154bc4]
 8: (std::__shared_ptr::__shared_ptr,
 CephContext* const&, RGWRados* const&>(std::_Sp_make_shared_tag, 
std::allocator const&, CephContext* const&, 
RGWRados* const&)+0xf8) [0x55e786158f78]
 9: (main()+0x196b) [0x55e78614463b]
 10: (__libc_start_main()+0xeb) [0x7fc7165f01bb]
 11: (_start()+0x2a) [0x55e78614c3da]
 NOTE: a copy of the executable, or `o

[ceph-users] Build the ceph daemon image

2018-05-21 Thread Ashutosh Narkar
Hello,

I am new to Ceph and trying to deploy Ceph on Kube by following
https://github.com/ceph/ceph-container/tree/master/examples/kubernetes.

I was getting the below error from the *mon* pod

monmaptool: error writing to '/etc/ceph/monmap-ceph': (30) Read-only file system


Hence I changed ,
MONMAP=/etc/ceph/monmap-${CLUSTER} to MONMAP=/var/lib/ceph/monmap-
${CLUSTER}.


Now how do I rebuild the ceph/daemon image ?

I appreciate your help.

Thanks
Ash
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Crush Map Changed After Reboot

2018-05-21 Thread Martin, Jeremy
Hello,

I had a ceph cluster up and running for a few months now and all has been well 
and good except for today where I updated two osd nodes and well still well, 
these two nodes are designated within a rack and the rack is the failure domain 
so they are essentially mirrors of each.  The issue came when I updated and 
rebooted the third node which has internal and external disks in a shelf and 
the failure domain is at the actual osd level as these are normal off the shelf 
disks for low priority storage that is not mission critical.  The issue is that 
before the reboot the crush map look and behaved correctly but after the reboot 
the crush map was changed and had to be rebuilt to get the storage back online, 
all was well after the reassignment by I need to track down why it lots it 
configuration.  The main differences here is that the first four disks (34-37) 
are supposed to be assigned to the chassis ceph-osd3-internal (like the before) 
and 21-31 assigned to chassis chassis-ceph-osd3-shelf1 
 (again like the before).  After the reboot everything (34-37 and 21-31) was 
reassigned to the host ceph-osd3.  Update was from 12.2.4 to 12.2.5.  Any 
thoughts?  

Jeremy

Before

ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
-58  0 root osd3-internal
-54  0 chassis ceph-osd3-internal
 34   hdd  0.42899 osd.34 up  1.0 1.0
 35   hdd  0.42899 osd.35 up  1.0 1.0
 36   hdd  0.42899 osd.36 up  1.0 1.0
 37   hdd  0.42899 osd.37 up  1.0 1.0
-50  0 root osd3-shelf1
-56  0 chassis ceph-osd3-shelf1
 21   hdd  1.81898 osd.21 up  1.0 1.0
 22   hdd  1.81898 osd.22 up  1.0 1.0
 23   hdd  1.81898 osd.23 up  1.0 1.0
 24   hdd  1.81898 osd.24 up  1.0 1.0
 25   hdd  1.81898 osd.25 up  1.0 1.0
 26   hdd  1.81898 osd.26 up  1.0 1.0
 27   hdd  1.81898 osd.27 up  1.0 1.0
 28   hdd  1.81898 osd.28 up  1.0 1.0
 29   hdd  1.81898 osd.29 up  1.0 1.0
 30   hdd  1.81898 osd.30 up  1.0 1.0
 31   hdd  1.81898 osd.31 up  1.0 1.0
 -7  0 host ceph-osd3
 -1   47.21199 root default
-40   23.59000 rack mainehall
 -3   23.59000 host ceph-osd1
  0   hdd  1.81898 osd.0  up  1.0 1.0
  Additional osd's left off for brevity
-5   23.62199 host ceph-osd2
 11   hdd  1.81898 osd.11 up  1.0 1.0
  Additional osd's left off for brevity

After

ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
-58  0 root osd3-internal
-54  0 chassis ceph-osd3-internal
-50  0 root osd3-shelf1
-56  0 chassis ceph-osd3-shelf1
-7   0 host ceph-osd3
 21   hdd  1.81898 osd.21 up  1.0 1.0
 22   hdd  1.81898 osd.22 up  1.0 1.0
 23   hdd  1.81898 osd.23 up  1.0 1.0
 24   hdd  1.81898 osd.24 up  1.0 1.0
 25   hdd  1.81898 osd.25 up  1.0 1.0
 26   hdd  1.81898 osd.26 up  1.0 1.0
 27   hdd  1.81898 osd.27 up  1.0 1.0
 28   hdd  1.81898 osd.28 up  1.0 1.0
 29   hdd  1.81898 osd.29 up  1.0 1.0
 30   hdd  1.81898 osd.30 up  1.0 1.0
 31   hdd  1.81898 osd.31 up  1.0 1.0
 34   hdd  0.42899 osd.34 up  1.0 1.0
 35   hdd  0.42899 osd.35 up  1.0 1.0
 36   hdd  0.42899 osd.36 up  1.0 1.0
 37   hdd  0.42899 osd.37 up  1.0 1.0
 -1   47.21199 root default
-40   23.59000 rack mainehall
 -3   23.59000 host ceph-osd1
  0   hdd  1.81898 osd.0  up  1.0 1.0
  Additional osd's left off for brevity
-42   23.62199 rack rangleyhall
 -5   23.62199 host ceph-osd2
 11   hdd  1.81898 osd.11 up  1.0 1.0
  Additional osd's left off for brevity
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] multi site with cephfs

2018-05-21 Thread Adrian Saul

You have the same performance problem then regardless of what platform you 
choose to present it on.  If you want cross site consistency with a single 
consistent view, you need to replicate writes synchronously between sites, 
which will induce a performance hit for writes.   Any other snapshot/async 
setup while improving write performance leaves you with that time window gap 
should you lose a site.

If you are not particularly latency sensitive on writes (i.e these are just 
small documents being written and left behind) then the write latency penalty 
is probably not that big an issue for the easier access a stretched CephFS 
filesystem would give you.  If your clients can access cephfs natively that 
might be cleaner than using NFS over the top, although it means having clients 
get full access to the ceph public network – otherwise my previously mentioned 
NFS export with automount would probably work for you.


From: Up Safe [mailto:upands...@gmail.com]
Sent: Tuesday, 22 May 2018 12:33 AM
To: David Turner 
Cc: Adrian Saul ; ceph-users 

Subject: Re: [ceph-users] multi site with cephfs

I'll explain.
Right now we have 2 sites (racks) with several dozens of servers at each
accessing a NAS (let's call it a NAS, although it's an IBM v7000 Unified that 
serves the files via NFS).

The biggest problem is that it works active-passive, i.e. we always access one 
of the storages for read/write
and the other one is replicated once every few hours, so it's more for backup 
needs.
In this setup once the power goes down in our main site - we're stuck with a 
bit (several hours) outdated files
and we need to remount all of the servers and what not.
The multi site ceph was supposed to solve this problem for us. This way we 
would have only local mounts, i.e.
each server would only access the filesystem that is in the same site. And if 
one of the sited go down - no pain.
The files are rather small, pdfs and xml of 50-300KB mostly.
The total size is about 25 TB right now.

We're a low budget company, so your advise about developing is not going to 
happen as we have no such skills or resources for this.
Plus, I want to make this transparent for the devs and everyone - just an 
infrastructure replacement that will buy me all of the ceph benefits and
allow the company to survive the power outages or storage crashes.


On Mon, May 21, 2018 at 5:12 PM, David Turner 
mailto:drakonst...@gmail.com>> wrote:
Not a lot of people use object storage multi-site.  I doubt anyone is using 
this like you are.  In theory it would work, but even if somebody has this 
setup running, it's almost impossible to tell if it would work for your needs 
and use case.  You really should try it out for yourself to see if it works to 
your needs.  And if you feel so inclined, report back here with how it worked.

If you're asking for advice, why do you need a networked posix filesystem?  
Unless you are using proprietary software with this requirement, it's generally 
lazy coding that requires a mounted filesystem like this and you should aim 
towards using object storage instead without any sort of NFS layer.  It's a 
little more work for the developers, but is drastically simpler to support and 
manage.

On Mon, May 21, 2018 at 10:06 AM Up Safe 
mailto:upands...@gmail.com>> wrote:
guys,
please tell me if I'm in the right direction.
If ceph object storage can be set up in multi site configuration,
and I add ganesha (which to my understanding is an "adapter"
that serves s3 objects via nfs to clients) -
won't this work as active-active?


Thanks

On Mon, May 21, 2018 at 11:48 AM, Up Safe 
mailto:upands...@gmail.com>> wrote:
ok, thanks.
but it seems to me that having pool replicas spread over sites is a bit too 
risky performance wise.
how about ganesha? will it work with cephfs and multi site setup?
I was previously reading about rgw with ganesha and it was full of limitations.
with cephfs - there is only one and one I can live with.
Will it work?

On Mon, May 21, 2018 at 10:57 AM, Adrian Saul 
mailto:adrian.s...@tpgtelecom.com.au>> wrote:

We run CephFS in a limited fashion in a stretched cluster of about 40km with 
redundant 10G fibre between sites – link latency is in the order of 1-2ms.  
Performance is reasonable for our usage but is noticeably slower than 
comparable local ceph based RBD shares.

Essentially we just setup the ceph pools behind cephFS to have replicas on each 
site.  To export it we are simply using Linux kernel NFS and it gets exported 
from 4 hosts that act as CephFS clients.  Those 4 hosts are then setup in an 
DNS record that resolves to all 4 IPs, and we then use automount to do 
automatic mounting and host failover on the NFS clients.  Automount takes care 
of finding the quickest and available NFS server.

I stress this is a limited setup that we use for some fairly light duty, but we 
are looking to move things like user home directories onto this.  YMMV.


From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.c

Re: [ceph-users] Crush Map Changed After Reboot

2018-05-21 Thread David Turner
Your problem sounds like osd_crush_update_on_start.  While set to the
default of true, when an osd starts it tells the mons which server it is on
and the mons will update the crush map to reflect it. While these osds
running on the host, but placed in a custom host in the crush map... when
they start they will update to show which host they are running on.  You
probably want to disable that in the config so that your custom crush
placement is not altered.

On Mon, May 21, 2018, 6:29 PM Martin, Jeremy  wrote:

> Hello,
>
> I had a ceph cluster up and running for a few months now and all has been
> well and good except for today where I updated two osd nodes and well still
> well, these two nodes are designated within a rack and the rack is the
> failure domain so they are essentially mirrors of each.  The issue came
> when I updated and rebooted the third node which has internal and external
> disks in a shelf and the failure domain is at the actual osd level as these
> are normal off the shelf disks for low priority storage that is not mission
> critical.  The issue is that before the reboot the crush map look and
> behaved correctly but after the reboot the crush map was changed and had to
> be rebuilt to get the storage back online, all was well after the
> reassignment by I need to track down why it lots it configuration.  The
> main differences here is that the first four disks (34-37) are supposed to
> be assigned to the chassis ceph-osd3-internal (like the before) and 21-31
> assigned to chassis chassis-ceph-osd3-shelf1
>  (again like the before).  After the reboot everything (34-37 and 21-31)
> was reassigned to the host ceph-osd3.  Update was from 12.2.4 to 12.2.5.
> Any thoughts?
>
> Jeremy
>
> Before
>
> ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
> -58  0 root osd3-internal
> -54  0 chassis ceph-osd3-internal
>  34   hdd  0.42899 osd.34 up  1.0 1.0
>  35   hdd  0.42899 osd.35 up  1.0 1.0
>  36   hdd  0.42899 osd.36 up  1.0 1.0
>  37   hdd  0.42899 osd.37 up  1.0 1.0
> -50  0 root osd3-shelf1
> -56  0 chassis ceph-osd3-shelf1
>  21   hdd  1.81898 osd.21 up  1.0 1.0
>  22   hdd  1.81898 osd.22 up  1.0 1.0
>  23   hdd  1.81898 osd.23 up  1.0 1.0
>  24   hdd  1.81898 osd.24 up  1.0 1.0
>  25   hdd  1.81898 osd.25 up  1.0 1.0
>  26   hdd  1.81898 osd.26 up  1.0 1.0
>  27   hdd  1.81898 osd.27 up  1.0 1.0
>  28   hdd  1.81898 osd.28 up  1.0 1.0
>  29   hdd  1.81898 osd.29 up  1.0 1.0
>  30   hdd  1.81898 osd.30 up  1.0 1.0
>  31   hdd  1.81898 osd.31 up  1.0 1.0
>  -7  0 host ceph-osd3
>  -1   47.21199 root default
> -40   23.59000 rack mainehall
>  -3   23.59000 host ceph-osd1
>   0   hdd  1.81898 osd.0  up  1.0 1.0
>   Additional osd's left off for brevity
> -5   23.62199 host ceph-osd2
>  11   hdd  1.81898 osd.11 up  1.0 1.0
>   Additional osd's left off for brevity
>
> After
>
> ID  CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
> -58  0 root osd3-internal
> -54  0 chassis ceph-osd3-internal
> -50  0 root osd3-shelf1
> -56  0 chassis ceph-osd3-shelf1
> -7   0 host ceph-osd3
>  21   hdd  1.81898 osd.21 up  1.0 1.0
>  22   hdd  1.81898 osd.22 up  1.0 1.0
>  23   hdd  1.81898 osd.23 up  1.0 1.0
>  24   hdd  1.81898 osd.24 up  1.0 1.0
>  25   hdd  1.81898 osd.25 up  1.0 1.0
>  26   hdd  1.81898 osd.26 up  1.0 1.0
>  27   hdd  1.81898 osd.27 up  1.0 1.0
>  28   hdd  1.81898 osd.28 up  1.0 1.0
>  29   hdd  1.81898 osd.29 up  1.0 1.0
>  30   hdd  1.81898 osd.30 up  1.0 1.0
>  31   hdd  1.81898 osd.31 up  1.0 1.0
>  34   hdd  0.42899 osd.34 up  1.0 1.0
>  35   hdd  0.42899 osd.35 up  1.0 1.0
>  36   hdd  0.42899 osd.36 up  1.0 1.0
>  37   hdd  0.42899 osd.37 

Re: [ceph-users] RGW won't start after upgrade to 12.2.5

2018-05-21 Thread Marc Spencer
I found the issue, for the curious.

The default configuration for rgw_ldap_secret seems to be set to 
/etc/openldap/secret, which on my system is empty:

# ceph-conf -D | grep ldap
rgw_ldap_binddn = uid=admin,cn=users,dc=example,dc=com
rgw_ldap_dnattr = uid
rgw_ldap_searchdn = cn=users,cn=accounts,dc=example,dc=com
rgw_ldap_searchfilter = 
rgw_ldap_secret = /etc/openldap/secret
rgw_ldap_uri = ldaps://
rgw_s3_auth_use_ldap = false

# cat /etc/openldap/secret
cat: /etc/openldap/secret: No such file or directory

But the code assumes that if it is set, the named file has content. Since it 
doesn’t, safe_read_file() asserts.

I set it to nothing (rgw_ldap_secret = ) in my configuration, and everything 
seems happy.

std::string parse_rgw_ldap_bindpw(CephContext* ctx)
{
  string ldap_bindpw;
  string ldap_secret = ctx->_conf->rgw_ldap_secret;

  if (ldap_secret.empty()) {
ldout(ctx, 10)
  << __func__ << " LDAP auth no rgw_ldap_secret file found in conf"
  << dendl;
} else {
  char bindpw[1024];
  memset(bindpw, 0, 1024);
  int pwlen = safe_read_file("" /* base */, ldap_secret.c_str(),
 bindpw, 1023);
if (pwlen) {
  ldap_bindpw = bindpw;
  boost::algorithm::trim(ldap_bindpw);
  if (ldap_bindpw.back() == '\n')
ldap_bindpw.pop_back();
}
  }

  return ldap_bindpw;
}


> On May 21, 2018, at 5:27 PM, Marc Spencer  > wrote:
> 
> Hi,
> 
>   I have a test cluster of 4 servers running Luminous. We were running 12.2.2 
> under Fedora 17 and have just completed upgrading to 12.2.5 under Fedora 18.
> 
>   All seems well: all MONs are up, OSDs are up, I can see objects stored as 
> expected with rados -p default.rgw.buckets.data ls. 
> 
>   But when i start RGW, my load goes through the roof as radosgw continuously 
> rapid-fire core dumps. 
> 
> Log Excerpt:
> 
> 
> … 
> 
>-16> 2018-05-21 15:52:48.244579 7fc70eeda700  5 -- 
> 10.19.33.13:0/3446208184 >> 10.19.33.14:6800/1417 conn(0x55e78a610800 :-1 
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14567 cs=1 l=1). rx osd.6 
> seq 7 0x55e78a67b500 osd_op_reply(47 notify.6 [watch watch cookie 
> 94452947886080] v1092'43446 uv43445 ondisk = 0) v8
>-15> 2018-05-21 15:52:48.244619 7fc70eeda700  1 -- 
> 10.19.33.13:0/3446208184 <== osd.6 10.19.33.14:6800/1417 7  
> osd_op_reply(47 notify.6 [watch watch cookie 94452947886080] v1092'43446 
> uv43445 ondisk = 0) v8  152+0+0 (1199963694 0 0) 0x55e78a67b500 con 
> 0x55e78a610800
>-14> 2018-05-21 15:52:48.244777 7fc723656000  1 -- 
> 10.19.33.13:0/3446208184 --> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:48 
> 16.1 16:93e5b521:::notify.7:head [create] snapc 0=[] 
> ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a67bc00 con 0
>-13> 2018-05-21 15:52:48.275650 7fc70eeda700  5 -- 
> 10.19.33.13:0/3446208184 >> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 
> seq 7 0x55e78a678380 osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 
> ondisk = 0) v8
>-12> 2018-05-21 15:52:48.275675 7fc70eeda700  1 -- 
> 10.19.33.13:0/3446208184 <== osd.2 10.19.33.15:6800/1433 7  
> osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 ondisk = 0) v8  
> 152+0+0 (2720997170 0 0) 0x55e78a678380 con 0x55e78a65e000
>-11> 2018-05-21 15:52:48.275849 7fc723656000  1 -- 
> 10.19.33.13:0/3446208184 --> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:49 
> 16.1 16:93e5b521:::notify.7:head [watch watch cookie 94452947887232] snapc 
> 0=[] ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a688000 con 0
>-10> 2018-05-21 15:52:48.296799 7fc70eeda700  5 -- 
> 10.19.33.13:0/3446208184 >> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 
> seq 8 0x55e78a688000 osd_op_reply(49 notify.7 [watch watch cookie 
> 94452947887232] v1092'43454 uv43453 ondisk = 0) v8
> -9> 2018-05-21 15:52:48.296824 7fc70eeda700  1 -- 
> 10.19.33.13:0/3446208184 <== osd.2 10.19.33.15:6800/1433 8  
> osd_op_reply(49 notify.7 [watch watch cookie 94452947887232] v1092'43454 
> uv43453 ondisk = 0) v8  152+0+0 (3812136207 0 0) 0x55e78a688000 con 
> 0x55e78a65e000
> -8> 2018-05-21 15:52:48.296924 7fc723656000  2 all 8 watchers are set, 
> enabling cache
> -7> 2018-05-21 15:52:48.297135 7fc57cbb6700  2 garbage collection: start
> -6> 2018-05-21 15:52:48.297185 7fc57c3b5700  2 object expiration: start
> -5> 2018-05-21 15:52:48.297321 7fc57cbb6700  1 -- 
> 10.19.33.13:0/3446208184 --> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:50 
> 18.3 18:d242335b:gc::gc.2:head [call lock.lock] snapc 0=[] 
> ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a692000 con 0
> -4> 2018-05-21 15:52:48.297395 7fc57c3b5700  1 -- 
> 10.19.33.13:0/3446208184 --> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:51 
> 18.0 18:1a734c59:::obj_delete_at_hint.00:head [call l

[ceph-users] how to export a directory to a specific rank manually

2018-05-21 Thread Wuxiaochen Wu
Hi  

I have deployed a cluster with two active MDSes. My question is how to export a 
directory to a specific rank manually, like from 0 to 1.

I’ve tried to run command “ceph daemon mds. export dir /som_dir 1 “.  
However, I found the subtree still existed in rank 0 by running command “ceph 
daemon mds.xxx get subtrees”.

In addition, I tried to use the xattr “dir pin” and it works. Another question 
is why command “export dir” did not work.

Cheers,

Wu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW won't start after upgrade to 12.2.5

2018-05-21 Thread Konstantin Shalygin

The default configuration for rgw_ldap_secret seems to be set to 
/etc/openldap/secret, which on my system is empty:



Please, create issue on tracker .

Thanks.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-21 Thread nokia ceph
Hi Ceph users,

We have a cluster with 5 node (67 disks) and EC 4+1 configuration and
min_size set as 4.
Ceph version : 12.2.5
While executing one of our resilience usecase , making private interface
down on one of the node, till kraken we saw less outage in rados (60s) .

Now with luminous, we could able to see rados read/write outage for more
than 200s . In the logs we could able to see that peer OSDs inform that one
of the node OSDs are down however the OSDs  defend like it is wrongly
marked down and does not move to down state for long time.

2018-05-22 05:37:17.871049 7f6ac71e6700  0 log_channel(cluster) log [WRN] :
Monitor daemon marked osd.1 down, but it is still running
2018-05-22 05:37:17.871072 7f6ac71e6700  0 log_channel(cluster) log [DBG] :
map e35690 wrongly marked me down at e35689
2018-05-22 05:37:17.878347 7f6ac71e6700  0 osd.1 35690 crush map has
features 1009107927421960192, adjusting msgr requires for osds
2018-05-22 05:37:18.296643 7f6ac71e6700  0 osd.1 35691 crush map has
features 1009107927421960192, adjusting msgr requires for osds


Only when all 67 OSDs are move to down state , the read/write traffic is
resumed.

Could you please help us in resolving this issue and if it is bug , we will
create corresponding ticket.

Thanks,
Muthu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com