[ceph-users] Memory usage of OSD

2020-05-12 Thread Rafał Wądołowski
Hi,
I noticed strange situation in one of our clusters. The OSD deamons are taking 
too much RAM.
We are running 12.2.12 and have default configuration of osd_memory_target 
(4GiB).
Heap dump shows:

osd.2969 dumping heap profile now.

MALLOC: 6381526944 ( 6085.9 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +173373288 (  165.3 MiB) Bytes in central cache freelist
MALLOC: + 17163520 (   16.4 MiB) Bytes in transfer cache freelist
MALLOC: + 95339512 (   90.9 MiB) Bytes in thread cache freelists
MALLOC: + 28995744 (   27.7 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =   6696399008 ( 6386.2 MiB) Actual memory used (physical + swap)
MALLOC: +218267648 (  208.2 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =   691456 ( 6594.3 MiB) Virtual address space used
MALLOC:
MALLOC: 408276  Spans in use
MALLOC: 75  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

IMO "Bytes in use by application" should be less than osd_memory_target. Am I 
correct?
I checked heap dump with google-pprof and got following results.
Total: 149.4 MB
60.5  40.5%  40.5% 60.5  40.5% 
rocksdb::UncompressBlockContentsForCompressionType
34.2  22.9%  63.4% 34.2  22.9% ceph::buffer::create_aligned_in_mempool
11.9   7.9%  71.3% 12.1   8.1% std::_Rb_tree::_M_emplace_hint_unique
10.7   7.1%  78.5% 71.2  47.7% rocksdb::ReadBlockContents

Does it mean that most of RAM is used by rocksdb?

How can I take a deeper look into memory usage ?


Regards,

Rafał Wądołowski



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs taking too much memory, for pglog

2020-05-12 Thread Harald Staub

Hi Mark

Thank you for your feedback!

The maximum number of PGs per OSD is only 123. But we have PGs with a 
lot of objects. For RGW, there is an EC pool 8+3 with 1024 PGs with 900M 
objects, maybe this is the problematic part. The OSDs are 510 hdd, 32 ssd.


Not sure, do you suggest to use something like
ceph-objectstore-tool --op trim-pg-log ?

When done correctly, would the risk be a lot of backfilling? Or also 
data loss?


Also, to get up the cluster is one thing, to keep it running seems to be 
a real challenge right now (OOM killer) ...


Cheers
 Harry

On 13.05.20 07:10, Mark Nelson wrote:

Hi Herald,


Changing the bluestore cache settings will have no effect at all on 
pglog memory consumption.  You can try either reducing the number of PGs 
(you might want to check and see how many PGs you have and specifically 
how many PGs on that OSD), or decrease the number of pglog entries per 
PG.  Keep in mind that fewer PG log entries may impact recovery.  FWIW, 
8.5GB of memory usage for pglog implies that you have a lot of PGs per 
OSD, so that's probably the first place to look.



Good luck!

Mark


On 5/12/20 5:10 PM, Harald Staub wrote:
Several OSDs of one of our clusters are down currently because RAM 
usage has increased during the last days. Now it is more than we can 
handle on some systems. Frequently OSDs get killed by the OOM killer. 
Looking at "ceph daemon osd.$OSD_ID dump_mempools", it shows that 
nearly all (about 8.5 GB) is taken by osd_pglog, e.g.


    "osd_pglog": {
    "items": 461859,
    "bytes": 8445595868
    },

We tried to reduce it, with "osd memory target" and even with 
"bluestore cache autotune = false" (together with "bluestore cache 
size hdd"), but there was no effect at all.


I remember the pglog_hardlimit parameter, but that is already set by 
default with Nautilus I read. I.e. this is on Nautilus, 14.2.8.


Is there a way to limit this pglog memory?

Cheers
 Harry
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs taking too much memory, for pglog

2020-05-12 Thread Mark Nelson

Hi Herald,


Changing the bluestore cache settings will have no effect at all on 
pglog memory consumption.  You can try either reducing the number of PGs 
(you might want to check and see how many PGs you have and specifically 
how many PGs on that OSD), or decrease the number of pglog entries per 
PG.  Keep in mind that fewer PG log entries may impact recovery.  FWIW, 
8.5GB of memory usage for pglog implies that you have a lot of PGs per 
OSD, so that's probably the first place to look.



Good luck!

Mark


On 5/12/20 5:10 PM, Harald Staub wrote:
Several OSDs of one of our clusters are down currently because RAM 
usage has increased during the last days. Now it is more than we can 
handle on some systems. Frequently OSDs get killed by the OOM killer. 
Looking at "ceph daemon osd.$OSD_ID dump_mempools", it shows that 
nearly all (about 8.5 GB) is taken by osd_pglog, e.g.


    "osd_pglog": {
    "items": 461859,
    "bytes": 8445595868
    },

We tried to reduce it, with "osd memory target" and even with 
"bluestore cache autotune = false" (together with "bluestore cache 
size hdd"), but there was no effect at all.


I remember the pglog_hardlimit parameter, but that is already set by 
default with Nautilus I read. I.e. this is on Nautilus, 14.2.8.


Is there a way to limit this pglog memory?

Cheers
 Harry
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Read speed low in cephfs volume exposed as samba share using vfs_ceph

2020-05-12 Thread Amudhan P
Hi,

I am running a small 3 node Ceph Nautilus 14.2.8 cluster on Ubuntu 18.04.

I am testing cluster to expose cephfs volume in samba v4 share for the user
to access from windows for latter use.
Samba Version 4.7.6-Ubuntu and mount.cifs version: 6.8.

When I did a test with DD Write (600 MB/s) and md5sum file Read speed is
(300 - 400 MB/s) from ceph kernel mount.

The same volume I have exposed in samba using "vfs_ceph" and mounted it
through CIFS in another ubuntu18.04 as client.
Now, when I perform DD write I get the speed of 600 MB/s and md5sum of file
Read speed is only 65 MB/s.

There is a different result when I try to read the same file using
smbclinet getting the speed of 101 MB/s.

Why is this difference what could be the issue?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW STS Support in Nautilus ?

2020-05-12 Thread Pritha Srivastava
Matching other fields in the token as part of the Condition Statement is
work in progress, but isnt there in nautilus.

Thanks,
Pritha

On Tue, May 12, 2020 at 10:21 PM Wyllys Ingersoll <
wyllys.ingers...@keepertech.com> wrote:

> Does STS support using other fields from the token as part of the
> Condition statement?  For example looking for specific "sub" identities or
> matching on custom token fields like lists of roles?
>
>
>
> On Tue, May 12, 2020 at 11:50 AM Matt Benjamin 
> wrote:
>
>> yay!  thanks Wyllys, Pritha
>>
>> Matt
>>
>> On Tue, May 12, 2020 at 11:38 AM Wyllys Ingersoll
>>  wrote:
>> >
>> >
>> > Thanks for the hint, I fixed my keycloak configuration for that
>> application client so the token only includes a single audience value and
>> now it works fine.
>> >
>> > thanks!!
>> >
>> >
>> > On Tue, May 12, 2020 at 11:11 AM Wyllys Ingersoll <
>> wyllys.ingers...@keepertech.com> wrote:
>> >>
>> >> The "aud" field in the introspection result is a list, not a single
>> string.
>> >>
>> >> On Tue, May 12, 2020 at 11:02 AM Pritha Srivastava <
>> prsri...@redhat.com> wrote:
>> >>>
>> >>> app_id must match with the 'aud' field in the token introspection
>> result (In the example the value of 'aud' is 'customer-portal')
>> >>>
>> >>> Thanks,
>> >>> Pritha
>> >>>
>> >>> On Tue, May 12, 2020 at 8:16 PM Wyllys Ingersoll <
>> wyllys.ingers...@keepertech.com> wrote:
>> 
>> 
>>  Running Nautilus 14.2.9 and trying to follow the STS example given
>> here: https://docs.ceph.com/docs/master/radosgw/STS/ to setup a policy
>> for AssumeRoleWithWebIdentity using KeyCloak (8.0.1) as the OIDC provider.
>> I am able to see in the rgw debug logs that the token being passed from the
>> client is passing the introspection check, but it always ends up failing
>> the final authorization to access the requested bucket resource and is
>> rejected with a 403 status "AccessDenied".
>> 
>>  I configured my policy as described in the 2nd example on the STS
>> page above. I suspect the problem is with the "StringEquals" condition
>> statement in the AssumeRolePolicy document (I could be wrong though).
>> 
>>  The example shows using the keycloak URI followed by ":app_id"
>> matching with the name of the keycloak client application
>> ("customer-portal" in the example).  My keycloak setup does not have any
>> such field in the introspection result and I can't seem to figure out how
>> to make this all work.
>> 
>>  I cranked up the logging to 20/20 and still did not see any hints as
>> to what part of the policy is causing the access to be denied.
>> 
>>  Any suggestions?
>> 
>>  -Wyllys Ingersoll
>> 
>>  ___
>>  Dev mailing list -- d...@ceph.io
>>  To unsubscribe send an email to dev-le...@ceph.io
>> >
>> > ___
>> > Dev mailing list -- d...@ceph.io
>> > To unsubscribe send an email to dev-le...@ceph.io
>>
>>
>>
>> --
>>
>> Matt Benjamin
>> Red Hat, Inc.
>> 315 West Huron Street, Suite 140A
>> Ann Arbor, Michigan 48103
>>
>> http://www.redhat.com/en/technologies/storage
>>
>> tel.  734-821-5101
>> fax.  734-769-8938
>> cel.  734-216-5309
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Kári Bertilsson
Hello David

I have physical devices i can use to mirror the OSD's no problem. But i
dont't think those disks are actually failing since there is no bad sector
on them and they are brand new with no issues reading from. But they got
corrupt OSD superblock which i believe happend because of bad SAS
controller or unclean shutdown and i can't find any way to get the data off
them or repair the OSD superblock.

On Tue, May 12, 2020 at 11:47 PM David Turner  wrote:

> Do you have access to another Ceph cluster with enough available space to
> create rbds that you dd these failing disks into? That's what I'm doing
> right now with some failing disks. I've recovered 2 out of 6 osds that
> failed in this way. I would recommend against using the same cluster for
> this, but a stage cluster or something would be great.
>
> On Tue, May 12, 2020, 7:36 PM Kári Bertilsson 
> wrote:
>
>> Hi Paul
>>
>> I was able to mount both OSD's i need data from successfully using
>> "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op fuse
>> --mountpoint /osd92/"
>>
>> I see the PG slices that are missing in the mounted folder
>> "41.b3s3_head" "41.ccs5_head" etc. And i can copy any data from inside the
>> mounted folder and that works fine.
>>
>> But when i try to export it fails. I get the same error when trying to
>> list.
>>
>> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op list
>> --debug
>> Output @ https://pastebin.com/nXScEL6L
>>
>> Any ideas ?
>>
>> On Tue, May 12, 2020 at 12:17 PM Paul Emmerich 
>> wrote:
>>
>> > First thing I'd try is to use objectstore-tool to scrape the
>> > inactive/broken PGs from the dead OSDs using it's PG export feature.
>> > Then import these PGs into any other OSD which will automatically
>> recover
>> > it.
>> >
>> > Paul
>> >
>> > --
>> > Paul Emmerich
>> >
>> > Looking for help with your Ceph cluster? Contact us at https://croit.io
>> >
>> > croit GmbH
>> > Freseniusstr. 31h
>> > 81247 München
>> > www.croit.io
>> > Tel: +49 89 1896585 90
>> >
>> >
>> > On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson 
>> > wrote:
>> >
>> >> Yes
>> >> ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1
>> >>
>> >> On Tue, May 12, 2020 at 10:39 AM Eugen Block  wrote:
>> >>
>> >> > Can you share your osd tree and the current ceph status?
>> >> >
>> >> >
>> >> > Zitat von Kári Bertilsson :
>> >> >
>> >> > > Hello
>> >> > >
>> >> > > I had an incidence where 3 OSD's crashed at once completely and
>> won't
>> >> > power
>> >> > > up. And during recovery 3 OSD's in another host have somehow become
>> >> > > corrupted. I am running erasure coding with 8+2 setup using crush
>> map
>> >> > which
>> >> > > takes 2 OSDs per host, and after losing the other 2 OSD i have few
>> >> PG's
>> >> > > down. Unfortunately these PG's seem to overlap almost all data on
>> the
>> >> > pool,
>> >> > > so i believe the entire pool is mostly lost after only these 2% of
>> >> PG's
>> >> > > down.
>> >> > >
>> >> > > I am running ceph 14.2.9.
>> >> > >
>> >> > > OSD 92 log https://pastebin.com/5aq8SyCW
>> >> > > OSD 97 log https://pastebin.com/uJELZxwr
>> >> > >
>> >> > > ceph-bluestore-tool repair without --deep showed "success" but
>> OSD's
>> >> > still
>> >> > > fail with the log above.
>> >> > >
>> >> > > Log from trying ceph-bluestore-tool repair --deep which is still
>> >> running,
>> >> > > not sure if it will actually fix anything and log looks pretty bad.
>> >> > > https://pastebin.com/gkqTZpY3
>> >> > >
>> >> > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97
>> >> --op
>> >> > > list" gave me input/output error. But everything in SMART looks OK,
>> >> and i
>> >> > > see no indication of hardware read error in any logs. Same for both
>> >> OSD.
>> >> > >
>> >> > > The OSD's with corruption have absolutely no bad sectors and likely
>> >> have
>> >> > > only a minor corruption but at important locations.
>> >> > >
>> >> > > Any ideas on how to recover this kind of scenario ? Any tips would
>> be
>> >> > > highly appreciated.
>> >> > >
>> >> > > Best regards,
>> >> > > Kári Bertilsson
>> >> > > ___
>> >> > > ceph-users mailing list -- ceph-users@ceph.io
>> >> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >> >
>> >> >
>> >> > ___
>> >> > ceph-users mailing list -- ceph-users@ceph.io
>> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >> >
>> >> ___
>> >> ceph-users mailing list -- ceph-users@ceph.io
>> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread David Turner
Do you have access to another Ceph cluster with enough available space to
create rbds that you dd these failing disks into? That's what I'm doing
right now with some failing disks. I've recovered 2 out of 6 osds that
failed in this way. I would recommend against using the same cluster for
this, but a stage cluster or something would be great.

On Tue, May 12, 2020, 7:36 PM Kári Bertilsson  wrote:

> Hi Paul
>
> I was able to mount both OSD's i need data from successfully using
> "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op fuse
> --mountpoint /osd92/"
>
> I see the PG slices that are missing in the mounted folder
> "41.b3s3_head" "41.ccs5_head" etc. And i can copy any data from inside the
> mounted folder and that works fine.
>
> But when i try to export it fails. I get the same error when trying to
> list.
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op list
> --debug
> Output @ https://pastebin.com/nXScEL6L
>
> Any ideas ?
>
> On Tue, May 12, 2020 at 12:17 PM Paul Emmerich 
> wrote:
>
> > First thing I'd try is to use objectstore-tool to scrape the
> > inactive/broken PGs from the dead OSDs using it's PG export feature.
> > Then import these PGs into any other OSD which will automatically recover
> > it.
> >
> > Paul
> >
> > --
> > Paul Emmerich
> >
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io
> > Tel: +49 89 1896585 90
> >
> >
> > On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson 
> > wrote:
> >
> >> Yes
> >> ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1
> >>
> >> On Tue, May 12, 2020 at 10:39 AM Eugen Block  wrote:
> >>
> >> > Can you share your osd tree and the current ceph status?
> >> >
> >> >
> >> > Zitat von Kári Bertilsson :
> >> >
> >> > > Hello
> >> > >
> >> > > I had an incidence where 3 OSD's crashed at once completely and
> won't
> >> > power
> >> > > up. And during recovery 3 OSD's in another host have somehow become
> >> > > corrupted. I am running erasure coding with 8+2 setup using crush
> map
> >> > which
> >> > > takes 2 OSDs per host, and after losing the other 2 OSD i have few
> >> PG's
> >> > > down. Unfortunately these PG's seem to overlap almost all data on
> the
> >> > pool,
> >> > > so i believe the entire pool is mostly lost after only these 2% of
> >> PG's
> >> > > down.
> >> > >
> >> > > I am running ceph 14.2.9.
> >> > >
> >> > > OSD 92 log https://pastebin.com/5aq8SyCW
> >> > > OSD 97 log https://pastebin.com/uJELZxwr
> >> > >
> >> > > ceph-bluestore-tool repair without --deep showed "success" but OSD's
> >> > still
> >> > > fail with the log above.
> >> > >
> >> > > Log from trying ceph-bluestore-tool repair --deep which is still
> >> running,
> >> > > not sure if it will actually fix anything and log looks pretty bad.
> >> > > https://pastebin.com/gkqTZpY3
> >> > >
> >> > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97
> >> --op
> >> > > list" gave me input/output error. But everything in SMART looks OK,
> >> and i
> >> > > see no indication of hardware read error in any logs. Same for both
> >> OSD.
> >> > >
> >> > > The OSD's with corruption have absolutely no bad sectors and likely
> >> have
> >> > > only a minor corruption but at important locations.
> >> > >
> >> > > Any ideas on how to recover this kind of scenario ? Any tips would
> be
> >> > > highly appreciated.
> >> > >
> >> > > Best regards,
> >> > > Kári Bertilsson
> >> > > ___
> >> > > ceph-users mailing list -- ceph-users@ceph.io
> >> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >
> >> >
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Kári Bertilsson
Hi Paul

I was able to mount both OSD's i need data from successfully using
"ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op fuse
--mountpoint /osd92/"

I see the PG slices that are missing in the mounted folder
"41.b3s3_head" "41.ccs5_head" etc. And i can copy any data from inside the
mounted folder and that works fine.

But when i try to export it fails. I get the same error when trying to list.

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-92 --op list
--debug
Output @ https://pastebin.com/nXScEL6L

Any ideas ?

On Tue, May 12, 2020 at 12:17 PM Paul Emmerich 
wrote:

> First thing I'd try is to use objectstore-tool to scrape the
> inactive/broken PGs from the dead OSDs using it's PG export feature.
> Then import these PGs into any other OSD which will automatically recover
> it.
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson 
> wrote:
>
>> Yes
>> ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1
>>
>> On Tue, May 12, 2020 at 10:39 AM Eugen Block  wrote:
>>
>> > Can you share your osd tree and the current ceph status?
>> >
>> >
>> > Zitat von Kári Bertilsson :
>> >
>> > > Hello
>> > >
>> > > I had an incidence where 3 OSD's crashed at once completely and won't
>> > power
>> > > up. And during recovery 3 OSD's in another host have somehow become
>> > > corrupted. I am running erasure coding with 8+2 setup using crush map
>> > which
>> > > takes 2 OSDs per host, and after losing the other 2 OSD i have few
>> PG's
>> > > down. Unfortunately these PG's seem to overlap almost all data on the
>> > pool,
>> > > so i believe the entire pool is mostly lost after only these 2% of
>> PG's
>> > > down.
>> > >
>> > > I am running ceph 14.2.9.
>> > >
>> > > OSD 92 log https://pastebin.com/5aq8SyCW
>> > > OSD 97 log https://pastebin.com/uJELZxwr
>> > >
>> > > ceph-bluestore-tool repair without --deep showed "success" but OSD's
>> > still
>> > > fail with the log above.
>> > >
>> > > Log from trying ceph-bluestore-tool repair --deep which is still
>> running,
>> > > not sure if it will actually fix anything and log looks pretty bad.
>> > > https://pastebin.com/gkqTZpY3
>> > >
>> > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97
>> --op
>> > > list" gave me input/output error. But everything in SMART looks OK,
>> and i
>> > > see no indication of hardware read error in any logs. Same for both
>> OSD.
>> > >
>> > > The OSD's with corruption have absolutely no bad sectors and likely
>> have
>> > > only a minor corruption but at important locations.
>> > >
>> > > Any ideas on how to recover this kind of scenario ? Any tips would be
>> > > highly appreciated.
>> > >
>> > > Best regards,
>> > > Kári Bertilsson
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Difficulty creating a topic for bucket notifications

2020-05-12 Thread Alexis Anand
Hi,



I am trying to create a topic so that I can use it to listen for object 
creation notifications on a bucket.



If I make my API call without supplying AWS authorization headers, the topic 
creation succeeds, and it can be seen by using a ListTopics call.



However, in order to attach a topic to a bucket, the topic and bucket must have 
the same owner. So I tried creating a topic using AWS auth.



The credential header I tried was the same as what I use for get/put items to a 
bucket:

Credential=/20200512/us-east-1/s3/aws4_request



However in this case rather than succeeding I get a NotImplemented error.



If I tried changing the service in the credential to something other than s3, 
like "Credential=/20200512/us-east-1/s3/aws4_request" I instead 
get a SignatureDoesNotMatch error. What is the right way to authenticate a 
CreateTopic request?



Thanks,

Alexis
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSDs taking too much memory, for pglog

2020-05-12 Thread Harald Staub
Several OSDs of one of our clusters are down currently because RAM usage 
has increased during the last days. Now it is more than we can handle on 
some systems. Frequently OSDs get killed by the OOM killer. Looking at 
"ceph daemon osd.$OSD_ID dump_mempools", it shows that nearly all (about 
8.5 GB) is taken by osd_pglog, e.g.


"osd_pglog": {
"items": 461859,
"bytes": 8445595868
},

We tried to reduce it, with "osd memory target" and even with "bluestore 
cache autotune = false" (together with "bluestore cache size hdd"), but 
there was no effect at all.


I remember the pglog_hardlimit parameter, but that is already set by 
default with Nautilus I read. I.e. this is on Nautilus, 14.2.8.


Is there a way to limit this pglog memory?

Cheers
 Harry
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to reshard bucket

2020-05-12 Thread Timothy Geier
Thank you..I looked through both logs and noticed this in the cancel one:

osd_op(unknown.0.0:4164 41.2 41:55b0279d:reshard::reshard.09:head [call 
rgw.reshard_remove] snapc 0=[] ondisk+write+known_if_redirected e24984) v8 -- 
0x7fe9b3625710 con 0
osd_op_reply(4164 reshard.09 [call] v24984'105796943 uv105796922 ondisk 
= -2 ((2) No such file or directory)) v8  162+0+0 (203651653 0 0) 
0x7fe9880044a0 con 0x7fe9b3625b70
ERROR: failed to remove entry from reshard log, oid=reshard.09 tenant= 
bucket=foo

Is there anything else that I should look for?  It looks like the cancel 
process thinks that reshard.09 is present (and probably blocking my 
attempts at resharding) but it's not actually there and thus can't be removed.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to reshard bucket

2020-05-12 Thread Eric Ivancich
Perhaps the next step is to examine the generated logs from:

radosgw-admin reshard status --bucket=foo --debug-rgw=20 --debug-ms=1
radosgw-admin reshard cancel --bucket foo --debug-rgw=20 --debug-ms=1

Eric

--
J. Eric Ivancich
he / him / his
Red Hat Storage
Ann Arbor, Michigan, USA

> On May 11, 2020, at 12:25 PM, Timothy Geier  wrote:
> 
> Hello all,
> 
> I'm having an issue with a bucket that refuses to be resharded..for the 
> record, the cluster was recently upgraded from 13.2.4 to 13.2.10.
> 
> # radosgw-admin reshard add --bucket foo --num-shards 3300
> ERROR: the bucket is currently undergoing resharding and cannot be added to 
> the reshard list at this time
> 
> # radosgw-admin reshard list
> []
> 
> # radosgw-admin reshard status --bucket=foo
> [
> {
> "reshard_status": "not-resharding",
> "new_bucket_instance_id": "",
> "num_shards": -1
> },
> 
> 
> # radosgw-admin reshard cancel --bucket foo
> ERROR: failed to remove entry from reshard log, oid=reshard.09 
> tenant= bucket=foo
> 
> # radosgw-admin reshard stale-instances list
> []
> 
> Is there anything else I should check to troubleshoot this?  I was able to 
> reshard another bucket since the upgrade, so I suspect there's something 
> lingering that's blocking this.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-12 Thread Anthony D'Atri

>  I think, however, that a disappearing back network has no real consequences 
> as the heartbeats always go over both.

FWIW this has not been my experience, at least through Luminous.

What I’ve seen is that when the cluster/replication net is configured but 
unavailable, OSD heartbeats fail and peers report them to the mons as down.  
The mons send out a map accordingly, and the affected OSDs report “I’m not dead 
yet!”.  Flap flap flap.

YMMV
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW STS Support in Nautilus ?

2020-05-12 Thread Wyllys Ingersoll
Does STS support using other fields from the token as part of the Condition
statement?  For example looking for specific "sub" identities or matching
on custom token fields like lists of roles?



On Tue, May 12, 2020 at 11:50 AM Matt Benjamin  wrote:

> yay!  thanks Wyllys, Pritha
>
> Matt
>
> On Tue, May 12, 2020 at 11:38 AM Wyllys Ingersoll
>  wrote:
> >
> >
> > Thanks for the hint, I fixed my keycloak configuration for that
> application client so the token only includes a single audience value and
> now it works fine.
> >
> > thanks!!
> >
> >
> > On Tue, May 12, 2020 at 11:11 AM Wyllys Ingersoll <
> wyllys.ingers...@keepertech.com> wrote:
> >>
> >> The "aud" field in the introspection result is a list, not a single
> string.
> >>
> >> On Tue, May 12, 2020 at 11:02 AM Pritha Srivastava 
> wrote:
> >>>
> >>> app_id must match with the 'aud' field in the token introspection
> result (In the example the value of 'aud' is 'customer-portal')
> >>>
> >>> Thanks,
> >>> Pritha
> >>>
> >>> On Tue, May 12, 2020 at 8:16 PM Wyllys Ingersoll <
> wyllys.ingers...@keepertech.com> wrote:
> 
> 
>  Running Nautilus 14.2.9 and trying to follow the STS example given
> here: https://docs.ceph.com/docs/master/radosgw/STS/ to setup a policy
> for AssumeRoleWithWebIdentity using KeyCloak (8.0.1) as the OIDC provider.
> I am able to see in the rgw debug logs that the token being passed from the
> client is passing the introspection check, but it always ends up failing
> the final authorization to access the requested bucket resource and is
> rejected with a 403 status "AccessDenied".
> 
>  I configured my policy as described in the 2nd example on the STS
> page above. I suspect the problem is with the "StringEquals" condition
> statement in the AssumeRolePolicy document (I could be wrong though).
> 
>  The example shows using the keycloak URI followed by ":app_id"
> matching with the name of the keycloak client application
> ("customer-portal" in the example).  My keycloak setup does not have any
> such field in the introspection result and I can't seem to figure out how
> to make this all work.
> 
>  I cranked up the logging to 20/20 and still did not see any hints as
> to what part of the policy is causing the access to be denied.
> 
>  Any suggestions?
> 
>  -Wyllys Ingersoll
> 
>  ___
>  Dev mailing list -- d...@ceph.io
>  To unsubscribe send an email to dev-le...@ceph.io
> >
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-12 Thread Frank Schilder
Hi MJ,

this should work. Note that when using cloned devices all traffic will go 
through the same VLAN. In that case, I believe you an simply remove the cluster 
network definition and use just one IP, there is no point having the second IP 
on the same VLAN. You will probably have to do "noout,nodown" for the 
flip-over, which probably required a restart of each OSD. I think, however, 
that a disappearing back network has no real consequences as the heartbeats 
always go over both. There might be stuck replication traffic for a while, but 
even this can be avoided with "osd pause".

Our configuration with 2 VLANS is this:

public network: ceph0.81: flags=4163  mtu 9000

cluster network: ceph0.82: flags=4163  mtu 9000

ceph0: flags=5187  mtu 9000

em1: flags=6211  mtu 9000
em2: flags=6211  mtu 9000
p1p1: flags=6211  mtu 9000
p1p2: flags=6211  mtu 9000
p2p1: flags=6211  mtu 9000
p2p2: flags=6211  mtu 9000

If you already have 2 VLANs with different IDs, then this flip-over is trivial. 
I did it without service outage.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: mj 
Sent: 12 May 2020 13:12:47
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Cluster network and public network

Hi,

On 11/05/2020 08:50, Wido den Hollander wrote:
> Great to hear! I'm still behind this idea and all the clusters I design
> have a single (or LACP) network going to the host.
>
> One IP address per node where all traffic goes over. That's Ceph, SSH,
> (SNMP) Monitoring, etc.
>
> Wido

We have an 'old-style' cluster, seperated LAN/cluster network. We would
like to move over to the 'new-style'.

Is it as easy as: define the NICs in a 2x10G LACP bond0, and add both
NICs to the bond0 config, and add configure like:

> auto bond0
> iface bond0 inet static
> address 192.168.0.5
> netmask 255.255.255.0

and add our cluster IP as a second IP, like

> auto bond0:1
> iface bond0:1 inet static
> address 192.168.10.160
> netmask 255.255.255.0

On all nodes, reboot, and everything will work?

Or are there ceph specifics to consider?

Thanks,
MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Apply/Commit vs Read/Write Op Latency

2020-05-12 Thread John Petrini
Hello,

Bumping this in hopes that someone can shed some light on this. I've tried
to find details on these metrics but I've come up empty handed.

Thank you,

John
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] DocuBetter Meeting -- EMEA 13 May 2020

2020-05-12 Thread John Zachary Dover
There is a general documentation meeting called the "DocuBetter Meeting",
and it is held every two weeks. The next DocuBetter Meeting will be on 13
May 2020 at 0830 PST, and will run for thirty minutes. Everyone with a
documentation-related request or complaint is invited. The meeting will be
held here: https://bluejeans.com/908675367

Send documentation-related requests and complaints to me by replying to
this email and CCing me at zac.do...@gmail.com.

The next DocuBetter meeting is scheduled for:

13 May 2020  0830 PST
13 May 2020  1630 UTC
14 May 2020  0230 AEST

Etherpad: https://pad.ceph.com/p/Ceph_Documentation
Meeting: https://bluejeans.com/908675367


Thanks, everyone.

Zac Dover
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Zeroing out rbd image or volume

2020-05-12 Thread huxia...@horebdata.cn
thanks a lot for all. Looks like dd zero does not help much about improving 
security, but OSD encryption would be sufficent.

best regards,

Samuel



huxia...@horebdata.cn
 
From: Wido den Hollander
Date: 2020-05-12 14:03
To: Paul Emmerich; Dillaman, Jason
CC: Marc Roos; ceph-users
Subject: [ceph-users] Re: Zeroing out rbd image or volume
 
 
On 5/12/20 1:54 PM, Paul Emmerich wrote:
> And many hypervisors will turn writing zeroes into an unmap/trim (qemu
> detect-zeroes=unmap), so running trim on the entire empty disk is often the
> same as writing zeroes.
> So +1 for encryption being the proper way here
> 
 
+1
 
And to add to this: No, a newly created RBD image will never have 'left
over' bits and bytes from a previous RBD image.
 
I had to explain this multiple times to people which were used to old
(i)SCSI setups where partitions could have leftover data from a
previously created LUN.
 
With RBD this won't happen.
 
Wido
 
> 
> Paul
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW STS Support in Nautilus ?

2020-05-12 Thread Matt Benjamin
yay!  thanks Wyllys, Pritha

Matt

On Tue, May 12, 2020 at 11:38 AM Wyllys Ingersoll
 wrote:
>
>
> Thanks for the hint, I fixed my keycloak configuration for that application 
> client so the token only includes a single audience value and now it works 
> fine.
>
> thanks!!
>
>
> On Tue, May 12, 2020 at 11:11 AM Wyllys Ingersoll 
>  wrote:
>>
>> The "aud" field in the introspection result is a list, not a single string.
>>
>> On Tue, May 12, 2020 at 11:02 AM Pritha Srivastava  
>> wrote:
>>>
>>> app_id must match with the 'aud' field in the token introspection result 
>>> (In the example the value of 'aud' is 'customer-portal')
>>>
>>> Thanks,
>>> Pritha
>>>
>>> On Tue, May 12, 2020 at 8:16 PM Wyllys Ingersoll 
>>>  wrote:


 Running Nautilus 14.2.9 and trying to follow the STS example given here: 
 https://docs.ceph.com/docs/master/radosgw/STS/ to setup a policy for 
 AssumeRoleWithWebIdentity using KeyCloak (8.0.1) as the OIDC provider. I 
 am able to see in the rgw debug logs that the token being passed from the 
 client is passing the introspection check, but it always ends up failing 
 the final authorization to access the requested bucket resource and is 
 rejected with a 403 status "AccessDenied".

 I configured my policy as described in the 2nd example on the STS page 
 above. I suspect the problem is with the "StringEquals" condition 
 statement in the AssumeRolePolicy document (I could be wrong though).

 The example shows using the keycloak URI followed by ":app_id" matching 
 with the name of the keycloak client application ("customer-portal" in the 
 example).  My keycloak setup does not have any such field in the 
 introspection result and I can't seem to figure out how to make this all 
 work.

 I cranked up the logging to 20/20 and still did not see any hints as to 
 what part of the policy is causing the access to be denied.

 Any suggestions?

 -Wyllys Ingersoll

 ___
 Dev mailing list -- d...@ceph.io
 To unsubscribe send an email to dev-le...@ceph.io
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW STS Support in Nautilus ?

2020-05-12 Thread Wyllys Ingersoll
Thanks for the hint, I fixed my keycloak configuration for that application
client so the token only includes a single audience value and now it works
fine.

thanks!!


On Tue, May 12, 2020 at 11:11 AM Wyllys Ingersoll <
wyllys.ingers...@keepertech.com> wrote:

> The "aud" field in the introspection result is a list, not a single string.
>
> On Tue, May 12, 2020 at 11:02 AM Pritha Srivastava 
> wrote:
>
>> app_id must match with the 'aud' field in the token introspection result
>> (In the example the value of 'aud' is 'customer-portal')
>>
>> Thanks,
>> Pritha
>>
>> On Tue, May 12, 2020 at 8:16 PM Wyllys Ingersoll <
>> wyllys.ingers...@keepertech.com> wrote:
>>
>>>
>>> Running Nautilus 14.2.9 and trying to follow the STS example given here:
>>> https://docs.ceph.com/docs/master/radosgw/STS/ to setup a policy
>>> for AssumeRoleWithWebIdentity using KeyCloak (8.0.1) as the OIDC provider.
>>> I am able to see in the rgw debug logs that the token being passed from the
>>> client is passing the introspection check, but it always ends up failing
>>> the final authorization to access the requested bucket resource and is
>>> rejected with a 403 status "AccessDenied".
>>>
>>> I configured my policy as described in the 2nd example on the STS page
>>> above. I suspect the problem is with the "StringEquals" condition statement
>>> in the AssumeRolePolicy document (I could be wrong though).
>>>
>>> The example shows using the keycloak URI followed by ":app_id" matching
>>> with the name of the keycloak client application ("customer-portal" in the
>>> example).  My keycloak setup does not have any such field in the
>>> introspection result and I can't seem to figure out how to make this all
>>> work.
>>>
>>> I cranked up the logging to 20/20 and still did not see any hints as to
>>> what part of the policy is causing the access to be denied.
>>>
>>> Any suggestions?
>>>
>>> -Wyllys Ingersoll
>>>
>>> ___
>>> Dev mailing list -- d...@ceph.io
>>> To unsubscribe send an email to dev-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW STS Support in Nautilus ?

2020-05-12 Thread Wyllys Ingersoll
The "aud" field in the introspection result is a list, not a single string.

On Tue, May 12, 2020 at 11:02 AM Pritha Srivastava 
wrote:

> app_id must match with the 'aud' field in the token introspection result
> (In the example the value of 'aud' is 'customer-portal')
>
> Thanks,
> Pritha
>
> On Tue, May 12, 2020 at 8:16 PM Wyllys Ingersoll <
> wyllys.ingers...@keepertech.com> wrote:
>
>>
>> Running Nautilus 14.2.9 and trying to follow the STS example given here:
>> https://docs.ceph.com/docs/master/radosgw/STS/ to setup a policy
>> for AssumeRoleWithWebIdentity using KeyCloak (8.0.1) as the OIDC provider.
>> I am able to see in the rgw debug logs that the token being passed from the
>> client is passing the introspection check, but it always ends up failing
>> the final authorization to access the requested bucket resource and is
>> rejected with a 403 status "AccessDenied".
>>
>> I configured my policy as described in the 2nd example on the STS page
>> above. I suspect the problem is with the "StringEquals" condition statement
>> in the AssumeRolePolicy document (I could be wrong though).
>>
>> The example shows using the keycloak URI followed by ":app_id" matching
>> with the name of the keycloak client application ("customer-portal" in the
>> example).  My keycloak setup does not have any such field in the
>> introspection result and I can't seem to figure out how to make this all
>> work.
>>
>> I cranked up the logging to 20/20 and still did not see any hints as to
>> what part of the policy is causing the access to be denied.
>>
>> Any suggestions?
>>
>> -Wyllys Ingersoll
>>
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW STS Support in Nautilus ?

2020-05-12 Thread Pritha Srivastava
app_id must match with the 'aud' field in the token introspection result
(In the example the value of 'aud' is 'customer-portal')

Thanks,
Pritha

On Tue, May 12, 2020 at 8:16 PM Wyllys Ingersoll <
wyllys.ingers...@keepertech.com> wrote:

>
> Running Nautilus 14.2.9 and trying to follow the STS example given here:
> https://docs.ceph.com/docs/master/radosgw/STS/ to setup a policy
> for AssumeRoleWithWebIdentity using KeyCloak (8.0.1) as the OIDC provider.
> I am able to see in the rgw debug logs that the token being passed from the
> client is passing the introspection check, but it always ends up failing
> the final authorization to access the requested bucket resource and is
> rejected with a 403 status "AccessDenied".
>
> I configured my policy as described in the 2nd example on the STS page
> above. I suspect the problem is with the "StringEquals" condition statement
> in the AssumeRolePolicy document (I could be wrong though).
>
> The example shows using the keycloak URI followed by ":app_id" matching
> with the name of the keycloak client application ("customer-portal" in the
> example).  My keycloak setup does not have any such field in the
> introspection result and I can't seem to figure out how to make this all
> work.
>
> I cranked up the logging to 20/20 and still did not see any hints as to
> what part of the policy is causing the access to be denied.
>
> Any suggestions?
>
> -Wyllys Ingersoll
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Add lvm in cephadm

2020-05-12 Thread Simon Sutter
Hello,


Thank you very much Joshua, it worked.

I have set up three nodes with the cephadm tool, which was very easy.

But I asked myself, what if node 1 goes down?

Before cephadm I just could manage everything from the other nodes with the 
ceph commands.

Now I'm a bit stuck, because this cephadm container is just running on one node.

I've installed it on the second one, but i'm getting a "[errno 13] RADOS 
permission denied (error connecting to the cluster)".

Do I need some special "cephadm" keyring from the first node? Which one? And 
where to put it?

Caphadm might be an easy to handle solution, but for me as a beginner, the 
added layer is very complicated to get in.

We are trying to build a new Ceph cluster (never got in touch with it before) 
but I might not go with octopus, but instead  use nautilus with ceph-deploy.

That's a bit easyer to understand, and the documentation out there is way 
better.


Thanks in advance,

Simon


Von: Joshua Schmid 
Gesendet: Dienstag, 5. Mai 2020 16:39:29
An: Simon Sutter
Cc: ceph-users@ceph.io
Betreff: Re: [ceph-users] Re: Add lvm in cephadm

On 20/05/05 08:46, Simon Sutter wrote:
> Sorry I missclicked, here the second part:
>
>
> ceph-volume --cluster ceph lvm prepare --data /dev/centos_node1/ceph
> But that gives me just:
>
> Running command: /usr/bin/ceph-authtool --gen-print-key
> Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
> f3b442b1-68f7-456a-9991-92254e7c9c30
>  stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
> -->  RuntimeError: Unable to create a new OSD id

Hey Simon,

This still works but is now encapsulated in a cephadm
command.

ceph orch daemon add osd :

so in your case:

ceph orch daemon add osd $host:centos_node1/ceph


hth

--
Joshua Schmid
Software Engineer
SUSE Enterprise Storage
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw user access questions

2020-05-12 Thread Casey Bodley
Rgw users are a higher-level feature, and they don't have a direct
relationship to rados pools. Their permissions are controlled at the
bucket/object level by the S3/Swift APIs. I would start by reading
about S3's ACLs and bucket policies.

On Mon, May 11, 2020 at 1:42 AM Vishwas Bm  wrote:
>
> Hi,
>
> I am a newbie to ceph. I have gone through the ceph docs, we are planning
> to use rgw for object storage.
>
> From the docs, what I have understood is that there are two types of users:
> 1) ceph storage user
> 2) radosgw user
>
> I am able to create user of both the types. But I am not able to understand
> how to restrict the rgw user access to a pool.
>
> My questions are below:
> 1) How to restrict the access of a rgw user to a particular pool ? Can this
> be done using placement groups ?
>
> 2) Is it possible to restrict rgw user access to a particular namespace in
> a pool ?
>
> 3) I can understand the flow till he is able to write to a bucket using the
> .index pool object. But I am not able to understand the flow how the rgw
> user can write  objects in pool. Where can I check the permissions ?
>
> *Thanks & Regards,*
>
> *Vishwas *
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Paul Emmerich
First thing I'd try is to use objectstore-tool to scrape the
inactive/broken PGs from the dead OSDs using it's PG export feature.
Then import these PGs into any other OSD which will automatically recover
it.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Tue, May 12, 2020 at 2:07 PM Kári Bertilsson 
wrote:

> Yes
> ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1
>
> On Tue, May 12, 2020 at 10:39 AM Eugen Block  wrote:
>
> > Can you share your osd tree and the current ceph status?
> >
> >
> > Zitat von Kári Bertilsson :
> >
> > > Hello
> > >
> > > I had an incidence where 3 OSD's crashed at once completely and won't
> > power
> > > up. And during recovery 3 OSD's in another host have somehow become
> > > corrupted. I am running erasure coding with 8+2 setup using crush map
> > which
> > > takes 2 OSDs per host, and after losing the other 2 OSD i have few PG's
> > > down. Unfortunately these PG's seem to overlap almost all data on the
> > pool,
> > > so i believe the entire pool is mostly lost after only these 2% of PG's
> > > down.
> > >
> > > I am running ceph 14.2.9.
> > >
> > > OSD 92 log https://pastebin.com/5aq8SyCW
> > > OSD 97 log https://pastebin.com/uJELZxwr
> > >
> > > ceph-bluestore-tool repair without --deep showed "success" but OSD's
> > still
> > > fail with the log above.
> > >
> > > Log from trying ceph-bluestore-tool repair --deep which is still
> running,
> > > not sure if it will actually fix anything and log looks pretty bad.
> > > https://pastebin.com/gkqTZpY3
> > >
> > > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97
> --op
> > > list" gave me input/output error. But everything in SMART looks OK,
> and i
> > > see no indication of hardware read error in any logs. Same for both
> OSD.
> > >
> > > The OSD's with corruption have absolutely no bad sectors and likely
> have
> > > only a minor corruption but at important locations.
> > >
> > > Any ideas on how to recover this kind of scenario ? Any tips would be
> > > highly appreciated.
> > >
> > > Best regards,
> > > Kári Bertilsson
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Kári Bertilsson
Yes
ceph osd df tree and ceph -s is at https://pastebin.com/By6b1ps1

On Tue, May 12, 2020 at 10:39 AM Eugen Block  wrote:

> Can you share your osd tree and the current ceph status?
>
>
> Zitat von Kári Bertilsson :
>
> > Hello
> >
> > I had an incidence where 3 OSD's crashed at once completely and won't
> power
> > up. And during recovery 3 OSD's in another host have somehow become
> > corrupted. I am running erasure coding with 8+2 setup using crush map
> which
> > takes 2 OSDs per host, and after losing the other 2 OSD i have few PG's
> > down. Unfortunately these PG's seem to overlap almost all data on the
> pool,
> > so i believe the entire pool is mostly lost after only these 2% of PG's
> > down.
> >
> > I am running ceph 14.2.9.
> >
> > OSD 92 log https://pastebin.com/5aq8SyCW
> > OSD 97 log https://pastebin.com/uJELZxwr
> >
> > ceph-bluestore-tool repair without --deep showed "success" but OSD's
> still
> > fail with the log above.
> >
> > Log from trying ceph-bluestore-tool repair --deep which is still running,
> > not sure if it will actually fix anything and log looks pretty bad.
> > https://pastebin.com/gkqTZpY3
> >
> > Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97 --op
> > list" gave me input/output error. But everything in SMART looks OK, and i
> > see no indication of hardware read error in any logs. Same for both OSD.
> >
> > The OSD's with corruption have absolutely no bad sectors and likely have
> > only a minor corruption but at important locations.
> >
> > Any ideas on how to recover this kind of scenario ? Any tips would be
> > highly appreciated.
> >
> > Best regards,
> > Kári Bertilsson
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Zeroing out rbd image or volume

2020-05-12 Thread Wido den Hollander



On 5/12/20 1:54 PM, Paul Emmerich wrote:
> And many hypervisors will turn writing zeroes into an unmap/trim (qemu
> detect-zeroes=unmap), so running trim on the entire empty disk is often the
> same as writing zeroes.
> So +1 for encryption being the proper way here
> 

+1

And to add to this: No, a newly created RBD image will never have 'left
over' bits and bytes from a previous RBD image.

I had to explain this multiple times to people which were used to old
(i)SCSI setups where partitions could have leftover data from a
previously created LUN.

With RBD this won't happen.

Wido

> 
> Paul
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Zeroing out rbd image or volume

2020-05-12 Thread Paul Emmerich
And many hypervisors will turn writing zeroes into an unmap/trim (qemu
detect-zeroes=unmap), so running trim on the entire empty disk is often the
same as writing zeroes.
So +1 for encryption being the proper way here


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Tue, May 12, 2020 at 1:52 PM Jason Dillaman  wrote:

> I would also like to add that the OSDs can (and will) use redirect on write
> techniques (not to mention the physical device hardware as well).
> Therefore, your zeroing of the device might just cause the OSDs to allocate
> new extents of zeros while the old extents remain intact (albeit
> unreferenced and available for future writes). The correct solution would
> be to layer LUKS/dm-crypt on top of the RBD device if you need a strong
> security guarantee about a specific image, or use encrypted OSDs if the
> concern is about the loss of the OSD physical device.
>
> On Tue, May 12, 2020 at 6:58 AM Marc Roos 
> wrote:
>
> >
> > dd if=/dev/zero of=rbd  :) but if you have encrypted osd's, what
> > would be the use of this?
> >
> >
> >
> > -Original Message-
> > From: huxia...@horebdata.cn [mailto:huxia...@horebdata.cn]
> > Sent: 12 May 2020 12:55
> > To: ceph-users
> > Subject: [ceph-users] Zeroing out rbd image or volume
> >
> > Hi, Ceph folks,
> >
> > Is there a rbd command, or any other way, to zero out rbd images or
> > volume? I would like to write all zero data to an rbd image/volume
> > before remove it.
> >
> > Any comments would be appreciated.
> >
> > best regards,
> >
> > samuel
> > Horebdata AG
> > Switzerland
> >
> >
> >
> >
> > huxia...@horebdata.cn
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Jason
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Zeroing out rbd image or volume

2020-05-12 Thread Jason Dillaman
I would also like to add that the OSDs can (and will) use redirect on write
techniques (not to mention the physical device hardware as well).
Therefore, your zeroing of the device might just cause the OSDs to allocate
new extents of zeros while the old extents remain intact (albeit
unreferenced and available for future writes). The correct solution would
be to layer LUKS/dm-crypt on top of the RBD device if you need a strong
security guarantee about a specific image, or use encrypted OSDs if the
concern is about the loss of the OSD physical device.

On Tue, May 12, 2020 at 6:58 AM Marc Roos  wrote:

>
> dd if=/dev/zero of=rbd  :) but if you have encrypted osd's, what
> would be the use of this?
>
>
>
> -Original Message-
> From: huxia...@horebdata.cn [mailto:huxia...@horebdata.cn]
> Sent: 12 May 2020 12:55
> To: ceph-users
> Subject: [ceph-users] Zeroing out rbd image or volume
>
> Hi, Ceph folks,
>
> Is there a rbd command, or any other way, to zero out rbd images or
> volume? I would like to write all zero data to an rbd image/volume
> before remove it.
>
> Any comments would be appreciated.
>
> best regards,
>
> samuel
> Horebdata AG
> Switzerland
>
>
>
>
> huxia...@horebdata.cn
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Write Caching to hot tier not working as expected

2020-05-12 Thread Steve Hughes
Thanks Eric.

Using your command for SET reported that the OSD may need a restart (which sets 
it back to default anyway)  but the below seems to work:
ceph tell osd.24 config set objecter_inflight_op_bytes 1073741824
ceph tell osd.24 config set objecter_inflight_ops 10240

reading back the settings looks right:
[root@ceph00 ~]# ceph daemon osd.24 config show | grep objecter
"debug_objecter": "0/1",
"objecter_completion_locks_per_session": "32",
"objecter_debug_inject_relock_delay": "false",
"objecter_inflight_op_bytes": "1073741824",
"objecter_inflight_ops": "10240",
"objecter_inject_no_watch_ping": "false",
"objecter_retry_writes_after_first_reply": "false",
"objecter_tick_interval": "5.00",
"objecter_timeout": "10.00",
"osd_objecter_finishers": "1",


Ive done that for the three OSDs that are in the cache tier.  But the 
performance is unchanged - the writes still spill over to the HDD pool.

Still, your idea sounds close - it does feel like something in the cache tier 
is hitting a limit.

Regards,
Steve

-Original Message-
From: Eric Smith  
Sent: Monday, 11 May 2020 9:11 PM
To: Steve Hughes ; ceph-users@ceph.io
Subject: RE: [ceph-users] Re: Write Caching to hot tier not working as expected

Reading and setting them should be pretty easy:

READ (Run from the host where OSD  is hosted):
ceph daemon osd. config show | grep objecter

SET (Assuming these can be set in memory):
ceph tell osd. injectargs "--objecter-inflight-op-bytes=1073741824" (Change 
to 1GB/sec throttle)

To persist these you should add them to the ceph.conf (I'm not sure what 
section though - you might have to test this).
And yes - the information is sketchy I agree - I don't really have any input 
here.

That's the best I can do for now 
Eric

-Original Message-
From: Steve Hughes  
Sent: Monday, May 11, 2020 6:44 AM
To: Eric Smith ; ceph-users@ceph.io
Subject: RE: [ceph-users] Re: Write Caching to hot tier not working as expected

Thank you Eric.  That 'sounds like' exactly my issue.  Though I'm surprised to 
bump into something like that on such a small system and at such low bandwidth.

But the information I can find on those parameters is sketchy to say the least.

Can you point me at some doco that explains what they do,  how to read the 
current values and how to set them?

Cheers,
Steve

-Original Message-
From: Eric Smith  
Sent: Monday, 11 May 2020 8:00 PM
To: Steve Hughes ; ceph-users@ceph.io
Subject: RE: [ceph-users] Re: Write Caching to hot tier not working as expected

It sounds like you might be bumping up against the default 
objecter_inflight_ops (1024)  and/or objecter_inflight_op_bytes (100MB). 

-Original Message-
From: ste...@scalar.com.au 
Sent: Monday, May 11, 2020 5:48 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Write Caching to hot tier not working as expected

Interestingly,  I have found that if I limit the rate at which data is written 
the tiering behaves as expected.

I'm using a robocopy job from a Windows VM to copy large files from my existing 
storage array to a test Ceph volume.  By using the /IPG parameter I can roughly 
control the rate at which data is written.

I've found that if I limit the write rate to around 30MBytes/sec the data all 
goes to the hot tier, zero data goes to the HDD tier, and the observed write 
latency is about 5msec.   If I go any higher than this I see data being written 
to the HDDs and the observed write latency goes way up.
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io


--

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster network and public network

2020-05-12 Thread mj

Hi,

On 11/05/2020 08:50, Wido den Hollander wrote:

Great to hear! I'm still behind this idea and all the clusters I design
have a single (or LACP) network going to the host.

One IP address per node where all traffic goes over. That's Ceph, SSH,
(SNMP) Monitoring, etc.

Wido


We have an 'old-style' cluster, seperated LAN/cluster network. We would 
like to move over to the 'new-style'.


Is it as easy as: define the NICs in a 2x10G LACP bond0, and add both 
NICs to the bond0 config, and add configure like:



auto bond0
iface bond0 inet static
address 192.168.0.5
netmask 255.255.255.0


and add our cluster IP as a second IP, like


auto bond0:1
iface bond0:1 inet static
address 192.168.10.160
netmask 255.255.255.0


On all nodes, reboot, and everything will work?

Or are there ceph specifics to consider?

Thanks,
MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Zeroing out rbd image or volume

2020-05-12 Thread Marc Roos

dd if=/dev/zero of=rbd  :) but if you have encrypted osd's, what 
would be the use of this?



-Original Message-
From: huxia...@horebdata.cn [mailto:huxia...@horebdata.cn] 
Sent: 12 May 2020 12:55
To: ceph-users
Subject: [ceph-users] Zeroing out rbd image or volume

Hi, Ceph folks,

Is there a rbd command, or any other way, to zero out rbd images or 
volume? I would like to write all zero data to an rbd image/volume 
before remove it.

Any comments would be appreciated.

best regards,

samuel
Horebdata AG
Switzerland




huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Zeroing out rbd image or volume

2020-05-12 Thread huxia...@horebdata.cn
Hi, Ceph folks,

Is there a rbd command, or any other way, to zero out rbd images or volume? I 
would like to write all zero data to an rbd image/volume before remove it.

Any comments would be appreciated.

best regards,

samuel
Horebdata AG
Switzerland




huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD corruption and down PGs

2020-05-12 Thread Eugen Block

Can you share your osd tree and the current ceph status?


Zitat von Kári Bertilsson :


Hello

I had an incidence where 3 OSD's crashed at once completely and won't power
up. And during recovery 3 OSD's in another host have somehow become
corrupted. I am running erasure coding with 8+2 setup using crush map which
takes 2 OSDs per host, and after losing the other 2 OSD i have few PG's
down. Unfortunately these PG's seem to overlap almost all data on the pool,
so i believe the entire pool is mostly lost after only these 2% of PG's
down.

I am running ceph 14.2.9.

OSD 92 log https://pastebin.com/5aq8SyCW
OSD 97 log https://pastebin.com/uJELZxwr

ceph-bluestore-tool repair without --deep showed "success" but OSD's still
fail with the log above.

Log from trying ceph-bluestore-tool repair --deep which is still running,
not sure if it will actually fix anything and log looks pretty bad.
https://pastebin.com/gkqTZpY3

Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97 --op
list" gave me input/output error. But everything in SMART looks OK, and i
see no indication of hardware read error in any logs. Same for both OSD.

The OSD's with corruption have absolutely no bad sectors and likely have
only a minor corruption but at important locations.

Any ideas on how to recover this kind of scenario ? Any tips would be
highly appreciated.

Best regards,
Kári Bertilsson
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD corruption and down PGs

2020-05-12 Thread Kári Bertilsson
Hello

I had an incidence where 3 OSD's crashed at once completely and won't power
up. And during recovery 3 OSD's in another host have somehow become
corrupted. I am running erasure coding with 8+2 setup using crush map which
takes 2 OSDs per host, and after losing the other 2 OSD i have few PG's
down. Unfortunately these PG's seem to overlap almost all data on the pool,
so i believe the entire pool is mostly lost after only these 2% of PG's
down.

I am running ceph 14.2.9.

OSD 92 log https://pastebin.com/5aq8SyCW
OSD 97 log https://pastebin.com/uJELZxwr

ceph-bluestore-tool repair without --deep showed "success" but OSD's still
fail with the log above.

Log from trying ceph-bluestore-tool repair --deep which is still running,
not sure if it will actually fix anything and log looks pretty bad.
https://pastebin.com/gkqTZpY3

Trying "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-97 --op
list" gave me input/output error. But everything in SMART looks OK, and i
see no indication of hardware read error in any logs. Same for both OSD.

The OSD's with corruption have absolutely no bad sectors and likely have
only a minor corruption but at important locations.

Any ideas on how to recover this kind of scenario ? Any tips would be
highly appreciated.

Best regards,
Kári Bertilsson
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw user access questions

2020-05-12 Thread Vishwas Bm
Hi,

Any input on this ?

*Thanks & Regards,*

*Vishwas *

On Mon, May 11, 2020 at 11:11 AM Vishwas Bm  wrote:

> Hi,
>
> I am a newbie to ceph. I have gone through the ceph docs, we are planning
> to use rgw for object storage.
>
> From the docs, what I have understood is that there are two types of users:
> 1) ceph storage user
> 2) radosgw user
>
> I am able to create user of both the types. But I am not able to
> understand how to restrict the rgw user access to a pool.
>
> My questions are below:
> 1) How to restrict the access of a rgw user to a particular pool ? Can
> this be done using placement groups ?
>
> 2) Is it possible to restrict rgw user access to a particular namespace in
> a pool ?
>
> 3) I can understand the flow till he is able to write to a bucket using
> the .index pool object. But I am not able to understand the flow how the
> rgw user can write  objects in pool. Where can I check the permissions ?
>
> *Thanks & Regards,*
>
> *Vishwas *
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nfs migrate to rgw

2020-05-12 Thread Wido den Hollander



On 5/12/20 4:22 AM, Zhenshi Zhou wrote:
> Hi all,
> 
> We have several nfs servers providing file storage. There is a nginx in
> front of
> nfs servers in order to serve the clients. The files are mostly small files
> and
> nearly about 30TB in total.
> 

What is small? How many objects/files are you talking about?

> I'm gonna use ceph rgw as the storage. I wanna know if it's appropriate to
> do so.
> The data migrating from nfs to rgw is a huge job. Besides I'm not sure
> whether
> ceph rgw is suitable in this scenario or not.
> 

Yes, it is. But make sure you don't put millions of objects into a
single bucket. Make sure that you spread them out so that you have let's
say 1M of objects per bucket at max.

Wido

> Thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cifs slow read speed

2020-05-12 Thread Amudhan P
Hi,

I am running  a small Ceph Nautilus cluster on Ubuntu 18.04.

I am testing cluster to expose cephfs volume in samba v4 share for user to
access from windows.

When i do test with DD Write (600 MB/s) and md5sum file Read speed is (700
- 800 MB/s) from ceph kernel mount.

Same volume i have exposed in samba using "vfs_ceph" and mounted it thru
cifs in another ubuntu18.04 as client.
Now, when i perform DD write i get speed of 600 MB/s and md5sum of file
Read speed is only 65 MB/s.

What could be the problem? Any one faced similar issue?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io