subject:"\[ceph\-users\] Radosgw"

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-10-10 Thread Paul Emmerich

I've also encountered this issue on a cluster yesterday; one CPU got
stuck in an infinite loop in get_obj_data::flush and it stopped
serving requests. I've updated the tracker issue accordingly.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Aug 21, 2019 at 3:55 PM Vladimir Brik
 wrote:
>
> Hello
>
> I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically,
> radosgw process on those machines starts consuming 100% of 5 CPU cores
> for days at a time, even though the machine is not being used for data
> transfers (nothing in radosgw logs, couple of KB/s of network).
>
> This situation can affect any number of our rados gateways, lasts from
> few hours to few days and stops if radosgw process is restarted or on
> its own.
>
> Does anybody have an idea what might be going on or how to debug it? I
> don't see anything obvious in the logs. Perf top is saying that CPU is
> consumed by radosgw shared object in symbol get_obj_data::flush, which,
> if I interpret things correctly, is called from a symbol with a long
> name that contains the substring "boost9intrusive9list_impl"
>
> This is our configuration:
> rgw_frontends = civetweb num_threads=5000 port=443s
> ssl_certificate=/etc/ceph/rgw.crt
> error_log_file=/var/log/ceph/civetweb.error.log
>
> (error log file doesn't exist)
>
>
> Thanks,
>
> Vlad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-26 Thread Vladimir Brik


I created a ticket: https://tracker.ceph.com/issues/41511

Note that I think I was mistaken when I said that sometimes the problem 
goes away on its own. I've looked back through our monitoring and it 
looks like when the problem did go away, it was because either the 
machine was rebooted or the radosgw service was restarted.



Vlad



On 8/23/19 10:17 AM, Eric Ivancich wrote:

Good morning, Vladimir,

Please create a tracker for this 
(https://tracker.ceph.com/projects/rgw/issues/new) and include the link 
to it in an email reply. And if you can include any more potentially 
relevant details, please do so. I’ll add my initial analysis to it.


But the threads do seem to be stuck, at least for a while, in 
get_obj_data::flush despite a lack of traffic. And sometimes it 
self-resolves, so it’s not a true “infinite loop”.


Thank you,

Eric

On Aug 22, 2019, at 9:12 PM, Eric Ivancich <mailto:ivanc...@redhat.com>> wrote:


Thank you for providing the profiling data, Vladimir. There are 5078 
threads and most of them are waiting. Here is a list of the deepest 
call of each thread with duplicates removed.


            + 100.00% epoll_wait
                          + 100.00% 
get_obj_data::flush(rgw::OwningList&&)

            + 100.00% poll
        + 100.00% poll
      + 100.00% poll
        + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2
      + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2
        + 100.00% pthread_cond_wait@@GLIBC_2.3.2
      + 100.00% pthread_cond_wait@@GLIBC_2.3.2
      + 100.00% read
                            + 
100.00% _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_


The only interesting ones are the second and last:

* get_obj_data::flush(rgw::OwningList&&)
* 
_ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_


They are essentially part of the same call stack that results from 
processing a GetObj request, and five threads are in this call stack 
(the only difference is wether or not they include the call into boost 
intrusive list). Here’s the full call stack of those threads:


+ 100.00% clone
  + 100.00% start_thread
    + 100.00% worker_thread
      + 100.00% process_new_connection
        + 100.00% handle_request
          + 100.00% RGWCivetWebFrontend::process(mg_connection*)
            + 100.00% process_request(RGWRados*, RGWREST*, 
RGWRequest*, std::string const&, rgw::auth::StrategyRegistry const&, 
RGWRestfulIO*, OpsLogSocket*, opt

ional_yield, rgw::dmclock::Scheduler*, int*)
              + 100.00% rgw_process_authenticated(RGWHandler_REST*, 
RGWOp*&, RGWRequest*, req_state*, bool)

                + 100.00% RGWGetObj::execute()
                  + 100.00% RGWRados::Object::Read::iterate(long, 
long, RGWGetDataCB*)
                    + 100.00% RGWRados::iterate_obj(RGWObjectCtx&, 
RGWBucketInfo const&, rgw_obj const&, long, long, unsigned long, int 
(*)(rgw_raw_obj const&, l

ong, long, long, bool, RGWObjState*, void*), void*)
                      + 100.00% _get_obj_iterate_cb(rgw_raw_obj 
const&, long, long, long, bool, RGWObjState*, void*)
                        + 100.00% 
RGWRados::get_obj_iterate_cb(rgw_raw_obj const&, long, long, long, 
bool, RGWObjState*, void*)
                          + 100.00% 
get_obj_data::flush(rgw::OwningList&&)
                            + 
100.00% _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_


So this isn’t background processing but request processing. I’m not 
clear why these requests are consuming so much CPU for so long.


From your initial message:
I am running a Ceph 14.2.1 cluster with 3 rados gateways. 
Periodically, radosgw process on those machines starts consuming 100% 
of 5 CPU cores for days at a time, even though the machine is not 
being used for data transfers (nothing in radosgw logs, couple of 
KB/s of network).


This situation can affect any number of our rados gateways, lasts 
from few hours to few days and stops if radosgw process is restarted 
or on its own.


I’m going to check with others who’re more familiar with this code path.


Begin forwarded message:

*From:*Vladimir Brik <mailto:vladimir.b...@icecube.wisc.edu>>
*Subject:**Re: [ceph-users] radosgw pegging down 5 CPU cores when no 
data is being transferred*

*Date:*August 21, 2019 at 4:47:01 PM EDT
*To:*"J. Eric Ivancich" <mailto:ivanc...@redhat.com>>, Mark Nelson <mailto:mnel...@redhat.com>>,ceph-users@lists.ceph.com 
<mailto:ceph-users@lists.

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-23 Thread Eric Ivancich

Good morning, Vladimir,

Please create a tracker for this 
(https://tracker.ceph.com/projects/rgw/issues/new 
<https://tracker.ceph.com/projects/rgw/issues/new>) and include the link to it 
in an email reply. And if you can include any more potentially relevant 
details, please do so. I’ll add my initial analysis to it.

But the threads do seem to be stuck, at least for a while, in 
get_obj_data::flush despite a lack of traffic. And sometimes it self-resolves, 
so it’s not a true “infinite loop”.

Thank you,

Eric

> On Aug 22, 2019, at 9:12 PM, Eric Ivancich  wrote:
> 
> Thank you for providing the profiling data, Vladimir. There are 5078 threads 
> and most of them are waiting. Here is a list of the deepest call of each 
> thread with duplicates removed.
> 
> + 100.00% epoll_wait
>   + 100.00% 
> get_obj_data::flush(rgw::OwningList&&)
> + 100.00% poll
> + 100.00% poll
>   + 100.00% poll
> + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2
>   + 100.00% pthread_cond_timedwait@@GLIBC_2.3.2
> + 100.00% pthread_cond_wait@@GLIBC_2.3.2
>   + 100.00% pthread_cond_wait@@GLIBC_2.3.2
>   + 100.00% read
> + 100.00% 
> _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_
> 
> The only interesting ones are the second and last:
> 
> * get_obj_data::flush(rgw::OwningList&&)
> * 
> _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_
> 
> They are essentially part of the same call stack that results from processing 
> a GetObj request, and five threads are in this call stack (the only 
> difference is wether or not they include the call into boost intrusive list). 
> Here’s the full call stack of those threads:
> 
> + 100.00% clone
>   + 100.00% start_thread
> + 100.00% worker_thread
>   + 100.00% process_new_connection
> + 100.00% handle_request
>   + 100.00% RGWCivetWebFrontend::process(mg_connection*)
> + 100.00% process_request(RGWRados*, RGWREST*, RGWRequest*, 
> std::string const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, 
> OpsLogSocket*, opt
> ional_yield, rgw::dmclock::Scheduler*, int*)
>   + 100.00% rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, 
> RGWRequest*, req_state*, bool)
> + 100.00% RGWGetObj::execute()
>   + 100.00% RGWRados::Object::Read::iterate(long, long, 
> RGWGetDataCB*)
> + 100.00% RGWRados::iterate_obj(RGWObjectCtx&, 
> RGWBucketInfo const&, rgw_obj const&, long, long, unsigned long, int 
> (*)(rgw_raw_obj const&, l
> ong, long, long, bool, RGWObjState*, void*), void*)
>   + 100.00% _get_obj_iterate_cb(rgw_raw_obj const&, long, 
> long, long, bool, RGWObjState*, void*)
> + 100.00% RGWRados::get_obj_iterate_cb(rgw_raw_obj 
> const&, long, long, long, bool, RGWObjState*, void*)
>   + 100.00% 
> get_obj_data::flush(rgw::OwningList&&)
> + 100.00% 
> _ZN5boost9intrusive9list_implINS0_8bhtraitsIN3rgw14AioResultEntryENS0_16list_node_traitsIPvEELNS0_14link_mode_typeE1ENS0_7dft_tagELj1EEEmLb1EvE4sortIZN12get_obj_data5flushEONS3_10OwningListIS4_JUlRKT_RKT0_E_EEvSH_
> 
> So this isn’t background processing but request processing. I’m not clear why 
> these requests are consuming so much CPU for so long.
> 
> From your initial message:
>> I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, 
>> radosgw process on those machines starts consuming 100% of 5 CPU cores for 
>> days at a time, even though the machine is not being used for data transfers 
>> (nothing in radosgw logs, couple of KB/s of network).
>> 
>> This situation can affect any number of our rados gateways, lasts from few 
>> hours to few days and stops if radosgw process is restarted or on its own.
> 
> 
> I’m going to check with others who’re more familiar with this code path.
> 
>> Begin forwarded message:
>> 
>> From: Vladimir Brik > <mailto:vladimir.b...@icecube.wisc.edu>>
>> Subject: Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is 
>> being transferred
>> Date: August 21, 2019 at 4:47:01 PM EDT
>> To: "J. Eric Ivancich" mailto:ivanc...@redhat.com>>, 
>> Mark Nelson mailto:mn

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik

> Are you running multisite?
No

> Do you have dynamic bucket resharding turned on?
Yes. "radosgw-admin reshard list" prints "[]"

> Are you using lifecycle?
I am not sure. How can I check? "radosgw-admin lc list" says "[]"

> And just to be clear -- sometimes all 3 of your rados gateways are
> simultaneously in this state?
Multiple, but I have not seen all 3 being in this state simultaneously. 
Currently one gateway has 1 thread using 100% of CPU, and another has 5 
threads each using 100% CPU.

Here are the fruits of my attempts to capture the call graph using perf 
and gdbpmp:

https://icecube.wisc.edu/~vbrik/perf.data
https://icecube.wisc.edu/~vbrik/gdbpmp.data

These are the commands that I ran and their outputs (note I couldn't get 
perf not to generate the warning):

rgw-3 gdbpmp # ./gdbpmp.py -n 100 -p 73688 -o gdbpmp.data
Attaching to process 73688...Done.
Gathering 
Samples

Profiling complete with 100 samples.

rgw-3 ~ # perf record --call-graph fp -p 73688 -- sleep 10
[ perf record: Woken up 54 times to write data ]
Warning:
Processed 574207 events and lost 4 chunks!
Check IO/CPU overload!
[ perf record: Captured and wrote 58.866 MB perf.data (233750 samples) ]

Vlad

On 8/21/19 11:16 AM, J. Eric Ivancich wrote:

On 8/21/19 10:22 AM, Mark Nelson wrote:

Hi Vladimir,

On 8/21/19 8:54 AM, Vladimir Brik wrote:

Hello

[much elided]

You might want to try grabbing a a callgraph from perf instead of just
running perf top or using my wallclock profiler to see if you can drill
down and find out where in that method it's spending the most time.

I agree with Mark -- a call graph would be very helpful in tracking down
what's happening.

There are background tasks that run. Are you running multisite? Do you
have dynamic bucket resharding turned on? Are you using lifecycle? And
garbage collection is another background task.

And just to be clear -- sometimes all 3 of your rados gateways are
simultaneously in this state?

But the call graph would be incredibly helpful.

Thank you,

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread J. Eric Ivancich

On 8/21/19 10:22 AM, Mark Nelson wrote:
> Hi Vladimir,
> 
> 
> On 8/21/19 8:54 AM, Vladimir Brik wrote:
>> Hello
>>

[much elided]

> You might want to try grabbing a a callgraph from perf instead of just
> running perf top or using my wallclock profiler to see if you can drill
> down and find out where in that method it's spending the most time.

I agree with Mark -- a call graph would be very helpful in tracking down
what's happening.

There are background tasks that run. Are you running multisite? Do you
have dynamic bucket resharding turned on? Are you using lifecycle? And
garbage collection is another background task.

And just to be clear -- sometimes all 3 of your rados gateways are
simultaneously in this state?

But the call graph would be incredibly helpful.

Thank you,

Eric

-- 
J. Eric Ivancich
he/him/his
Red Hat Storage
Ann Arbor, Michigan, USA
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik

Correction: the number of threads stuck using 100% of a CPU core varies 
from 1 to 5 (it's not always 5)


Vlad

On 8/21/19 8:54 AM, Vladimir Brik wrote:

Hello

I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, 
radosgw process on those machines starts consuming 100% of 5 CPU cores 
for days at a time, even though the machine is not being used for data 
transfers (nothing in radosgw logs, couple of KB/s of network).


This situation can affect any number of our rados gateways, lasts from 
few hours to few days and stops if radosgw process is restarted or on 
its own.


Does anybody have an idea what might be going on or how to debug it? I 
don't see anything obvious in the logs. Perf top is saying that CPU is 
consumed by radosgw shared object in symbol get_obj_data::flush, which, 
if I interpret things correctly, is called from a symbol with a long 
name that contains the substring "boost9intrusive9list_impl"


This is our configuration:
rgw_frontends = civetweb num_threads=5000 port=443s 
ssl_certificate=/etc/ceph/rgw.crt 
error_log_file=/var/log/ceph/civetweb.error.log


(error log file doesn't exist)


Thanks,

Vlad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Paul Emmerich

On Wed, Aug 21, 2019 at 3:55 PM Vladimir Brik
 wrote:
>
> Hello
>
> I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically,
> radosgw process on those machines starts consuming 100% of 5 CPU cores
> for days at a time, even though the machine is not being used for data
> transfers (nothing in radosgw logs, couple of KB/s of network).
>
> This situation can affect any number of our rados gateways, lasts from
> few hours to few days and stops if radosgw process is restarted or on
> its own.
>
> Does anybody have an idea what might be going on or how to debug it? I
> don't see anything obvious in the logs. Perf top is saying that CPU is
> consumed by radosgw shared object in symbol get_obj_data::flush, which,
> if I interpret things correctly, is called from a symbol with a long
> name that contains the substring "boost9intrusive9list_impl"
>
> This is our configuration:
> rgw_frontends = civetweb num_threads=5000 port=443s
> ssl_certificate=/etc/ceph/rgw.crt
> error_log_file=/var/log/ceph/civetweb.error.log

Probably unrelated to your problem, but running with lots of threads
is usually an indicator that the async beast frontend would be a
better fit for your setup.
(But the code you see in perf should not be related to the frontend)


Paul

>
> (error log file doesn't exist)
>
>
> Thanks,
>
> Vlad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Mark Nelson


Hi Vladimir,


On 8/21/19 8:54 AM, Vladimir Brik wrote:

Hello

I am running a Ceph 14.2.1 cluster with 3 rados gateways. 
Periodically, radosgw process on those machines starts consuming 100% 
of 5 CPU cores for days at a time, even though the machine is not 
being used for data transfers (nothing in radosgw logs, couple of KB/s 
of network).


This situation can affect any number of our rados gateways, lasts from 
few hours to few days and stops if radosgw process is restarted or on 
its own.


Does anybody have an idea what might be going on or how to debug it? I 
don't see anything obvious in the logs. Perf top is saying that CPU is 
consumed by radosgw shared object in symbol get_obj_data::flush, 
which, if I interpret things correctly, is called from a symbol with a 
long name that contains the substring "boost9intrusive9list_impl"



I don't normally look at the RGW code so maybe Matt/Casey/Eric can chime 
in.  That code is in src/rgw/rgw_rados.cc in the get_obj_data struct.  
The flush method does some sorting/merging and then walks through a 
listed of completed IOs and appears to copy a bufferlist out of each 
one, then deletes it from the list and passes the BL off to 
client_cb->handle_data.  Looks like it could be pretty CPU intensive but 
if you are seeing that much CPU for that long it sounds like something 
is rather off.



You might want to try grabbing a a callgraph from perf instead of just 
running perf top or using my wallclock profiler to see if you can drill 
down and find out where in that method it's spending the most time.



My wallclock profiler is here:


https://github.com/markhpc/gdbpmp


Mark




This is our configuration:
rgw_frontends = civetweb num_threads=5000 port=443s 
ssl_certificate=/etc/ceph/rgw.crt 
error_log_file=/var/log/ceph/civetweb.error.log


(error log file doesn't exist)


Thanks,

Vlad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik


Hello

I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, 
radosgw process on those machines starts consuming 100% of 5 CPU cores 
for days at a time, even though the machine is not being used for data 
transfers (nothing in radosgw logs, couple of KB/s of network).


This situation can affect any number of our rados gateways, lasts from 
few hours to few days and stops if radosgw process is restarted or on 
its own.


Does anybody have an idea what might be going on or how to debug it? I 
don't see anything obvious in the logs. Perf top is saying that CPU is 
consumed by radosgw shared object in symbol get_obj_data::flush, which, 
if I interpret things correctly, is called from a symbol with a long 
name that contains the substring "boost9intrusive9list_impl"


This is our configuration:
rgw_frontends = civetweb num_threads=5000 port=443s 
ssl_certificate=/etc/ceph/rgw.crt 
error_log_file=/var/log/ceph/civetweb.error.log


(error log file doesn't exist)


Thanks,

Vlad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.

2019-08-07 Thread Félix Barbeira

Hi Manuel,

Yes, I already tried that option but the result it's extremely noisy and
not usable due to lack of some fields, besides that forget to parse those
logs in order to print some stats. Also, I'm not sure if this is a good
hint to rgw performance.

I think I'm going to stick with nginx and made some tests.

Thanks anyway! :)

El mar., 6 ago. 2019 a las 18:06, EDH - Manuel Rios Fernandez (<
mrios...@easydatahost.com>) escribió:

> Hi Felix,
>
>
>
> You can increase debug option with debug rgw in your rgw nodes.
>
>
>
> We got it to 10.
>
>
>
> But at least in our case we switched again to civetweb because it don’t
> provide a clear log without a lot verbose.
>
>
>
> Regards
>
>
>
> Manuel
>
>
>
>
>
> *De:* ceph-users  *En nombre de *Félix
> Barbeira
> *Enviado el:* martes, 6 de agosto de 2019 17:43
> *Para:* Ceph Users 
> *Asunto:* [ceph-users] radosgw (beast): how to enable verbose log?
> request, user-agent, etc.
>
>
>
> Hi,
>
>
>
> I'm testing radosgw with beast backend and I did not found a way to view
> more information on logfile. This is an example:
>
>
>
> 2019-08-06 16:59:14.488 7fc808234700  1 == starting new request
> req=0x5608245646f0 =
> 2019-08-06 16:59:14.496 7fc808234700  1 == req done req=0x5608245646f0
> op status=0 http_status=204 latency=0.00800043s ==
>
>
>
> I would be interested on typical fields that a regular webserver has:
> origin, request, useragent, etc. I checked the official docs but I don't
> find anything related:
>
>
>
> https://docs.ceph.com/docs/nautilus/radosgw/frontends/
> <https://docs.ceph.com/docs/nautilus/radosgw/frontends/#id3>
>
>
>
> The only manner I found is to put in front a nginx server running as a
> proxy or an haproxy, but I really don't like that solution because it would
> be an overhead component used only to log requests. Anyone in the same
> situation?
>
>
>
> Thanks in advance.
>
> --
>
> Félix Barbeira.
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW (Ceph Object Gateay) Pools

2019-08-06 Thread EDH - Manuel Rios Fernandez

Hi,

I think -> default.rgw.buckets.index for us it reach 2k-6K iops for a index
size of 23GB.

Regards
Manuel



-Mensaje original-
De: ceph-users  En nombre de
dhils...@performair.com
Enviado el: miércoles, 7 de agosto de 2019 1:41
Para: ceph-users@lists.ceph.com
Asunto: [ceph-users] RadosGW (Ceph Object Gateay) Pools

All;

Based on the PG Calculator, on the Ceph website, I have this list of pools
to pre-create for my Object Gateway:
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
default.rgw.intent-log
default.rgw.meta
default.rgw.usage
default.rgw.users.keys
default.rgw.users.email
default.rgw.users.uid
default.rgw.buckets.extra
default.rgw.buckets.index
default.rgw.buckets.data

I have a limited amount of SSDs, and I plan to create rules which limit
pools to either HDD or SSD.  My HDDs have their block.db on NVMe devices.

I intend to use the SSDs primarily to back RBD for ISCSi, to support
virtualization, but I'm not opposed to using some of the space to speed up
RGW.

Which pool(s) would have the most impact on the performance of RGW to have
on SSDs?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW (Ceph Object Gateay) Pools

2019-08-06 Thread DHilsbos

All;

Based on the PG Calculator, on the Ceph website, I have this list of pools to 
pre-create for my Object Gateway:
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
default.rgw.intent-log
default.rgw.meta
default.rgw.usage
default.rgw.users.keys
default.rgw.users.email
default.rgw.users.uid
default.rgw.buckets.extra
default.rgw.buckets.index
default.rgw.buckets.data

I have a limited amount of SSDs, and I plan to create rules which limit pools 
to either HDD or SSD.  My HDDs have their block.db on NVMe devices.

I intend to use the SSDs primarily to back RBD for ISCSi, to support 
virtualization, but I'm not opposed to using some of the space to speed up RGW.

Which pool(s) would have the most impact on the performance of RGW to have on 
SSDs?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.

2019-08-06 Thread EDH - Manuel Rios Fernandez

Hi Felix,

 

You can increase debug option with debug rgw in your rgw nodes.

 

We got it to 10.

 

But at least in our case we switched again to civetweb because it don’t provide 
a clear log without a lot verbose.

 

Regards

 

Manuel

 

 

De: ceph-users  En nombre de Félix Barbeira
Enviado el: martes, 6 de agosto de 2019 17:43
Para: Ceph Users 
Asunto: [ceph-users] radosgw (beast): how to enable verbose log? request, 
user-agent, etc.

 

Hi,

 

I'm testing radosgw with beast backend and I did not found a way to view more 
information on logfile. This is an example:

 

2019-08-06 16:59:14.488 7fc808234700  1 == starting new request 
req=0x5608245646f0 =
2019-08-06 16:59:14.496 7fc808234700  1 == req done req=0x5608245646f0 op 
status=0 http_status=204 latency=0.00800043s ==


 

I would be interested on typical fields that a regular webserver has: origin, 
request, useragent, etc. I checked the official docs but I don't find anything 
related:

 

https://docs.ceph.com/docs/nautilus/radosgw/frontends/ 
<https://docs.ceph.com/docs/nautilus/radosgw/frontends/#id3> 

 

The only manner I found is to put in front a nginx server running as a proxy or 
an haproxy, but I really don't like that solution because it would be an 
overhead component used only to log requests. Anyone in the same situation?

 

Thanks in advance.

-- 

Félix Barbeira.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.

2019-08-06 Thread Félix Barbeira

Hi,

I'm testing radosgw with beast backend and I did not found a way to view
more information on logfile. This is an example:

2019-08-06 16:59:14.488 7fc808234700  1 == starting new request
req=0x5608245646f0 =
2019-08-06 16:59:14.496 7fc808234700  1 == req done req=0x5608245646f0
op status=0 http_status=204 latency=0.00800043s ==

I would be interested on typical fields that a regular webserver has:
origin, request, useragent, etc. I checked the official docs but I don't
find anything related:

https://docs.ceph.com/docs/nautilus/radosgw/frontends/


The only manner I found is to put in front a nginx server running as a
proxy or an haproxy, but I really don't like that solution because it would
be an overhead component used only to log requests. Anyone in the same
situation?

Thanks in advance.
-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw user audit trail

2019-07-08 Thread shubjero

Good day,

We have a sizeable ceph deployment and use object-storage heavily. We
also integrate our object-storage with OpenStack but sometimes we are
required to create S3 keys for some of our users (aws-cli, java apps
that speak s3, etc). I was wondering if it is possible to see an audit
trail of a specific access key. I have noticed that only some
applications disclose their access key in the radosgw logs whereas
others (like aws-cli). I have also been able to view the audit logs
for a specific user (which is an OpenStack project), but not
specifically a key within that user/openstack project.

Any help would be appreciated!

Thank you,

Jared Baker
Ontario Institute for Cancer Research
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?

2019-06-28 Thread Matt Benjamin

FYI, this PR just merged.  I would expect to see backports at least as
far as N, and others would be possible.

regards,

Matt

On Fri, Jun 28, 2019 at 3:43 PM  wrote:
>
> Matt;
>
> Yep, that would certainly explain it.
>
> My apologies, I almost searched for that information before sending the email.
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Matt Benjamin [mailto:mbenj...@redhat.com]
> Sent: Friday, June 28, 2019 9:48 AM
> To: Dominic Hilsbos
> Cc: ceph-users
> Subject: Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?
>
> Hi Dominic,
>
> The reason is likely that RGW doesn't yet support ListObjectsV2.
>
> Support is nearly here though:  https://github.com/ceph/ceph/pull/28102
>
> Matt
>
>
> On Fri, Jun 28, 2019 at 12:43 PM  wrote:
> >
> > All;
> >
> > I've got a RADOSGW instance setup, backed by my demonstration Ceph cluster. 
> >  I'm using Amazon's S3 SDK, and I've run into an annoying little snag.
> >
> > My code looks like this:
> > amazonS3 = builder.build();
> >
> > ListObjectsV2Request req = new 
> > ListObjectsV2Request().withBucketName("WorkOrder").withMaxKeys(MAX_KEYS);
> > ListObjectsV2Result result;
> >
> > do
> > {
> > result = amazonS3.listObjectsV2(req);
> >
> > for (S3ObjectSummary objectSummary : result.getObjectSummaries())
> > {
> > summaries.add(objectSummary);
> > }
> >
> > String token = result.getNextContinuationToken();
> > req.setContinuationToken(token);
> > }
> > while (result.isTruncated());
> >
> > The problem is, the ContinuationToken seems to be ignored, i.e. every call 
> > to amazonS3.listObjectsV2(req) returns the same set, and the loop never 
> > ends (until the summaries LinkedList overflows).
> >
> > Thoughts?
> >
> > Thank you,
> >
> > Dominic L. Hilsbos, MBA
> > Director - Information Technology
> > Perform Air International Inc.
> > dhils...@performair.com
> > www.PerformAir.com
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?

2019-06-28 Thread DHilsbos

Matt;

Yep, that would certainly explain it.

My apologies, I almost searched for that information before sending the email.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Matt Benjamin [mailto:mbenj...@redhat.com] 
Sent: Friday, June 28, 2019 9:48 AM
To: Dominic Hilsbos
Cc: ceph-users
Subject: Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?

Hi Dominic,

The reason is likely that RGW doesn't yet support ListObjectsV2.

Support is nearly here though:  https://github.com/ceph/ceph/pull/28102

Matt


On Fri, Jun 28, 2019 at 12:43 PM  wrote:
>
> All;
>
> I've got a RADOSGW instance setup, backed by my demonstration Ceph cluster.  
> I'm using Amazon's S3 SDK, and I've run into an annoying little snag.
>
> My code looks like this:
> amazonS3 = builder.build();
>
> ListObjectsV2Request req = new 
> ListObjectsV2Request().withBucketName("WorkOrder").withMaxKeys(MAX_KEYS);
> ListObjectsV2Result result;
>
> do
> {
> result = amazonS3.listObjectsV2(req);
>
> for (S3ObjectSummary objectSummary : result.getObjectSummaries())
> {
> summaries.add(objectSummary);
> }
>
> String token = result.getNextContinuationToken();
> req.setContinuationToken(token);
> }
> while (result.isTruncated());
>
> The problem is, the ContinuationToken seems to be ignored, i.e. every call to 
> amazonS3.listObjectsV2(req) returns the same set, and the loop never ends 
> (until the summaries LinkedList overflows).
>
> Thoughts?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOSGW S3 - Continuation Token Ignored?

2019-06-28 Thread Matt Benjamin

Hi Dominic,

The reason is likely that RGW doesn't yet support ListObjectsV2.

Support is nearly here though:  https://github.com/ceph/ceph/pull/28102

Matt


On Fri, Jun 28, 2019 at 12:43 PM  wrote:
>
> All;
>
> I've got a RADOSGW instance setup, backed by my demonstration Ceph cluster.  
> I'm using Amazon's S3 SDK, and I've run into an annoying little snag.
>
> My code looks like this:
> amazonS3 = builder.build();
>
> ListObjectsV2Request req = new 
> ListObjectsV2Request().withBucketName("WorkOrder").withMaxKeys(MAX_KEYS);
> ListObjectsV2Result result;
>
> do
> {
> result = amazonS3.listObjectsV2(req);
>
> for (S3ObjectSummary objectSummary : result.getObjectSummaries())
> {
> summaries.add(objectSummary);
> }
>
> String token = result.getNextContinuationToken();
> req.setContinuationToken(token);
> }
> while (result.isTruncated());
>
> The problem is, the ContinuationToken seems to be ignored, i.e. every call to 
> amazonS3.listObjectsV2(req) returns the same set, and the loop never ends 
> (until the summaries LinkedList overflows).
>
> Thoughts?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RADOSGW S3 - Continuation Token Ignored?

2019-06-28 Thread DHilsbos

All;

I've got a RADOSGW instance setup, backed by my demonstration Ceph cluster.  
I'm using Amazon's S3 SDK, and I've run into an annoying little snag.

My code looks like this:
amazonS3 = builder.build();

ListObjectsV2Request req = new 
ListObjectsV2Request().withBucketName("WorkOrder").withMaxKeys(MAX_KEYS);
ListObjectsV2Result result;

do
{
result = amazonS3.listObjectsV2(req);

for (S3ObjectSummary objectSummary : result.getObjectSummaries())
{
summaries.add(objectSummary);
}

String token = result.getNextContinuationToken();
req.setContinuationToken(token);
}
while (result.isTruncated());

The problem is, the ContinuationToken seems to be ignored, i.e. every call to 
amazonS3.listObjectsV2(req) returns the same set, and the loop never ends 
(until the summaries LinkedList overflows).

Thoughts?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin list bucket based on "last modified"

2019-06-25 Thread M Ranga Swami Reddy

Thank you..
Looking into the URL...


On Tue, 25 Jun, 2019, 12:18 PM Torben Hørup,  wrote:

> Hi
>
> You could look into the radosgw elasicsearch sync module, and use that
> to find the objects last modified.
>
> http://docs.ceph.com/docs/master/radosgw/elastic-sync-module/
>
> /Torben
>
> On 25.06.2019 08:19, M Ranga Swami Reddy wrote:
>
> > Thanks for the reply.
> > Btw, one my customer wants to get the objects based on last modified
> > date filed. How do we can achive this?
> >
> > On Thu, Jun 13, 2019 at 7:09 PM Paul Emmerich 
> > wrote:
> >
> > There's no (useful) internal ordering of these entries, so there isn't
> > a more efficient way than getting everything and sorting it :(
> >
> > Paul
> >
> > --
> > Paul Emmerich
> >
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io
> > Tel: +49 89 1896585 90
> >
> > On Thu, Jun 13, 2019 at 3:33 PM M Ranga Swami Reddy
> >  wrote:
> > hello - Can we list the objects in rgw, via last modified date?
> >
> > For example - I wanted to list all the objects which were modified 01
> > Jun 2019.
> >
> > Thanks
> > Swami ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw federation replication

2019-06-25 Thread Marcelo Mariano Miziara

Hi...this page is for an old version (jewel). They call federated multi-site 
nowadays. Read this one instead [ 
http://docs.ceph.com/docs/master/radosgw/multisite/ | 
http://docs.ceph.com/docs/master/radosgw/multisite/ ] . There's some 
instructions in the end about migrating a single site 

Marcelo M. Miziara 
Serviço Federal de Processamento de Dados - SERPRO 
marcelo.mizi...@serpro.gov.br 


De: "Behnam Loghmani"  
Para: "ceph-users"  
Enviadas: Terça-feira, 25 de junho de 2019 4:07:08 
Assunto: [ceph-users] Radosgw federation replication 

Hi there, 

I have a Ceph cluster with radosgw and I use it in my production environment 
for a while. Now I decided to set up another cluster in another geo place to 
have a disaster recovery plan. I read some docs like [ 
http://docs.ceph.com/docs/jewel/radosgw/federated-config/ | 
http://docs.ceph.com/docs/jewel/radosgw/federated-config/ ] , but all of them 
is about making fresh clusters, not an existing one with data and these docs 
aren't available in new versions! 

What do you suggest to make this work in my environment? 
here is my pools: 


.rgw.root 
default.rgw.control 
default.rgw.meta 
default.rgw.log 
default.rgw.buckets.index 
default.rgw.buckets.data 
default.rgw.buckets.non-ec 



Cluster Ceph version 13.2.5 mimic 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

-


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa 
pública federal regida pelo disposto na Lei Federal nº 5.615, é enviada 
exclusivamente a seu destinatário e pode conter informações confidenciais, 
protegidas por sigilo profissional. Sua utilização desautorizada é ilegal e 
sujeita o infrator às penas da lei. Se você a recebeu indevidamente, queira, 
por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."

"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a 
government company established under Brazilian law (5.615/70) -- is directed 
exclusively to its addressee and may contain confidential data, protected under 
professional secrecy rules. Its unauthorized use is illegal and may subject the 
transgressor to the law's penalties. If you're not the addressee, please send 
it back, elucidating the failure."
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Radosgw federation replication

2019-06-25 Thread Behnam Loghmani

Hi there,

I have a Ceph cluster with radosgw and I use it in my production
environment for a while. Now I decided to set up another cluster in another
geo place to have a disaster recovery plan. I read some docs like
http://docs.ceph.com/docs/jewel/radosgw/federated-config/, but all of them
is about making fresh clusters, not an existing one with data and these
docs aren't available in new versions!

What do you suggest to make this work in my environment?
here is my pools:

> .rgw.root
> default.rgw.control
> default.rgw.meta
> default.rgw.log
> default.rgw.buckets.index
> default.rgw.buckets.data
> default.rgw.buckets.non-ec
>

Cluster Ceph version 13.2.5 mimic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin list bucket based on "last modified"

2019-06-25 Thread Torben Hørup


Hi

You could look into the radosgw elasicsearch sync module, and use that 
to find the objects last modified.


http://docs.ceph.com/docs/master/radosgw/elastic-sync-module/

/Torben

On 25.06.2019 08:19, M Ranga Swami Reddy wrote:


Thanks for the reply.
Btw, one my customer wants to get the objects based on last modified 
date filed. How do we can achive this?


On Thu, Jun 13, 2019 at 7:09 PM Paul Emmerich  
wrote:


There's no (useful) internal ordering of these entries, so there isn't 
a more efficient way than getting everything and sorting it :(


Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Jun 13, 2019 at 3:33 PM M Ranga Swami Reddy 
 wrote:

hello - Can we list the objects in rgw, via last modified date?

For example - I wanted to list all the objects which were modified 01 
Jun 2019.


Thanks
Swami ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin list bucket based on "last modified"

2019-06-25 Thread M Ranga Swami Reddy

Thanks for the reply.
Btw, one my customer wants to get the objects based on last modified date
filed. How do we can achive this?


On Thu, Jun 13, 2019 at 7:09 PM Paul Emmerich 
wrote:

> There's no (useful) internal ordering of these entries, so there isn't a
> more efficient way than getting everything and sorting it :(
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Thu, Jun 13, 2019 at 3:33 PM M Ranga Swami Reddy 
> wrote:
>
>> hello - Can we list the objects in rgw, via last modified date?
>>
>> For example - I wanted to list all the objects which were modified 01 Jun
>> 2019.
>>
>> Thanks
>> Swami
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw multisite replication segfaults on init in 13.2.6

2019-06-14 Thread Płaza Tomasz

Hi,

We have a standalone ceph cluster v13.2.6 and wanted to replicate it to another 
DC. After going through "Migrating a Single Site
System to Multi-Site" and "Configure a Secondary Zone" from 
http://docs.ceph.com/docs/master/radosgw/multisite/, We have setted
up all buckets to "disable replication" and started replication. To our suprise 
after a few minutes from start a new pools named
default.rgw.buckets.{index,data} appeared and started getting data.

There was a data split in the indexes pool, like below:
dc2_zone.rgw.control   35 0 B 0   118 TiB   
8
dc2_zone.rgw.meta  36 714 KiB 0   118 TiB   
 2895
dc2_zone.rgw.log   37  14 KiB 0   118 TiB   
  734
dc2_zone.rgw.buckets.index 38 0 B 0   565 GiB   
 7203
default.rgw.buckets.index  39 0 B 0   565 GiB   
 4204
dc2_zone.rgw.buckets.data  40 933 MiB 0   118 TiB   
 2605
Idexes on a secondary pool was inconsistent.

In logs from radosgw setted as an enpoint for secondary zone we found those 
lines:

-10001> 2019-06-14 11:41:45.701 7f46f0959700 -1 *** Caught signal (Segmentation 
fault) **
 in thread 7f46f0959700 thread_name:data-sync
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
1: (()+0xf5d0) [0x7f4739e1c5d0]
2: (RGWCoroutine::set_sleeping(bool)+0xc) [0x5561c28ffe0c]
3: (RGWOmapAppend::flush_pending()+0x2d) [0x5561c2904e1d]
4: (RGWOmapAppend::finish()+0x10) [0x5561c2904f00]
5: (RGWDataSyncShardCR::stop_spawned_services()+0x30) [0x5561c2b44320]
6: (RGWDataSyncShardCR::incremental_sync()+0x4c6) [0x5561c2b5d736]
7: (RGWDataSyncShardCR::operate()+0x75) [0x5561c2b5f0e5]
8: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x46) [0x5561c28fd566]
9: (RGWCoroutinesManager::run(std::list >&)+0x293) [0x5561c2900233]
10: (RGWCoroutinesManager::run(RGWCoroutine*)+0x78) [0x5561c2901108]
11: (RGWRemoteDataLog::run_sync(int)+0x1e7) [0x5561c2b36d37]
12: (RGWDataSyncProcessorThread::process()+0x46) [0x5561c29bacb6]
13: (RGWRadosThread::Worker::entry()+0x22b) [0x5561c295c4cb]
14: (()+0x7dd5) [0x7f4739e14dd5]
15: (clone()+0x6d) [0x7f472e306ead]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

Right now We workaround this by setting pool names in secondary zone to 
default.* and everything looks fine so We are gradually
enabling replication for other buckets and We are observing situation.

Has anyone seen a similar beahaviour?

Best Regards,
Tomasz Płaza
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin list bucket based on "last modified"

2019-06-13 Thread Paul Emmerich

There's no (useful) internal ordering of these entries, so there isn't a
more efficient way than getting everything and sorting it :(


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Thu, Jun 13, 2019 at 3:33 PM M Ranga Swami Reddy 
wrote:

> hello - Can we list the objects in rgw, via last modified date?
>
> For example - I wanted to list all the objects which were modified 01 Jun
> 2019.
>
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw-admin list bucket based on "last modified"

2019-06-13 Thread M Ranga Swami Reddy

hello - Can we list the objects in rgw, via last modified date?

For example - I wanted to list all the objects which were modified 01 Jun
2019.

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw dying

2019-06-09 Thread DHilsbos

All;

Thank you to all who assisted, this was the problem!

My default PG/pool was too high for my total OSD count, and it was unable to 
create all of these pools.

I remove the other pools I had created, and reduced the default PGs / pool, and 
radosgw was able to create all of its default pools, and is now running 
properly.

Tank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com

From: Torben Hørup [tor...@t-hoerup.dk]
Sent: Sunday, June 09, 2019 11:12 AM
To: Paul Emmerich
Cc: Dominic Hilsbos; Ceph Users
Subject: Re: [ceph-users] radosgw dying

For just core rgw services it will need these 4
.rgw.root



  default.rgw.control



   default.rgw.meta
default.rgw.log

When creating buckets and uploading data RGW will need additional 3:

default.rgw.buckets.index
default.rgw.buckets.non-ec
default.rgw.buckets.data

/Torben


On 09.06.2019 19:34, Paul Emmerich wrote:

> rgw uses more than one pool. (5 or 6 IIRC)
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Sun, Jun 9, 2019 at 7:00 PM  wrote:
>
> Huan;
>
> I get that, but the pool already exists, why is radosgw trying to
> create one?
>
> Dominic Hilsbos
>
> Get Outlook for Android
>
> On Sat, Jun 8, 2019 at 2:55 AM -0700, "huang jun" 
> wrote:
>
> From the error message, i'm decline to that 'mon_max_pg_per_osd' was
> exceed,
> you can check the value of it, and its default value is 250, so you
> can at most have 1500pgs(250*6osds),
> and for replicated pools with size=3, you can have 500pgs for all
> pools,
> you already have 448pgs, so the next pool can create at most
> 500-448=52pgs.
>
> 于2019年6月8日周六 下午2:41写道：
>>
>> All;
>>
>> I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x
>> OSD per host), and I'm trying to add a 4th host for gateway purposes.
>>
>> The radosgw process keeps dying with:
>> 2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1
>> (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process
>> radosgw, pid 17588
>> 2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR:
>> librados::Rados::pool_create returned (34) Numerical result out of
>> range (this can be due to a pool or placement group misconfiguration,
>> e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)
>> 2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider
>> (RADOS)
>>
>> The .rgw.root pool already exists.
>>
>> ceph status returns:
>> cluster:
>> id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
>> health: HEALTH_OK
>>
>> services:
>> mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
>> mgr: S700028(active, since 47h), standbys: S700030, S700029
>> osd: 6 osds: 6 up (since 2d), 6 in (since 3d)
>>
>> data:
>> pools:   5 pools, 448 pgs
>> objects: 12 objects, 1.2 KiB
>> usage:   722 GiB used, 65 TiB / 66 TiB avail
>> pgs: 448 active+clean
>>
>> and ceph osd tree returns:
>> ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
>> -1   66.17697 root default
>> -5   22.05899 host S700029
>> 2   hdd 11.02950 osd.2up  1.0 1.0
>> 3   hdd 11.02950 osd.3up  1.0 1.0
>> -7   22.05899 host S700030
>> 4   hdd 11.02950 osd.4up  1.0 1.0
>> 5   hdd 11.02950 osd.5up  1.0 1.0
>> -3   22.05899 host s700028
>> 0   hdd 11.02950 osd.0up  1.0 1.0
>> 1   hdd 11.02950 osd.1up  1.0 1.0
>>
>> Any thoughts on what I'm missing?
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
> Thank you!
> HuangJun
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw dying

2019-06-09 Thread DHilsbos

Certainly.

Output of ceph osd df:
ID CLASS WEIGHT   REWEIGHT SIZE   RAW USE DATAOMAP META  AVAIL  %USE VAR  
PGS STATUS 
 2   hdd 11.02950  1.0 11 TiB 120 GiB  51 MiB  0 B 1 GiB 11 TiB 1.07 1.00 
227 up 
 3   hdd 11.02950  1.0 11 TiB 120 GiB  51 MiB  0 B 1 GiB 11 TiB 1.07 1.00 
221 up 
 4   hdd 11.02950  1.0 11 TiB 120 GiB  51 MiB  0 B 1 GiB 11 TiB 1.07 1.00 
226 up 
 5   hdd 11.02950  1.0 11 TiB 120 GiB  51 MiB  0 B 1 GiB 11 TiB 1.07 1.00 
222 up 
 0   hdd 11.02950  1.0 11 TiB 120 GiB  51 MiB  0 B 1 GiB 11 TiB 1.07 1.00 
217 up 
 1   hdd 11.02950  1.0 11 TiB 120 GiB  51 MiB  0 B 1 GiB 11 TiB 1.07 1.00 
231 up 
 TOTAL 66 TiB 722 GiB 306 MiB  0 B 6 GiB 65 TiB 1.07
 
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

Thank you,

Dominic Hilsbos
Perform Air International Inc.

From: 
Sent: Saturday, June 08, 2019 3:35 AM
To: Dominic Hilsbos
Subject: Re: [ceph-users] radosgw dying

Can you post this?

ceph osd df

On Fri, Jun 7, 2019 at 7:31 PM 
mailto:dhils...@performair.com>> wrote:
All;

I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD per 
host), and I'm trying to add a 4th host for gateway purposes.

The radosgw process keeps dying with:
2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
(d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process radosgw, 
pid 17588
2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of range (this 
can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num 
or mon_max_pg_per_osd exceeded)
2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider (RADOS)

The .rgw.root pool already exists.

ceph status returns:
  cluster:
id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
health: HEALTH_OK

  services:
mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
mgr: S700028(active, since 47h), standbys: S700030, S700029
osd: 6 osds: 6 up (since 2d), 6 in (since 3d)

  data:
pools:   5 pools, 448 pgs
objects: 12 objects, 1.2 KiB
usage:   722 GiB used, 65 TiB / 66 TiB avail
pgs: 448 active+clean

and ceph osd tree returns:
ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
-1   66.17697 root default
-5   22.05899 host S700029
 2   hdd 11.02950 osd.2up  1.0 1.0
 3   hdd 11.02950 osd.3up  1.0 1.0
-7   22.05899 host S700030
 4   hdd 11.02950 osd.4up  1.0 1.0
 5   hdd 11.02950 osd.5up  1.0 1.0
-3   22.05899 host s700028
 0   hdd 11.02950 osd.0up  1.0 1.0
 1   hdd 11.02950 osd.1up  1.0 1.0

Any thoughts on what I'm missing?

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com<http://www.PerformAir.com>



___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Shawn Iverson, CETL
Director of Technology
Rush County Schools
765-932-3901 option 7
ivers...@rushville.k12.in.us<mailto:ivers...@rushville.k12.in.us>

[https://docs.google.com/uc?export=download=0Bw5iD0ToYvs_Zkh4eEs3R01yWXc=0Bw5iD0ToYvs_QWpBK2Y2ajJtYjhOMDRFekZwK2xOamk5Q3Y0PQ][https://docs.google.com/uc?export=download=1aBrlQou4gjB04FY-twHN_0Dn3GHVNxqa=0Bw5iD0ToYvs_RnQ0eDhHcm95WHBFdkNRbXhQRXpoYkR6SEEwPQ][Cybersecurity]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw dying

2019-06-09 Thread Torben Hørup


For just core rgw services it will need these 4
.rgw.root
 
 
 
 default.rgw.control 
 
 
 
  default.rgw.meta

default.rgw.log

When creating buckets and uploading data RGW will need additional 3:

default.rgw.buckets.index
default.rgw.buckets.non-ec
default.rgw.buckets.data

/Torben


On 09.06.2019 19:34, Paul Emmerich wrote:


rgw uses more than one pool. (5 or 6 IIRC)

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Sun, Jun 9, 2019 at 7:00 PM  wrote:

Huan;

I get that, but the pool already exists, why is radosgw trying to 
create one?


Dominic Hilsbos

Get Outlook for Android

On Sat, Jun 8, 2019 at 2:55 AM -0700, "huang jun"  
wrote:


From the error message, i'm decline to that 'mon_max_pg_per_osd' was 
exceed,

you can check the value of it, and its default value is 250, so you
can at most have 1500pgs(250*6osds),
and for replicated pools with size=3, you can have 500pgs for all 
pools,
you already have 448pgs, so the next pool can create at most 
500-448=52pgs.


于2019年6月8日周六 下午2:41写道：


All;

I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x 
OSD per host), and I'm trying to add a 4th host for gateway purposes.


The radosgw process keeps dying with:
2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
(d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process 
radosgw, pid 17588
2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of 
range (this can be due to a pool or placement group misconfiguration, 
e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)
2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider 
(RADOS)


The .rgw.root pool already exists.

ceph status returns:
cluster:
id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
health: HEALTH_OK

services:
mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
mgr: S700028(active, since 47h), standbys: S700030, S700029
osd: 6 osds: 6 up (since 2d), 6 in (since 3d)

data:
pools:   5 pools, 448 pgs
objects: 12 objects, 1.2 KiB
usage:   722 GiB used, 65 TiB / 66 TiB avail
pgs: 448 active+clean

and ceph osd tree returns:
ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
-1   66.17697 root default
-5   22.05899 host S700029
2   hdd 11.02950 osd.2up  1.0 1.0
3   hdd 11.02950 osd.3up  1.0 1.0
-7   22.05899 host S700030
4   hdd 11.02950 osd.4up  1.0 1.0
5   hdd 11.02950 osd.5up  1.0 1.0
-3   22.05899 host s700028
0   hdd 11.02950 osd.0up  1.0 1.0
1   hdd 11.02950 osd.1up  1.0 1.0

Any thoughts on what I'm missing?

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Thank you!
HuangJun

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw dying

2019-06-09 Thread Paul Emmerich

rgw uses more than one pool. (5 or 6 IIRC)

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Sun, Jun 9, 2019 at 7:00 PM  wrote:

> Huan;
>
> I get that, but the pool already exists, why is radosgw trying to create
> one?
>
> Dominic Hilsbos
>
> Get Outlook for Android 
>
>
>
>
> On Sat, Jun 8, 2019 at 2:55 AM -0700, "huang jun" 
> wrote:
>
> From the error message, i'm decline to that 'mon_max_pg_per_osd' was exceed,
>> you can check the value of it, and its default value is 250, so you
>> can at most have 1500pgs(250*6osds),
>> and for replicated pools with size=3, you can have 500pgs for all pools,
>> you already have 448pgs, so the next pool can create at most 500-448=52pgs.
>>  于2019年6月8日周六 下午2:41写道：
>> >
>> > All;
>> >
>> > I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD 
>> > per host), and I'm trying to add a 4th host for gateway purposes.
>> >
>> > The radosgw process keeps dying with:
>> > 2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
>> > (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process 
>> > radosgw, pid 17588
>> > 2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
>> > librados::Rados::pool_create returned (34) Numerical result out of range 
>> > (this can be due to a pool or placement group misconfiguration, e.g. 
>> > pg_num < pgp_num or mon_max_pg_per_osd exceeded)
>> > 2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider 
>> > (RADOS)
>> >
>> > The .rgw.root pool already exists.
>> >
>> > ceph status returns:
>> >   cluster:
>> > id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
>> > health: HEALTH_OK
>> >
>> >   services:
>> > mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
>> > mgr: S700028(active, since 47h), standbys: S700030, S700029
>> > osd: 6 osds: 6 up (since 2d), 6 in (since 3d)
>> >
>> >   data:
>> > pools:   5 pools, 448 pgs
>> > objects: 12 objects, 1.2 KiB
>> > usage:   722 GiB used, 65 TiB / 66 TiB avail
>> > pgs: 448 active+clean
>> >
>> > and ceph osd tree returns:
>> > ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
>> > -1   66.17697 root default
>> > -5   22.05899 host S700029
>> >  2   hdd 11.02950 osd.2up  1.0 1.0
>> >  3   hdd 11.02950 osd.3up  1.0 1.0
>> > -7   22.05899 host S700030
>> >  4   hdd 11.02950 osd.4up  1.0 1.0
>> >  5   hdd 11.02950 osd.5up  1.0 1.0
>> > -3   22.05899 host s700028
>> >  0   hdd 11.02950 osd.0up  1.0 1.0
>> >  1   hdd 11.02950 osd.1up  1.0 1.0
>> >
>> > Any thoughts on what I'm missing?
>> >
>> > Thank you,
>> >
>> > Dominic L. Hilsbos, MBA
>> > Director - Information Technology
>> > Perform Air International Inc.
>> > dhils...@performair.com
>> > www.PerformAir.com
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Thank you!
>> HuangJun
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw dying

2019-06-09 Thread Brett Chancellor

radosgw will try and create all if the default pools if they are missing.
The number of pools changes depending on the version, but it's somewhere
around 5.

On Sun, Jun 9, 2019, 1:00 PM  wrote:

> Huan;
>
> I get that, but the pool already exists, why is radosgw trying to create
> one?
>
> Dominic Hilsbos
>
> Get Outlook for Android 
>
>
>
>
> On Sat, Jun 8, 2019 at 2:55 AM -0700, "huang jun" 
> wrote:
>
> From the error message, i'm decline to that 'mon_max_pg_per_osd' was exceed,
>> you can check the value of it, and its default value is 250, so you
>> can at most have 1500pgs(250*6osds),
>> and for replicated pools with size=3, you can have 500pgs for all pools,
>> you already have 448pgs, so the next pool can create at most 500-448=52pgs.
>>  于2019年6月8日周六 下午2:41写道：
>> >
>> > All;
>> >
>> > I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD 
>> > per host), and I'm trying to add a 4th host for gateway purposes.
>> >
>> > The radosgw process keeps dying with:
>> > 2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
>> > (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process 
>> > radosgw, pid 17588
>> > 2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
>> > librados::Rados::pool_create returned (34) Numerical result out of range 
>> > (this can be due to a pool or placement group misconfiguration, e.g. 
>> > pg_num < pgp_num or mon_max_pg_per_osd exceeded)
>> > 2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider 
>> > (RADOS)
>> >
>> > The .rgw.root pool already exists.
>> >
>> > ceph status returns:
>> >   cluster:
>> > id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
>> > health: HEALTH_OK
>> >
>> >   services:
>> > mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
>> > mgr: S700028(active, since 47h), standbys: S700030, S700029
>> > osd: 6 osds: 6 up (since 2d), 6 in (since 3d)
>> >
>> >   data:
>> > pools:   5 pools, 448 pgs
>> > objects: 12 objects, 1.2 KiB
>> > usage:   722 GiB used, 65 TiB / 66 TiB avail
>> > pgs: 448 active+clean
>> >
>> > and ceph osd tree returns:
>> > ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
>> > -1   66.17697 root default
>> > -5   22.05899 host S700029
>> >  2   hdd 11.02950 osd.2up  1.0 1.0
>> >  3   hdd 11.02950 osd.3up  1.0 1.0
>> > -7   22.05899 host S700030
>> >  4   hdd 11.02950 osd.4up  1.0 1.0
>> >  5   hdd 11.02950 osd.5up  1.0 1.0
>> > -3   22.05899 host s700028
>> >  0   hdd 11.02950 osd.0up  1.0 1.0
>> >  1   hdd 11.02950 osd.1up  1.0 1.0
>> >
>> > Any thoughts on what I'm missing?
>> >
>> > Thank you,
>> >
>> > Dominic L. Hilsbos, MBA
>> > Director - Information Technology
>> > Perform Air International Inc.
>> > dhils...@performair.com
>> > www.PerformAir.com
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Thank you!
>> HuangJun
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw dying

2019-06-09 Thread DHilsbos

Huan;

I get that, but the pool already exists, why is radosgw trying to create one?

Dominic Hilsbos

Get Outlook for Android




On Sat, Jun 8, 2019 at 2:55 AM -0700, "huang jun" 
mailto:hjwsm1...@gmail.com>> wrote:


>From the error message, i'm decline to that 'mon_max_pg_per_osd' was exceed,
you can check the value of it, and its default value is 250, so you
can at most have 1500pgs(250*6osds),
and for replicated pools with size=3, you can have 500pgs for all pools,
you already have 448pgs, so the next pool can create at most 500-448=52pgs.

 于2019年6月8日周六 下午2:41写道：
>
> All;
>
> I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD 
> per host), and I'm trying to add a 4th host for gateway purposes.
>
> The radosgw process keeps dying with:
> 2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
> (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process 
> radosgw, pid 17588
> 2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
> librados::Rados::pool_create returned (34) Numerical result out of range 
> (this can be due to a pool or placement group misconfiguration, e.g. pg_num < 
> pgp_num or mon_max_pg_per_osd exceeded)
> 2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider (RADOS)
>
> The .rgw.root pool already exists.
>
> ceph status returns:
>   cluster:
> id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
> mgr: S700028(active, since 47h), standbys: S700030, S700029
> osd: 6 osds: 6 up (since 2d), 6 in (since 3d)
>
>   data:
> pools:   5 pools, 448 pgs
> objects: 12 objects, 1.2 KiB
> usage:   722 GiB used, 65 TiB / 66 TiB avail
> pgs: 448 active+clean
>
> and ceph osd tree returns:
> ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
> -1   66.17697 root default
> -5   22.05899 host S700029
>  2   hdd 11.02950 osd.2up  1.0 1.0
>  3   hdd 11.02950 osd.3up  1.0 1.0
> -7   22.05899 host S700030
>  4   hdd 11.02950 osd.4up  1.0 1.0
>  5   hdd 11.02950 osd.5up  1.0 1.0
> -3   22.05899 host s700028
>  0   hdd 11.02950 osd.0up  1.0 1.0
>  1   hdd 11.02950 osd.1up  1.0 1.0
>
> Any thoughts on what I'm missing?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Thank you!
HuangJun

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw dying

2019-06-08 Thread huang jun

From the error message, i'm decline to that 'mon_max_pg_per_osd' was exceed,
you can check the value of it, and its default value is 250, so you
can at most have 1500pgs(250*6osds),
and for replicated pools with size=3, you can have 500pgs for all pools,
you already have 448pgs, so the next pool can create at most 500-448=52pgs.

 于2019年6月8日周六 下午2:41写道：
>
> All;
>
> I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD 
> per host), and I'm trying to add a 4th host for gateway purposes.
>
> The radosgw process keeps dying with:
> 2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
> (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process 
> radosgw, pid 17588
> 2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
> librados::Rados::pool_create returned (34) Numerical result out of range 
> (this can be due to a pool or placement group misconfiguration, e.g. pg_num < 
> pgp_num or mon_max_pg_per_osd exceeded)
> 2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider (RADOS)
>
> The .rgw.root pool already exists.
>
> ceph status returns:
>   cluster:
> id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
> mgr: S700028(active, since 47h), standbys: S700030, S700029
> osd: 6 osds: 6 up (since 2d), 6 in (since 3d)
>
>   data:
> pools:   5 pools, 448 pgs
> objects: 12 objects, 1.2 KiB
> usage:   722 GiB used, 65 TiB / 66 TiB avail
> pgs: 448 active+clean
>
> and ceph osd tree returns:
> ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
> -1   66.17697 root default
> -5   22.05899 host S700029
>  2   hdd 11.02950 osd.2up  1.0 1.0
>  3   hdd 11.02950 osd.3up  1.0 1.0
> -7   22.05899 host S700030
>  4   hdd 11.02950 osd.4up  1.0 1.0
>  5   hdd 11.02950 osd.5up  1.0 1.0
> -3   22.05899 host s700028
>  0   hdd 11.02950 osd.0up  1.0 1.0
>  1   hdd 11.02950 osd.1up  1.0 1.0
>
> Any thoughts on what I'm missing?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw dying

2019-06-07 Thread DHilsbos

All;

I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD per 
host), and I'm trying to add a 4th host for gateway purposes.

The radosgw process keeps dying with:
2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
(d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process radosgw, 
pid 17588
2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of range (this 
can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num 
or mon_max_pg_per_osd exceeded)
2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider (RADOS)

The .rgw.root pool already exists.

ceph status returns:
  cluster:
id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
health: HEALTH_OK

  services:
mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
mgr: S700028(active, since 47h), standbys: S700030, S700029
osd: 6 osds: 6 up (since 2d), 6 in (since 3d)

  data:
pools:   5 pools, 448 pgs
objects: 12 objects, 1.2 KiB
usage:   722 GiB used, 65 TiB / 66 TiB avail
pgs: 448 active+clean

and ceph osd tree returns:
ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
-1   66.17697 root default
-5   22.05899 host S700029
 2   hdd 11.02950 osd.2up  1.0 1.0
 3   hdd 11.02950 osd.3up  1.0 1.0
-7   22.05899 host S700030
 4   hdd 11.02950 osd.4up  1.0 1.0
 5   hdd 11.02950 osd.5up  1.0 1.0
-3   22.05899 host s700028
 0   hdd 11.02950 osd.0up  1.0 1.0
 1   hdd 11.02950 osd.1up  1.0 1.0

Any thoughts on what I'm missing?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw in container

2019-06-05 Thread Brett Chancellor

It works okay. You need a ceph.conf and a generic radosgw cephx key. That's
it.

On Wed, Jun 5, 2019, 5:37 AM Marc Roos  wrote:

>
>
> Has anyone put the radosgw in a container? What files do I need to put
> in the sandbox directory? Are there other things I should consider?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Radosgw in container

2019-06-05 Thread Marc Roos




Has anyone put the radosgw in a container? What files do I need to put 
in the sandbox directory? Are there other things I should consider?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw index all keys in all buckets [EXT]

2019-05-13 Thread Matthew Vernon

Hi,

On 02/05/2019 22:00, Aaron Bassett wrote:

> With these caps I'm able to use a python radosgw-admin lib to list
> buckets and acls and users, but not keys. This user is also unable to
> read buckets and/or keys through the normal s3 api. Is there a way to
> create an s3 user that has read access to all buckets and keys
> without explicitly being granted acls?
I think you might want the --system argument to radosgw-admin user modify?

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw object size limit?

2019-05-10 Thread Jan Kasprzak

Hello,

thanks for your help.

Casey Bodley wrote:
: It looks like the default.rgw.buckets.non-ec pool is missing, which
: is where we track in-progress multipart uploads. So I'm guessing
: that your perl client is not doing a multipart upload, where s3cmd
: does by default.
: 
: I'd recommend debugging this by trying to create the pool manually -
: the only requirement for this pool is that it not be erasure coded.
: See the docs for your ceph release for more information:
: 
: http://docs.ceph.com/docs/luminous/rados/operations/pools/#create-a-pool
: 
: http://docs.ceph.com/docs/luminous/rados/operations/placement-groups/

I use Mimic, FWIW. I created the pool in question manually:

# ceph osd pool create default.rgw.buckets.non-ec 32
pool 'default.rgw.buckets.non-ec' created
# 

and it finished without any error. Now I can do multipart uploads
using s3cmd.

What could be the problem? Maybe radosgw cephx user does not have
sufficient rights to create a pool? ceph auth ls shows the following
keys:

client.bootstrap-rgw
key: ...
caps: [mgr] allow r
caps: [mon] allow profile bootstrap-rgw
client.rgw.myrgwhost
key: ...
caps: [mon] allow rw
caps: [osd] allow rwx

Is this correct?

Thank you very much!

-Yenya


-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
sir_clive> I hope you don't mind if I steal some of your ideas?
 laryross> As far as stealing... we call it sharing here.   --from rcgroups
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw object size limit?

2019-05-10 Thread Casey Bodley




On 5/10/19 10:20 AM, Jan Kasprzak wrote:

Hello Casey (and the ceph-users list),

I am returning to my older problem to which you replied:

Casey Bodley wrote:
: There is a rgw_max_put_size which defaults to 5G, which limits the
: size of a single PUT request. But in that case, the http response
: would be 400 EntityTooLarge. For multipart uploads, there's also a
: rgw_multipart_part_upload_limit that defaults to 1 parts, which
: would cause a 416 InvalidRange error. By default though, s3cmd does
: multipart uploads with 15MB parts, so your 11G object should only
: have ~750 parts.
:
: Are you able to upload smaller objects successfully? These
: InvalidRange errors can also result from failures to create any
: rados pools that didn't exist already. If that's what you're
: hitting, you'd get the same InvalidRange errors for smaller object
: uploads, and you'd also see messages like this in your radosgw log:
:
: > rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34)
: Numerical result out of range (this can be due to a pool or
: placement group misconfiguration, e.g. pg_num < pgp_num or
: mon_max_pg_per_osd exceeded)

You are right. Now how do I know which pool it is and what is the
reason?

Anyway, If I try to upload a CentOS 7 ISO image using
Perl module Net::Amazon::S3, it works. I do something like this there:

 my $bucket = $s3->add_bucket({
 bucket => 'testbucket',
 acl_short => 'private',
 });
$bucket->add_key_filename("testdir/$dst", $file, {
content_type => 'application/octet-stream'
 }) or die $s3->err . ': ' . $s3->errstr;

and I see the following in /var/log/ceph/ceph-client.rgwlog:

2019-05-10 15:55:28.394 7f4b859b8700  1 civetweb: 0x558108506000: 127.0.0.1 - - 
[10/May/2019:15:53:50 +0200] "PUT 
/testbucket/testdir/CentOS-7-x86_64-Everything-1810.iso HTTP/1.1" 200 234 - 
libwww-perl/6.38

I can see the uploaded object using "s3cmd ls", and I can download it back
using "s3cmd get", with matching sha1sum. When I do the same using
"s3cmd put" instead of Perl module, I indeed get the pool create failure:

2019-05-10 15:53:14.914 7f4b859b8700  1 == starting new request 
req=0x7f4b859af850 =
2019-05-10 15:53:15.492 7f4b859b8700  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of range (this can 
be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num or 
mon_max_pg_per_osd exceeded)
2019-05-10 15:53:15.492 7f4b859b8700  1 == req done req=0x7f4b859af850 op 
status=-34 http_status=416 ==
2019-05-10 15:53:15.492 7f4b859b8700  1 civetweb: 0x558108506000: 127.0.0.1 - - 
[10/May/2019:15:53:14 +0200] "POST /testbucket/testdir/c7.iso?uploads HTTP/1.0" 
416 469 - -

So maybe the Perl module is configured differently? But which pool or
other parameter is the problem? I have the following pools:

# ceph osd pool ls
one
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.data


It looks like the default.rgw.buckets.non-ec pool is missing, which is 
where we track in-progress multipart uploads. So I'm guessing that your 
perl client is not doing a multipart upload, where s3cmd does by default.


I'd recommend debugging this by trying to create the pool manually - the 
only requirement for this pool is that it not be erasure coded. See the 
docs for your ceph release for more information:


http://docs.ceph.com/docs/luminous/rados/operations/pools/#create-a-pool

http://docs.ceph.com/docs/luminous/rados/operations/placement-groups/


(the "one" pool is unrelated to RadosGW, it contains OpenNebula RBD images).

Thanks,

-Yenya

: On 3/7/19 12:21 PM, Jan Kasprzak wrote:
: >  Hello, Ceph users,
: >
: >does radosgw have an upper limit of object size? I tried to upload
: >a 11GB file using s3cmd, but it failed with InvalidRange error:
: >
: >$ s3cmd put --verbose 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso s3://mybucket/
: >INFO: No cache file found, creating it.
: >INFO: Compiling list of local files...
: >INFO: Running stat() and reading/calculating MD5 values on 1 files, this may 
take some time...
: >INFO: Summary: 1 local files to upload
: >WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner username not known. 
Storing UID=108 instead.
: >WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner groupname not known. 
Storing GID=108 instead.
: >ERROR: S3 error: 416 (InvalidRange)
: >
: >$ ls -lh centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
: >-rw-r--r--. 1 108 108 11G Nov 26 15:28 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
: >
: >Thanks for any hint how to increase the limit.
: >
: >-Yenya
: >
: ___
: ceph-users mailing list
: ceph-users@lists.ceph.com
: http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list

Re: [ceph-users] Radosgw object size limit?

2019-05-10 Thread Jan Kasprzak

Hello Casey (and the ceph-users list),

I am returning to my older problem to which you replied:

Casey Bodley wrote:
: There is a rgw_max_put_size which defaults to 5G, which limits the
: size of a single PUT request. But in that case, the http response
: would be 400 EntityTooLarge. For multipart uploads, there's also a
: rgw_multipart_part_upload_limit that defaults to 1 parts, which
: would cause a 416 InvalidRange error. By default though, s3cmd does
: multipart uploads with 15MB parts, so your 11G object should only
: have ~750 parts.
: 
: Are you able to upload smaller objects successfully? These
: InvalidRange errors can also result from failures to create any
: rados pools that didn't exist already. If that's what you're
: hitting, you'd get the same InvalidRange errors for smaller object
: uploads, and you'd also see messages like this in your radosgw log:
: 
: > rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34)
: Numerical result out of range (this can be due to a pool or
: placement group misconfiguration, e.g. pg_num < pgp_num or
: mon_max_pg_per_osd exceeded)

You are right. Now how do I know which pool it is and what is the
reason?

Anyway, If I try to upload a CentOS 7 ISO image using
Perl module Net::Amazon::S3, it works. I do something like this there:

my $bucket = $s3->add_bucket({
bucket => 'testbucket',
acl_short => 'private',
});
$bucket->add_key_filename("testdir/$dst", $file, {
content_type => 'application/octet-stream'
}) or die $s3->err . ': ' . $s3->errstr;

and I see the following in /var/log/ceph/ceph-client.rgwlog:

2019-05-10 15:55:28.394 7f4b859b8700  1 civetweb: 0x558108506000: 127.0.0.1 - - 
[10/May/2019:15:53:50 +0200] "PUT 
/testbucket/testdir/CentOS-7-x86_64-Everything-1810.iso HTTP/1.1" 200 234 - 
libwww-perl/6.38

I can see the uploaded object using "s3cmd ls", and I can download it back
using "s3cmd get", with matching sha1sum. When I do the same using
"s3cmd put" instead of Perl module, I indeed get the pool create failure:

2019-05-10 15:53:14.914 7f4b859b8700  1 == starting new request 
req=0x7f4b859af850 =
2019-05-10 15:53:15.492 7f4b859b8700  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of range (this 
can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num 
or mon_max_pg_per_osd exceeded)
2019-05-10 15:53:15.492 7f4b859b8700  1 == req done req=0x7f4b859af850 op 
status=-34 http_status=416 ==
2019-05-10 15:53:15.492 7f4b859b8700  1 civetweb: 0x558108506000: 127.0.0.1 - - 
[10/May/2019:15:53:14 +0200] "POST /testbucket/testdir/c7.iso?uploads HTTP/1.0" 
416 469 - -

So maybe the Perl module is configured differently? But which pool or
other parameter is the problem? I have the following pools:

# ceph osd pool ls
one
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.data

(the "one" pool is unrelated to RadosGW, it contains OpenNebula RBD images).

Thanks,

-Yenya

: On 3/7/19 12:21 PM, Jan Kasprzak wrote:
: > Hello, Ceph users,
: >
: >does radosgw have an upper limit of object size? I tried to upload
: >a 11GB file using s3cmd, but it failed with InvalidRange error:
: >
: >$ s3cmd put --verbose 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso s3://mybucket/
: >INFO: No cache file found, creating it.
: >INFO: Compiling list of local files...
: >INFO: Running stat() and reading/calculating MD5 values on 1 files, this may 
take some time...
: >INFO: Summary: 1 local files to upload
: >WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner username not known. 
Storing UID=108 instead.
: >WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner groupname not known. 
Storing GID=108 instead.
: >ERROR: S3 error: 416 (InvalidRange)
: >
: >$ ls -lh centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
: >-rw-r--r--. 1 108 108 11G Nov 26 15:28 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
: >
: >Thanks for any hint how to increase the limit.
: >
: >-Yenya
: >
: ___
: ceph-users mailing list
: ceph-users@lists.ceph.com
: http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
sir_clive> I hope you don't mind if I steal some of your ideas?
 laryross> As far as stealing... we call it sharing here.   --from rcgroups
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw daemons constantly reading default.rgw.log pool

2019-05-03 Thread Vladimir Brik


Hello

I have set up rados gateway using "ceph-deploy rgw create" (default 
pools, 3 machines acting as gateways) on Ceph 13.2.5.


For over 2 weeks now, the three rados gateways have been generating 
constant ~30MB/s 4K ops/s of read i/o on default.rgw.log even though 
nothing is using the rados gateways.


Nothing in the logs except occasional
7fbce9329700  0 RGWReshardLock::lock failed to acquire lock on 
reshard.00 ret=-16


Anybody know what might be going on?


Thanks,

Vlad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw index all keys in all buckets

2019-05-02 Thread Aaron Bassett

Hello,
I'm trying to write a tool to index all keys in all buckets stored in radosgw. 
I've created a user with the following caps:

"caps": [
{
"type": "buckets",
"perm": "read"
},
{
"type": "metadata",
"perm": "read"
},
{
"type": "usage",
"perm": "read"
},
{
"type": "users",
"perm": "read"
}
],


With these caps I'm able to use a python radosgw-admin lib to list buckets and 
acls and users, but not keys. This user is also unable to read buckets and/or 
keys through the normal s3 api. Is there a way to create an s3 user that has 
read access to all buckets and keys without explicitly being granted acls?

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW ops log lag?

2019-04-17 Thread Matt Benjamin

It should not be best effort.  As written, exactly
rgw_usage_log_flush_threshold outstanding log entries will be
buffered.  The default value for this parameter is 1024, which is
probably not high for a sustained workload, but you could experiment
with reducing it.

Matt

On Fri, Apr 12, 2019 at 11:21 AM Aaron Bassett
 wrote:
>
> Ok thanks. Is the expectation that events will be available on that socket as 
> soon as the occur or is it more of a best effort situation? I'm just trying 
> to nail down which side of the socket might be lagging. It's pretty difficult 
> to recreate this as I have to hit the cluster very hard to get it to start 
> lagging.
>
> Thanks, Aaron
>
> > On Apr 12, 2019, at 11:16 AM, Matt Benjamin  wrote:
> >
> > Hi Aaron,
> >
> > I don't think that exists currently.
> >
> > Matt
> >
> > On Fri, Apr 12, 2019 at 11:12 AM Aaron Bassett
> >  wrote:
> >>
> >> I have an radogw log centralizer that we use to for an audit trail for 
> >> data access in our ceph clusters. We've enabled the ops log socket and 
> >> added logging of the http_authorization header to it:
> >>
> >> rgw log http headers = "http_authorization"
> >> rgw ops log socket path = /var/run/ceph/rgw-ops.sock
> >> rgw enable ops log = true
> >>
> >> We have a daemon that listens on the ops socket, extracts/manipulates some 
> >> information from the ops log, and sends it off to our log aggregator.
> >>
> >> This setup works pretty well for the most part, except when the cluster 
> >> comes under heavy load, it can get _very_ laggy - sometimes up to several 
> >> hours behind. I'm having a hard time nailing down whats causing this lag. 
> >> The daemon is rather naive, basically just some nc with jq in between, but 
> >> the log aggregator has plenty of spare capacity, so I don't think its 
> >> slowing down how fast the daemon is consuming from the socket.
> >>
> >> I was revisiting the documentation about this ops log and noticed the 
> >> following which I hadn't seen previously:
> >>
> >> When specifying a UNIX domain socket, it is also possible to specify the 
> >> maximum amount of memory that will be used to keep the data backlog:
> >> rgw ops log data backlog = 
> >> Any backlogged data in excess to the specified size will be lost, so the 
> >> socket needs to be read constantly.
> >>
> >> I'm wondering if theres a way I can query radosgw for the current size of 
> >> that backlog to help me narrow down where the bottleneck may be occuring.
> >>
> >> Thanks,
> >> Aaron
> >>
> >>
> >>
> >> CONFIDENTIALITY NOTICE
> >> This e-mail message and any attachments are only for the use of the 
> >> intended recipient and may contain information that is privileged, 
> >> confidential or exempt from disclosure under applicable law. If you are 
> >> not the intended recipient, any disclosure, distribution or other use of 
> >> this e-mail message or attachments is prohibited. If you have received 
> >> this e-mail message in error, please delete and notify the sender 
> >> immediately. Thank you.
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=sIK_aBR3PrR2olfXOZWgvPVm7jIoZtvEk2YHofl4TDU=FzFoCJ8qtZ66OKdL1Ph10qjZbCEjvMg9JyS_9LwEpSg=
> >>
> >>
> >
> >
> > --
> >
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.redhat.com_en_technologies_storage=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=sIK_aBR3PrR2olfXOZWgvPVm7jIoZtvEk2YHofl4TDU=hi6_HiZS0D_nzAqKsvJPPfmi8nZSv4lZCRFZ1ru9CxM=
> >
> > tel.  734-821-5101
> > fax.  734-769-8938
> > cel.  734-216-5309
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw in Nautilus: message "client_io->complete_request() returned Broken pipe"

2019-04-17 Thread Francois Lafont


Hi @ll,

I have a Nautilus Ceph cluster UP with radosgw
in a zonegroup. I'm using the web frontend Beast
(the default in Nautilus). All seems to work fine
but in the log of radosgw I have this message:

Apr 17 14:02:56 rgw-m-1 ceph-m-rgw.rgw-m-1.rgw0[888]: 2019-04-17 14:02:56.410 
7fe659803700  0 ERROR: client_io->complete_request() returned Broken pipe

approximately every ~2-3 minutes (it's an average,
it's random, it's not every 2 minutes exactly).
I think the code which generates this message is
here:

https://github.com/ceph/ceph/blob/master/src/rgw/rgw_process.cc#L283-L287

but I'm completely unqualified to understand the code.
What is the meaning this error message? Should I worry
about this message?

François (flaf)


PS: just in case, here my conf:


~$ cat /etc/ceph/ceph-m.conf
[client.rgw.rgw-m-1.rgw0]
host = rgw-m-1
keyring = /var/lib/ceph/radosgw/ceph-m-rgw.rgw-m-1.rgw0/keyring
log file = /var/log/ceph/ceph-m-rgw-rgw-m-1.rgw0.log
rgw frontends = beast endpoint=192.168.222.1:80
rgw thread pool size = 512

[client.rgw.rgw-m-2.rgw0]
host = rgw-m-2
keyring = /var/lib/ceph/radosgw/ceph-m-rgw.rgw-m-2.rgw0/keyring
log file = /var/log/ceph/ceph-m-rgw-rgw-m-2.rgw0.log
rgw frontends = beast endpoint=192.168.222.2:80
rgw thread pool size = 512

# Please do not change this file directly since it is managed by Ansible and 
will be overwritten
[global]
cluster network = 10.90.90.0/24
debug_rgw = 0/5
fsid = bb27079f-f116-4440-8a64-9ed430dc17be
log file = /dev/null
mon cluster log file = /dev/null
mon host = 
[v2:192.168.221.31:3300,v1:192.168.221.31:6789],[v2:192.168.221.32:3300,v1:192.168.221.32:6789],[v2:192.168.221.33:3300,v1:192.168.221.33:6789]
mon_osd_down_out_subtree_limit = host
mon_osd_min_down_reporters = 4
osd_crush_chooseleaf_type = 1
osd_crush_update_on_start = true
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 8
osd_pool_default_ppg_num = 8
osd_pool_default_size = 3
public network = 192.168.221.0/25
rgw_enable_ops_log = true
rgw_log_http_headers = http_x_forwarded_for
rgw_ops_log_socket_path = /var/run/ceph/rgw-opslog.asok
rgw_realm = denmark
rgw_zone = zone-m
rgw_zonegroup = copenhagen


Installation via ceph-ansible with a docker deployment version stable 4.0.
ceph_docker_image: v4.0.0-stable-4.0-nautilus-centos-7-x86_64
ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW ops log lag?

2019-04-12 Thread Aaron Bassett

Ok thanks. Is the expectation that events will be available on that socket as 
soon as the occur or is it more of a best effort situation? I'm just trying to 
nail down which side of the socket might be lagging. It's pretty difficult to 
recreate this as I have to hit the cluster very hard to get it to start lagging.

Thanks, Aaron 

> On Apr 12, 2019, at 11:16 AM, Matt Benjamin  wrote:
> 
> Hi Aaron,
> 
> I don't think that exists currently.
> 
> Matt
> 
> On Fri, Apr 12, 2019 at 11:12 AM Aaron Bassett
>  wrote:
>> 
>> I have an radogw log centralizer that we use to for an audit trail for data 
>> access in our ceph clusters. We've enabled the ops log socket and added 
>> logging of the http_authorization header to it:
>> 
>> rgw log http headers = "http_authorization"
>> rgw ops log socket path = /var/run/ceph/rgw-ops.sock
>> rgw enable ops log = true
>> 
>> We have a daemon that listens on the ops socket, extracts/manipulates some 
>> information from the ops log, and sends it off to our log aggregator.
>> 
>> This setup works pretty well for the most part, except when the cluster 
>> comes under heavy load, it can get _very_ laggy - sometimes up to several 
>> hours behind. I'm having a hard time nailing down whats causing this lag. 
>> The daemon is rather naive, basically just some nc with jq in between, but 
>> the log aggregator has plenty of spare capacity, so I don't think its 
>> slowing down how fast the daemon is consuming from the socket.
>> 
>> I was revisiting the documentation about this ops log and noticed the 
>> following which I hadn't seen previously:
>> 
>> When specifying a UNIX domain socket, it is also possible to specify the 
>> maximum amount of memory that will be used to keep the data backlog:
>> rgw ops log data backlog = 
>> Any backlogged data in excess to the specified size will be lost, so the 
>> socket needs to be read constantly.
>> 
>> I'm wondering if theres a way I can query radosgw for the current size of 
>> that backlog to help me narrow down where the bottleneck may be occuring.
>> 
>> Thanks,
>> Aaron
>> 
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> This e-mail message and any attachments are only for the use of the intended 
>> recipient and may contain information that is privileged, confidential or 
>> exempt from disclosure under applicable law. If you are not the intended 
>> recipient, any disclosure, distribution or other use of this e-mail message 
>> or attachments is prohibited. If you have received this e-mail message in 
>> error, please delete and notify the sender immediately. Thank you.
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=sIK_aBR3PrR2olfXOZWgvPVm7jIoZtvEk2YHofl4TDU=FzFoCJ8qtZ66OKdL1Ph10qjZbCEjvMg9JyS_9LwEpSg=
>> 
>> 
> 
> 
> -- 
> 
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.redhat.com_en_technologies_storage=DwIFaQ=Tpa2GKmmYSmpYS4baANxQwQYqA0vwGXwkJOPBegaiTs=5nKer5huNDFQXjYpOR4o_7t5CRI8wb5Vb_v1pBywbYw=sIK_aBR3PrR2olfXOZWgvPVm7jIoZtvEk2YHofl4TDU=hi6_HiZS0D_nzAqKsvJPPfmi8nZSv4lZCRFZ1ru9CxM=
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW ops log lag?

2019-04-12 Thread Matt Benjamin

Hi Aaron,

I don't think that exists currently.

Matt

On Fri, Apr 12, 2019 at 11:12 AM Aaron Bassett
 wrote:
>
> I have an radogw log centralizer that we use to for an audit trail for data 
> access in our ceph clusters. We've enabled the ops log socket and added 
> logging of the http_authorization header to it:
>
> rgw log http headers = "http_authorization"
> rgw ops log socket path = /var/run/ceph/rgw-ops.sock
> rgw enable ops log = true
>
> We have a daemon that listens on the ops socket, extracts/manipulates some 
> information from the ops log, and sends it off to our log aggregator.
>
> This setup works pretty well for the most part, except when the cluster comes 
> under heavy load, it can get _very_ laggy - sometimes up to several hours 
> behind. I'm having a hard time nailing down whats causing this lag. The 
> daemon is rather naive, basically just some nc with jq in between, but the 
> log aggregator has plenty of spare capacity, so I don't think its slowing 
> down how fast the daemon is consuming from the socket.
>
> I was revisiting the documentation about this ops log and noticed the 
> following which I hadn't seen previously:
>
> When specifying a UNIX domain socket, it is also possible to specify the 
> maximum amount of memory that will be used to keep the data backlog:
> rgw ops log data backlog = 
> Any backlogged data in excess to the specified size will be lost, so the 
> socket needs to be read constantly.
>
> I'm wondering if theres a way I can query radosgw for the current size of 
> that backlog to help me narrow down where the bottleneck may be occuring.
>
> Thanks,
> Aaron
>
>
>
> CONFIDENTIALITY NOTICE
> This e-mail message and any attachments are only for the use of the intended 
> recipient and may contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended 
> recipient, any disclosure, distribution or other use of this e-mail message 
> or attachments is prohibited. If you have received this e-mail message in 
> error, please delete and notify the sender immediately. Thank you.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW ops log lag?

2019-04-12 Thread Aaron Bassett

I have an radogw log centralizer that we use to for an audit trail for data 
access in our ceph clusters. We've enabled the ops log socket and added logging 
of the http_authorization header to it:

rgw log http headers = "http_authorization"
rgw ops log socket path = /var/run/ceph/rgw-ops.sock
rgw enable ops log = true

We have a daemon that listens on the ops socket, extracts/manipulates some 
information from the ops log, and sends it off to our log aggregator.

This setup works pretty well for the most part, except when the cluster comes 
under heavy load, it can get _very_ laggy - sometimes up to several hours 
behind. I'm having a hard time nailing down whats causing this lag. The daemon 
is rather naive, basically just some nc with jq in between, but the log 
aggregator has plenty of spare capacity, so I don't think its slowing down how 
fast the daemon is consuming from the socket.

I was revisiting the documentation about this ops log and noticed the following 
which I hadn't seen previously:

When specifying a UNIX domain socket, it is also possible to specify the 
maximum amount of memory that will be used to keep the data backlog:
rgw ops log data backlog = 
Any backlogged data in excess to the specified size will be lost, so the socket 
needs to be read constantly.

I'm wondering if theres a way I can query radosgw for the current size of that 
backlog to help me narrow down where the bottleneck may be occuring.

Thanks,
Aaron



CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw cloud sync aws s3 auth failed

2019-04-08 Thread Robin H. Johnson

On Mon, Apr 08, 2019 at 06:38:59PM +0800, 黄明友 wrote:
> 
> hi,all
> 
>I had test the cloud sync module in radosgw.  ceph verion is
>13.2.5  , git commit id is
>cbff874f9007f1869bfd3821b7e33b2a6ffd4988;
Reading src/rgw/rgw_rest_client.cc
shows that it only generates v2 signatures for the sync module :-(

AWS China regions are some of the v4-only regions.

I don't know of any current work to tackle this, but there is v4
signature generation code already in the codebase, would just need to be
wired up in src/rgw/rgw_rest_client.cc.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw cloud sync aws s3 auth failed

2019-04-08 Thread 黄明友


hi,all

   I had test the cloud sync module in radosgw.  ceph verion is 13.2.5  , 
git commit id is  cbff874f9007f1869bfd3821b7e33b2a6ffd4988;

when sync to a aws s3 endpoint ,get http 400 error , so I use http:// protocol 
,use the tcpick tool to  dump some message like this.

PUT /wuxi01 HTTP/1.1


  
Host: s3.cn-north-1.amazonaws.com.cn
Accept: */*
Authorization: AWS AKIAUQ2G7NKZFVDQ76FZ:7ThaXKa3axR7Egf1tkwZc/YNRm4=
Date: Mon, 08 Apr 2019 10:04:37 +
Content-Length: 0
HTTP/1.1 400 Bad Request
x-amz-request-id: 65803EFC370CF11A
x-amz-id-2: 
py6N1QJw+pd91mvL0XpQhiwIVOiWIUprAX8PwAuSVOx3vrqat/Ka+xIVW3D1zC0+tJSLQyr4qC4=
x-amz-region: cn-north-1
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Mon, 08 Apr 2019 10:04:37 GMT
Connection: close
Server: AmazonS3
144

InvalidRequestThe authorization mechanism you have 
provided is not supported. Please use 
AWS4-HMAC-SHA256.65803EFC370CF11Apy6N1QJw+pd91mvL0XpQhiwIVOiWIUprAX8PwAuSVOx3vrqat/Ka+xIVW3D1zC0+tJSLQyr4qC4=
0



it looks like that the client use a old auth method, not use the 
aws4-hmac-sha256. but , how can enable the aws4-hmac-sha256 auth method?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-11 Thread Trey Palmer

HI Casey,

We're still trying to figure this sync problem out, if you could possibly
tell us anything further we would be deeply grateful!

Our errors are coming from 'data sync'.   In `sync status` we pretty
constantly show one shard behind, but a different one each time we run it.

Here's a paste -- these commands were run in rapid succession.

root@sv3-ceph-rgw1:~# radosgw-admin sync status
  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
   zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
root@sv3-ceph-rgw1:~# radosgw-admin sync status
  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
   zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [30]
oldest incremental change not applied: 2019-01-19
22:53:23.0.16109s
source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
root@sv3-ceph-rgw1:~#


Below I'm pasting a small section of log.  Thanks so much for looking!

Trey Palmer


root@sv3-ceph-rgw1:/var/log/ceph# tail -f ceph-rgw-sv3-ceph-rgw1.log | grep
-i error
2019-03-08 11:43:07.208572 7fa080cc7700  0 data sync: ERROR: failed to read
remote data log info: ret=-2
2019-03-08 11:43:07.211348 7fa080cc7700  0 meta sync: ERROR:
RGWBackoffControlCR called coroutine returned -2
2019-03-08 11:43:07.267117 7fa080cc7700  0 data sync: ERROR: failed to read
remote data log info: ret=-2
2019-03-08 11:43:07.269631 7fa080cc7700  0 meta sync: ERROR:
RGWBackoffControlCR called coroutine returned -2
2019-03-08 11:43:07.895192 7fa080cc7700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.046685 7fa080cc7700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.171277 7fa0870eb700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.171748 7fa0850e7700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.175867 7fa08a0f1700  0 meta sync: ERROR: can't remove
key:
bucket.instance:phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
ret=-2
2019-03-08 11:43:08.176755 7fa0820e1700  0 data sync: ERROR: init sync on
whoiswho/whoiswho:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.293 failed,
retcode=-2
2019-03-08 11:43:08.176872 7fa0820e1700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.176885 7fa093103700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.176925 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.177916 7fa0910ff700  0 meta sync: ERROR: can't remove
key:
bucket.instance:gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
ret=-2
2019-03-08 11:43:08.178815 7fa08b0f3700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.178847 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.179492 7fa0820e1700  0 data sync:

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-08 Thread Casey Bodley

(cc ceph-users)

Can you tell whether these sync errors are coming from metadata sync or 
data sync? Are they blocking sync from making progress according to your 
'sync status'?

On 3/8/19 10:23 AM, Trey Palmer wrote:

Casey,

Having done the 'reshard stale-instances delete' earlier on the advice 
of another list member, we have tons of sync errors on deleted 
buckets, as you mention.

After 'data sync init' we're still seeing all of these errors on 
deleted buckets.

Since buckets are metadata, it occurred to me this morning that 
buckets are metadata so a 'sync init' wouldn't refresh that info.  
 But a 'metadata sync init' might get rid of the stale bucket sync 
info and stop the sync errors.   Would that be the way to go?

Thanks,

Trey

On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley > wrote:

Hi Trey,

I think it's more likely that these stale metadata entries are from
deleted buckets, rather than accidental bucket reshards. When a
bucket
is deleted in a multisite configuration, we don't delete its bucket
instance because other zones may still need to sync the object
deletes -
and they can't make progress on sync if the bucket metadata
disappears.
These leftover bucket instances look the same to the 'reshard
stale-instances' commands, but I'd be cautious about using that to
remove them in multisite, as it may cause more sync errors and
potentially leak storage if they still contain objects.

Regarding 'datalog trim', that alone isn't safe because it could trim
entries that hadn't been applied on other zones yet, causing them to
miss some updates. What you can do is run 'data sync init' on each
zone,
and restart gateways. This will restart with a data full sync (which
will scan all buckets for changes), and skip past any datalog entries
from before the full sync. I was concerned that the bug in error
handling (ie "ERROR: init sync on...") would also affect full
sync, but
that doesn't appear to be the case - so I do think that's worth
trying.

On 3/5/19 6:24 PM, Trey Palmer wrote:
> Casey,
>
> Thanks very much for the reply!
>
> We definitely have lots of errors on sync-disabled buckets and the
> workaround for that is obvious (most of them are empty anyway).
>
> Our second form of error is stale buckets.  We had dynamic
resharding
> enabled but have now disabled it (having discovered it was on by
> default, and not supported in multisite).
>
> We removed several hundred stale buckets via 'radosgw-admin
sharding
> stale-instances rm', but they are still giving us sync errors.
>
> I have found that these buckets do have entries in 'radosgw-admin
> datalog list', and my guess is this could be fixed by doing a
> 'radosgw-admin datalog trim' for each entry on the master zone.
>
> Does that sound right?  :-)
>
> Thanks again for the detailed explanation,
>
> Trey Palmer
>
> On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley mailto:cbod...@redhat.com>
> >> wrote:
>
>     Hi Christian,
>
>     I think you've correctly intuited that the issues are related to
>     the use
>     of 'bucket sync disable'. There was a bug fix for that
feature in
> http://tracker.ceph.com/issues/26895, and I recently found that a
>     block
>     of code was missing from its luminous backport. That missing
code is
>     what handled those "ERROR: init sync on 
failed,
>     retcode=-2" errors.
>
>     I included a fix for that in a later backport
>     (https://github.com/ceph/ceph/pull/26549), which I'm still
working to
>     get through qa. I'm afraid I can't really recommend a workaround
>     for the
>     issue in the meantime.
>
>     Looking forward though, we do plan to support something like
s3's
>     cross
>     region replication so you can enable replication on a
specific bucket
>     without having to enable it globally.
>
>     Casey
>
>
>     On 3/5/19 2:32 PM, Christian Rice wrote:
>     >
>     > Much appreciated.  We’ll continue to poke around and
certainly will
>     > disable the dynamic resharding.
>     >
>     > We started with 12.2.8 in production.  We definitely did not
>     have it
>     > enabled in ceph.conf
>     >
>     > *From: *Matthew H mailto:matthew.he...@hotmail.com>
>     >>
>     > *Date: *Tuesday, March 5, 2019 at 11:22 AM
>     > *To: *Christian Rice mailto:cr...@pandora.com>
>     >>,
ceph-users
>     > mailto:ceph-users@lists.ceph.com>

Re: [ceph-users] Radosgw object size limit?

2019-03-07 Thread Casey Bodley

There is a rgw_max_put_size which defaults to 5G, which limits the size 
of a single PUT request. But in that case, the http response would be 
400 EntityTooLarge. For multipart uploads, there's also a 
rgw_multipart_part_upload_limit that defaults to 1 parts, which 
would cause a 416 InvalidRange error. By default though, s3cmd does 
multipart uploads with 15MB parts, so your 11G object should only have 
~750 parts.


Are you able to upload smaller objects successfully? These InvalidRange 
errors can also result from failures to create any rados pools that 
didn't exist already. If that's what you're hitting, you'd get the same 
InvalidRange errors for smaller object uploads, and you'd also see 
messages like this in your radosgw log:


> rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34) 
Numerical result out of range (this can be due to a pool or placement 
group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd 
exceeded)


On 3/7/19 12:21 PM, Jan Kasprzak wrote:

Hello, Ceph users,

does radosgw have an upper limit of object size? I tried to upload
a 11GB file using s3cmd, but it failed with InvalidRange error:

$ s3cmd put --verbose centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso 
s3://mybucket/
INFO: No cache file found, creating it.
INFO: Compiling list of local files...
INFO: Running stat() and reading/calculating MD5 values on 1 files, this may 
take some time...
INFO: Summary: 1 local files to upload
WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner username not known. Storing 
UID=108 instead.
WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner groupname not known. 
Storing GID=108 instead.
ERROR: S3 error: 416 (InvalidRange)

$ ls -lh centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
-rw-r--r--. 1 108 108 11G Nov 26 15:28 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso

Thanks for any hint how to increase the limit.

-Yenya


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Radosgw object size limit?

2019-03-07 Thread Jan Kasprzak

Hello, Ceph users,

does radosgw have an upper limit of object size? I tried to upload
a 11GB file using s3cmd, but it failed with InvalidRange error:

$ s3cmd put --verbose centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso 
s3://mybucket/
INFO: No cache file found, creating it.
INFO: Compiling list of local files...
INFO: Running stat() and reading/calculating MD5 values on 1 files, this may 
take some time...
INFO: Summary: 1 local files to upload
WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner username not known. Storing 
UID=108 instead.
WARNING: CentOS-7-x86_64-Everything-1810.iso: Owner groupname not known. 
Storing GID=108 instead.
ERROR: S3 error: 416 (InvalidRange)

$ ls -lh centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso
-rw-r--r--. 1 108 108 11G Nov 26 15:28 
centos/7/isos/x86_64/CentOS-7-x86_64-Everything-1810.iso

Thanks for any hint how to increase the limit.

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer

It appears we eventually got 'data sync init' working.

At least, it's worked on 5 of the 6 sync directions in our 3-node cluster.
 The sixth has not run without an error returned, although 'sync status'
does say "preparing for full sync".

Thanks,

Trey

On Wed, Mar 6, 2019 at 1:22 PM Trey Palmer  wrote:

> Casey,
>
> This was the result of trying 'data sync init':
>
> root@c2-rgw1:~# radosgw-admin data sync init
> ERROR: source zone not specified
> root@c2-rgw1:~# radosgw-admin data sync init --source-zone= uuid>
> WARNING: cannot find source zone id for name=
> ERROR: sync.init_sync_status() returned ret=-2
> root@c2-rgw1:~# radosgw-admin data sync init --source-zone=c1-zone
> ERROR: sync.init() returned ret=-5
> 2019-03-06 10:14:59.815735 7fecb214fe40  0 data sync: ERROR: failed to
> fetch datalog info
> root@c2-rgw1:~#
>
> Do you have any further advice or info?
>
> Thanks again,
>
> Trey
>
>
> On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley  wrote:
>
>> Hi Trey,
>>
>> I think it's more likely that these stale metadata entries are from
>> deleted buckets, rather than accidental bucket reshards. When a bucket
>> is deleted in a multisite configuration, we don't delete its bucket
>> instance because other zones may still need to sync the object deletes -
>> and they can't make progress on sync if the bucket metadata disappears.
>> These leftover bucket instances look the same to the 'reshard
>> stale-instances' commands, but I'd be cautious about using that to
>> remove them in multisite, as it may cause more sync errors and
>> potentially leak storage if they still contain objects.
>>
>> Regarding 'datalog trim', that alone isn't safe because it could trim
>> entries that hadn't been applied on other zones yet, causing them to
>> miss some updates. What you can do is run 'data sync init' on each zone,
>> and restart gateways. This will restart with a data full sync (which
>> will scan all buckets for changes), and skip past any datalog entries
>> from before the full sync. I was concerned that the bug in error
>> handling (ie "ERROR: init sync on...") would also affect full sync, but
>> that doesn't appear to be the case - so I do think that's worth trying.
>>
>> On 3/5/19 6:24 PM, Trey Palmer wrote:
>> > Casey,
>> >
>> > Thanks very much for the reply!
>> >
>> > We definitely have lots of errors on sync-disabled buckets and the
>> > workaround for that is obvious (most of them are empty anyway).
>> >
>> > Our second form of error is stale buckets.  We had dynamic resharding
>> > enabled but have now disabled it (having discovered it was on by
>> > default, and not supported in multisite).
>> >
>> > We removed several hundred stale buckets via 'radosgw-admin sharding
>> > stale-instances rm', but they are still giving us sync errors.
>> >
>> > I have found that these buckets do have entries in 'radosgw-admin
>> > datalog list', and my guess is this could be fixed by doing a
>> > 'radosgw-admin datalog trim' for each entry on the master zone.
>> >
>> > Does that sound right?  :-)
>> >
>> > Thanks again for the detailed explanation,
>> >
>> > Trey Palmer
>> >
>> > On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley > > > wrote:
>> >
>> > Hi Christian,
>> >
>> > I think you've correctly intuited that the issues are related to
>> > the use
>> > of 'bucket sync disable'. There was a bug fix for that feature in
>> > http://tracker.ceph.com/issues/26895, and I recently found that a
>> > block
>> > of code was missing from its luminous backport. That missing code is
>> > what handled those "ERROR: init sync on  failed,
>> > retcode=-2" errors.
>> >
>> > I included a fix for that in a later backport
>> > (https://github.com/ceph/ceph/pull/26549), which I'm still working
>> to
>> > get through qa. I'm afraid I can't really recommend a workaround
>> > for the
>> > issue in the meantime.
>> >
>> > Looking forward though, we do plan to support something like s3's
>> > cross
>> > region replication so you can enable replication on a specific
>> bucket
>> > without having to enable it globally.
>> >
>> > Casey
>> >
>> >
>> > On 3/5/19 2:32 PM, Christian Rice wrote:
>> > >
>> > > Much appreciated.  We’ll continue to poke around and certainly
>> will
>> > > disable the dynamic resharding.
>> > >
>> > > We started with 12.2.8 in production.  We definitely did not
>> > have it
>> > > enabled in ceph.conf
>> > >
>> > > *From: *Matthew H > > >
>> > > *Date: *Tuesday, March 5, 2019 at 11:22 AM
>> > > *To: *Christian Rice > > >, ceph-users
>> > > mailto:ceph-users@lists.ceph.com>>
>> > > *Cc: *Trey Palmer > > >
>> > > *Subject: *Re: radosgw sync falling behind regularly
>> > >
>> > > Hi Christian,
>> > >
>> > > To be on the safe side and future proof

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer

Casey,

This was the result of trying 'data sync init':

root@c2-rgw1:~# radosgw-admin data sync init
ERROR: source zone not specified
root@c2-rgw1:~# radosgw-admin data sync init --source-zone=
WARNING: cannot find source zone id for name=
ERROR: sync.init_sync_status() returned ret=-2
root@c2-rgw1:~# radosgw-admin data sync init --source-zone=c1-zone
ERROR: sync.init() returned ret=-5
2019-03-06 10:14:59.815735 7fecb214fe40  0 data sync: ERROR: failed to
fetch datalog info
root@c2-rgw1:~#

Do you have any further advice or info?

Thanks again,

Trey


On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley  wrote:

> Hi Trey,
>
> I think it's more likely that these stale metadata entries are from
> deleted buckets, rather than accidental bucket reshards. When a bucket
> is deleted in a multisite configuration, we don't delete its bucket
> instance because other zones may still need to sync the object deletes -
> and they can't make progress on sync if the bucket metadata disappears.
> These leftover bucket instances look the same to the 'reshard
> stale-instances' commands, but I'd be cautious about using that to
> remove them in multisite, as it may cause more sync errors and
> potentially leak storage if they still contain objects.
>
> Regarding 'datalog trim', that alone isn't safe because it could trim
> entries that hadn't been applied on other zones yet, causing them to
> miss some updates. What you can do is run 'data sync init' on each zone,
> and restart gateways. This will restart with a data full sync (which
> will scan all buckets for changes), and skip past any datalog entries
> from before the full sync. I was concerned that the bug in error
> handling (ie "ERROR: init sync on...") would also affect full sync, but
> that doesn't appear to be the case - so I do think that's worth trying.
>
> On 3/5/19 6:24 PM, Trey Palmer wrote:
> > Casey,
> >
> > Thanks very much for the reply!
> >
> > We definitely have lots of errors on sync-disabled buckets and the
> > workaround for that is obvious (most of them are empty anyway).
> >
> > Our second form of error is stale buckets.  We had dynamic resharding
> > enabled but have now disabled it (having discovered it was on by
> > default, and not supported in multisite).
> >
> > We removed several hundred stale buckets via 'radosgw-admin sharding
> > stale-instances rm', but they are still giving us sync errors.
> >
> > I have found that these buckets do have entries in 'radosgw-admin
> > datalog list', and my guess is this could be fixed by doing a
> > 'radosgw-admin datalog trim' for each entry on the master zone.
> >
> > Does that sound right?  :-)
> >
> > Thanks again for the detailed explanation,
> >
> > Trey Palmer
> >
> > On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley  > > wrote:
> >
> > Hi Christian,
> >
> > I think you've correctly intuited that the issues are related to
> > the use
> > of 'bucket sync disable'. There was a bug fix for that feature in
> > http://tracker.ceph.com/issues/26895, and I recently found that a
> > block
> > of code was missing from its luminous backport. That missing code is
> > what handled those "ERROR: init sync on  failed,
> > retcode=-2" errors.
> >
> > I included a fix for that in a later backport
> > (https://github.com/ceph/ceph/pull/26549), which I'm still working
> to
> > get through qa. I'm afraid I can't really recommend a workaround
> > for the
> > issue in the meantime.
> >
> > Looking forward though, we do plan to support something like s3's
> > cross
> > region replication so you can enable replication on a specific bucket
> > without having to enable it globally.
> >
> > Casey
> >
> >
> > On 3/5/19 2:32 PM, Christian Rice wrote:
> > >
> > > Much appreciated.  We’ll continue to poke around and certainly will
> > > disable the dynamic resharding.
> > >
> > > We started with 12.2.8 in production.  We definitely did not
> > have it
> > > enabled in ceph.conf
> > >
> > > *From: *Matthew H  > >
> > > *Date: *Tuesday, March 5, 2019 at 11:22 AM
> > > *To: *Christian Rice  > >, ceph-users
> > > mailto:ceph-users@lists.ceph.com>>
> > > *Cc: *Trey Palmer  > >
> > > *Subject: *Re: radosgw sync falling behind regularly
> > >
> > > Hi Christian,
> > >
> > > To be on the safe side and future proof yourself will want to go
> > ahead
> > > and set the following in your ceph.conf file, and then issue a
> > restart
> > > to your RGW instances.
> > >
> > > rgw_dynamic_resharding = false
> > >
> > > There are a number of issues with dynamic resharding, multisite rgw
> > > problems being just one of them. However I thought it was disabled
> > > automatically when multisite rgw is used (but I will have to double
> > >

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer

Casey,

You are spot on that almost all of these are deleted buckets.   At some
point in the last few months we deleted and replaced buckets with
underscores in their names,  and those are responsible for most of these
errors.

Thanks very much for the reply and explanation.  We’ll give ‘data sync
init’ a try.

— Trey


On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley  wrote:

> Hi Trey,
>
> I think it's more likely that these stale metadata entries are from
> deleted buckets, rather than accidental bucket reshards. When a bucket
> is deleted in a multisite configuration, we don't delete its bucket
> instance because other zones may still need to sync the object deletes -
> and they can't make progress on sync if the bucket metadata disappears.
> These leftover bucket instances look the same to the 'reshard
> stale-instances' commands, but I'd be cautious about using that to
> remove them in multisite, as it may cause more sync errors and
> potentially leak storage if they still contain objects.
>
> Regarding 'datalog trim', that alone isn't safe because it could trim
> entries that hadn't been applied on other zones yet, causing them to
> miss some updates. What you can do is run 'data sync init' on each zone,
> and restart gateways. This will restart with a data full sync (which
> will scan all buckets for changes), and skip past any datalog entries
> from before the full sync. I was concerned that the bug in error
> handling (ie "ERROR: init sync on...") would also affect full sync, but
> that doesn't appear to be the case - so I do think that's worth trying.
>
> On 3/5/19 6:24 PM, Trey Palmer wrote:
> > Casey,
> >
> > Thanks very much for the reply!
> >
> > We definitely have lots of errors on sync-disabled buckets and the
> > workaround for that is obvious (most of them are empty anyway).
> >
> > Our second form of error is stale buckets.  We had dynamic resharding
> > enabled but have now disabled it (having discovered it was on by
> > default, and not supported in multisite).
> >
> > We removed several hundred stale buckets via 'radosgw-admin sharding
> > stale-instances rm', but they are still giving us sync errors.
> >
> > I have found that these buckets do have entries in 'radosgw-admin
> > datalog list', and my guess is this could be fixed by doing a
> > 'radosgw-admin datalog trim' for each entry on the master zone.
> >
> > Does that sound right?  :-)
> >
> > Thanks again for the detailed explanation,
> >
> > Trey Palmer
> >
> > On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley  > > wrote:
> >
> > Hi Christian,
> >
> > I think you've correctly intuited that the issues are related to
> > the use
> > of 'bucket sync disable'. There was a bug fix for that feature in
> > http://tracker.ceph.com/issues/26895, and I recently found that a
> > block
> > of code was missing from its luminous backport. That missing code is
> > what handled those "ERROR: init sync on  failed,
> > retcode=-2" errors.
> >
> > I included a fix for that in a later backport
> > (https://github.com/ceph/ceph/pull/26549), which I'm still working
> to
> > get through qa. I'm afraid I can't really recommend a workaround
> > for the
> > issue in the meantime.
> >
> > Looking forward though, we do plan to support something like s3's
> > cross
> > region replication so you can enable replication on a specific bucket
> > without having to enable it globally.
> >
> > Casey
> >
> >
> > On 3/5/19 2:32 PM, Christian Rice wrote:
> > >
> > > Much appreciated.  We’ll continue to poke around and certainly will
> > > disable the dynamic resharding.
> > >
> > > We started with 12.2.8 in production.  We definitely did not
> > have it
> > > enabled in ceph.conf
> > >
> > > *From: *Matthew H  > >
> > > *Date: *Tuesday, March 5, 2019 at 11:22 AM
> > > *To: *Christian Rice  > >, ceph-users
> > > mailto:ceph-users@lists.ceph.com>>
> > > *Cc: *Trey Palmer  > >
> > > *Subject: *Re: radosgw sync falling behind regularly
> > >
> > > Hi Christian,
> > >
> > > To be on the safe side and future proof yourself will want to go
> > ahead
> > > and set the following in your ceph.conf file, and then issue a
> > restart
> > > to your RGW instances.
> > >
> > > rgw_dynamic_resharding = false
> > >
> > > There are a number of issues with dynamic resharding, multisite rgw
> > > problems being just one of them. However I thought it was disabled
> > > automatically when multisite rgw is used (but I will have to double
> > > check the code on that). What version of Ceph did you initially
> > > install the cluster with? Prior to v12.2.2 this feature was
> > enabled by
> > > default for all rgw use cases.
> > >
> > > Thanks,
> > >
>

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Casey Bodley

Hi Trey,

I think it's more likely that these stale metadata entries are from 
deleted buckets, rather than accidental bucket reshards. When a bucket 
is deleted in a multisite configuration, we don't delete its bucket 
instance because other zones may still need to sync the object deletes - 
and they can't make progress on sync if the bucket metadata disappears. 
These leftover bucket instances look the same to the 'reshard 
stale-instances' commands, but I'd be cautious about using that to 
remove them in multisite, as it may cause more sync errors and 
potentially leak storage if they still contain objects.

Regarding 'datalog trim', that alone isn't safe because it could trim 
entries that hadn't been applied on other zones yet, causing them to 
miss some updates. What you can do is run 'data sync init' on each zone, 
and restart gateways. This will restart with a data full sync (which 
will scan all buckets for changes), and skip past any datalog entries 
from before the full sync. I was concerned that the bug in error 
handling (ie "ERROR: init sync on...") would also affect full sync, but 
that doesn't appear to be the case - so I do think that's worth trying.

On 3/5/19 6:24 PM, Trey Palmer wrote:

Casey,

Thanks very much for the reply!

We definitely have lots of errors on sync-disabled buckets and the 
workaround for that is obvious (most of them are empty anyway).

Our second form of error is stale buckets.  We had dynamic resharding 
enabled but have now disabled it (having discovered it was on by 
default, and not supported in multisite).

We removed several hundred stale buckets via 'radosgw-admin sharding 
stale-instances rm', but they are still giving us sync errors.

I have found that these buckets do have entries in 'radosgw-admin 
datalog list', and my guess is this could be fixed by doing a 
'radosgw-admin datalog trim' for each entry on the master zone.

Does that sound right?  :-)

Thanks again for the detailed explanation,

Trey Palmer

On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley > wrote:

Hi Christian,

I think you've correctly intuited that the issues are related to
the use
of 'bucket sync disable'. There was a bug fix for that feature in
http://tracker.ceph.com/issues/26895, and I recently found that a
block
of code was missing from its luminous backport. That missing code is
what handled those "ERROR: init sync on  failed,
retcode=-2" errors.

I included a fix for that in a later backport
(https://github.com/ceph/ceph/pull/26549), which I'm still working to
get through qa. I'm afraid I can't really recommend a workaround
for the
issue in the meantime.

Looking forward though, we do plan to support something like s3's
cross
region replication so you can enable replication on a specific bucket
without having to enable it globally.

Casey

On 3/5/19 2:32 PM, Christian Rice wrote:
>
> Much appreciated.  We’ll continue to poke around and certainly will
> disable the dynamic resharding.
>
> We started with 12.2.8 in production.  We definitely did not
have it
> enabled in ceph.conf
>
> *From: *Matthew H mailto:matthew.he...@hotmail.com>>
> *Date: *Tuesday, March 5, 2019 at 11:22 AM
> *To: *Christian Rice mailto:cr...@pandora.com>>, ceph-users
> mailto:ceph-users@lists.ceph.com>>
> *Cc: *Trey Palmer mailto:nerdmagic...@gmail.com>>
> *Subject: *Re: radosgw sync falling behind regularly
>
> Hi Christian,
>
> To be on the safe side and future proof yourself will want to go
ahead
> and set the following in your ceph.conf file, and then issue a
restart
> to your RGW instances.
>
> rgw_dynamic_resharding = false
>
> There are a number of issues with dynamic resharding, multisite rgw
> problems being just one of them. However I thought it was disabled
> automatically when multisite rgw is used (but I will have to double
> check the code on that). What version of Ceph did you initially
> install the cluster with? Prior to v12.2.2 this feature was
enabled by
> default for all rgw use cases.
>
> Thanks,
>
>

>
> *From:*Christian Rice mailto:cr...@pandora.com>>
> *Sent:* Tuesday, March 5, 2019 2:07 PM
> *To:* Matthew H; ceph-users
> *Subject:* Re: radosgw sync falling behind regularly
>
> Matthew, first of all, let me say we very much appreciate your help!
>
> So I don’t think we turned dynamic resharding on, nor did we
manually
> reshard buckets. Seems like it defaults to on for luminous but the
> mimic docs say it’s not supported in multisite.  So do we need to
> disable it manually via tell and ceph.conf?
>
> Also, after running the command you suggested, all the stale
instances
> are gone…these

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Trey Palmer

Casey,

Thanks very much for the reply!

We definitely have lots of errors on sync-disabled buckets and the
workaround for that is obvious (most of them are empty anyway).

Our second form of error is stale buckets.  We had dynamic resharding
enabled but have now disabled it (having discovered it was on by default,
and not supported in multisite).

We removed several hundred stale buckets via 'radosgw-admin sharding
stale-instances rm', but they are still giving us sync errors.

I have found that these buckets do have entries in 'radosgw-admin datalog
list', and my guess is this could be fixed by doing a 'radosgw-admin
datalog trim' for each entry on the master zone.

Does that sound right?  :-)

Thanks again for the detailed explanation,

Trey Palmer

On Tue, Mar 5, 2019 at 5:55 PM Casey Bodley  wrote:

> Hi Christian,
>
> I think you've correctly intuited that the issues are related to the use
> of 'bucket sync disable'. There was a bug fix for that feature in
> http://tracker.ceph.com/issues/26895, and I recently found that a block
> of code was missing from its luminous backport. That missing code is
> what handled those "ERROR: init sync on  failed,
> retcode=-2" errors.
>
> I included a fix for that in a later backport
> (https://github.com/ceph/ceph/pull/26549), which I'm still working to
> get through qa. I'm afraid I can't really recommend a workaround for the
> issue in the meantime.
>
> Looking forward though, we do plan to support something like s3's cross
> region replication so you can enable replication on a specific bucket
> without having to enable it globally.
>
> Casey
>
>
> On 3/5/19 2:32 PM, Christian Rice wrote:
> >
> > Much appreciated.  We’ll continue to poke around and certainly will
> > disable the dynamic resharding.
> >
> > We started with 12.2.8 in production.  We definitely did not have it
> > enabled in ceph.conf
> >
> > *From: *Matthew H 
> > *Date: *Tuesday, March 5, 2019 at 11:22 AM
> > *To: *Christian Rice , ceph-users
> > 
> > *Cc: *Trey Palmer 
> > *Subject: *Re: radosgw sync falling behind regularly
> >
> > Hi Christian,
> >
> > To be on the safe side and future proof yourself will want to go ahead
> > and set the following in your ceph.conf file, and then issue a restart
> > to your RGW instances.
> >
> > rgw_dynamic_resharding = false
> >
> > There are a number of issues with dynamic resharding, multisite rgw
> > problems being just one of them. However I thought it was disabled
> > automatically when multisite rgw is used (but I will have to double
> > check the code on that). What version of Ceph did you initially
> > install the cluster with? Prior to v12.2.2 this feature was enabled by
> > default for all rgw use cases.
> >
> > Thanks,
> >
> > 
> >
> > *From:*Christian Rice 
> > *Sent:* Tuesday, March 5, 2019 2:07 PM
> > *To:* Matthew H; ceph-users
> > *Subject:* Re: radosgw sync falling behind regularly
> >
> > Matthew, first of all, let me say we very much appreciate your help!
> >
> > So I don’t think we turned dynamic resharding on, nor did we manually
> > reshard buckets. Seems like it defaults to on for luminous but the
> > mimic docs say it’s not supported in multisite.  So do we need to
> > disable it manually via tell and ceph.conf?
> >
> > Also, after running the command you suggested, all the stale instances
> > are gone…these from my examples were in output:
> >
> > "bucket_instance":
> > "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303",
> >
> > "bucket_instance":
> > "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299",
> >
> > "bucket_instance":
> > "sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301",
> >
> > Though we still get lots of log messages like so in rgw:
> >
> > 2019-03-05 11:01:09.526120 7f64120ae700  0 ERROR: failed to get bucket
> > instance info for
> >
> .bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
> >
> > 2019-03-05 11:01:09.528664 7f63e5016700  1 civetweb: 0x55976f1c2000:
> > 172.17.136.17 - - [05/Mar/2019:10:54:06 -0800] "GET
> >
> /admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e
>
> > HTTP/1.1" 404 0 - -
> >
> > 2019-03-05 11:01:09.529648 7f64130b0700  0 meta sync: ERROR: can't
> > remove key:
> >
> bucket.instance:sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
>
> > ret=-2
> >
> > 2019-03-05 11:01:09.530324 7f64138b1700  0 ERROR: failed to get bucket
> > instance info for
> >
> .bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
> >
> > 2019-03-05 11:01:09.530345 7f6405094700  0 data sync: ERROR: failed to
> > retrieve bucket info for
> >
> bucket=sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
> >
> >

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Casey Bodley


Hi Christian,

I think you've correctly intuited that the issues are related to the use 
of 'bucket sync disable'. There was a bug fix for that feature in 
http://tracker.ceph.com/issues/26895, and I recently found that a block 
of code was missing from its luminous backport. That missing code is 
what handled those "ERROR: init sync on  failed, 
retcode=-2" errors.


I included a fix for that in a later backport 
(https://github.com/ceph/ceph/pull/26549), which I'm still working to 
get through qa. I'm afraid I can't really recommend a workaround for the 
issue in the meantime.


Looking forward though, we do plan to support something like s3's cross 
region replication so you can enable replication on a specific bucket 
without having to enable it globally.


Casey


On 3/5/19 2:32 PM, Christian Rice wrote:


Much appreciated.  We’ll continue to poke around and certainly will 
disable the dynamic resharding.


We started with 12.2.8 in production.  We definitely did not have it 
enabled in ceph.conf


*From: *Matthew H 
*Date: *Tuesday, March 5, 2019 at 11:22 AM
*To: *Christian Rice , ceph-users 


*Cc: *Trey Palmer 
*Subject: *Re: radosgw sync falling behind regularly

Hi Christian,

To be on the safe side and future proof yourself will want to go ahead 
and set the following in your ceph.conf file, and then issue a restart 
to your RGW instances.


rgw_dynamic_resharding = false

There are a number of issues with dynamic resharding, multisite rgw 
problems being just one of them. However I thought it was disabled 
automatically when multisite rgw is used (but I will have to double 
check the code on that). What version of Ceph did you initially 
install the cluster with? Prior to v12.2.2 this feature was enabled by 
default for all rgw use cases.


Thanks,



*From:*Christian Rice 
*Sent:* Tuesday, March 5, 2019 2:07 PM
*To:* Matthew H; ceph-users
*Subject:* Re: radosgw sync falling behind regularly

Matthew, first of all, let me say we very much appreciate your help!

So I don’t think we turned dynamic resharding on, nor did we manually 
reshard buckets. Seems like it defaults to on for luminous but the 
mimic docs say it’s not supported in multisite.  So do we need to 
disable it manually via tell and ceph.conf?


Also, after running the command you suggested, all the stale instances 
are gone…these from my examples were in output:


    "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303",


    "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299",


    "bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301",


Though we still get lots of log messages like so in rgw:

2019-03-05 11:01:09.526120 7f64120ae700  0 ERROR: failed to get bucket 
instance info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299


2019-03-05 11:01:09.528664 7f63e5016700  1 civetweb: 0x55976f1c2000: 
172.17.136.17 - - [05/Mar/2019:10:54:06 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e 
HTTP/1.1" 404 0 - -


2019-03-05 11:01:09.529648 7f64130b0700  0 meta sync: ERROR: can't 
remove key: 
bucket.instance:sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299 
ret=-2


2019-03-05 11:01:09.530324 7f64138b1700  0 ERROR: failed to get bucket 
instance info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299


2019-03-05 11:01:09.530345 7f6405094700  0 data sync: ERROR: failed to 
retrieve bucket info for 
bucket=sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299


2019-03-05 11:01:09.531774 7f6405094700  0 data sync: WARNING: 
skipping data log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299


2019-03-05 11:01:09.571680 7f6405094700  0 data sync: ERROR: init sync 
on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302 
failed, retcode=-2


2019-03-05 11:01:09.573179 7f6405094700  0 data sync: WARNING: 
skipping data log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302


2019-03-05 11:01:13.504308 7f63f903e700  1 civetweb: 0x55976f0f2000: 
10.105.18.20 - - [05/Mar/2019:11:00:57 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e 
HTTP/1.1" 404 0 - -


*From: *Matthew H 
*Date: *Tuesday, March 5, 2019 at 10:03 AM
*To: *Christian Rice , ceph-users 


*Subject: *Re: radosgw sync falling behind regularly

Hi Christian,

You have stale bucket instances that need to be clean up, which

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Matthew H

Hi Christian,

To be on the safe side and future proof yourself will want to go ahead and set 
the following in your ceph.conf file, and then issue a restart to your RGW 
instances.

rgw_dynamic_resharding = false

There are a number of issues with dynamic resharding, multisite rgw problems 
being just one of them. However I thought it was disabled automatically when 
multisite rgw is used (but I will have to double check the code on that). What 
version of Ceph did you initially install the cluster with? Prior to v12.2.2 
this feature was enabled by default for all rgw use cases.

Thanks,


From: Christian Rice 
Sent: Tuesday, March 5, 2019 2:07 PM
To: Matthew H; ceph-users
Subject: Re: radosgw sync falling behind regularly


Matthew, first of all, let me say we very much appreciate your help!



So I don’t think we turned dynamic resharding on, nor did we manually reshard 
buckets.  Seems like it defaults to on for luminous but the mimic docs say it’s 
not supported in multisite.  So do we need to disable it manually via tell and 
ceph.conf?



Also, after running the command you suggested, all the stale instances are 
gone…these from my examples were in output:

"bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303",

"bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299",

"bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301",



Though we still get lots of log messages like so in rgw:



2019-03-05 11:01:09.526120 7f64120ae700  0 ERROR: failed to get bucket instance 
info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299

2019-03-05 11:01:09.528664 7f63e5016700  1 civetweb: 0x55976f1c2000: 
172.17.136.17 - - [05/Mar/2019:10:54:06 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e
 HTTP/1.1" 404 0 - -

2019-03-05 11:01:09.529648 7f64130b0700  0 meta sync: ERROR: can't remove key: 
bucket.instance:sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
 ret=-2

2019-03-05 11:01:09.530324 7f64138b1700  0 ERROR: failed to get bucket instance 
info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299

2019-03-05 11:01:09.530345 7f6405094700  0 data sync: ERROR: failed to retrieve 
bucket info for 
bucket=sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299

2019-03-05 11:01:09.531774 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299

2019-03-05 11:01:09.571680 7f6405094700  0 data sync: ERROR: init sync on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302 failed, 
retcode=-2

2019-03-05 11:01:09.573179 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302

2019-03-05 11:01:13.504308 7f63f903e700  1 civetweb: 0x55976f0f2000: 
10.105.18.20 - - [05/Mar/2019:11:00:57 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e
 HTTP/1.1" 404 0 - -



From: Matthew H 
Date: Tuesday, March 5, 2019 at 10:03 AM
To: Christian Rice , ceph-users 
Subject: Re: radosgw sync falling behind regularly



Hi Christian,



You have stale bucket instances that need to be clean up, which is what 
'radosgw-admin reshard stale-instances list' is showing you. Have you or were 
you manually resharding your buckets? The errors you are seeing in the logs are 
related to these stale instances being kept around.



In v12.2.11 this command along with 'radosgw-admin reshard stale-instance rm' 
was introduced [1].



Hopefully this helps.



[1]

https://ceph.com/releases/v12-2-11-luminous-released/



"There have been fixes to RGW dynamic and manual resharding, which no longer
leaves behind stale bucket instances to be removed manually. For finding and
cleaning up older instances from a reshard a radosgw-admin command reshard
stale-instances list and reshard stale-instances rm should do the necessary
cleanup."





From: Christian Rice 
Sent: Tuesday, March 5, 2019 11:34 AM
To: Matthew H; ceph-users
Subject: Re: radosgw sync falling behind regularly



The output of “radosgw-admin reshard stale-instances

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Christian Rice

Matthew, first of all, let me say we very much appreciate your help!

So I don’t think we turned dynamic resharding on, nor did we manually reshard 
buckets.  Seems like it defaults to on for luminous but the mimic docs say it’s 
not supported in multisite.  So do we need to disable it manually via tell and 
ceph.conf?

Also, after running the command you suggested, all the stale instances are 
gone…these from my examples were in output:
"bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303",
"bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299",
"bucket_instance": 
"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301",

Though we still get lots of log messages like so in rgw:

2019-03-05 11:01:09.526120 7f64120ae700  0 ERROR: failed to get bucket instance 
info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
2019-03-05 11:01:09.528664 7f63e5016700  1 civetweb: 0x55976f1c2000: 
172.17.136.17 - - [05/Mar/2019:10:54:06 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e
 HTTP/1.1" 404 0 - -
2019-03-05 11:01:09.529648 7f64130b0700  0 meta sync: ERROR: can't remove key: 
bucket.instance:sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
 ret=-2
2019-03-05 11:01:09.530324 7f64138b1700  0 ERROR: failed to get bucket instance 
info for 
.bucket.meta.sysad_task:sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
2019-03-05 11:01:09.530345 7f6405094700  0 data sync: ERROR: failed to retrieve 
bucket info for 
bucket=sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
2019-03-05 11:01:09.531774 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299
2019-03-05 11:01:09.571680 7f6405094700  0 data sync: ERROR: init sync on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302 failed, 
retcode=-2
2019-03-05 11:01:09.573179 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302
2019-03-05 11:01:13.504308 7f63f903e700  1 civetweb: 0x55976f0f2000: 
10.105.18.20 - - [05/Mar/2019:11:00:57 -0800] "GET 
/admin/metadata/bucket.instance/sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299?key=sysad_task%2Fsysad-task%3A1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299=de6af748-1a2f-44a1-9d44-30799cf1313e
 HTTP/1.1" 404 0 - -

From: Matthew H 
Date: Tuesday, March 5, 2019 at 10:03 AM
To: Christian Rice , ceph-users 
Subject: Re: radosgw sync falling behind regularly

Hi Christian,

You have stale bucket instances that need to be clean up, which is what 
'radosgw-admin reshard stale-instances list' is showing you. Have you or were 
you manually resharding your buckets? The errors you are seeing in the logs are 
related to these stale instances being kept around.

In v12.2.11 this command along with 'radosgw-admin reshard stale-instance rm' 
was introduced [1].

Hopefully this helps.

[1]
https://ceph.com/releases/v12-2-11-luminous-released/

"There have been fixes to RGW dynamic and manual resharding, which no longer
leaves behind stale bucket instances to be removed manually. For finding and
cleaning up older instances from a reshard a radosgw-admin command reshard
stale-instances list and reshard stale-instances rm should do the necessary
cleanup."


From: Christian Rice 
Sent: Tuesday, March 5, 2019 11:34 AM
To: Matthew H; ceph-users
Subject: Re: radosgw sync falling behind regularly


The output of “radosgw-admin reshard stale-instances list” shows 242 entries, 
which might embed too much proprietary info for me to list, but here’s a tiny 
sample:

"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.303",

"sysad_task/sysad_task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.281",

"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.299",

"sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.301",



Some of appear repeatedly in the radosgw error logs like so:

2019-03-05 08:13:08.929206 7f6405094700  0 data sync: ERROR: init sync on 
sysad_task/sysad-task:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.302 failed, 
retcode=-2

2019-03-05 08:13:08.930581 7f6405094700  0 data sync: WARNING: skipping data 
log entry for missing bucket

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-05 Thread Trey Palmer

 "control_pool": "sv3-prod.rgw.control",
>
> "gc_pool": "sv3-prod.rgw.log:gc",
>
> "lc_pool": "sv3-prod.rgw.log:lc",
>
> "log_pool": "sv3-prod.rgw.log",
>
> "intent_log_pool": "sv3-prod.rgw.log:intent",
>
> "usage_log_pool": "sv3-prod.rgw.log:usage",
>
> "reshard_pool": "sv3-prod.rgw.log:reshard",
>
> "user_keys_pool": "sv3-prod.rgw.meta:users.keys",
>
> "user_email_pool": "sv3-prod.rgw.meta:users.email",
>
> "user_swift_pool": "sv3-prod.rgw.meta:users.swift",
>
> "user_uid_pool": "sv3-prod.rgw.meta:users.uid",
>
> "system_key": {
>
> "access_key": "access_key_redacted",
>
> "secret_key": "secret_key_redacted"
>
> },
>
> "placement_pools": [
>
> {
>
> "key": "default-placement",
>
> "val": {
>
> "index_pool": "sv3-prod.rgw.buckets.index",
>
> "data_pool": "sv3-prod.rgw.buckets.data",
>
> "data_extra_pool": "sv3-prod.rgw.buckets.non-ec",
>
> "index_type": 0,
>
> "compression": ""
>
> }
>
> }
>
> ],
>
> "metadata_heap": "",
>
> "tier_config": [],
>
> "realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
>
> }
>
> dc11-ceph-rgw1
>
> zonegroup get
>
> {
>
> "id": "de6af748-1a2f-44a1-9d44-30799cf1313e",
>
> "name": "us",
>
> "api_name": "us",
>
> "is_master": "true",
>
> "endpoints": [
>
> "http://sv5-ceph-rgw1.savagebeast.com:8080;
>
> ],
>
> "hostnames": [],
>
> "hostnames_s3website": [],
>
> "master_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "zones": [
>
> {
>
> "id": "107d29a0-b732-4bf1-a26e-1f64f820e839",
>
> "name": "dc11-prod",
>
> "endpoints": [
>
> "http://dc11-ceph-rgw1:8080;
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> },
>
> {
>
> "id": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
>
> "name": "sv5-corp",
>
> "endpoints": [
>
> "http://sv5-ceph-rgw1.savagebeast.com:8080;
>
> ],
>
> "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> },
>
> {
>
> "id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
>
> "name": "sv3-prod",
>
> "endpoints": [
>
> "http://sv3-ceph-rgw1:8080;
>
> ],
>
>     "log_meta": "false",
>
> "log_data": "true",
>
> "bucket_index_max_shards": 0,
>
> "read_only": "false",
>
> "tier_type": "",
>
> "sync_from_all": "true",
>
> "sync_from": []
>
> }
>
> ],
>
> "placement_targets": [
>
> {
>
> "name": "default-placement",
>
> "tags": []
>
> }
>

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-04 Thread Christian Rice

4",
"name": "sv5-corp",
"endpoints": [
"http://sv5-ceph-rgw1.savagebeast.com:8080;
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
},
{
"id": "331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8",
"name": "sv3-prod",
"endpoints": [
"http://sv3-ceph-rgw1:8080;
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
}

zone get
{
"id": "107d29a0-b732-4bf1-a26e-1f64f820e839",
"name": "dc11-prod",
"domain_root": "dc11-prod.rgw.meta:root",
"control_pool": "dc11-prod.rgw.control",
"gc_pool": "dc11-prod.rgw.log:gc",
"lc_pool": "dc11-prod.rgw.log:lc",
"log_pool": "dc11-prod.rgw.log",
"intent_log_pool": "dc11-prod.rgw.log:intent",
"usage_log_pool": "dc11-prod.rgw.log:usage",
"reshard_pool": "dc11-prod.rgw.log:reshard",
"user_keys_pool": "dc11-prod.rgw.meta:users.keys",
"user_email_pool": "dc11-prod.rgw.meta:users.email",
"user_swift_pool": "dc11-prod.rgw.meta:users.swift",
"user_uid_pool": "dc11-prod.rgw.meta:users.uid",
"system_key": {
"access_key": "access_key_redacted",
"secret_key": "secret_key_redacted"
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "dc11-prod.rgw.buckets.index",
"data_pool": "dc11-prod.rgw.buckets.data",
"data_extra_pool": "dc11-prod.rgw.buckets.non-ec",
    "index_type": 0,
"compression": ""
}
}
],
"metadata_heap": "",
"tier_config": [],
"realm_id": "b3e2afe7-2254-494a-9a34-ce50358779fd"
}

From: Matthew H 
Date: Monday, March 4, 2019 at 7:44 PM
To: Christian Rice , ceph-users 
Subject: Re: radosgw sync falling behind regularly

Christian,

Can you provide your zonegroup and zones configurations for all 3 rgw sites? 
(run the commands for each site please)

Thanks,


From: Christian Rice 
Sent: Monday, March 4, 2019 5:34 PM
To: Matthew H; ceph-users
Subject: Re: radosgw sync falling behind regularly


So we upgraded everything from 12.2.8 to 12.2.11, and things have gone to hell. 
 Lots of sync errors, like so:



sudo radosgw-admin sync error list

[

{

"shard_id": 0,

"entries": [

{

"id": "1_1549348245.870945_5163821.1",

"section": "data",

"name": 
"dora/catalogmaker-redis:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.470/56fbc9685d609b4c8cdbd11dd60bf03bedcb613b438c663c9899d930b25f0405",

"timestamp": "2019-02-05 06:30:45.870945Z",

"info": {

"source_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

"error_code": 5,

"message": "failed to sync object(5) Input/output error"

}

},

…



radosgw logs are full of:

2019-03-04 14:32:58.039467 7f90e81eb700  0 data sync: ERROR: failed to read 
remote data log info: ret=-2

2019-03-04 14:32:58.041296 7f90e81eb700  0 data sync: ERROR: init sync on 
escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146 failed, 
retcode=-2

2019-03-04 14:32:58.041662 7f90e81eb700  0 meta sync:

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-04 Thread Matthew H

Christian,

Can you provide your zonegroup and zones configurations for all 3 rgw sites? 
(run the commands for each site please)

Thanks,


From: Christian Rice 
Sent: Monday, March 4, 2019 5:34 PM
To: Matthew H; ceph-users
Subject: Re: radosgw sync falling behind regularly


So we upgraded everything from 12.2.8 to 12.2.11, and things have gone to hell. 
 Lots of sync errors, like so:



sudo radosgw-admin sync error list

[

{

"shard_id": 0,

"entries": [

{

"id": "1_1549348245.870945_5163821.1",

"section": "data",

"name": 
"dora/catalogmaker-redis:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.470/56fbc9685d609b4c8cdbd11dd60bf03bedcb613b438c663c9899d930b25f0405",

"timestamp": "2019-02-05 06:30:45.870945Z",

"info": {

"source_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",

"error_code": 5,

"message": "failed to sync object(5) Input/output error"

}

},

…



radosgw logs are full of:

2019-03-04 14:32:58.039467 7f90e81eb700  0 data sync: ERROR: failed to read 
remote data log info: ret=-2

2019-03-04 14:32:58.041296 7f90e81eb700  0 data sync: ERROR: init sync on 
escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146 failed, 
retcode=-2

2019-03-04 14:32:58.041662 7f90e81eb700  0 meta sync: ERROR: 
RGWBackoffControlCR called coroutine returned -2

2019-03-04 14:32:58.042949 7f90e81eb700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146

2019-03-04 14:32:58.823501 7f90e81eb700  0 data sync: ERROR: failed to read 
remote data log info: ret=-2

2019-03-04 14:32:58.825243 7f90e81eb700  0 meta sync: ERROR: 
RGWBackoffControlCR called coroutine returned -2



dc11-ceph-rgw2:~$ sudo radosgw-admin sync status

  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

2019-03-04 14:26:21.351372 7ff7ae042e40  0 meta sync: ERROR: failed to fetch 
mdlog info

  metadata sync syncing

full sync: 0/64 shards

failed to fetch local sync status: (5) Input/output error

^C



Any advice?  All three clusters on 12.2.11, Debian stretch.



From: Christian Rice 
Date: Thursday, February 28, 2019 at 9:06 AM
To: Matthew H , ceph-users 

Subject: Re: radosgw sync falling behind regularly



Yeah my bad on the typo, not running 12.8.8 ☺  It’s 12.2.8.  We can upgrade and 
will attempt to do so asap.  Thanks for that, I need to read my release notes 
more carefully, I guess!



From: Matthew H 
Date: Wednesday, February 27, 2019 at 8:33 PM
To: Christian Rice , ceph-users 
Subject: Re: radosgw sync falling behind regularly



Hey Christian,



I'm making a while guess, but assuming this is 12.2.8. If so, it it possible 
that you can upgrade to 12.2.11? There's been rgw multisite bug fixes for 
metadata syncing and data syncing ( both separate issues ) that you could be 
hitting.



Thanks,

________

From: ceph-users  on behalf of Christian 
Rice 
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly



Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters 
in one zonegroup.



Often we find either metadata or data sync behind, and it doesn’t look to ever 
recover until…we restart the endpoint radosgw target service.



eg at 15:45:40:



dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

full sync: 0/64 shards

incremental sync: 64/64 shards

metadata is behind on 2 shards

behind shards: [19,41]

oldest incremental change not applied: 2019-02-27 
14:42:24.0.408263s

  data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source

source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source





so at 15:46:07:



dc11-ceph-rgw1:/var/log/ceph# sudo systemctl resta

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-04 Thread Christian Rice

So we upgraded everything from 12.2.8 to 12.2.11, and things have gone to hell. 
 Lots of sync errors, like so:

sudo radosgw-admin sync error list
[
{
"shard_id": 0,
"entries": [
{
"id": "1_1549348245.870945_5163821.1",
"section": "data",
"name": 
"dora/catalogmaker-redis:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.470/56fbc9685d609b4c8cdbd11dd60bf03bedcb613b438c663c9899d930b25f0405",
"timestamp": "2019-02-05 06:30:45.870945Z",
"info": {
"source_zone": "1e27bf9c-3a2f-4845-85b6-33a24bbe1c04",
"error_code": 5,
"message": "failed to sync object(5) Input/output error"
}
},
…

radosgw logs are full of:
2019-03-04 14:32:58.039467 7f90e81eb700  0 data sync: ERROR: failed to read 
remote data log info: ret=-2
2019-03-04 14:32:58.041296 7f90e81eb700  0 data sync: ERROR: init sync on 
escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146 failed, 
retcode=-2
2019-03-04 14:32:58.041662 7f90e81eb700  0 meta sync: ERROR: 
RGWBackoffControlCR called coroutine returned -2
2019-03-04 14:32:58.042949 7f90e81eb700  0 data sync: WARNING: skipping data 
log entry for missing bucket 
escarpment/escarpment:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.146
2019-03-04 14:32:58.823501 7f90e81eb700  0 data sync: ERROR: failed to read 
remote data log info: ret=-2
2019-03-04 14:32:58.825243 7f90e81eb700  0 meta sync: ERROR: 
RGWBackoffControlCR called coroutine returned -2

dc11-ceph-rgw2:~$ sudo radosgw-admin sync status
  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
2019-03-04 14:26:21.351372 7ff7ae042e40  0 meta sync: ERROR: failed to fetch 
mdlog info
  metadata sync syncing
full sync: 0/64 shards
failed to fetch local sync status: (5) Input/output error
^C

Any advice?  All three clusters on 12.2.11, Debian stretch.

From: Christian Rice 
Date: Thursday, February 28, 2019 at 9:06 AM
To: Matthew H , ceph-users 

Subject: Re: radosgw sync falling behind regularly

Yeah my bad on the typo, not running 12.8.8 ☺  It’s 12.2.8.  We can upgrade and 
will attempt to do so asap.  Thanks for that, I need to read my release notes 
more carefully, I guess!

From: Matthew H 
Date: Wednesday, February 27, 2019 at 8:33 PM
To: Christian Rice , ceph-users 
Subject: Re: radosgw sync falling behind regularly

Hey Christian,

I'm making a while guess, but assuming this is 12.2.8. If so, it it possible 
that you can upgrade to 12.2.11? There's been rgw multisite bug fixes for 
metadata syncing and data syncing ( both separate issues ) that you could be 
hitting.

Thanks,
____
From: ceph-users  on behalf of Christian 
Rice 
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly


Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters 
in one zonegroup.



Often we find either metadata or data sync behind, and it doesn’t look to ever 
recover until…we restart the endpoint radosgw target service.



eg at 15:45:40:



dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

full sync: 0/64 shards

incremental sync: 64/64 shards

metadata is behind on 2 shards

behind shards: [19,41]

oldest incremental change not applied: 2019-02-27 
14:42:24.0.408263s

  data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source

source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source





so at 15:46:07:



dc11-ceph-rgw1:/var/log/ceph# sudo systemctl restart 
ceph-radosgw@rgw.dc11-ceph-rgw1.service<mailto:ceph-radosgw@rgw.dc11-ceph-rgw1.service>



and by the time I checked at 15:48:08:



dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

   zone 107d29a0-b732-4bf1-a2

Re: [ceph-users] radosgw sync falling behind regularly

2019-02-28 Thread Christian Rice

Yeah my bad on the typo, not running 12.8.8 ☺  It’s 12.2.8.  We can upgrade and 
will attempt to do so asap.  Thanks for that, I need to read my release notes 
more carefully, I guess!

From: Matthew H 
Date: Wednesday, February 27, 2019 at 8:33 PM
To: Christian Rice , ceph-users 
Subject: Re: radosgw sync falling behind regularly

Hey Christian,

I'm making a while guess, but assuming this is 12.2.8. If so, it it possible 
that you can upgrade to 12.2.11? There's been rgw multisite bug fixes for 
metadata syncing and data syncing ( both separate issues ) that you could be 
hitting.

Thanks,

From: ceph-users  on behalf of Christian 
Rice 
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly

Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters 
in one zonegroup.

Often we find either metadata or data sync behind, and it doesn’t look to ever 
recover until…we restart the endpoint radosgw target service.

eg at 15:45:40:

dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

full sync: 0/64 shards

incremental sync: 64/64 shards

metadata is behind on 2 shards

behind shards: [19,41]

oldest incremental change not applied: 2019-02-27 
14:42:24.0.408263s

  data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source

source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source

so at 15:46:07:

dc11-ceph-rgw1:/var/log/ceph# sudo systemctl restart 
ceph-radosgw@rgw.dc11-ceph-rgw1.service<mailto:ceph-radosgw@rgw.dc11-ceph-rgw1.service>

and by the time I checked at 15:48:08:

dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

full sync: 0/64 shards

incremental sync: 64/64 shards

metadata is caught up with master

  data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source

source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source

There’s no way this is “lag.”  It’s stuck, and happens frequently, though 
perhaps not daily.  Any suggestions?  Our cluster isn’t heavily used yet, but 
it’s production.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw sync falling behind regularly

2019-02-27 Thread Matthew H

Hey Christian,

I'm making a while guess, but assuming this is 12.2.8. If so, it it possible 
that you can upgrade to 12.2.11? There's been rgw multisite bug fixes for 
metadata syncing and data syncing ( both separate issues ) that you could be 
hitting.

Thanks,

From: ceph-users  on behalf of Christian 
Rice 
Sent: Wednesday, February 27, 2019 7:05 PM
To: ceph-users
Subject: [ceph-users] radosgw sync falling behind regularly


Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters 
in one zonegroup.



Often we find either metadata or data sync behind, and it doesn’t look to ever 
recover until…we restart the endpoint radosgw target service.



eg at 15:45:40:



dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

full sync: 0/64 shards

incremental sync: 64/64 shards

metadata is behind on 2 shards

behind shards: [19,41]

oldest incremental change not applied: 2019-02-27 
14:42:24.0.408263s

  data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source

source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source





so at 15:46:07:



dc11-ceph-rgw1:/var/log/ceph# sudo systemctl restart 
ceph-radosgw@rgw.dc11-ceph-rgw1.service<mailto:ceph-radosgw@rgw.dc11-ceph-rgw1.service>



and by the time I checked at 15:48:08:



dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status

  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)

  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)

   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)

  metadata sync syncing

full sync: 0/64 shards

incremental sync: 64/64 shards

metadata is caught up with master

  data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source

source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)

syncing

full sync: 0/128 shards

incremental sync: 128/128 shards

data is caught up with source





There’s no way this is “lag.”  It’s stuck, and happens frequently, though 
perhaps not daily.  Any suggestions?  Our cluster isn’t heavily used yet, but 
it’s production.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw sync falling behind regularly

2019-02-27 Thread Christian Rice

Debian 9; ceph 12.8.8-bpo90+1; no rbd or cephfs, just radosgw; three clusters 
in one zonegroup.

Often we find either metadata or data sync behind, and it doesn’t look to ever 
recover until…we restart the endpoint radosgw target service.

eg at 15:45:40:

dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status
  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is behind on 2 shards
behind shards: [19,41]
oldest incremental change not applied: 2019-02-27 
14:42:24.0.408263s
  data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source


so at 15:46:07:

dc11-ceph-rgw1:/var/log/ceph# sudo systemctl restart 
ceph-radosgw@rgw.dc11-ceph-rgw1.service

and by the time I checked at 15:48:08:

dc11-ceph-rgw1:/var/log/ceph# radosgw-admin sync status
  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
   zone 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
source: 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source


There’s no way this is “lag.”  It’s stuck, and happens frequently, though 
perhaps not daily.  Any suggestions?  Our cluster isn’t heavily used yet, but 
it’s production.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin reshard stale-instances rm experience

2019-02-26 Thread Wido den Hollander




On 2/21/19 9:19 PM, Paul Emmerich wrote:
> On Thu, Feb 21, 2019 at 4:05 PM Wido den Hollander  wrote:
>> This isn't available in 13.2.4, but should be in 13.2.5, so on Mimic you
>> will need to wait. But this might bite you at some point.
> 
> Unfortunately it hasn't been backported to Mimic:
> http://tracker.ceph.com/issues/37447
> 

I see. We really need this in Mimic as well. I have another cluster,
which is running Mimic, but it's a suspect as well.

547 buckets, but 290k objects in the index pool. That ratio is not correct.

> This is the Luminous backport:
> https://github.com/ceph/ceph/pull/25326/files which looks a little bit
> messy because it fixes 3 related issues in one backport.
> 
> CC'ing devel: best way to get this in Mimic?
> 

I'd love to know as well.

Wido

> Paul
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin reshard stale-instances rm experience

2019-02-21 Thread Konstantin Shalygin


My advise: Upgrade to 12.2.11 and run the stale-instances list asap and
see if you need to rm data.

This isn't available in 13.2.4, but should be in 13.2.5, so on Mimic you
will need to wait. But this might bite you at some point.

I hope I can prevent some admins from having sleepless nights about a
Ceph cluster flapping.


Thanks for sharing your experience!



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin reshard stale-instances rm experience

2019-02-21 Thread Paul Emmerich

On Thu, Feb 21, 2019 at 4:05 PM Wido den Hollander  wrote:
> This isn't available in 13.2.4, but should be in 13.2.5, so on Mimic you
> will need to wait. But this might bite you at some point.

Unfortunately it hasn't been backported to Mimic:
http://tracker.ceph.com/issues/37447

This is the Luminous backport:
https://github.com/ceph/ceph/pull/25326/files which looks a little bit
messy because it fixes 3 related issues in one backport.

CC'ing devel: best way to get this in Mimic?

Paul
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw-admin reshard stale-instances rm experience

2019-02-21 Thread Wido den Hollander

Hi,

For the last few months I've been getting question about people seeing
warnings about large OMAP objects after scrubs.

I've been digging for a few months (You'll also find multiple threads
about this) and it all seemed to trace back to RGW indexes.

Resharding didn't clean up old indexes properly which caused the RGW
indexes to keep growing and growing in number of Objects.

Last week I got a case where a RGW-only cluster running on HDD became
unusable slow. OSDs flapping, slow requests, the whole package. (yay!)

I traced it down to OSDs sometimes scanning RocksDB (debug bluefs) and
the HDD would become 100% busy for a few minutes.

Compacting these OSDs could take more then 30 minutes and it helped for
a while.

This cluster was running 12.2.8 and we upgraded to 12.2.11 to run:

$ radosgw-admin reshard stale-instances list > instances.json
$ cat instances.json|jq -r '.[]'|wc -l

It showed that there we 88k stale Instances.

The rgw.buckets.index pool showed 222k objects according to 'ceph df'.

So we started the clean up the stale Instances as they are stored in
RocksDB mainly.

$ radosgw-admin reshard stale-instances rm

While this was running OSDs would sometimes start to flap. We had to
cancel, compact and restart the rm.

After 6 days (!) of rm'ing all the indexes were gone.

The index pool went from 222k objects to just 43k objects.

We compacted all the OSDs, which now took just 3 minutes, and things are
running again properly.

As a precaution NVMe devices have been added and using device classes we
move the index pool to NVMe-backend OSDs only, but nevertheless, this
would have also not worked on NVMe.

For some reason RocksDB couldn't handle the tens of millions OMAP
entries stored in the OSDs and would start to scan the whole DB.

It could be that the 4GB of memory per OSD just was not sufficient to
store all the indexes for RocksDB, but I wasn't able to confirm that.

This cluster has ~1200 buckets in RGW and had 222k objects prior to the
cleanup.

I got another call yesterday about a cluster with identical symptoms and
that has just 250 buckets, but it has ~700k (!!) objects in the RGW
index pool.

My advise: Upgrade to 12.2.11 and run the stale-instances list asap and
see if you need to rm data.

This isn't available in 13.2.4, but should be in 13.2.5, so on Mimic you
will need to wait. But this might bite you at some point.

I hope I can prevent some admins from having sleepless nights about a
Ceph cluster flapping.

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw s3 subuser permissions

2019-01-27 Thread Marc Roos



I tried with these, but didn't get any results

"arn:aws:iam::Company:user/testuser:testsubuser"
"arn:aws:iam::Company:subuser/testuser:testsubuser"

-Original Message-
From: Adam C. Emerson [mailto:aemer...@redhat.com] 
Sent: vrijdag 25 januari 2019 16:40
To: The Exoteric Order of the Squid Cybernetic
Subject: Re: [ceph-users] Radosgw s3 subuser permissions

On 24/01/2019, Marc Roos wrote:
>
>
> This should do it sort of.
>
> {
>   "Id": "Policy1548367105316",
>   "Version": "2012-10-17",
>   "Statement": [
> {
>   "Sid": "Stmt1548367099807",
>   "Effect": "Allow",
>   "Action": "s3:ListBucket",
>   "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
>   "Resource": "arn:aws:s3:::archive"
> },
> {
>   "Sid": "Stmt1548369229354",
>   "Effect": "Allow",
>   "Action": [
> "s3:GetObject",
> "s3:PutObject",
> "s3:ListBucket"
>   ],
>   "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
>   "Resource": "arn:aws:s3:::archive/folder2/*"
> }
>   ]
> }


Does this work well for sub-users? I hadn't worked on them as we were 
focusing on the tenant/user case, but if someone's been using policy 
with sub-users, I'd like to hear their experience and any problems they 
run into.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw s3 subuser permissions

2019-01-25 Thread Adam C. Emerson

On 24/01/2019, Marc Roos wrote:
>
>
> This should do it sort of.
>
> {
>   "Id": "Policy1548367105316",
>   "Version": "2012-10-17",
>   "Statement": [
> {
>   "Sid": "Stmt1548367099807",
>   "Effect": "Allow",
>   "Action": "s3:ListBucket",
>   "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
>   "Resource": "arn:aws:s3:::archive"
> },
> {
>   "Sid": "Stmt1548369229354",
>   "Effect": "Allow",
>   "Action": [
> "s3:GetObject",
> "s3:PutObject",
> "s3:ListBucket"
>   ],
>   "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
>   "Resource": "arn:aws:s3:::archive/folder2/*"
> }
>   ]
> }


Does this work well for sub-users? I hadn't worked on them as we were
focusing on the tenant/user case, but if someone's been using policy
with sub-users, I'd like to hear their experience and any problems
they run into.

-- 
Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
IRC: Aemerson@OFTC, Actinic@Freenode
0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw s3 subuser permissions

2019-01-24 Thread Marc Roos



This should do it sort of.

{
  "Id": "Policy1548367105316",
  "Version": "2012-10-17",
  "Statement": [
{
  "Sid": "Stmt1548367099807",
  "Effect": "Allow",
  "Action": "s3:ListBucket",
  "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
  "Resource": "arn:aws:s3:::archive"
},
{
  "Sid": "Stmt1548369229354",
  "Effect": "Allow",
  "Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
  ],
  "Principal": { "AWS": "arn:aws:iam::Company:user/testuser" },
  "Resource": "arn:aws:s3:::archive/folder2/*"
}
  ]
} 





-Original Message-
From: Matt Benjamin [mailto:mbenj...@redhat.com] 
Sent: 24 January 2019 21:36
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Radosgw s3 subuser permissions

Hi Marc,

I'm not actually certain whether the traditional ACLs permit any 
solution for that, but I believe with bucket policy, you can achieve 
precise control within and across tenants, for any set of desired 
resources (buckets).

Matt

On Thu, Jan 24, 2019 at 3:18 PM Marc Roos  
wrote:
>
>
> It is correct that it is NOT possible for s3 subusers to have 
> different permissions on folders created by the parent account?
> Thus the --access=[ read | write | readwrite | full ] is for 
> everything the parent has created, and it is not possible to change 
> that for specific folders/buckets?
>
> radosgw-admin subuser create --uid='Company$archive' 
> --subuser=testuser
> --key-type=s3
>
> Thus if archive created this bucket/folder structure.
> └── bucket
> ├── folder1
> ├── folder2
> └── folder3
> └── folder4
>
> It is not possible to allow testuser to only write in folder2?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw s3 subuser permissions

2019-01-24 Thread Matt Benjamin

Hi Marc,

I'm not actually certain whether the traditional ACLs permit any
solution for that, but I believe with bucket policy, you can achieve
precise control within and across tenants, for any set of desired
resources (buckets).

Matt

On Thu, Jan 24, 2019 at 3:18 PM Marc Roos  wrote:
>
>
> It is correct that it is NOT possible for s3 subusers to have different
> permissions on folders created by the parent account?
> Thus the --access=[ read | write | readwrite | full ] is for everything
> the parent has created, and it is not possible to change that for
> specific folders/buckets?
>
> radosgw-admin subuser create --uid='Company$archive' --subuser=testuser
> --key-type=s3
>
> Thus if archive created this bucket/folder structure.
> └── bucket
> ├── folder1
> ├── folder2
> └── folder3
> └── folder4
>
> It is not possible to allow testuser to only write in folder2?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Radosgw s3 subuser permissions

2019-01-24 Thread Marc Roos


It is correct that it is NOT possible for s3 subusers to have different 
permissions on folders created by the parent account?
Thus the --access=[ read | write | readwrite | full ] is for everything 
the parent has created, and it is not possible to change that for 
specific folders/buckets?

radosgw-admin subuser create --uid='Company$archive' --subuser=testuser 
--key-type=s3

Thus if archive created this bucket/folder structure. 
└── bucket
├── folder1
├── folder2
└── folder3
└── folder4

It is not possible to allow testuser to only write in folder2?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW replication and failover issues

2019-01-22 Thread Rom Freiman

Hi,
We are running the following radosgw( luminous 12.2.8) replications
scenario.
1) We have 2 clusters, each running a radosgw, Cluster1 defined as master,
and Cluster2 as slave.
2) We create a number of bucket with objects via master and slave
3) We shutdown the Cluster1
4) We execute failover on Cluster2: radosgw-admin zone modify --master
--default
  radosgw-admin
period update --commit
5) We create some new bucket and delete some existing bucket that were
created in Step 2
6) We restart Cluster1, and execute :radosgw-admin realm pull
   radosgw-admin
period pull
7) We saw that resync has finished succesfull and Cluster1 is defined as
slave and Cluster2 as master

The issue that now we see in Cluster1 the buckets that were deleted in
Step5 ( while this cluster was down). We have waited awhile to see if maybe
there were some objects left that should be deleted by GC, but even after a
few hours those buckets are still visible in Cluster1 and not visible in
Cluster2

We also tried:
6) We restart Cluster1, and execute :radosgw-admin period  pull
 But then we see that sync is stuck, both of the clusters are defined as
masters, and Cluster1 current period is the one before last period of
Cluster2

How can we fix this issue? Is there some config command that should be
called during failover?


Thanks,

Rom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW replication and failover issues

2019-01-21 Thread Ronnie Lazar

Hi,
We are running the following radosgw( luminous 12.2.8) replications
scenario.
1) We have 2 clusters, each running a radosgw, Cluster1 defined as master,
and Cluster2 as slave.
2) We create a number of bucket with objects via master and slave
3) We shutdown the Cluster1
4) We execute failover on Cluster2: radosgw-admin zone modify --master
--default
  radosgw-admin
period update --commit
5) We create some new bucket and delete some existing bucket that were
created in Step 2
6) We restart Cluster1, and execute :radosgw-admin realm pull
   radosgw-admin
period pull
7) We saw that resync has finished succesfull and Cluster1 is defined as
slave and Cluster2 as master

The issue that now we see in Cluster1 the buckets that were deleted in
Step5 ( while this cluster was down). We have waited awhile to see if maybe
there were some objects left that should be deleted by GC, but even after a
few hours those buckets are still visible in Cluster1 and not visible in
Cluster2

We also tried:
6) We restart Cluster1, and execute :radosgw-admin period  pull
 But then we see that sync is stuck, both of the clusters are defined as
masters, and Cluster1 current period is the one before last period of
Cluster2

How can we fix this issue? Is there some config command that should be
called during failover?

Thanks,
*Ronnie Lazar*
*R*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Radosgw cannot create pool

2019-01-17 Thread Jan Kasprzak

Hello, Ceph users,

TL;DR: radosgw fails on me with the following message:

2019-01-17 09:34:45.247721 7f52722b3dc0  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of range (this 
can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num 
or mon_max_pg_per_osd exceeded)

Detailed description:

I have a Ceph cluster installed long time ago as firefly on CentOS 7,
and now running luminous. So far I have used it for RBD pools, but now
I want to try using radosgw as well.

I tried to deploy radosgw using

# ceph-deploy rgw create myhost

Which went well until it tried to start it up:

[myhost][INFO  ] Running command: service ceph-radosgw start
[myhost][WARNIN] Redirecting to /bin/systemctl start ceph-radosgw.service
[myhost][WARNIN] Failed to start ceph-radosgw.service: Unit not found.
[myhost][ERROR ] RuntimeError: command returned non-zero exit status: 5
[ceph_deploy.rgw][ERROR ] Failed to execute command: service ceph-radosgw start
[ceph_deploy][ERROR ] GenericError: Failed to create 1 RGWs

Comparing it to my testing deployment of mimic, where radosgw works,
the problem was with the unit name, the correct way to start it up
apparently was

# systemctl start ceph-radosgw@rgw.myhost.service

Now it is apparently running:

/usr/bin/radosgw -f --cluster ceph --name client.rgw.myhost --setuser ceph 
--setgroup ceph

However, when I want to add the first user, radosgw-admin fails and
radosgw itself exits with the similar message:

# radosgw-admin user create --uid=kas --display-name="Jan Kasprzak"
2019-01-17 09:52:29.805828 7fea6cfd2dc0  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of range (this 
can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num 
or mon_max_pg_per_osd exceeded)
2019-01-17 09:52:29.805957 7fea6cfd2dc0 -1 ERROR: failed to initialize watch: 
(34) Numerical result out of range
couldn't init storage provider

So I guess it is trying to create a pool for data, but it fails somehow.
Can I determine which pool it is and what parameters it tries to use?

I have looked at my testing mimic cluster, and radosgw there created the
following pools:

.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.data

So I created these pools manually on my luminous cluster as well:

# ceph osd pool create .rgw.root 128
(repeat for all the above pool names)

Which helped, and I am able to create the user with radosgw-admin.
Now where should I look for the exact parameters radosgw is trying
to use when creating its pools?

Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw-admin unable to store user information

2019-01-02 Thread Casey Bodley



On 12/26/18 4:58 PM, Dilip Renkila wrote:

Hi all,

Some useful information

>>/>> />>///What do the following return?/
>>/>> >> />>/>> >> $ radosgw-admin zone get/
/root@ctrl1:~# radosgw-admin zone get { "id": 
"8bfdf8a3-c165-44e9-9ed6-deff8a5d852f", "name": "default", 
"domain_root": "default.rgw.meta:root", "control_pool": 
"default.rgw.control", "gc_pool": "default.rgw.log:gc", "lc_pool": 
"default.rgw.log:lc", "log_pool": "default.rgw.log", 
"intent_log_pool": "default.rgw.log:intent", "usage_log_pool": 
"default.rgw.log:usage", "reshard_pool": "default.rgw.log:reshard", 
"user_keys_pool": "default.rgw.meta:users.keys", "user_email_pool": 
"default.rgw.meta:users.email", "user_swift_pool": 
"default.rgw.meta:users.swift", "user_uid_pool": 
"default.rgw.meta:users.uid", "otp_pool": "default.rgw.otp", 
"system_key": { "access_key": "", "secret_key": "" }, 
"placement_pools": [ { "key": "default-placement", "val": { 
"index_pool": "default.rgw.buckets.index", "data_pool": 
"default.rgw.buckets.data", "data_extra_pool": 
"default.rgw.buckets.non-ec", "index_type": 0, "compression": "" } } 
], "metadata_heap": "", "realm_id": "" }/
>>/>> >> /radosgw-admin user info 
--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f" 
--debug-ms=1 --debug-rgw=20 --debug-objecter=20 --log-to-stderr//

https://etherpad.openstack.org/p/loPctEQWFU
//
>>/>> >> />>/>> >> $ rados lspools/
/root@ctrl1:~# rados lspools cinder-volumes-sas ephemeral-volumes 
.rgw.root rgw1 defaults.rgw.buckets.data default.rgw.control 
default.rgw.meta defaults.rgw.buckets.index default.rgw.log 
cinder-volumes-nvme default.rgw.buckets.index images 
default.rgw.buckets.data /

//
/
/
Best Regards / Kind Regards

Dilip Renkila


Den ons 26 dec. 2018 kl 22:29 skrev Dilip Renkila 
mailto:dilip.renk...@linserv.se>>:


Hi all,

I have a ceph radosgw deployment as openstack swift backend with
multitenancy enabled in rgw.

I can create containers and store data through swift api.

I am trying to retrieve user data from radosgw-admin cli tool for
an user. I am able to get only admin user info but no one else.
$  radosgw-admin user info
--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f"
could not fetch user info: no user info saved

$  radosgw-admin user list
[
"0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f",
"32a7cd9b37bb40168200bae69015311a$32a7cd9b37bb40168200bae69015311a",
"2eea218eea984dd68f1378ea21c64b83$2eea218eea984dd68f1378ea21c64b83",
    "admin",
"032f07e376404586b53bb8c3bfd6d1d7$032f07e376404586b53bb8c3bfd6d1d7",
"afcf7fc3fd5844ea920c2028ebfa5832$afcf7fc3fd5844ea920c2028ebfa5832",
"5793054cd0fe4a018e959eb9081442a8$5793054cd0fe4a018e959eb9081442a8",
"d4f6c1bd190d40feb8379625bcf2bc39$d4f6c1bd190d40feb8379625bcf2bc39",
"8f411343b44143d2b116563c177ed93d$8f411343b44143d2b116563c177ed93d",
"0a49f61d66644fb2a10d664d5b79b1af$0a49f61d66644fb2a10d664d5b79b1af",
"a1dd449c9ce64345af2a7fb05c4aa21f$a1dd449c9ce64345af2a7fb05c4aa21f",
"a5442064c50a4b9bbf854d15748f99d4$a5442064c50a4b9bbf854d15748f99d4"
]



The general format of these objects names is 'tenant$uid', so you may 
need to specify them separately ie. radosgw-admin user info --tenant= --uid=





Debug output
$ radosgw-admin user info
--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f"
--debug_rgw=20 --log-to-stderr
2018-12-26 22:25:10.722 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bfe20 obj=.rgw.root:default.realm
state=0x5571718d9000 s->prefetch_data=0
2018-12-26 22:25:10.722 7fbc24ff9700  2
RGWDataChangesLog::ChangesRenewThread: start
2018-12-26 22:25:10.726 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bf3d0 obj=.rgw.root:converted state=0x5571718d9000
s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bee50 obj=.rgw.root:default.realm
state=0x5571718e35a0 s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bef40 obj=.rgw.root:zonegroups_names.default
state=0x5571718e35a0 s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
s->obj_tag was set empty
2018-12-26 22:25:10.730 7fbc4999e740 20 rados->read ofs=0 len=524288
2018-12-26 22:25:10.730 7fbc4999e740 20 rados->read r=0 bl.length=46
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate:
got zonegroup_info.b7493bbe-a638-4950-a4d5-716919e5d150
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate:
got zonegroup_info.23e74943-f594-44cb-a3bb-3a2150804dd3
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate:
got zone_info.9be46480-91cb-437b-87e1-eb6eff862767
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate:
got zone_info.8bfdf8a3-c165-44e9-9ed6-deff8a5d852f
2018-12-26 22:25:10.742

Re: [ceph-users] radosgw-admin unable to store user information

2018-12-26 Thread Dilip Renkila

Hi all,

Some useful information

>>* >> *>> * What do the following return?*

>>* >> >>
*>>* >> >> $ radosgw-admin zone get*

*root@ctrl1:~# radosgw-admin zone get
{
"id": "8bfdf8a3-c165-44e9-9ed6-deff8a5d852f",
"name": "default",
"domain_root": "default.rgw.meta:root",
"control_pool": "default.rgw.control",
"gc_pool": "default.rgw.log:gc",
"lc_pool": "default.rgw.log:lc",
"log_pool": "default.rgw.log",
"intent_log_pool": "default.rgw.log:intent",
"usage_log_pool": "default.rgw.log:usage",
"reshard_pool": "default.rgw.log:reshard",
"user_keys_pool": "default.rgw.meta:users.keys",
"user_email_pool": "default.rgw.meta:users.email",
"user_swift_pool": "default.rgw.meta:users.swift",
"user_uid_pool": "default.rgw.meta:users.uid",
"otp_pool": "default.rgw.otp",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "default.rgw.buckets.index",
"data_pool": "default.rgw.buckets.data",
"data_extra_pool": "default.rgw.buckets.non-ec",
"index_type": 0,
"compression": ""
}
}
],
"metadata_heap": "",
"realm_id": ""
}*

>>* >> >> *radosgw-admin user info 
>>--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f" 
>>--debug-ms=1 --debug-rgw=20 --debug-objecter=20 --log-to-stderr

https://etherpad.openstack.org/p/loPctEQWFU

>>* >> >>
*>>* >> >> $ rados lspools*


*root@ctrl1:~# rados lspools
cinder-volumes-sas
ephemeral-volumes
.rgw.root
rgw1
defaults.rgw.buckets.data
default.rgw.control
default.rgw.meta
defaults.rgw.buckets.index
default.rgw.log
cinder-volumes-nvme
default.rgw.buckets.index
images
default.rgw.buckets.data
*



Best Regards / Kind Regards

Dilip Renkila


Den ons 26 dec. 2018 kl 22:29 skrev Dilip Renkila :

> Hi all,
>
> I have a ceph radosgw deployment as openstack swift backend with
> multitenancy enabled in rgw.
>
> I can create containers and store data through swift api.
>
> I am trying to retrieve user data from radosgw-admin cli tool for an user.
> I am able to get only admin user info but no one else.
>
> $  radosgw-admin user info
> --uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f"
> could not fetch user info: no user info saved
>
> $  radosgw-admin user list
> [
> "0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f",
> "32a7cd9b37bb40168200bae69015311a$32a7cd9b37bb40168200bae69015311a",
> "2eea218eea984dd68f1378ea21c64b83$2eea218eea984dd68f1378ea21c64b83",
> "admin",
> "032f07e376404586b53bb8c3bfd6d1d7$032f07e376404586b53bb8c3bfd6d1d7",
> "afcf7fc3fd5844ea920c2028ebfa5832$afcf7fc3fd5844ea920c2028ebfa5832",
> "5793054cd0fe4a018e959eb9081442a8$5793054cd0fe4a018e959eb9081442a8",
> "d4f6c1bd190d40feb8379625bcf2bc39$d4f6c1bd190d40feb8379625bcf2bc39",
> "8f411343b44143d2b116563c177ed93d$8f411343b44143d2b116563c177ed93d",
> "0a49f61d66644fb2a10d664d5b79b1af$0a49f61d66644fb2a10d664d5b79b1af",
> "a1dd449c9ce64345af2a7fb05c4aa21f$a1dd449c9ce64345af2a7fb05c4aa21f",
> "a5442064c50a4b9bbf854d15748f99d4$a5442064c50a4b9bbf854d15748f99d4"
> ]
>
>
> Debug output
> $ radosgw-admin user info
> --uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f"
> --debug_rgw=20 --log-to-stderr
> 2018-12-26 22:25:10.722 7fbc4999e740 20 get_system_obj_state:
> rctx=0x7ffcd45bfe20 obj=.rgw.root:default.realm state=0x5571718d9000
> s->prefetch_data=0
> 2018-12-26 22:25:10.722 7fbc24ff9700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2018-12-26 22:25:10.726 7fbc4999e740 20 get_system_obj_state:
> rctx=0x7ffcd45bf3d0 obj=.rgw.root:converted state=0x5571718d9000
> s->prefetch_data=0
> 2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
> rctx=0x7ffcd45bee50 obj=.rgw.root:default.realm state=0x5571718e35a0
> s->prefetch_data=0
> 2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
> rctx=0x7ffcd45bef40 obj=.rgw.root:zonegroups_names.default
> state=0x5571718e35a0 s->prefetch_data=0
> 2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state: s->obj_tag
> was set empty
> 2018-12-26 22:25:10.730 7fbc4999e740 20 rados->read ofs=0 len=524288
> 2018-12-26 22:25:10.730 7fbc4999e740 20 rados->read r=0 bl.length=46
> 2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
> zonegroup_info.b7493bbe-a638-4950-a4d5-716919e5d150
> 2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
> zonegroup_info.23e74943-f594-44cb-a3bb-3a2150804dd3
> 2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
> zone_info.9be46480-91cb-437b-87e1-eb6eff862767
> 2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
> zone_info.8bfdf8a3-c165-44e9-9ed6-deff8a5d852f
> 2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
> zone_names.default
>

[ceph-users] radosgw-admin unable to store user information

2018-12-26 Thread Dilip Renkila

Hi all,

I have a ceph radosgw deployment as openstack swift backend with
multitenancy enabled in rgw.

I can create containers and store data through swift api.

I am trying to retrieve user data from radosgw-admin cli tool for an user.
I am able to get only admin user info but no one else.

$  radosgw-admin user info
--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f"
could not fetch user info: no user info saved

$  radosgw-admin user list
[
"0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f",
"32a7cd9b37bb40168200bae69015311a$32a7cd9b37bb40168200bae69015311a",
"2eea218eea984dd68f1378ea21c64b83$2eea218eea984dd68f1378ea21c64b83",
"admin",
"032f07e376404586b53bb8c3bfd6d1d7$032f07e376404586b53bb8c3bfd6d1d7",
"afcf7fc3fd5844ea920c2028ebfa5832$afcf7fc3fd5844ea920c2028ebfa5832",
"5793054cd0fe4a018e959eb9081442a8$5793054cd0fe4a018e959eb9081442a8",
"d4f6c1bd190d40feb8379625bcf2bc39$d4f6c1bd190d40feb8379625bcf2bc39",
"8f411343b44143d2b116563c177ed93d$8f411343b44143d2b116563c177ed93d",
"0a49f61d66644fb2a10d664d5b79b1af$0a49f61d66644fb2a10d664d5b79b1af",
"a1dd449c9ce64345af2a7fb05c4aa21f$a1dd449c9ce64345af2a7fb05c4aa21f",
"a5442064c50a4b9bbf854d15748f99d4$a5442064c50a4b9bbf854d15748f99d4"
]


Debug output
$ radosgw-admin user info
--uid="0611e8fdb62b4b2892b62c7e7bf3767f$0611e8fdb62b4b2892b62c7e7bf3767f"
--debug_rgw=20 --log-to-stderr
2018-12-26 22:25:10.722 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bfe20 obj=.rgw.root:default.realm state=0x5571718d9000
s->prefetch_data=0
2018-12-26 22:25:10.722 7fbc24ff9700  2
RGWDataChangesLog::ChangesRenewThread: start
2018-12-26 22:25:10.726 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bf3d0 obj=.rgw.root:converted state=0x5571718d9000
s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bee50 obj=.rgw.root:default.realm state=0x5571718e35a0
s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bef40 obj=.rgw.root:zonegroups_names.default
state=0x5571718e35a0 s->prefetch_data=0
2018-12-26 22:25:10.730 7fbc4999e740 20 get_system_obj_state: s->obj_tag
was set empty
2018-12-26 22:25:10.730 7fbc4999e740 20 rados->read ofs=0 len=524288
2018-12-26 22:25:10.730 7fbc4999e740 20 rados->read r=0 bl.length=46
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
zonegroup_info.b7493bbe-a638-4950-a4d5-716919e5d150
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
zonegroup_info.23e74943-f594-44cb-a3bb-3a2150804dd3
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
zone_info.9be46480-91cb-437b-87e1-eb6eff862767
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
zone_info.8bfdf8a3-c165-44e9-9ed6-deff8a5d852f
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
zone_names.default
2018-12-26 22:25:10.742 7fbc4999e740 20 RGWRados::pool_iterate: got
zonegroups_names.default
2018-12-26 22:25:10.742 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45befa0 obj=.rgw.root:zone_names.default state=0x5571718e35a0
s->prefetch_data=0
2018-12-26 22:25:10.742 7fbc4999e740 20 get_system_obj_state: s->obj_tag
was set empty
2018-12-26 22:25:10.742 7fbc4999e740 20 rados->read ofs=0 len=524288
2018-12-26 22:25:10.742 7fbc4999e740 20 rados->read r=0 bl.length=46
2018-12-26 22:25:10.742 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45befa0
obj=.rgw.root:zone_info.8bfdf8a3-c165-44e9-9ed6-deff8a5d852f
state=0x5571718e35a0 s->prefetch_data=0
2018-12-26 22:25:10.742 7fbc4999e740 20 get_system_obj_state: s->obj_tag
was set empty
2018-12-26 22:25:10.742 7fbc4999e740 20 rados->read ofs=0 len=524288
2018-12-26 22:25:10.742 7fbc4999e740 20 rados->read r=0 bl.length=736
2018-12-26 22:25:10.742 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45befa0 obj=.rgw.root:zonegroups_names.default
state=0x5571718e35a0 s->prefetch_data=0
2018-12-26 22:25:10.742 7fbc4999e740 20 get_system_obj_state: s->obj_tag
was set empty
2018-12-26 22:25:10.742 7fbc4999e740 20 rados->read ofs=0 len=524288
2018-12-26 22:25:10.742 7fbc4999e740 20 rados->read r=0 bl.length=46
2018-12-26 22:25:10.742 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45befa0
obj=.rgw.root:zonegroup_info.23e74943-f594-44cb-a3bb-3a2150804dd3
state=0x5571718e35a0 s->prefetch_data=0
2018-12-26 22:25:10.746 7fbc4999e740 20 get_system_obj_state: s->obj_tag
was set empty
2018-12-26 22:25:10.746 7fbc4999e740 20 rados->read ofs=0 len=524288
2018-12-26 22:25:10.746 7fbc4999e740 20 rados->read r=0 bl.length=337
2018-12-26 22:25:10.746 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bff10 obj=.rgw.root:region_map state=0x5571718d9000
s->prefetch_data=0
2018-12-26 22:25:10.746 7fbc4999e740 10  cannot find current period
zonegroup using local zonegroup
2018-12-26 22:25:10.746 7fbc4999e740 20 get_system_obj_state:
rctx=0x7ffcd45bfc60 obj=.rgw.root:default.realm state=0x5571718d9000
s->prefetch_data=0
2018-12-26

Re: [ceph-users] radosgw, Keystone integration, and the S3 API

2018-11-22 Thread Florian Haas

On 19/11/2018 16:23, Florian Haas wrote:
> Hi everyone,
> 
> I've recently started a documentation patch to better explain Swift
> compatibility and OpenStack integration for radosgw; a WIP PR is at
> https://github.com/ceph/ceph/pull/25056/. I have, however, run into an
> issue that I would really *like* to document, except I don't know
> whether what I'm seeing is how things are supposed to work. :)
> 
> This is about multi-tenancy in radosgw, in combination with S3
> authentication via Keystone (and EC2-compatible credentials generated
> from OpenStack, as explained in my doc patch). Now, when I enable
> rgw_s3_use_keystone_auth and rgw_keystone_implicit_tenants, then, if I
> create an S3 bucket in radosgw for the first time, naming that bucket
> "foo", the following things happen:
> 
> * I see a user that has been created, and that I can query with
>   "radosgw-admin user info", that is named
>   ff569d377ecb4f77875fa1b3f89eb16f$ff569d377ecb4f77875fa1b3f89eb16f
>   (that is, the Keystone tenant/project UUID twice[1], separated by a $
>   character). Its display_name is the name of my tenant.
> 
> * With "radosgw-admin bucket list
> --uid='ff569d377ecb4f77875fa1b3f89eb16f$ff569d377ecb4f77875fa1b3f89eb16f'",
>   I see a bucket that has been created, and that has been named "foo".
> 
> So far, all is well. If I do this, then I can see an bucket named
> "foo" if I use an S3 client, and I can see a container named "foo",
> with identical content, if I use the Swift API.
> 
> Now, if I enable rgw_swift_account_in_url, and update my Keystone
> object store endpoint to include AUTH_%(tenant_id)s, then using the
> Swift API I can also use public ACLs and temp URLs.
> 
> However, I am stumped trying to to understand how exactly this is meant
> to work with the S3 API.
> 
> So I have two questions:
> 
> (1) What do I have to do to get publicly-readable buckets to work in
> the Keystone-authenticated scenario? Moreover, what is the correct
> path to use, for a non-S3 client like curl or a browser, to access
> an object? It seems that using
> http://host:port/ff569d377ecb4f77875fa1b3f89eb16f:foo/bar works
> for S3 objects with a public ACL set, but if I try to use the same
> approach with a signed object, I get a 403 with
> SignatureDoesNotMatch. It seems like what I have to use for a
> signed object is, instead,
> 
> http://host:port/foo/bar?AWSAccessKeyId=something=something=something.
> However, if I do *ask* for a signed object that includes the
> tenant name, as in "s3cmd signurl
> s3://5ed51981f4a8468292bf2c578806ebf:foo/bar +120", then I *can*
> use the same URL format as for public ACL objects. Is this the
> intended behavior? If so, does that mean that an application
> using the S3 API, and access/secret keys from OpenStack-backed
> EC2, should configure always itself to use the ":"
> prefix to precede the bucket name?
> 
> (2) Do I understand the documentation
> (http://docs.ceph.com/docs/mimic/radosgw/multitenancy/#s3)
> correctly in that whenever one uses multitenancy of any kind in
> radosgw, S3 bucket hostnames can't ever be used? Thus, is it correct
> to say that if a radosgw instance is meant to *only* ever
> authenticate its users against Keystone, where there is always a
> radosgw tenant that is being created, then it's pointless to set
> rgw_dns_name?
> 
> 
> If anyone could shed a light on the above, I can write up the answer and
> amend the doc patch. Thanks!

OK I *think* I've got this fairly well figured out and I've dropped the
WIP prefix from my doc patch:

https://github.com/ceph/ceph/pull/25056

As this is a documentation patch, you really don't need to be a radosgw
developer to review it — if there's anything you find unclear or plain
wrong by your experience, please do let me know; I'd much appreciate that.

> [1] This would be an additional question: why is the project UUID in
> there *twice*? Surely there's a good cause for that, but it presently
> escapes me. http://docs.ceph.com/docs/master/radosgw/multitenancy/ says
> "TBD – don’t forget to explain the function of rgw keystone implicit
> tenants = true" here, which isn't very helpful. :)

Although I've covered that TBD in my patch, the question of why the
tenant name is duplicated in the radosgw user name is something I still
haven't been able to suss out. So if anyone can enlighten me there,
that'd be excellent too. :)

Cheers,
Florian




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw, Keystone integration, and the S3 API

2018-11-19 Thread Florian Haas

Hi everyone,

I've recently started a documentation patch to better explain Swift
compatibility and OpenStack integration for radosgw; a WIP PR is at
https://github.com/ceph/ceph/pull/25056/. I have, however, run into an
issue that I would really *like* to document, except I don't know
whether what I'm seeing is how things are supposed to work. :)

This is about multi-tenancy in radosgw, in combination with S3
authentication via Keystone (and EC2-compatible credentials generated
from OpenStack, as explained in my doc patch). Now, when I enable
rgw_s3_use_keystone_auth and rgw_keystone_implicit_tenants, then, if I
create an S3 bucket in radosgw for the first time, naming that bucket
"foo", the following things happen:

* I see a user that has been created, and that I can query with
  "radosgw-admin user info", that is named
  ff569d377ecb4f77875fa1b3f89eb16f$ff569d377ecb4f77875fa1b3f89eb16f
  (that is, the Keystone tenant/project UUID twice[1], separated by a $
  character). Its display_name is the name of my tenant.

* With "radosgw-admin bucket list
--uid='ff569d377ecb4f77875fa1b3f89eb16f$ff569d377ecb4f77875fa1b3f89eb16f'",
  I see a bucket that has been created, and that has been named "foo".

So far, all is well. If I do this, then I can see an bucket named
"foo" if I use an S3 client, and I can see a container named "foo",
with identical content, if I use the Swift API.

Now, if I enable rgw_swift_account_in_url, and update my Keystone
object store endpoint to include AUTH_%(tenant_id)s, then using the
Swift API I can also use public ACLs and temp URLs.

However, I am stumped trying to to understand how exactly this is meant
to work with the S3 API.

So I have two questions:

(1) What do I have to do to get publicly-readable buckets to work in
the Keystone-authenticated scenario? Moreover, what is the correct
path to use, for a non-S3 client like curl or a browser, to access
an object? It seems that using
http://host:port/ff569d377ecb4f77875fa1b3f89eb16f:foo/bar works
for S3 objects with a public ACL set, but if I try to use the same
approach with a signed object, I get a 403 with
SignatureDoesNotMatch. It seems like what I have to use for a
signed object is, instead,

http://host:port/foo/bar?AWSAccessKeyId=something=something=something.
However, if I do *ask* for a signed object that includes the
tenant name, as in "s3cmd signurl
s3://5ed51981f4a8468292bf2c578806ebf:foo/bar +120", then I *can*
use the same URL format as for public ACL objects. Is this the
intended behavior? If so, does that mean that an application
using the S3 API, and access/secret keys from OpenStack-backed
EC2, should configure always itself to use the ":"
prefix to precede the bucket name?

(2) Do I understand the documentation
(http://docs.ceph.com/docs/mimic/radosgw/multitenancy/#s3)
correctly in that whenever one uses multitenancy of any kind in
radosgw, S3 bucket hostnames can't ever be used? Thus, is it correct
to say that if a radosgw instance is meant to *only* ever
authenticate its users against Keystone, where there is always a
radosgw tenant that is being created, then it's pointless to set
rgw_dns_name?


If anyone could shed a light on the above, I can write up the answer and
amend the doc patch. Thanks!

Cheers,
Florian


[1] This would be an additional question: why is the project UUID in
there *twice*? Surely there's a good cause for that, but it presently
escapes me. http://docs.ceph.com/docs/master/radosgw/multitenancy/ says
"TBD – don’t forget to explain the function of rgw keystone implicit
tenants = true" here, which isn't very helpful. :)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw s3 bucket acls

2018-10-19 Thread Niels Denissen

Hi,

I’m currently running into a similar problem. My goal is to ensure all S3 users 
are able to list any buckets/objects that are available within ceph.
Haven’t found a way around that yet, I indeed found also that linking buckets 
to users allows them to list anything, but only for the user the bucket is 
linked to.

Have you perhaps found a solution to your problem?
Any help or pointers would be appreciated!

Niels 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw index has been inconsistent with reality

2018-10-18 Thread Yang Yang

Hmm, It's useful to rebuild the index by rewriting a object.
But at first, I need know the all keys of objects. If I want to know all
keys, I need list_objects ...
Maybe I can make an union set of instances, then copy all of them into
themselves.

Anyway, I want to find out more about why it happens and how to avoid it.

Yehuda Sadeh-Weinraub 于2018年10月19日周五 上午2:25写道：

> On Wed, Oct 17, 2018 at 1:14 AM Yang Yang  wrote:
> >
> > Hi,
> > A few weeks ago I found radosgw index has been inconsistent with
> reality. Some object I can not list, but I can get them by key. Please see
> the details below:
> >
> > BACKGROUND：
> > Ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b)
> luminous (stable)
> > Index pool is on ssd.
> > There is a very big bucket with more than 10 million object and
> 500TB data.
> > Ceph health is OK.
> > I use s3 api on radosgw.
> >
> > DESCRIBE:
> > When use s3 list_object() to list, some uploaded object can not be
> listed and some uploaded object have an old lastModified time.
> > But at the same time, we can get this object by an exact key. And if
> I put a new object into this bucket, it can be listed.
> > It seems that some indexes during a period of time have been lost.
> >
> > I try to run "radosgw-admin bucket check --bucket  --fix
> --check-objects" and I get nothing at all.
> >
> > SOME ELSE：
> > I found that one bucket will have many indexes, and we can use
> "radosgw-admin metadata list bucket.instance | grep "{bucket name}" to show
> them. But I can not found a doc to describe this feature. And we can use
> "radosgw-admin bucket stats --bucket {bucket_name}" to get id as the active
> instance id.
> > I use "rados listomapkeys" at active(or latest) index to get all
> object in a index, it is really lost. But when I use "rados listomapkeys"
> at another index which is not active as mentioned above, I found the lost
> object index.
> >
> > Resharding is within my consideration. Listomapkeys means do this
> action on all shards(more than 300).
> > In my understanding, a big bucket has one latest index and many old
> indexes. Every index has many shards. So listomapkeys on a index means
> listomapkeys on many shards.
> >
> > QUESTION:
> > Why my index lost?
> > How to recover？
>
> I don't really know what happened, haven't seen this exact issue
> before. You can try copying objects into themselves. That should
> recreate their bucket index entry.
>
> > Why radosgw has many index instances, how do radosgw use them and
> how to change active index?
>
> Could be related to an existing bug. You can unlink the bucket and
> then link a specific bucket instance version (to the user), however,
> I'm not sure I recommend going this path if it isn't necessary.
>
> Regards,
> Yehuda
> >
> >
> > Thanks,
> >
> > Inksink
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw index has been inconsistent with reality

2018-10-18 Thread Yehuda Sadeh-Weinraub

On Wed, Oct 17, 2018 at 1:14 AM Yang Yang  wrote:
>
> Hi,
> A few weeks ago I found radosgw index has been inconsistent with reality. 
> Some object I can not list, but I can get them by key. Please see the details 
> below:
>
> BACKGROUND：
> Ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous 
> (stable)
> Index pool is on ssd.
> There is a very big bucket with more than 10 million object and 500TB 
> data.
> Ceph health is OK.
> I use s3 api on radosgw.
>
> DESCRIBE:
> When use s3 list_object() to list, some uploaded object can not be listed 
> and some uploaded object have an old lastModified time.
> But at the same time, we can get this object by an exact key. And if I 
> put a new object into this bucket, it can be listed.
> It seems that some indexes during a period of time have been lost.
>
> I try to run "radosgw-admin bucket check --bucket  --fix 
> --check-objects" and I get nothing at all.
>
> SOME ELSE：
> I found that one bucket will have many indexes, and we can use 
> "radosgw-admin metadata list bucket.instance | grep "{bucket name}" to show 
> them. But I can not found a doc to describe this feature. And we can use 
> "radosgw-admin bucket stats --bucket {bucket_name}" to get id as the active 
> instance id.
> I use "rados listomapkeys" at active(or latest) index to get all object 
> in a index, it is really lost. But when I use "rados listomapkeys" at another 
> index which is not active as mentioned above, I found the lost object index.
>
> Resharding is within my consideration. Listomapkeys means do this action 
> on all shards(more than 300).
> In my understanding, a big bucket has one latest index and many old 
> indexes. Every index has many shards. So listomapkeys on a index means 
> listomapkeys on many shards.
>
> QUESTION:
> Why my index lost?
> How to recover？

I don't really know what happened, haven't seen this exact issue
before. You can try copying objects into themselves. That should
recreate their bucket index entry.

> Why radosgw has many index instances, how do radosgw use them and how to 
> change active index?

Could be related to an existing bug. You can unlink the bucket and
then link a specific bucket instance version (to the user), however,
I'm not sure I recommend going this path if it isn't necessary.

Regards,
Yehuda
>
>
> Thanks,
>
> Inksink
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW multipart completion is already in progress

2018-10-18 Thread Yang Yang

Hi,
I copy some big files to radosgw with awscli. But I found some copy
will failed, like :
   * aws s3 --endpoint=XXX cp ./bigfile s3://mybucket/bigfile*
*upload failed: ./bigfile to s3://mybucket/bigfile An error occurred
(InternalError) when calling the CompleteMultipartUpload operation (reached
max retries: 4): This multipart completion is already in progress *

*BACKGROUND：*
Ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous
(stable)

I found a similar issue: https://tracker.ceph.com/issues/22368 , but it
has been fixed.

Not all cp is failed.  I copy 2000 files, about 90 of them failed.

Is this a bug?

Thanks,
Inksink
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Radosgw index has been inconsistent with reality

2018-10-17 Thread Yang Yang

Hi,
A few weeks ago I found radosgw index has been inconsistent with
reality. Some object I can not list, but I can get them by key. Please see
the details below:

*BACKGROUND：*
Ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous
(stable)
Index pool is on ssd.
There is a very big bucket with more than 10 million object and 500TB
data.
Ceph health is OK.
I use s3 api on radosgw.

*DESCRIBE:*
When use s3 list_object() to list, some uploaded object can not be
listed and some uploaded object have an old lastModified time.
But at the same time, we can get this object by an exact key. And if I
put a new object into this bucket, it can be listed.
It seems that some indexes during a period of time have been lost.

I try to run "*radosgw-admin bucket check --bucket  --fix
--check-objects*" and I get nothing at all.

*SOME ELSE：*
I found that one bucket will have many indexes, and we can use
"*radosgw-admin
metadata list bucket.instance | grep "{bucket name}*" to show them. But I
can not found a doc to describe this feature. And we can use "*radosgw-admin
bucket stats --bucket {bucket_name}*" to get id as the active instance id.
I use "*rados listomapkeys*" at active(or latest) index to get all
object in a index, it is really lost. But when I use "*rados listomapkeys*"
at another index which is not active as mentioned above, I found the lost
object index.


*Resharding is within my consideration. Listomapkeys means do this
action on all shards(more than 300).In my understanding, a big bucket
has one latest index and many old indexes. Every index has many shards. So
listomapkeys on a index means listomapkeys on many shards.*

*QUESTION:*
Why my index lost?
How to recover？
Why radosgw has many index instances, how do radosgw use them and how
to change active index?


Thanks,

Inksink
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw lifecycle not removing delete markers

2018-10-15 Thread Sean Purdy

Hi,


Versions 12.2.7 and 12.2.8.  I've set up a bucket with versioning enabled and 
upload a lifecycle configuration.  I upload some files and delete them, 
inserting delete markers.  The configured lifecycle DOES remove the deleted 
binaries (non current versions).  The lifecycle DOES NOT remove the delete 
markers.  With ExpiredObjectDeleteMarker set.

Is this a known issue?  I have an empty bucket full of delete markers.

Does this lifecycle do what I expect?  Remove the non-current version after a 
day, and remove orphaned delete markers:

{
"Rules": [
{
"Status": "Enabled", 
"Prefix": "", 
"NoncurrentVersionExpiration": {
"NoncurrentDays": 1
}, 
"Expiration": {
"ExpiredObjectDeleteMarker": true
}, 
"ID": "Test expiry"
}
]
}


I can't be the only one who wants to use this feature.

Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw bucket stats vs s3cmd du

2018-10-09 Thread David Turner

Have you looked at your Garbage Collection.  I would guess that your GC is
behind and that radosgw-admin is accounting for that space knowing that it
hasn't been freed up yet, whiles 3cmd doesn't see it since it no longer
shows in the listing.

On Tue, Sep 18, 2018 at 4:45 AM Luis Periquito  wrote:

> Hi all,
>
> I have a couple of very big s3 buckets that store temporary data. We
> keep writing to the buckets some files which are then read and
> deleted. They serve as a temporary storage.
>
> We're writing (and deleting) circa 1TB of data daily in each of those
> buckets, and their size has been mostly stable over time.
>
> The issue has arisen that radosgw-admin bucket stats says one bucket
> is 10T and the other is 4T; but s3cmd du (and I did a sync which
> agrees) says 3.5T and 2.3T respectively.
>
> The bigger bucket suffered from the orphaned objects bug
> (http://tracker.ceph.com/issues/18331). The smaller was created as
> 10.2.3 so it may also had the suffered from the same bug.
>
> Any ideas what could be at play here? How can we reduce actual usage?
>
> trimming part of the radosgw-admin bucket stats output
> "usage": {
> "rgw.none": {
> "size": 0,
> "size_actual": 0,
> "size_utilized": 0,
> "size_kb": 0,
> "size_kb_actual": 0,
> "size_kb_utilized": 0,
> "num_objects": 18446744073709551572
> },
> "rgw.main": {
> "size": 10870197197183,
> "size_actual": 10873866362880,
> "size_utilized": 18446743601253967400,
> "size_kb": 10615426951,
> "size_kb_actual": 10619010120,
> "size_kb_utilized": 18014398048099578,
> "num_objects": 1702444
> },
> "rgw.multimeta": {
> "size": 0,
> "size_actual": 0,
> "size_utilized": 0,
> "size_kb": 0,
> "size_kb_actual": 0,
> "size_kb_utilized": 0,
> "num_objects": 406462
> }
> },
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw rest API to retrive rgw log entries

2018-09-23 Thread Robin H. Johnson

On Fri, Sep 21, 2018 at 04:17:35PM -0400, Jin Mao wrote:
> I am looking for an API equivalent of 'radosgw-admin log list' and
> 'radosgw-admin log show'. Existing /usage API only reports bucket level
> numbers like 'radosgw-admin usage show' does. Does anyone know if this is
> possible from rest API?
/admin/log is the endpoint you want.
params:
REQUIRED: type=(metadata|bucket-index|data)

The API is a little inconsistent.
metadata & data default to an global info operation, and need an 'id'
argument for listing (also if both 'info' & 'id' are passed, you get
ShardInfo).
bucket-index defaults to listing, but responds to the 'info' argument
with info response.

All types support the status argument as well.

The complete list of /admin/ resources as of Luminous:
/admin/usage
/admin/user
/admin/bucket
/admin/metadata
/admin/log
/admin/opstat
/admin/replica_log
/admin/config
/admin/realm

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw rest API to retrive rgw log entries

2018-09-21 Thread Jin Mao

I am looking for an API equivalent of 'radosgw-admin log list' and
'radosgw-admin log show'. Existing /usage API only reports bucket level
numbers like 'radosgw-admin usage show' does. Does anyone know if this is
possible from rest API?

Thanks.

Jin.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw bucket stats vs s3cmd du

2018-09-18 Thread Luis Periquito

Hi all,

I have a couple of very big s3 buckets that store temporary data. We
keep writing to the buckets some files which are then read and
deleted. They serve as a temporary storage.

We're writing (and deleting) circa 1TB of data daily in each of those
buckets, and their size has been mostly stable over time.

The issue has arisen that radosgw-admin bucket stats says one bucket
is 10T and the other is 4T; but s3cmd du (and I did a sync which
agrees) says 3.5T and 2.3T respectively.

The bigger bucket suffered from the orphaned objects bug
(http://tracker.ceph.com/issues/18331). The smaller was created as
10.2.3 so it may also had the suffered from the same bug.

Any ideas what could be at play here? How can we reduce actual usage?

trimming part of the radosgw-admin bucket stats output
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 18446744073709551572
},
"rgw.main": {
"size": 10870197197183,
"size_actual": 10873866362880,
"size_utilized": 18446743601253967400,
"size_kb": 10615426951,
"size_kb_actual": 10619010120,
"size_kb_utilized": 18014398048099578,
"num_objects": 1702444
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 406462
}
},
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?

2018-08-25 Thread Konstantin Shalygin


Thank you very much! If anyone would like to help update these docs, I
would be happy to help with guidance/review.



I was make a try half year ago - http://tracker.ceph.com/issues/23081




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?

2018-08-24 Thread Casey Bodley




On 08/24/2018 06:44 AM, Konstantin Shalygin wrote:


Answer to myself.

radosgw-admin realm create --rgw-realm=default --default
radosgw-admin zonegroup modify --rgw-zonegroup=default --rgw-realm=default
radosgw-admin period update --commit
radosgw-admin zonegroup placement add --rgw-zonegroup="default" \
  --placement-id="indexless-placement"
radosgw-admin zonegroup placement default 
--placement-id="default-placement"

radosgw-admin period update --commit
radosgw-admin zone placement add --rgw-zone="default" \
  --placement-id="indexless-placement" \
  --data-pool="default.rgw.buckets.data" \
  --index-pool="default.rgw.buckets.index" \
  --data_extra_pool="default.rgw.buckets.non-ec" \
  --placement-index-type="indexless"


Restart rgw instances and now is possible to create indexless buckets:

s3cmd mb s3://blindbucket --region=:indexless-placement


The documentation of Object Storage Gateway worse that for rbd or 
cephfs and have outdated (removed year ago) strings.


http://tracker.ceph.com/issues/18082

http://tracker.ceph.com/issues/24508

http://tracker.ceph.com/issues/8073

Hope this post will help somebody in future.



k



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Thank you very much! If anyone would like to help update these docs, I 
would be happy to help with guidance/review.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?

2018-08-24 Thread Konstantin Shalygin


Answer to myself.

radosgw-admin realm create --rgw-realm=default --default
radosgw-admin zonegroup modify --rgw-zonegroup=default --rgw-realm=default
radosgw-admin period update --commit
radosgw-admin zonegroup placement add --rgw-zonegroup="default" \
  --placement-id="indexless-placement"
radosgw-admin zonegroup placement default --placement-id="default-placement"
radosgw-admin period update --commit
radosgw-admin zone placement add --rgw-zone="default" \
  --placement-id="indexless-placement" \
  --data-pool="default.rgw.buckets.data" \
  --index-pool="default.rgw.buckets.index" \
  --data_extra_pool="default.rgw.buckets.non-ec" \
  --placement-index-type="indexless"


Restart rgw instances and now is possible to create indexless buckets:

s3cmd mb s3://blindbucket --region=:indexless-placement


The documentation of Object Storage Gateway worse that for rbd or cephfs 
and have outdated (removed year ago) strings.


http://tracker.ceph.com/issues/18082

http://tracker.ceph.com/issues/24508

http://tracker.ceph.com/issues/8073

Hope this post will help somebody in future.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?

2018-08-23 Thread Konstantin Shalygin

I need bucket without index for 5000 objects, how to properly create 
a indexless bucket in next to indexed buckets? This is "default radosgw" 
Luminous instance.


I was take a look to cli, as far as I understand I will need to create 
placement rule via "zone placement add" and add this key to "zonegroup 
placement add", but how-to create "special" bucket with this placement?


Yehuda, please paste cli commands for this case.




Thanks,

k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 966 matches

Mail list logo