[ceph-users] Re: RGW/Lua script does not show logs

2024-04-11 Thread Thomas Bennett
Hi Lee,

RGWDebugLog logs at the debug level. Do you have the correct logging levels
on your rados gateways? Should be 20.

Cheers,
Tom

On Mon, 8 Apr 2024 at 23:31,  wrote:

> Hello, I wrote a Lua script in order to retrieve RGW logs such as bucket
> name, bucket owner, etc.
> However, when I apply a lua script I wrote using the below command, I do
> not see any logs start with Lua: INFO
>
> radosgw-admin script put --infile=/usr/tmp/testPreRequest.lua
> --context=postrequest
>
> 
> function print_bucket_log(bucket)
>   RGWDebugLog("  Name: " .. bucket.Name)
> end
>
> if Request.Bucket then
>   RGWDebugLog("bucket operation logs: ")
>   print_bucket_log(Request.Bucket)
> end
>
>
> According to the official document regarding Lua Scripting,
> The RGWDebugLog() function accepts a string and prints it to the debug log
> with priority 20. Each log message is prefixed Lua INFO:. This function has
> no return value.
> even though I set debug_rgw = 20, I do not see any logs.
>
> However, if when I apply the below lua script with bucket.Id, I am getting
> Lua: ERROR like below:
> Lua ERROR: [string "function print_bucket_log(bucket)..."]:3: attempt to
> concatenate a nil value (field 'Id')
>
> 
> function print_bucket_log(bucket)
>   RGWDebugLog("  Name: " .. bucket.Name)
>   RGWDebugLog("  Id: " .. bucket.Id)
> end
>
> if Request.Bucket then
>   RGWDebugLog("bucket operation logs: ")
>   print_bucket_log(Request.Bucket)
> end
>
>
> Any help would be very appreciated!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting S3 bucket policies with multi-tenants

2023-11-01 Thread Thomas Bennett
To update my own question, it would seem that  Principle should be
defined like this:

   - "Principal": {"AWS": ["arn:aws:iam::Tenant1:user/readwrite"]}

And resource should:
"Resource": [ "arn:aws:s3:::backups"]

Is it worth having the docs updates -
https://docs.ceph.com/en/quincy/radosgw/bucketpolicy/
to indicate that usfolks in the example is the tenant name?


On Wed, 1 Nov 2023 at 18:27, Thomas Bennett  wrote:

> Hi,
>
> I'm running Ceph Quincy (17.2.6) with a rados-gateway. I have muti
> tenants, for example:
>
>- Tenant1$manager
>- Tenant1$readwrite
>
> I would like to set a policy on a bucket (backups for example) owned by
> *Tenant1$manager* to allow *Tenant1$readwrite* access to that bucket. I
> can't find any documentation that discusses this scenario.
>
> Does anyone know how to specify the Principle and Resource section of a
> policy.json file? Or any other configuration that I might be missing?
>
> I've tried some variations on Principal and Resource including and
> excluding tenant information, but not no luck yet.
>
>
> For example:
> {
>   "Version": "2012-10-17",
>   "Statement": [{
> "Effect": "Allow",
> "Principal": {"AWS": ["arn:aws:iam:::user/*Tenant1$readwrite*"]},
> "Action": ["s3:ListBucket","s3:GetObject", ,"s3:PutObject"],
> "Resource": [
>   "arn:aws:s3:::*Tenant1/backups*"
> ]
>   }]
> }
>
> I'm using s3cmd for testing, so:
> s3cmd --config s3cfg.manager setpolicy policy.json s3://backups/
> Returns:
> s3://backups/: Policy updated
>
> And then testing:
> s3cmd --config s3cfg.readwrite ls s3://backups/
> ERROR: Access to bucket 'backups' was denied
> ERROR: S3 error: 403 (AccessDenied)
>
> Thanks,
> Tom
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Setting S3 bucket policies with multi-tenants

2023-11-01 Thread Thomas Bennett
Hi,

I'm running Ceph Quincy (17.2.6) with a rados-gateway. I have muti tenants,
for example:

   - Tenant1$manager
   - Tenant1$readwrite

I would like to set a policy on a bucket (backups for example) owned by
*Tenant1$manager* to allow *Tenant1$readwrite* access to that bucket. I
can't find any documentation that discusses this scenario.

Does anyone know how to specify the Principle and Resource section of a
policy.json file? Or any other configuration that I might be missing?

I've tried some variations on Principal and Resource including and
excluding tenant information, but not no luck yet.


For example:
{
  "Version": "2012-10-17",
  "Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["arn:aws:iam:::user/*Tenant1$readwrite*"]},
"Action": ["s3:ListBucket","s3:GetObject", ,"s3:PutObject"],
"Resource": [
  "arn:aws:s3:::*Tenant1/backups*"
]
  }]
}

I'm using s3cmd for testing, so:
s3cmd --config s3cfg.manager setpolicy policy.json s3://backups/
Returns:
s3://backups/: Policy updated

And then testing:
s3cmd --config s3cfg.readwrite ls s3://backups/
ERROR: Access to bucket 'backups' was denied
ERROR: S3 error: 403 (AccessDenied)

Thanks,
Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Thanks for all the responses, much appreciated.

Upping the chunk size fixes my problem in the short term but I upgrade to
17.2.6 :)

Kind regards,
Tom

On Tue, 3 Oct 2023 at 15:28, Matt Benjamin  wrote:

> Hi Thomas,
>
> If I'm not mistaken, the RGW will paginate ListBuckets essentially like
> ListObjectsv1 if the S3 client provides the appropriate "marker" parameter
> values.  COS does this too, I noticed.  I'm not sure which S3 clients can
> be relied on to do this, though.
>
> Matt
>
> On Tue, Oct 3, 2023 at 9:06 AM Thomas Bennett  wrote:
>
>> Hi Jonas,
>>
>> Thanks :) that solved my issue.
>>
>> It would seem to me that this is heading towards something that the
>> clients
>> s3 should paginate, but I couldn't find any documentation on how to
>> paginate bucket listings. All the information points to paginating object
>> listing - which makes sense.
>>
>> Just for competition of this thread:
>>
>> The rgw parameters are found at: Quincy radosgw config ref
>> <https://docs.ceph.com/en/quincy/radosgw/config-ref/>
>>
>> I ran the following command to update the parameter for all running rgw
>> daemons:
>> ceph config set client.rgw rgw_list_buckets_max_chunk 1
>>
>> And then confirmed the running daemons were configured:
>> ceph daemon /var/run/ceph/ceph-client.rgw.xxx.xxx.asok config show | grep
>> rgw_list_buckets_max_chunk
>> "rgw_list_buckets_max_chunk": "1",
>>
>> Kind regards,
>> Tom
>>
>> On Tue, 3 Oct 2023 at 13:30, Jonas Nemeiksis 
>> wrote:
>>
>> > Hi,
>> >
>> > You should increase these default settings:
>> >
>> > rgw_list_buckets_max_chunk // for buckets
>> > rgw_max_listing_results // for objects
>> >
>> > On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett  wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more
>> than
>> >> 1000 buckets.
>> >>
>> >> When the client tries to list all their buckets using s3cmd, rclone and
>> >> python boto3, they all three only ever return the first 1000 bucket
>> names.
>> >> I can confirm the buckets are all there (and more than 1000) by
>> checking
>> >> with the radosgw-admin command.
>> >>
>> >> Have I missed a pagination limit for listing user buckets in the rados
>> >> gateway?
>> >>
>> >> Thanks,
>> >> Tom
>> >> ___
>> >> ceph-users mailing list -- ceph-users@ceph.io
>> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>
>> >
>> >
>> > --
>> > Jonas
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Hi Jonas,

Thanks :) that solved my issue.

It would seem to me that this is heading towards something that the clients
s3 should paginate, but I couldn't find any documentation on how to
paginate bucket listings. All the information points to paginating object
listing - which makes sense.

Just for competition of this thread:

The rgw parameters are found at: Quincy radosgw config ref
<https://docs.ceph.com/en/quincy/radosgw/config-ref/>

I ran the following command to update the parameter for all running rgw
daemons:
ceph config set client.rgw rgw_list_buckets_max_chunk 1

And then confirmed the running daemons were configured:
ceph daemon /var/run/ceph/ceph-client.rgw.xxx.xxx.asok config show | grep
rgw_list_buckets_max_chunk
"rgw_list_buckets_max_chunk": "1",

Kind regards,
Tom

On Tue, 3 Oct 2023 at 13:30, Jonas Nemeiksis  wrote:

> Hi,
>
> You should increase these default settings:
>
> rgw_list_buckets_max_chunk // for buckets
> rgw_max_listing_results // for objects
>
> On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett  wrote:
>
>> Hi,
>>
>> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
>> 1000 buckets.
>>
>> When the client tries to list all their buckets using s3cmd, rclone and
>> python boto3, they all three only ever return the first 1000 bucket names.
>> I can confirm the buckets are all there (and more than 1000) by checking
>> with the radosgw-admin command.
>>
>> Have I missed a pagination limit for listing user buckets in the rados
>> gateway?
>>
>> Thanks,
>> Tom
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
> Jonas
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Hi,

I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
1000 buckets.

When the client tries to list all their buckets using s3cmd, rclone and
python boto3, they all three only ever return the first 1000 bucket names.
I can confirm the buckets are all there (and more than 1000) by checking
with the radosgw-admin command.

Have I missed a pagination limit for listing user buckets in the rados
gateway?

Thanks,
Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Dashboard daemon logging not working

2023-09-27 Thread Thomas Bennett
Hey,

Has anyone else had issues with exploring Loki after deploying ceph
monitoring services
?
I'm running 17.2.6.

When clicking on the Ceph dashboard daemon logs (i.e Cluster -> Logs ->
Daemon Logs), it took me through to an embedded Grafana dashboard for
"Dashboard1" so it's not working for me.

I found a workaround by enabling viewer role edit permissions. So I added

viewers_can_edit = true

to my grafana.ini.  After I fixed this, the 'Explore' button appeared in my
Grafana dashboard and I could explore the log files.

If you've hit the same problem and have a better solution, please let me
know.

For anyone who has the same problem and want more details of how I fixed
it, here is what I did:

>From my cephadm shell:
ceph config-key get mgr/cephadm/services/grafana/grafana.ini >
/tmp/grafana.ini

Edit /tmp/grafana.ini and add the line below in red.
# {{ cephadm_managed }}
# Source
/usr/share/ceph/mgr/cephadm/templates/services/grafana/grafana.ini.j2
[users]
  default_theme = light
  *viewers_can_edit = true*
  ...

Then updated the config:
ceph config-key set mgr/cephadm/services/grafana/grafana.ini -i
/tmp/grafana.ini

Then a reconfig and restart grafana:
ceph orch reconfig grafana
ceph orch restart grafana

Cheers,
Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practice for expanding Ceph cluster

2023-05-17 Thread Thomas Bennett
Hi Samual,

Not sure if you know but  if you don't use the default CRUSH map, you can
also use custom location hooks. This can be used to bring your osds into
the correct place in the CRUSH map the first time they start.

https://docs.ceph.com/en/quincy/rados/operations/crush-map/#custom-location-hooks

Cheers,
Tom

On Wed, 17 May 2023 at 14:40, Thomas Bennett  wrote:

> Hey,
>
> A question slightly related to this:
>
> > I would suggest that you add all new hosts and make the OSDs start
>> > with a super-low initial weight (0.0001 or so), which means they will
>> > be in and up, but not receive any PGs.
>
>
> Is it possible to have the correct weight set and use ceph osd set noin .
>
> I'll probably test this at some point, but would this maybe have the same
> end result without needing to reweight the OSDs?
>
> From the ceph docs
> <https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/#flapping-osds>
> :
>
> Two other flags are supported, noin and noout, which prevent booting OSDs
> from being marked in (allocated data) or protect OSDs from eventually
> being marked out (regardless of what the current value for mon osd down
> out interval is).
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practice for expanding Ceph cluster

2023-05-17 Thread Thomas Bennett
Hey,

A question slightly related to this:

> I would suggest that you add all new hosts and make the OSDs start
> > with a super-low initial weight (0.0001 or so), which means they will
> > be in and up, but not receive any PGs.


Is it possible to have the correct weight set and use ceph osd set noin .

I'll probably test this at some point, but would this maybe have the same
end result without needing to reweight the OSDs?

>From the ceph docs

:

Two other flags are supported, noin and noout, which prevent booting OSDs
from being marked in (allocated data) or protect OSDs from eventually being
marked out (regardless of what the current value for mon osd down out
interval is).
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lua scripting in the rados gateway

2023-05-09 Thread Thomas Bennett
Hi Yuval,

Just a follow up on this.

An issue I’ve just resolved is getting scripts into the cephadm shell. As
it turns out - I didn’t know this be it seems the host file system is
mounted into the cephadm shell at /rootfs/.

So I've been editing a /tmp/preRequest.lua on my host and then running:

cephadm shell radosgw-admin script put --infile=/rootfs/tmp/preRequest.lua
--context=preRequest

This injects the lua script into the pre request context.

Cheers,
Tom

On Fri, 28 Apr 2023 at 15:19, Thomas Bennett  wrote:

> Hey Yuval,
>
> No problem. It was interesting to me to figure out how it all fits
> together and works.  Thanks for opening an issue on the tracker.
>
> Cheers,
> Tom
>
> On Thu, 27 Apr 2023 at 15:03, Yuval Lifshitz  wrote:
>
>> Hi Thomas,
>> Thanks for the detailed info!
>> RGW lua scripting was never tested in a cephadm deployment :-(
>> Opened a tracker: https://tracker.ceph.com/issues/59574 to make sure
>> this would work out of the box.
>>
>> Yuval
>>
>>
>> On Tue, Apr 25, 2023 at 10:25 PM Thomas Bennett  wrote:
>>
>>> Hi ceph users,
>>>
>>> I've been trying out the lua scripting for the rados gateway (thanks
>>> Yuval).
>>>
>>> As in my previous email I mentioned that there is an error when trying to
>>> load the luasocket module. However, I thought it was a good time to
>>> report
>>> on my progress.
>>>
>>> My 'hello world' example below is called *test.lua* below includes the
>>> following checks:
>>>
>>>1. Can I write to the debug log?
>>>2. Can I use the lua socket package to do something stupid but
>>>intersting, like connect to a webservice?
>>>
>>> Before you continue reading this, you might need to know that I run all
>>> ceph processes in a *CentOS Stream release 8 *container deployed using
>>> ceph
>>> orchestrator running *Ceph v17.2.5*, so please view the information below
>>> in that context.
>>>
>>> For anyone looking for a reference, I suggest going to the ceph lua rados
>>> gateway documentation at radosgw/lua-scripting
>>> <https://docs.ceph.com/en/quincy/radosgw/lua-scripting/>.
>>>
>>> There are two new switches you need to know about in the radosgw-admin:
>>>
>>>- *script* -> loading your lua script
>>>- *script-package* -> loading supporting packages for your script -
>>> e.i.
>>>luasocket in this case.
>>>
>>> For a basic setup, you'll need to have a few dependencies in your
>>> containers:
>>>
>>>- cephadm container: requires luarocks (I've checked the code - it
>>> runs
>>>a luarocks search command)
>>>- radosgw container: requires luarocks, gcc, make,  m4, wget (wget
>>> just
>>>in case).
>>>
>>> To achieve the above, I updated the container image for our running
>>> system.
>>> I needed to do this because I needed to redeploy the rados gateway
>>> container to inject the lua script packages into the radosgw runtime
>>> process. This will start with a fresh container based on the global
>>> config
>>> *container_image* setting on your running system.
>>>
>>> For us this is currently captured in *quay.io/tsolo/ceph:v17.2.5-3
>>> <http://quay.io/tsolo/ceph:v17.2.5-3>* and included the following exta
>>> steps (including installing the lua dev from an rpm because there is no
>>> centos package in yum):
>>> yum install luarocks gcc make wget m4
>>> rpm -i
>>>
>>> https://rpmfind.net/linux/centos/8-stream/PowerTools/x86_64/os/Packages/lua-devel-5.3.4-12.el8.x86_64.rpm
>>>
>>> You will notice that I've included a compiler and compiler support into
>>> the
>>> image. This is because luarocks on the radosgw to compile luasocket (the
>>> package I want to install). This will happen at start time when the
>>> radosgw
>>> is restarted from ceph orch.
>>>
>>> In the cephadm container I still need to update our cephadm shell so I
>>> need
>>> to install luarocks by hand:
>>> yum install luarocks
>>>
>>> Then set thew updated image to use:
>>> ceph config set global container_image quay.io/tsolo/ceph:v17.2.5-3
>>>
>>> I now create a file called: *test.lua* in the cephadm container. This
>>> contains the following lines to write to the log and then do a get
>>> request
>>> to google. This is not practical in pr

[ceph-users] osd pause

2023-05-05 Thread Thomas Bennett
Hi,

FYI - This might be pedantic, but there does not seem to be any difference
between using these two sets of commands:

   - ceph osd pause / ceph osd unpause
   - ceph osd set pause / ceph osd unset pause

I can see that they both set/unset the pauserd,pausewr flags, but since
they don't report anything else, I assume they do exactly the same thing.

I also assumed it only stopped reads/writes to the OSDs, but I found
this openattic
post

which
had this comment:
Pausing the cluster means that you can't see when OSDs come back up again
and no map update will happen.

I didn't know that but it seems pretty useful knowing.

Pausing is mentioned when posts talking about shutting down Ceph cluster
for maintenance but often it's added as optional.

Does anyone know what is the original intended purpose of pausing and
when/why would you use it?

Also - can I assume that pause will complete any current write/read ops on
OSDs before pausing?

Thanks,
Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lua scripting in the rados gateway

2023-04-28 Thread Thomas Bennett
Hey Yuval,

No problem. It was interesting to me to figure out how it all fits together
and works.  Thanks for opening an issue on the tracker.

Cheers,
Tom

On Thu, 27 Apr 2023 at 15:03, Yuval Lifshitz  wrote:

> Hi Thomas,
> Thanks for the detailed info!
> RGW lua scripting was never tested in a cephadm deployment :-(
> Opened a tracker: https://tracker.ceph.com/issues/59574 to make sure this
> would work out of the box.
>
> Yuval
>
>
> On Tue, Apr 25, 2023 at 10:25 PM Thomas Bennett  wrote:
>
>> Hi ceph users,
>>
>> I've been trying out the lua scripting for the rados gateway (thanks
>> Yuval).
>>
>> As in my previous email I mentioned that there is an error when trying to
>> load the luasocket module. However, I thought it was a good time to report
>> on my progress.
>>
>> My 'hello world' example below is called *test.lua* below includes the
>> following checks:
>>
>>1. Can I write to the debug log?
>>2. Can I use the lua socket package to do something stupid but
>>intersting, like connect to a webservice?
>>
>> Before you continue reading this, you might need to know that I run all
>> ceph processes in a *CentOS Stream release 8 *container deployed using
>> ceph
>> orchestrator running *Ceph v17.2.5*, so please view the information below
>> in that context.
>>
>> For anyone looking for a reference, I suggest going to the ceph lua rados
>> gateway documentation at radosgw/lua-scripting
>> <https://docs.ceph.com/en/quincy/radosgw/lua-scripting/>.
>>
>> There are two new switches you need to know about in the radosgw-admin:
>>
>>- *script* -> loading your lua script
>>- *script-package* -> loading supporting packages for your script -
>> e.i.
>>luasocket in this case.
>>
>> For a basic setup, you'll need to have a few dependencies in your
>> containers:
>>
>>- cephadm container: requires luarocks (I've checked the code - it runs
>>a luarocks search command)
>>- radosgw container: requires luarocks, gcc, make,  m4, wget (wget just
>>in case).
>>
>> To achieve the above, I updated the container image for our running
>> system.
>> I needed to do this because I needed to redeploy the rados gateway
>> container to inject the lua script packages into the radosgw runtime
>> process. This will start with a fresh container based on the global config
>> *container_image* setting on your running system.
>>
>> For us this is currently captured in *quay.io/tsolo/ceph:v17.2.5-3
>> <http://quay.io/tsolo/ceph:v17.2.5-3>* and included the following exta
>> steps (including installing the lua dev from an rpm because there is no
>> centos package in yum):
>> yum install luarocks gcc make wget m4
>> rpm -i
>>
>> https://rpmfind.net/linux/centos/8-stream/PowerTools/x86_64/os/Packages/lua-devel-5.3.4-12.el8.x86_64.rpm
>>
>> You will notice that I've included a compiler and compiler support into
>> the
>> image. This is because luarocks on the radosgw to compile luasocket (the
>> package I want to install). This will happen at start time when the
>> radosgw
>> is restarted from ceph orch.
>>
>> In the cephadm container I still need to update our cephadm shell so I
>> need
>> to install luarocks by hand:
>> yum install luarocks
>>
>> Then set thew updated image to use:
>> ceph config set global container_image quay.io/tsolo/ceph:v17.2.5-3
>>
>> I now create a file called: *test.lua* in the cephadm container. This
>> contains the following lines to write to the log and then do a get request
>> to google. This is not practical in production, but it serves the purpose
>> of testing the infrastructure:
>>
>> RGWDebugLog("Tsolo start lua script")
>> local LuaSocket = require("socket")
>> client = LuaSocket.connect("google.com", 80)
>> client:send("GET / HTTP/1.0\r\nHost: google.com\r\n\r\n")
>> while true do
>>   s, status, partial = client:receive('*a')
>>   RGWDebugLog(s or partial)
>>   if status == "closed" then
>> break
>>   end
>> end
>> client:close()
>> RGWDebugLog("Tsolo stop lua")
>>
>> Next I run:
>> radosgw-admin script-package add --package=luasocket --allow-compilation
>>
>> And then list the added package to make sure it is there:
>> radosgw-admin script-package list
>>
>> Note - at this point the radosgw has not been modified, it must first be
>> restarted.
>>
>

[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-28 Thread Thomas Bennett
A pleasure. Hope it helps :)

Happy to share if you need any more information Zac.

Cheers,
Tom

On Wed, 26 Apr 2023 at 18:14, Dan van der Ster 
wrote:

> Thanks Tom, this is a very useful post!
> I've added our docs guy Zac in cc: IMHO this would be useful in a
> "Tips & Tricks" section of the docs.
>
> -- dan
>
> __
> Clyso GmbH | https://www.clyso.com
>
>
>
>
> On Wed, Apr 26, 2023 at 7:46 AM Thomas Bennett  wrote:
> >
> > I would second Joachim's suggestion - this is exactly what we're in the
> > process of doing for a client, i.e migrating from Luminous to Quincy.
> > However below would also work if you're moving to Nautilus.
> >
> > The only catch with this plan would be if you plan to reuse any hardware
> -
> > i.e the hosts running rados gateways and mons, etc. If you have enough
> > hardware to spare this is a good plan.
> >
> > My process:
> >
> >1. Stand a new Quincy cluster and tune the cluster.
> >2. Migrate user information, secrets and access keys (using
> >radosg-admin in a script).
> >3. Using a combination of rclone and parallel to push data across from
> >the old cluster to the new cluster.
> >
> >
> > Below is a bash script I used to capture all the user information on the
> > old cluster and I ran it on the new cluster to create users and keep
> their
> > secrets and keys the same.
> >
> > #
> > for i in $(radosgw-admin user list | jq -r .[]); do
> > USER_INFO=$(radosgw-admin user info --uid=$i)
> > USER_ID=$(echo $USER_INFO | jq -r '.user_id')
> > DISPLAY_NAME=$(echo $USER_INFO | jq '.display_name')
> > EMAIL=$(echo $USER_INFO | jq '.email')
> > MAX_BUCKETS=$(echo $USER_INFO | jq -r '(.max_buckets|tostring)')
> > ACCESS=$(echo $USER_INFO | jq -r '.keys[].access_key')
> > SECRET=$(echo $USER_INFO | jq -r '.keys[].secret_key')
> > echo "radosgw-admin user create --uid=$USER_ID
> > --display-name=$DISPLAY_NAME --email=$EMAIL --max-buckets=$MAX_BUCKETS
> > --access-key=$ACCESS --secret-key=$SECRET" | tee -a
> > generated.radosgw-admin-user-create.sh
> > done
> > #
> >
> > Rclone is a really powerful tool! I lazily set up a backends for each
> user,
> > by appending below to the for loop in the above script. Below script is
> not
> > pretty but it does the job:
> > #
> > echo "" >> generated.rclone.conf
> > echo [old-cluster-$USER_ID] >> generated.rclone.conf
> > echo type = s3 >> generated.rclone.conf
> > echo provider = Ceph >> generated.rclone.conf
> > echo env_auth = false >> generated.rclone.conf
> > echo access_key_id = $ACCESS >> generated.rclone.conf
> > echo secret_access_key = $SECRET >> generated.rclone.conf
> > echo endpoint = http://xx.xx.xx.xx: >> generated.rclone.conf
> > echo acl = public-read >> generated.rclone.conf
> > echo "" >> generated.rclone.conf
> > echo [new-cluster-$USER_ID] >> generated.rclone.conf
> > echo type = s3 >> generated.rclone.conf
> > echo provider = Ceph >> generated.rclone.conf
> > echo env_auth = false >> generated.rclone.conf
> > echo access_key_id = $ACCESS >> generated.rclone.conf
> > echo secret_access_key = $SECRET >> generated.rclone.conf
> > echo endpoint = http://yy.yy.yy.yy: >> generated.rclone.conf
> > echo acl = public-read >> generated.rclone.conf
> > echo "" >> generated.rclone.conf
> > #
> >
> > Copy the generated.rclone.conf to the node that is going to act as the
> > transfer node (I just used the new rados gateway node) into
> > ~/.config/rclone/rclone.conf
> >
> > Now if you run rclone lsd old-cluser-{user}: (it even tab completes!)
> > you'll get a list of all the buckets for that user.
> >
> > You could even simply rclone sync old-cluser-{user}: new-cluser-{user}:
> and
> > it should sync all buckets for a user.
> >
> > Catches:
> >
> >- Use the scripts carefully - our buckets for this one user are set
> >public-read - you might want to check each line of the script if you
> use it.
> >- Quincy bucket naming convention is stricter than Luminous. I've had
> to
> >catch some '_' and upper cases and fix them in the command line I
> generate
> >for copying each bucket.
> >- Using rclone will take a long time.Feed a script into parallel sped
&g

[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-26 Thread Thomas Bennett
I would second Joachim's suggestion - this is exactly what we're in the
process of doing for a client, i.e migrating from Luminous to Quincy.
However below would also work if you're moving to Nautilus.

The only catch with this plan would be if you plan to reuse any hardware -
i.e the hosts running rados gateways and mons, etc. If you have enough
hardware to spare this is a good plan.

My process:

   1. Stand a new Quincy cluster and tune the cluster.
   2. Migrate user information, secrets and access keys (using
   radosg-admin in a script).
   3. Using a combination of rclone and parallel to push data across from
   the old cluster to the new cluster.


Below is a bash script I used to capture all the user information on the
old cluster and I ran it on the new cluster to create users and keep their
secrets and keys the same.

#
for i in $(radosgw-admin user list | jq -r .[]); do
USER_INFO=$(radosgw-admin user info --uid=$i)
USER_ID=$(echo $USER_INFO | jq -r '.user_id')
DISPLAY_NAME=$(echo $USER_INFO | jq '.display_name')
EMAIL=$(echo $USER_INFO | jq '.email')
MAX_BUCKETS=$(echo $USER_INFO | jq -r '(.max_buckets|tostring)')
ACCESS=$(echo $USER_INFO | jq -r '.keys[].access_key')
SECRET=$(echo $USER_INFO | jq -r '.keys[].secret_key')
echo "radosgw-admin user create --uid=$USER_ID
--display-name=$DISPLAY_NAME --email=$EMAIL --max-buckets=$MAX_BUCKETS
--access-key=$ACCESS --secret-key=$SECRET" | tee -a
generated.radosgw-admin-user-create.sh
done
#

Rclone is a really powerful tool! I lazily set up a backends for each user,
by appending below to the for loop in the above script. Below script is not
pretty but it does the job:
#
echo "" >> generated.rclone.conf
echo [old-cluster-$USER_ID] >> generated.rclone.conf
echo type = s3 >> generated.rclone.conf
echo provider = Ceph >> generated.rclone.conf
echo env_auth = false >> generated.rclone.conf
echo access_key_id = $ACCESS >> generated.rclone.conf
echo secret_access_key = $SECRET >> generated.rclone.conf
echo endpoint = http://xx.xx.xx.xx: >> generated.rclone.conf
echo acl = public-read >> generated.rclone.conf
echo "" >> generated.rclone.conf
echo [new-cluster-$USER_ID] >> generated.rclone.conf
echo type = s3 >> generated.rclone.conf
echo provider = Ceph >> generated.rclone.conf
echo env_auth = false >> generated.rclone.conf
echo access_key_id = $ACCESS >> generated.rclone.conf
echo secret_access_key = $SECRET >> generated.rclone.conf
echo endpoint = http://yy.yy.yy.yy: >> generated.rclone.conf
echo acl = public-read >> generated.rclone.conf
echo "" >> generated.rclone.conf
#

Copy the generated.rclone.conf to the node that is going to act as the
transfer node (I just used the new rados gateway node) into
~/.config/rclone/rclone.conf

Now if you run rclone lsd old-cluser-{user}: (it even tab completes!)
you'll get a list of all the buckets for that user.

You could even simply rclone sync old-cluser-{user}: new-cluser-{user}: and
it should sync all buckets for a user.

Catches:

   - Use the scripts carefully - our buckets for this one user are set
   public-read - you might want to check each line of the script if you use it.
   - Quincy bucket naming convention is stricter than Luminous. I've had to
   catch some '_' and upper cases and fix them in the command line I generate
   for copying each bucket.
   - Using rclone will take a long time.Feed a script into parallel sped
   things up for me:
  - # parallel -j 10 < sync-script
   - Watch out for lifecycling! Not sure how to handle this to make sure
   it's captured correctly.

Cheers,
Tom

On Tue, 25 Apr 2023 at 22:36, Marc  wrote:

>
> Maybe he is limited by the supported OS
>
>
> >
> > I would create a new cluster with Quincy and would migrate the data from
> > the old to the new cluster bucket by bucket. Nautilus is out of support
> > and
> > I would recommend at least to use a ceph version that is receiving
> > Backports.
> >
> > huxia...@horebdata.cn  schrieb am Di., 25. Apr.
> > 2023, 18:30:
> >
> > > Dear Ceph folks,
> > >
> > > I would like to listen to your advice on the following topic: We have
> > a
> > > 6-node Ceph cluster (for RGW usage only ) running on Luminous 12.2.12,
> > and
> > > now will add 10 new nodes. Our plan is to phase out the old 6 nodes,
> > and
> > > run RGW Ceph cluster with the new 10 nodes on Nautilus version。
> > >
> > > I can think of two ways to achieve the above goal. The first method
> > would
> > > be:   1) Upgrade the current 6-node cluster from Luminous 12.2.12 to
> > > Nautilus 14.2.22;  2) Expand the cluster with the 10 new nodes, and
> > then
> > > re-balance;  3) After rebalance completes, remove the 6 old nodes from
> > the
> > > cluster
> > >
> > > The second method would get rid of the procedure to upgrade the old 6-
> > node
> > > from Luminous to Nautilus, because those 6 nodes will be phased out
> > anyway,
> > > but then we have to 

[ceph-users] Lua scripting in the rados gateway

2023-04-25 Thread Thomas Bennett
Hi ceph users,

I've been trying out the lua scripting for the rados gateway (thanks Yuval).

As in my previous email I mentioned that there is an error when trying to
load the luasocket module. However, I thought it was a good time to report
on my progress.

My 'hello world' example below is called *test.lua* below includes the
following checks:

   1. Can I write to the debug log?
   2. Can I use the lua socket package to do something stupid but
   intersting, like connect to a webservice?

Before you continue reading this, you might need to know that I run all
ceph processes in a *CentOS Stream release 8 *container deployed using ceph
orchestrator running *Ceph v17.2.5*, so please view the information below
in that context.

For anyone looking for a reference, I suggest going to the ceph lua rados
gateway documentation at radosgw/lua-scripting
.

There are two new switches you need to know about in the radosgw-admin:

   - *script* -> loading your lua script
   - *script-package* -> loading supporting packages for your script - e.i.
   luasocket in this case.

For a basic setup, you'll need to have a few dependencies in your
containers:

   - cephadm container: requires luarocks (I've checked the code - it runs
   a luarocks search command)
   - radosgw container: requires luarocks, gcc, make,  m4, wget (wget just
   in case).

To achieve the above, I updated the container image for our running system.
I needed to do this because I needed to redeploy the rados gateway
container to inject the lua script packages into the radosgw runtime
process. This will start with a fresh container based on the global config
*container_image* setting on your running system.

For us this is currently captured in *quay.io/tsolo/ceph:v17.2.5-3
* and included the following exta
steps (including installing the lua dev from an rpm because there is no
centos package in yum):
yum install luarocks gcc make wget m4
rpm -i
https://rpmfind.net/linux/centos/8-stream/PowerTools/x86_64/os/Packages/lua-devel-5.3.4-12.el8.x86_64.rpm

You will notice that I've included a compiler and compiler support into the
image. This is because luarocks on the radosgw to compile luasocket (the
package I want to install). This will happen at start time when the radosgw
is restarted from ceph orch.

In the cephadm container I still need to update our cephadm shell so I need
to install luarocks by hand:
yum install luarocks

Then set thew updated image to use:
ceph config set global container_image quay.io/tsolo/ceph:v17.2.5-3

I now create a file called: *test.lua* in the cephadm container. This
contains the following lines to write to the log and then do a get request
to google. This is not practical in production, but it serves the purpose
of testing the infrastructure:

RGWDebugLog("Tsolo start lua script")
local LuaSocket = require("socket")
client = LuaSocket.connect("google.com", 80)
client:send("GET / HTTP/1.0\r\nHost: google.com\r\n\r\n")
while true do
  s, status, partial = client:receive('*a')
  RGWDebugLog(s or partial)
  if status == "closed" then
break
  end
end
client:close()
RGWDebugLog("Tsolo stop lua")

Next I run:
radosgw-admin script-package add --package=luasocket --allow-compilation

And then list the added package to make sure it is there:
radosgw-admin script-package list

Note - at this point the radosgw has not been modified, it must first be
restarted.

Then I put the *test.lua *script into the pre request context:
radosgw-admin script put --infile=test.lua --context=preRequest

You also need to raise the debug log level on the running rados gateway:
ceph daemon
/var/run/ceph/ceph-client.rgw.xxx.xxx-cms1.x.x.xx.asok
config set debug_rgw 20

Inside the radosgw container I apply my fix (as per previous email):
cp -ru /tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib64/*
/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib/

Outside on the host running the radosgw-admin container I follow the
journalctl for the radosgw container (to get the logs):
journalctl -fu ceph-----@rgw.
xxx.xxx-cms1.x.x.xx.service

Then I run an s3cmd to put data in via the rados gateway and check the
journalctl logs and see:
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO: Tsolo start lua
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO: HTTP/1.0 301 Moved
Permanently
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO:
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO: Tsolo stop lua
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO: Tsolo start lua
Apr 25 20:54:48 brp-ceph-cms1 radosgw[60901]: Lua INFO: HTTP/1.0 301 Moved
Permanently
Apr 25 20:54:48 brp-ceph-cms1 radosgw[60901]: Lua INFO:
Apr 25 20:54:48 brp-ceph-cms1 radosgw[60901]: Lua INFO: Tsolo stop lua

So the script worked :)

If you want to see where the luarocks libraries have been installed,  look

[ceph-users] Rados gateway lua script-package error lib64

2023-04-25 Thread Thomas Bennett
Hi,

I've noticed that when my lua script runs I get the following error on my
radosgw container. It looks like the lib64 directory is not included in the
path when looking for shared libraries.

Copying the content of lib64 into the lib directory solves the issue on the
running container.

Here are more details:
Apr 25 20:26:59 xxx-ceph- radosgw[60901]: req 2268223694354647302
0.0s Lua ERROR:
/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/*share*/lua/5.3/socket.lua:12:
module 'socket.core' not found:
 no field package.preload['socket.core']
 no file '/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/*share*
/lua/5.3/socket/core.lua'
 no file '/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/*lib*
/lua/5.3/socket/core.so'
 no file '/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/*lib*
/lua/5.3/socket.so'

As mentioned the following command on the running radosgw container solves
the issue for the running container:
cp -ru /tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib64/*
/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib/

Cheers,
Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Object Gateway and lua scripts

2023-04-11 Thread Thomas Bennett
Thanks Yuval. From your email I've confirmed that it's not the logging that
is broken - it's the CopyFrom is causing an issue :)

I've got some other example Lua scripts working now.

Kind regards,
Thomas



On Sun, 9 Apr 2023 at 11:41, Yuval Lifshitz  wrote:

> Hi Thomas,I think you found a crash when using the lua "CopyFrom"
> field.Opened a tracker: https://tracker.ceph.com/issues/59381Will
> <https://tracker.ceph.com/issues/59381Will>
> fix SASP and keep you updated.YuvalOn Wed, Apr 5, 2023 at 6:58 PM Thomas
> Bennett  wrote:Hi,We're currently
> t ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
> ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌
>
> <https://za.report.cybergraph.mimecast.com/alert-details/?dep=gNvGpJ2HHW%2FgGKb2cEfe0A%3D%3DXSEHgbMo2U6IbqFdTQvo4Kd%2FDF6gJagHdW2k%2BAG3d9x4MXqxU3wFkGxLwQiLO5IGzNLl2KqNolOlGnMtmnH1MBDDffJcKf5bIk2ickC%2F%2BlGmtGrpYiSMEy1zRTvWpn882BdBNA60Pf%2Bs84bt4zlMKs%2FxJcFtoBRWdKZe4MHn8XEjYjcwg7LYRfHsMJewVWHIoKeOgQNgG2Jix1798ThqwFv9BwlmQhO9QEpuJ%2F1hsTJkK%2FTz7XHuxFKkC9V%2F6DSTzP97wnE1qBUova8Z4wM1t2hEccRf2jFxlcqsexechZ8p7lTKWJ8h%2FF9QW5m%2Bn5zgx%2FBFOFyIEXJkBKPuEa%2B%2F%2Bs6jtsyqprw85xW3boTIDqa97XXoh5frwFMbxgOQk9g3rfg9XYL%2BWrXgbZ0zGunI47Jy1ls%2BqgEjpfzR0DMMaD0BBButW95wk7p3rwtTk37BPSJwhdyneD6BujYExdxMoIOHr1CdxcFBar4kfJzRt%2Bebbf9Oww8jIbJDeeatPwBosYI8Bji2Iu%2FN3IvvkWSNdFrnPM4MmXUrc5kY0rRKKcz%2FjaukDmCy5L1CP9zm6GK1XjYv2r66PrLiqmL6CVAHn3gt48TO2TR%2Fl12Qg0pA6X12MoA6uw5ySF7GO%2FrnHEHKd5FMhZM0p%2F5lJGvizqEH7pEwGXWqetOOr8wuRA9sq7purYi29LGyUx4W%2FBAikaHW>
> Hi Thomas,
> I think you found a crash when using the lua "CopyFrom" field.
> Opened a tracker: https://tracker.ceph.com/issues/59381
> <https://tracker.ceph.com/issues/59381>
>
> Will fix SASP and keep you updated.
>
> Yuval
>
> On Wed, Apr 5, 2023 at 6:58 PM Thomas Bennett  wrote:
>
>> Hi,
>>
>> We're currently testing out lua scripting in the Ceph Object Gateway
>> (Radosgw).
>>
>> Ceph version: 17.2.5
>>
>> We've tried a simple experiment with the simple lua script which is based
>> on the documentation (see fixed width text below).
>>
>> However, the issue we're having is that we can't find the log messages
>> anywhere. We've searched the entire jourrnalctl database as well as raised
>> the debug level on the radosgw by setting debug_rgw to 20 on the running
>> daemon.
>>
>> Any help welcome :)
>>
>> function print_object(msg, object)
>>   RGWDebugLog("  Title: " .. msg)
>>   RGWDebugLog("  Name: " .. object.Name)
>>   RGWDebugLog("  Instance: " .. object.Instance)
>>   RGWDebugLog("  Id: " .. object.Id)
>>   RGWDebugLog("  Size: " .. object.Size)
>>   RGWDebugLog("  MTime: " .. object.MTime)
>> end

[ceph-users] Ceph Object Gateway and lua scripts

2023-04-05 Thread Thomas Bennett
Hi,

We're currently testing out lua scripting in the Ceph Object Gateway
(Radosgw).

Ceph version: 17.2.5

We've tried a simple experiment with the simple lua script which is based
on the documentation (see fixed width text below).

However, the issue we're having is that we can't find the log messages
anywhere. We've searched the entire jourrnalctl database as well as raised
the debug level on the radosgw by setting debug_rgw to 20 on the running
daemon.

Any help welcome :)

function print_object(msg, object)
  RGWDebugLog("  Title: " .. msg)
  RGWDebugLog("  Name: " .. object.Name)
  RGWDebugLog("  Instance: " .. object.Instance)
  RGWDebugLog("  Id: " .. object.Id)
  RGWDebugLog("  Size: " .. object.Size)
  RGWDebugLog("  MTime: " .. object.MTime)
end

RGWDebugLog("This is a log message!")

Request.Log()
if Request.CopyFrom then
  print_object("copy from", Request.CopyFrom.Object)
if Request.CopyFrom.Object then
  print_object("copy from-object" ,Request.CopyFrom.Object)
end
end

if Request.Object then
  print_object("Object" ,Request.Object)
end

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast, a leader in email security and cyber 
resilience. Mimecast integrates email defenses with brand protection, security 
awareness training, web security, compliance and other essential capabilities. 
Mimecast helps protect large and small organizations from malicious activity, 
human error and technology failure; and to lead the movement toward building a 
more resilient world. To find out more, visit our website.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: speed up individual backfills

2020-06-04 Thread Thomas Bennett
Hi,

It turns out I was mapping to a problematic OSD. In this case OSD 3313.

After disabling the OSD with systemctl on the host, recovery has picked up
again and mapped the pgs to new osds.

For prosperity, I ran smartctl on osd.3313's device and then I noticed:
  5 Reallocated_Sector_Ct   0x0033   092   092   010Pre-fail  Always
-   *31688*

Lots of reallocated sectors, so the drive was "working" but not usable.

In the end it had nothing to do with Ceph at all.

Regards,


On Thu, Jun 4, 2020 at 1:59 PM Thomas Bennett  wrote:

> Hi,
>
> I have 15628 misplaced objects that are currently backfilling as follows:
>
>1. pgid:14.3ce1  from:osd.1321 to:osd.3313
>2. pgid:14.4dd9 from:osd.1693 to:osd.2980
>3. pgid:14.680b from:osd.362 to:osd.3313
>
> These are remnant backfills from a pg-upmap/rebalance campaign after we've
> added 2 new racks worth of osds to our cluster.
>
> Our mon db is bloated so I'm wanting to trim the mon db before continuing
> the next pg-upmap/rebalance campaign.
>
> So, my question is:
> Is there any way I can speed up the backfill process on these individual
> osds?
> Or hints to trace out why these are so slow?
>
> Regards
>


-- 
Thomas Bennett

Storage Engineer at SARAO
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] speed up individual backfills

2020-06-04 Thread Thomas Bennett
Hi,

I have 15628 misplaced objects that are currently backfilling as follows:

   1. pgid:14.3ce1  from:osd.1321 to:osd.3313
   2. pgid:14.4dd9 from:osd.1693 to:osd.2980
   3. pgid:14.680b from:osd.362 to:osd.3313

These are remnant backfills from a pg-upmap/rebalance campaign after we've
added 2 new racks worth of osds to our cluster.

Our mon db is bloated so I'm wanting to trim the mon db before continuing
the next pg-upmap/rebalance campaign.

So, my question is:
Is there any way I can speed up the backfill process on these individual
osds?
Or hints to trace out why these are so slow?

Regards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] pg-upmap-items

2020-06-03 Thread Thomas Bennett
Hi,

I've been using pg-upmap items both in the ceph balancer and by hand
running osdmaptool for a while now (on Ceph 12.2.13).

But I've noticed a side effect of up-map-items which can sometimes lead to
some unnecessary data movement.

My understanding is that the ceph osdmap keeps track of upmap-items that I
undo (in my case using the CERN scrip upmap-remapped.py).

These can be seen in the osdmap (or osd dump) json output. It looks, for
example, like this:
  "pg_upmap_items": [
{
  "pgid": "9.10",
  "mappings": [
{
  "from": 1761,
  "to": 6
}
  ]
},

When upmapping pg 9.10 I first need to clear this pg_upmap_item by
executing an rm-upmap-item command:
ceph rm-pg-upmap-items 9.10

All this does is unmap the from/to osds (here from osd.6 to osd.1761) which
is sometimes not useful.

I would prefer to rather "forget" this upmap. I.e remove it permanently
from pg_upmap_items. Is there any way to do this?

Cheers,
Toms
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Aging in S3 or Moving old data to slow OSDs

2020-05-20 Thread Thomas Bennett
Hi Khodayar,

Yes, you are correct. I would have to move objects manually between (more
> than one) buckets if I use "Pool placements and Storage classes"
>
> So you have successfully used this method and it was OK?
>

After we set up the new placement rule in the zone and zonegroups we
modified users configuration to set the placement to use. It's worked well
so far :). I'm happy to share how we set up the zones and zonegroups if you
need that information. But I followed what I could find in the ceph docs.

> I may be forced to use this method because clients needs more features
> than mere cache tiering.
>
> Have you ever used S3 Browser "bucket lifecycle rules" and those "storage
> classes"?
>

We use s3 life cycle for data expiration but not for data transition. It
looks like transitions are supported in Nautilus
https://docs.ceph.com/docs/master/radosgw/placement/#storage-classes.

Cheers,
Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large omap

2020-05-20 Thread Thomas Bennett
Hi,

Have you looked at omaps keys to see what's listed there?

In our configuration, the radosgw garbage collector uses the
*default.rgw.logs* pool for garbage collection (radosgw-admin zone get
default | jq .gc_pool).

I've seen large omaps in my *default.rgw.logs* pool before when I've
deleted large amounts of s3 data and there are many shadow objects that
still need to be deleted by the garbage collector.

Cheers,
Tom

On Wed, May 20, 2020 at 9:47 AM Janne Johansson  wrote:

> Den ons 20 maj 2020 kl 05:23 skrev Szabo, Istvan (Agoda) <
> istvan.sz...@agoda.com>:
>
> > LARGE_OMAP_OBJECTS 1 large omap objects
> > 1 large objects found in pool 'default.rgw.log'
> > When I look for this large omap object, this is the one:
> > for i in `ceph pg ls-by-pool default.rgw.log | tail -n +2 | awk '{print
> > $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects |
> > head -1 | awk '{print $2}'; done | grep ": 1"
> > 4.d: 1
> > I found only this way to reduce the size:
> > radosgw-admin usage trim --end-date=2019-05-01 --yes-i-really-mean-it
> >
> > However when this is running the RGW became completely unreachable, the
> > loadbalancer started to flapping and users started to complain because
> they
> > can't do anything.
> > Is there any other way to fix it, or any suggestion why this issue
> happens?
> >
>
> If you are not using the usage logs for anything, there are options in rgw
> to not produce them, which is a blunt but working solution to not have to
> clean them out with outages during trimming.
>
> If you do use them, perhaps set "rgw usage max user shards" to something
> larger than the default 1.
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Thomas Bennett

Storage Engineer at SARAO
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Aging in S3 or Moving old data to slow OSDs

2020-05-20 Thread Thomas Bennett
Hi Khodayar,

Setting placement policies is probably not what you're looking for.

I've used placement policies successfully to separate an HDD pool from an
SSD pool. However, this policy only applies to new data if it is set. You
would have to read it out and write it back in at the s3 level using your
new policy.

Also, I think (but not sure) that policies are applied at the bucket level,
so you would need a second bucket.

Cheers,
Tom

On Tue, May 19, 2020 at 11:54 PM Khodayar Doustar 
wrote:

> Hi,
>
> I'm using Nautilus and I'm using the whole cluster mainly for a single
> bucket in RadosGW.
> There is a lot of data in this bucket (Petabyte scale) and I don't want to
> waste all of SSD on it.
> Is there anyway to automatically set some aging threshold for this data and
> e.g. move any data older than a month to HDD OSDs?
> Does anyone has experience with this:
> Pool Placement and Storage Classes:
> https://docs.ceph.com/docs/master/radosgw/placement/
>
> But something automatic would be much better for me in this case.
>
> Any help would be appreciated.
>
> Thanks a lot,
> Khodayar
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Thomas Bennett

Storage Engineer at SARAO
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding new non-containerised hosts to current contanerised environment and moving away from containers forward

2019-11-14 Thread Thomas Bennett
Hey Jeremi,

I'm not sure how ceph-ansible will handle a hybrid system. You'll need to
make sure that you have the same info in the ceph-ansible "fetch" directory
or it will create a separate cluster.

 I'm not sure if you can somehow force this without causing some issues.
I'm also not sure what other issues might arise.

Perhaps a way forward would be to start the osds by hand?

Something like:
sudo apt update
sudo apt install ceph

Get a copy of the /etc/ceph/ceph.conf and
/var/lib/ceph/bootstrap-osd/ceph.keyring  from another node.

sudo ceph-volume lvm zap /dev/sd?
sudo ceph-volume lvm create --data /dev/sd?



On Mon, Nov 11, 2019 at 1:29 PM Jeremi Avenant  wrote:

> Good day
>
> We currently have 12 nodes in 4 Racks (3x4) and getting another 3 nodes to
> complete the 5th rack on Version 12.2.12, using ceph-ansible & docker
> containers.
>
> With the 3 new nodes (1 rack bucket) we would like to make use of a
> non-containerised setup since our long term plan is to completely move away
> from OSD containers.
>
> How would one go forward running a hybrid environment in the interim until
> we've rebuilt all our existing osd nodes?
>
> I can assume that i would require a separate /ceph-ansible directory with
> site.yml to my existing site-docker.yml
>
> Would I require separate inventory files, one for the existing 12 nodes &
> one for the 3 new nodes? How would I connect the new nodes to the current
> controller (mds,mgr, mon) container nodes?
>
> Regards
>
>
> --
>
>
>
>
> *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist
> Inter-University Institute for Data Intensive Astronomy
> 5th Floor, Department of Physics and Astronomy,
> University of Cape Town
>
> Tel: 021 959 4137 <0219592327>
> Web: www.idia.ac.za <http://www.uwc.ac.za/>
> E-mail (IDIA): jer...@idia.ac.za 
> Rondebosch, Cape Town, 7600, South Africa
> ___________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Thomas Bennett

Storage Engineer at SARAO
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lower mem radosgw config?

2019-10-30 Thread Thomas Bennett
Hey Dan,

We've got three rgws with the following configuration:

   - We're running 12.2.12 with civit web.
   - 3 RGW's with haproxy round robin
   - 32 GiB RAM (handles = 4, thread pool = 512)
   - We run mon+mgr+rgw on the same hardware.

Looking at our grafana dashboards, I don't see us running out of free
memory, they hover around 5 GB free (but sometimes lower).

>From my quick tests (a fair while ago) I found making  more handles
eventually lead to OOMs.

Currently, we've been getting some "connection reset by peer" (errno
104)...  from client reads, which I suspect is due to the low handles
setting.

We plan to move the rgws onto 125 GiB beefier machines.

Cheers,
Tom

On Mon, Oct 28, 2019 at 6:14 PM Dan van der Ster  wrote:

> Hi all,
>
> Does anyone have a good config for lower memory radosgw machines?
> We have 16GB VMs and our radosgw's go OOM when we have lots of
> parallel clients (e.g. I see around 500 objecter_ops via the rgw
> asok).
>
> Maybe lowering rgw_thread_pool_size from 512 would help?
>
> (This is running latest luminous).
>
> Thanks, Dan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Thomas Bennett

Storage Engineer at SARAO
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io