[ceph-users] mimic: much more raw used than reported

2020-07-26 Thread Frank Schilder
Dear fellow cephers,

I observe a wired problem on our mimic-13.2.8 cluster. We have an EC RBD pool 
backed by HDDs. These disks are not in any other pool. I noticed that the total 
capacity (=USED+MAX AVAIL) reported by "ceph df detail" has shrunk recently 
from 300TiB to 200TiB. Part but by no means all of this can be explained by 
imbalance of the data distribution.

When I compare the output of "ceph df detail" and "ceph osd df tree", I find 
69TiB raw capacity used but not accounted for; see calculations below. These 
69TiB raw are equivalent to 20% usable capacity and I really need it back. 
Together with the imbalance, we loose about 30% capacity.

What is using these extra 69TiB and how can I get it back?


Some findings:

These are the 5 largest images in the pool, accounting for a total of 97TiB out 
of 119TiB usage:

# rbd du :
NAMEPROVISIONED   USED 
one-133  25 TiB 14 TiB 
NAMEPROVISIONEDUSED 
one-153@222  40 TiB  14 TiB 
one-153@228  40 TiB 357 GiB 
one-153@235  40 TiB 797 GiB 
one-153@241  40 TiB 509 GiB 
one-153@242  40 TiB  43 GiB 
one-153@243  40 TiB  16 MiB 
one-153@244  40 TiB  16 MiB 
one-153@245  40 TiB 324 MiB 
one-153@246  40 TiB 276 MiB 
one-153@247  40 TiB  96 MiB 
one-153@248  40 TiB 138 GiB 
one-153@249  40 TiB 1.8 GiB 
one-153@250  40 TiB 0 B 
one-153  40 TiB 204 MiB 
  40 TiB  16 TiB 
NAME   PROVISIONEDUSED 
one-391@3   40 TiB 432 MiB 
one-391@9   40 TiB  26 GiB 
one-391@15  40 TiB  90 GiB 
one-391@16  40 TiB 0 B 
one-391@17  40 TiB 0 B 
one-391@18  40 TiB 0 B 
one-391@19  40 TiB 0 B 
one-391@20  40 TiB 3.5 TiB 
one-391@21  40 TiB 5.4 TiB 
one-391@22  40 TiB 5.8 TiB 
one-391@23  40 TiB 8.4 TiB 
one-391@24  40 TiB 1.4 TiB 
one-391 40 TiB 2.2 TiB 
 40 TiB  27 TiB 
NAME   PROVISIONEDUSED 
one-394@3   70 TiB 1.4 TiB 
one-394@9   70 TiB 2.5 TiB 
one-394@15  70 TiB  20 GiB 
one-394@16  70 TiB 0 B 
one-394@17  70 TiB 0 B 
one-394@18  70 TiB 0 B 
one-394@19  70 TiB 383 GiB 
one-394@20  70 TiB 3.3 TiB 
one-394@21  70 TiB 5.0 TiB 
one-394@22  70 TiB 5.0 TiB 
one-394@23  70 TiB 9.0 TiB 
one-394@24  70 TiB 1.6 TiB 
one-394 70 TiB 2.5 TiB 
 70 TiB  31 TiB 
NAMEPROVISIONEDUSED 
one-434  25 TiB 9.1 TiB 

The large 70TiB images one-391 and one-394 are currently copied to with ca. 
5TiB per day.

Output of "ceph df detail" with some columns removed:

NAME ID USED%USED MAX AVAIL OBJECTS 
 RAW USED 
sr-rbd-data-one-hdd  11 119 TiB 58.4584 TiB 31286554
  158 TiB 

Pool is EC 6+2.
USED is correct: 31286554*4MiB=119TiB.
RAW USED is correct: 119*8/6=158TiB.
Most of this data is freshly copied onto large RBD images.
Compression is enabled on this pool (aggressive,snappy).

However, when looking at "deph osd df tree", I get

The combined raw capacity of OSDs backing this pool is 406.8TiB (sum over SIZE).
Summing up column USE over all OSDs gives 227.5TiB.

This gives a difference of 69TiB (=227-158) that is not accounted for.

Here the output of "ceph osd df tree limited" to the drives backing the pool:

ID   CLASSWEIGHT REWEIGHT SIZEUSE DATAOMAPMETA 
AVAIL   %USE  VAR  PGS TYPE NAME
  84  hdd8.90999  1.0 8.9 TiB 5.0 TiB 5.0 TiB 180 MiB   16 GiB 3.9 
TiB 56.43 1.72 103 osd.84
 145  hdd8.90999  1.0 8.9 TiB 4.6 TiB 4.6 TiB 144 MiB   14 GiB 4.3 
TiB 51.37 1.57  87 osd.145
 156  hdd8.90999  1.0 8.9 TiB 5.2 TiB 5.1 TiB 173 MiB   16 GiB 3.8 
TiB 57.91 1.77 100 osd.156
 168  hdd8.90999  1.0 8.9 TiB 5.0 TiB 5.0 TiB 164 MiB   16 GiB 3.9 
TiB 56.31 1.72  98 osd.168
 181  hdd8.90999  1.0 8.9 TiB 5.5 TiB 5.4 TiB 121 MiB   17 GiB 3.5 
TiB 61.26 1.87 105 osd.181
  74  hdd8.90999  1.0 8.9 TiB 4.2 TiB 4.2 TiB 148 MiB   13 GiB 4.7 
TiB 46.79 1.43  85 osd.74
 144  hdd8.90999  1.0 8.9 TiB 4.7 TiB 4.7 TiB 106 MiB   15 GiB 4.2 
TiB 53.17 1.62  94 osd.144
 157  hdd8.90999  1.0 8.9 TiB 5.8 TiB 5.8 TiB 192 MiB   18 GiB 3.1 
TiB 65.02 1.99 111 osd.157
 169  hdd8.90999  1.0 8.9 TiB 5.1 TiB 5.1 TiB 172 MiB   16 GiB 3.8 
TiB 56.99 1.74 102 osd.169
 180  hdd8.90999  1.0 8.9 TiB 5.8 TiB 5.8 TiB 131 MiB   18 GiB 3.1 
TiB 65.04 1.99 111 osd.180
  60  hdd8.90999  1.0 8.9 TiB 4.5 TiB 4.5 TiB 155 MiB   14 GiB 4.4 
TiB 50.40 1.54  93 osd.60
 146  hdd8.90999  1.0 8.9 TiB 4.8 TiB 4.8 TiB 139 MiB   15 GiB 4.1 
TiB 53.70 1.64  92 osd.146
 158  hdd8.90999  1.0 8.9 TiB 5.6 TiB 5.5 TiB 

[ceph-users] Re: Ceph-deploy on rhel.

2020-07-26 Thread Zhenshi Zhou
The user provided to the dashboard must be created with '--system' with
radosgw-admin, or it's not working.


sathvik vutukuri <7vik.sath...@gmail.com> 于2020年7月26日周日 上午9:54写道:

> I have enabled it using the same doc, but some how it's not working.
>
> On Sun, 26 Jul 2020, 06:55 Oliver Freyermuth, <
> freyerm...@physik.uni-bonn.de>
> wrote:
>
> > Hey Sathvik,
> >
> > Am 26.07.20 um 03:18 schrieb sathvik vutukuri:
> > > Hey Oliver,
> > >
> > > I have installed the nautilus version on Centos. It was installed
> > properly  and I have created s3 buckets.
> > > But when accessing from S3 SDk code or object gateway dashboard I am
> > facing this issue in ceph dashboard.
> > >
> > > *"RGW REST API failed request with status code 403
> > '{"Code":"InvalidAccessKeyId","RequestId":"*
> > > *
> > > *
> > > _Am I missing something regarding  object gateway enablement in the
> Ceph
> > dashboard?_
> >
> > good to hear, so the ceph-deploy issue (whatever it was) seems solved
> :-).
> >
> > It seems like you might have missed this step:
> >
> >
> https://docs.ceph.com/docs/nautilus/mgr/dashboard/#enabling-the-object-gateway-management-frontend
> > which is necessary to let the dashboard manage the Object Gateways.
> >
> > Cheers,
> > Oliver
> >
> > > *
> > > *
> > >
> > > *
> > > *
> > >
> > > On Sun, Jul 26, 2020 at 6:08 AM Oliver Freyermuth <
> > freyerm...@physik.uni-bonn.de >
> > wrote:
> > >
> > > Hey Sathvik,
> > >
> > > Am 26.07.20 um 02:22 schrieb sathvik vutukuri:
> > > > Hey oliver,
> > > >
> > > > When I tried to do in RHEL ceph-deploy is trying to get rpm's
> from
> > /nautilus/rhel7/noarch which is not available. Available for jewel
> version.
> > >
> > > I sadly still don't understand the exact issue you have.
> > > From your last mail, I thought your problem was not finding the
> > ceph-deploy RPMs fro EL 7.
> > >
> > > From this mail, it seems you have an issue installing ceph-radosgw?
> > >
> > > Last time I did that with ceph-deploy on CentOS 7, I used nautilus,
> > and it worked perfectly fine. As you can see, it installs the package
> > "ceph-radosgw":
> > >
> >
> https://github.com/ceph/ceph-deploy/blob/42f2b376542fde2d412505271fadbd22d73e5ea4/ceph_deploy/install.py#L56
> > > which for nautilus comes from here:
> > >  https://download.ceph.com/rpm-nautilus/el7/x86_64/
> > > and even with jewel, ceph-radosgw never was in noarch, but found
> > here:
> > >  https://download.ceph.com/rpm-jewel/el7/x86_64/
> > >
> > > Can you maybe share which command you tried and which error you
> got?
> > >
> > > Using my crystal ball, my best guess would be you've manually set
> up
> > the noarch repository only,
> > > instead of using ceph-deploy to set up both noarch and x86_64 repos
> > (or doing that manually).
> > >
> > > Cheers,
> > > Oliver
> > >
> > > >
> > > >
> > > > On Sat, 25 Jul 2020, 21:36 Oliver Freyermuth, <
> > freyerm...@physik.uni-bonn.de 
> >  freyerm...@physik.uni-bonn.de>>> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Am 22.07.20 um 12:13 schrieb sathvik vutukuri:
> > > > > Hi,
> > > > >
> > > > > Did any one installed ceph-deploy on rhel7 with rados gate
> > way.
> > > > >
> > > > > I see there are no rpms available for rhel7 in ceph-deploy
> in
> > > > > download.ceph.com  <
> > http://download.ceph.com> for nautilis , luminous, octopus versions.
> > > >
> > > > where exactly did you look?
> > > >
> > > > I find the ceph-deploy RPMs just fine here (for example):
> > > >  https://download.ceph.com/rpm-nautilus/el7/noarch/
> > > > They are also still there in other noarch directories and yum
> > finds them well.
> > > >
> > > > It's missing for rhel 8, though.
> > > >
> > > > Cheers,
> > > > Oliver
> > > >
> > > > >
> > > > > Is ceph-deploy go to for rhel7???
> > > > > ___
> > > > > ceph-users mailing list -- ceph-users@ceph.io  > ceph-users@ceph.io>  > >>
> > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >   > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Vutukuri Sathvik,
> > > 8197748291.
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] large omap objects

2020-07-26 Thread Zhenshi Zhou
Hi all,

I have a cluster providing object storage.

The cluster has worked well until someone saves flink checkpoints in
the 'flink' bucket. I checked its behavior and I find that the flink saves
the current checkpoint data and delete the former ones frequently. I
suppose that it makes the bucket index get large omap object. I got
'1 large omap objects' warning message these days. And after I check
the cluster status and logs, of course,  all of those large omap object
points to exactly the same one index. The warning message:

*cluster [WRN] Large omap object found. Object:
17:2f908b17:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.166289.1:head PG:
17.e8d109f4 (17.74) Key count: 568681 Size (bytes): 149443581*

I did 'bilog trim' and 'pg deep-scrub' and the cluster became health again.
However, I cannot do this all the time. Is there a way to solve this issue
permanently?

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph-deploy on rhel.

2020-07-26 Thread sathvik vutukuri
I have done the same, but this is the  issue in the dashboard..


Information
key system is not in dict {u'attrs': [], u'display_name': u'User for
Connector', u'default_storage_class': u'', u'keys': [{u'access_key':
u'Q2RQU16YETCDGQ0C483Q', u'secret_key':
u'KzNOfUMDyQCodXbYM0mwQVBpOsgRxbgGT8HpugL6', u'user': u'connector'},
{u'access_key': u'S1IXI38UQJK435INX6W7', u'secret_key':
u'4E1UBv5B2BM7pawtqHcNHI0wNTLPbdqCLyyfoQPF', u'user': u'connector'}],
u'default_placement': u'', u'mfa_ids': [], u'temp_url_keys': [], u'caps':
[], u'max_buckets': 1000, u'swift_keys': [], u'user_quota':
{u'max_objects': -1, u'enabled': False, u'max_size_kb': 0, u'max_size': -1,
u'check_on_raw': False}, u'placement_tags': [], u'suspended': 0,
u'op_mask': u'read, write, delete', u'user_id': u'connector', u'type':
u'rgw', u'email': u'', u'subusers': [], u'bucket_quota': {u'max_objects':
-1, u'enabled': False, u'max_size_kb': 0, u'max_size': -1, u'check_on_raw':
False}}

Did I miss something?






On Mon, Jul 27, 2020 at 7:38 AM Zhenshi Zhou  wrote:

> The user provided to the dashboard must be created with '--system' with
> radosgw-admin, or it's not working.
>
>
> sathvik vutukuri <7vik.sath...@gmail.com> 于2020年7月26日周日 上午9:54写道:
>
>> I have enabled it using the same doc, but some how it's not working.
>>
>> On Sun, 26 Jul 2020, 06:55 Oliver Freyermuth, <
>> freyerm...@physik.uni-bonn.de>
>> wrote:
>>
>> > Hey Sathvik,
>> >
>> > Am 26.07.20 um 03:18 schrieb sathvik vutukuri:
>> > > Hey Oliver,
>> > >
>> > > I have installed the nautilus version on Centos. It was installed
>> > properly  and I have created s3 buckets.
>> > > But when accessing from S3 SDk code or object gateway dashboard I am
>> > facing this issue in ceph dashboard.
>> > >
>> > > *"RGW REST API failed request with status code 403
>> > '{"Code":"InvalidAccessKeyId","RequestId":"*
>> > > *
>> > > *
>> > > _Am I missing something regarding  object gateway enablement in the
>> Ceph
>> > dashboard?_
>> >
>> > good to hear, so the ceph-deploy issue (whatever it was) seems solved
>> :-).
>> >
>> > It seems like you might have missed this step:
>> >
>> >
>> https://docs.ceph.com/docs/nautilus/mgr/dashboard/#enabling-the-object-gateway-management-frontend
>> > which is necessary to let the dashboard manage the Object Gateways.
>> >
>> > Cheers,
>> > Oliver
>> >
>> > > *
>> > > *
>> > >
>> > > *
>> > > *
>> > >
>> > > On Sun, Jul 26, 2020 at 6:08 AM Oliver Freyermuth <
>> > freyerm...@physik.uni-bonn.de >
>> > wrote:
>> > >
>> > > Hey Sathvik,
>> > >
>> > > Am 26.07.20 um 02:22 schrieb sathvik vutukuri:
>> > > > Hey oliver,
>> > > >
>> > > > When I tried to do in RHEL ceph-deploy is trying to get rpm's
>> from
>> > /nautilus/rhel7/noarch which is not available. Available for jewel
>> version.
>> > >
>> > > I sadly still don't understand the exact issue you have.
>> > > From your last mail, I thought your problem was not finding the
>> > ceph-deploy RPMs fro EL 7.
>> > >
>> > > From this mail, it seems you have an issue installing
>> ceph-radosgw?
>> > >
>> > > Last time I did that with ceph-deploy on CentOS 7, I used
>> nautilus,
>> > and it worked perfectly fine. As you can see, it installs the package
>> > "ceph-radosgw":
>> > >
>> >
>> https://github.com/ceph/ceph-deploy/blob/42f2b376542fde2d412505271fadbd22d73e5ea4/ceph_deploy/install.py#L56
>> > > which for nautilus comes from here:
>> > >  https://download.ceph.com/rpm-nautilus/el7/x86_64/
>> > > and even with jewel, ceph-radosgw never was in noarch, but found
>> > here:
>> > >  https://download.ceph.com/rpm-jewel/el7/x86_64/
>> > >
>> > > Can you maybe share which command you tried and which error you
>> got?
>> > >
>> > > Using my crystal ball, my best guess would be you've manually set
>> up
>> > the noarch repository only,
>> > > instead of using ceph-deploy to set up both noarch and x86_64
>> repos
>> > (or doing that manually).
>> > >
>> > > Cheers,
>> > > Oliver
>> > >
>> > > >
>> > > >
>> > > > On Sat, 25 Jul 2020, 21:36 Oliver Freyermuth, <
>> > freyerm...@physik.uni-bonn.de 
>> >  > freyerm...@physik.uni-bonn.de>>> wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > Am 22.07.20 um 12:13 schrieb sathvik vutukuri:
>> > > > > Hi,
>> > > > >
>> > > > > Did any one installed ceph-deploy on rhel7 with rados gate
>> > way.
>> > > > >
>> > > > > I see there are no rpms available for rhel7 in
>> ceph-deploy in
>> > > > > download.ceph.com  <
>> > http://download.ceph.com> for nautilis , luminous, octopus versions.
>> > > >
>> > > > where exactly did you look?
>> > > >
>> > > > I find the ceph-deploy RPMs just fine here (for example):
>> > > >  https://download.ceph.com/rpm-nautilus/el7/

[ceph-users] Re: Ceph-deploy on rhel.

2020-07-26 Thread Zhenshi Zhou
Well, it didn't work at first and I found that I created the user without
'--system'.
After I modify the user with '--system', the dashboard connects to the rgw.
I'm
not sure if there is any other operation I did out of the docs.

sathvik vutukuri <7vik.sath...@gmail.com> 于2020年7月27日周一 上午11:36写道:

> I have done the same, but this is the  issue in the dashboard..
>
>
> Information
> key system is not in dict {u'attrs': [], u'display_name': u'User for
> Connector', u'default_storage_class': u'', u'keys': [{u'access_key':
> u'Q2RQU16YETCDGQ0C483Q', u'secret_key':
> u'KzNOfUMDyQCodXbYM0mwQVBpOsgRxbgGT8HpugL6', u'user': u'connector'},
> {u'access_key': u'S1IXI38UQJK435INX6W7', u'secret_key':
> u'4E1UBv5B2BM7pawtqHcNHI0wNTLPbdqCLyyfoQPF', u'user': u'connector'}],
> u'default_placement': u'', u'mfa_ids': [], u'temp_url_keys': [], u'caps':
> [], u'max_buckets': 1000, u'swift_keys': [], u'user_quota':
> {u'max_objects': -1, u'enabled': False, u'max_size_kb': 0, u'max_size': -1,
> u'check_on_raw': False}, u'placement_tags': [], u'suspended': 0,
> u'op_mask': u'read, write, delete', u'user_id': u'connector', u'type':
> u'rgw', u'email': u'', u'subusers': [], u'bucket_quota': {u'max_objects':
> -1, u'enabled': False, u'max_size_kb': 0, u'max_size': -1, u'check_on_raw':
> False}}
>
> Did I miss something?
>
>
>
>
>
>
> On Mon, Jul 27, 2020 at 7:38 AM Zhenshi Zhou  wrote:
>
>> The user provided to the dashboard must be created with '--system' with
>> radosgw-admin, or it's not working.
>>
>>
>> sathvik vutukuri <7vik.sath...@gmail.com> 于2020年7月26日周日 上午9:54写道:
>>
>>> I have enabled it using the same doc, but some how it's not working.
>>>
>>> On Sun, 26 Jul 2020, 06:55 Oliver Freyermuth, <
>>> freyerm...@physik.uni-bonn.de>
>>> wrote:
>>>
>>> > Hey Sathvik,
>>> >
>>> > Am 26.07.20 um 03:18 schrieb sathvik vutukuri:
>>> > > Hey Oliver,
>>> > >
>>> > > I have installed the nautilus version on Centos. It was installed
>>> > properly  and I have created s3 buckets.
>>> > > But when accessing from S3 SDk code or object gateway dashboard I am
>>> > facing this issue in ceph dashboard.
>>> > >
>>> > > *"RGW REST API failed request with status code 403
>>> > '{"Code":"InvalidAccessKeyId","RequestId":"*
>>> > > *
>>> > > *
>>> > > _Am I missing something regarding  object gateway enablement in the
>>> Ceph
>>> > dashboard?_
>>> >
>>> > good to hear, so the ceph-deploy issue (whatever it was) seems solved
>>> :-).
>>> >
>>> > It seems like you might have missed this step:
>>> >
>>> >
>>> https://docs.ceph.com/docs/nautilus/mgr/dashboard/#enabling-the-object-gateway-management-frontend
>>> > which is necessary to let the dashboard manage the Object Gateways.
>>> >
>>> > Cheers,
>>> > Oliver
>>> >
>>> > > *
>>> > > *
>>> > >
>>> > > *
>>> > > *
>>> > >
>>> > > On Sun, Jul 26, 2020 at 6:08 AM Oliver Freyermuth <
>>> > freyerm...@physik.uni-bonn.de >
>>> > wrote:
>>> > >
>>> > > Hey Sathvik,
>>> > >
>>> > > Am 26.07.20 um 02:22 schrieb sathvik vutukuri:
>>> > > > Hey oliver,
>>> > > >
>>> > > > When I tried to do in RHEL ceph-deploy is trying to get rpm's
>>> from
>>> > /nautilus/rhel7/noarch which is not available. Available for jewel
>>> version.
>>> > >
>>> > > I sadly still don't understand the exact issue you have.
>>> > > From your last mail, I thought your problem was not finding the
>>> > ceph-deploy RPMs fro EL 7.
>>> > >
>>> > > From this mail, it seems you have an issue installing
>>> ceph-radosgw?
>>> > >
>>> > > Last time I did that with ceph-deploy on CentOS 7, I used
>>> nautilus,
>>> > and it worked perfectly fine. As you can see, it installs the package
>>> > "ceph-radosgw":
>>> > >
>>> >
>>> https://github.com/ceph/ceph-deploy/blob/42f2b376542fde2d412505271fadbd22d73e5ea4/ceph_deploy/install.py#L56
>>> > > which for nautilus comes from here:
>>> > >  https://download.ceph.com/rpm-nautilus/el7/x86_64/
>>> > > and even with jewel, ceph-radosgw never was in noarch, but found
>>> > here:
>>> > >  https://download.ceph.com/rpm-jewel/el7/x86_64/
>>> > >
>>> > > Can you maybe share which command you tried and which error you
>>> got?
>>> > >
>>> > > Using my crystal ball, my best guess would be you've manually
>>> set up
>>> > the noarch repository only,
>>> > > instead of using ceph-deploy to set up both noarch and x86_64
>>> repos
>>> > (or doing that manually).
>>> > >
>>> > > Cheers,
>>> > > Oliver
>>> > >
>>> > > >
>>> > > >
>>> > > > On Sat, 25 Jul 2020, 21:36 Oliver Freyermuth, <
>>> > freyerm...@physik.uni-bonn.de 
>>> > > > freyerm...@physik.uni-bonn.de>>> wrote:
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > Am 22.07.20 um 12:13 schrieb sathvik vutukuri:
>>> > > > > Hi,
>>> > > > >
>>> > > > > Did any one installed ceph-deploy on rhel7 with rados
>>> gate
>>>