[ceph-users] Looking up buckets in multi-site radosgw configuration

2019-03-18 Thread David Coles
I'm looking at setting up a multi-site radosgw configuration where
data is sharded over multiple clusters in a single physical location;
and would like to understand how Ceph handles requests in this
configuration.

Looking through the radosgw source[1] it looks like radowgw will
return 301 redirect if I request a bucket that is not in the current
zonegroup. This redirect appears to be to the endpoint for the
zonegroup (I assume as configured by `radosgw-admin zonegroup create
--endpoints`). This seems like it would work well for multiple
geographic regions (e.g. us-east and us-west) for ensuring that a
request is redirected to the region (zonegroup) that hosts the bucket.
We could possibly improve this by virtual hosted buckets and having
DNS point to the correct region for that bucket.

I notice that it's also possible to configure zones in a zonegroup
that don't peform replication[2] (e.g. us-east-1 and us-east-2). In
this case I assume that if I direct a request to the wrong zone, then
Ceph will just report that the object as not-found because, despite
the bucket metadata being replicated from the zonegroup master, the
objects will never be replicated from one zone to the other. Another
layer (like a consistent hash across the bucket name or database)
would be required for routing to the correct zone.

Is this mostly correct? Are there other ways of controlling which
cluster data is placed (i.e. placement groups)?

Thanks!

1. 
https://github.com/ceph/ceph/blob/affb7d396f76273e885cfdbcd363c1882496726c/src/rgw/rgw_op.cc#L653-L669
2. 
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_red_hat_enterprise_linux/multi_site#configuring_multiple_zones_without_replication
-- 
David Coles
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs error

2019-03-18 Thread huang jun
Marc Roos  于2019年3月18日周一 上午5:46写道:
>
>
>
>
> 2019-03-17 21:59:58.296394 7f97cbbe6700  0 --
> 192.168.10.203:6800/1614422834 >> 192.168.10.43:0/1827964483
> conn(0x55ba9614d000 :6800 s=STATE_OPEN pgs=8 cs=1 l=0).fault server,
> going to standby
>
> What does this mean?
That means the connection is idle for some time, and the server goto
standby state unless the new packages come
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Gregory Farnum
On Mon, Mar 18, 2019 at 7:28 PM Yan, Zheng  wrote:
>
> On Mon, Mar 18, 2019 at 9:50 PM Dylan McCulloch  wrote:
> >
> >
> > >please run following command. It will show where is 4.
> > >
> > >rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent
> > >ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
> > >
> >
> > $ ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
> > {
> > "ino": 4,
> > "ancestors": [
> > {
> > "dirino": 1,
> > "dname": "lost+found",
> > "version": 1
> > }
> > ],
> > "pool": 20,
> > "old_pools": []
> > }
> >
> > I guess it may have a very large number of files from previous recovery 
> > operations?
> >
>
> Yes, these files are created by cephfs-data-scan. If you don't want
> them, you can delete "lost+found"

This certainly makes sense, but even with that pointer I can't find
how it's picking inode 4. That should probably be documented? :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Full L3 Ceph

2019-03-18 Thread Lazuardi Nasution
Hi Stefan,

I think I have missed your reply. I'm interested to know how you manage the
performance on running Ceph with host based VXLAN overlay. May be you can
share the comparison for better understanding of possible performance
impact.

Best regards,


> Date: Sun, 25 Nov 2018 21:17:34 +0100
> From: Stefan Kooman 
> To: "Robin H. Johnson" 
> Cc: Ceph Users 
> Subject: Re: [ceph-users] Full L3 Ceph
> Message-ID: <20181125201734.gc17...@shell.dmz.bit.nl>
> Content-Type: text/plain; charset="us-ascii"
>
> Quoting Robin H. Johnson (robb...@gentoo.org):
> > On Fri, Nov 23, 2018 at 04:03:25AM +0700, Lazuardi Nasution wrote:
> > > I'm looking example Ceph configuration and topology on full layer 3
> > > networking deployment. Maybe all daemons can use loopback alias
> address in
> > > this case. But how to set cluster network and public network
> configuration,
> > > using supernet? I think using loopback alias address can prevent the
> > > daemons down due to physical interfaces disconnection and can load
> balance
> > > traffic between physical interfaces without interfaces bonding, but
> with
> > > ECMP.
> > I can say I've done something similar**, but I don't have access to that
> > environment or most*** of the configuration anymore.
> >
> > One of the parts I do recall, was explicitly setting cluster_network
> > and public_network to empty strings, AND using public_addr+cluster_addr
> > instead, with routable addressing on dummy interfaces (NOT loopback).
>
> You can do this with MP-BGP (VXLAN) EVPN. We are running it like that.
> IPv6 overlay network only. ECMP to make use of all the links. We don't
> use a seperate cluster network. That only complicates things, and
> there's no real use for it (trademark by Wido den Hollander). If you
> want to use BGP on the hosts themselves have a look at this post by
> Vincent Bernat (great writeups of complex networking stuff) [1]. You can
> use "MC-LAG" on the host to get redundant connectivity, or use "Type 4"
> EVPN to get endpoint redundancy (Ethernet Segment Route). FRR 6.0 has
> support for most of this (not yet "Type 4" EVPN support IIRC) [2].
>
> We use a network namespace to seperate (IPv6) mangemant traffic
> from production traffic. This complicates Ceph deployment a lot, but in
> the end it's worth it.
>
> Gr. Stefan
>
> [1]: https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn
> [2]: https://frrouting.org/
>
>
> --
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados Gateway using S3 Api does not store file correctly

2019-03-18 Thread Dan Smith
Casey,

I am not sure if this is related, but I cannot seem to retrieve files that
are 524288001 bytes (5MB + 1 byte) or 629145601 bytes (600MB + 1 byte) when
using server side encryption. Without encryption, these files store and
retrieve without issue. I'm sure there are other various permutations of
lengths that would result in this issue as well. I still find it puzzling
that after ingesting 167 million files in this manner, I am discovering
this issue.

-Dan

On Mon, Mar 18, 2019 at 12:38 PM Dan Smith 
wrote:

> Hi Casey,
>
> Thanks for the quick response. I have just confirmed that if I set the
> PartSize to 500MB the file uploads correctly. I am hesitant to do this in
> production but I think we are on the right track. Interestingly enough,
> when I set the PartSize to 5242880 the file did not store correctly (it
> retrieved with a different hash, and not even the same different hash I saw
> before).
>
> Here are my details of my client. Please forgive me for not seeing how to
> attach this to the tracker item you mentioned in your email.
>
> I am using the AWSSDK.S3 nuget package version 3.3.31.24, with ceph
> version "ceph version 12.2.10-551-gbb089269ea
> (bb089269ea0c1272294c6b9777123ac81662b6d2) luminous (stable)"
>
> My relevant c# code is as follows:
>
> using (var client = new AmazonS3Client(AccessKey, SecretKey,
> GetS3Config()))
> using (var transferUtility = new
> Amazon.S3.Transfer.TransferUtility(client))
> {
> var transferRequest = new
> Amazon.S3.Transfer.TransferUtilityUploadRequest
> {
> BucketName = bucketName,
> Key = key,
> InputStream = stream,
> ServerSideEncryptionCustomerProvidedKey =
> EncryptionKey,
> ServerSideEncryptionCustomerMethod =
> string.IsNullOrWhiteSpace(EncryptionKey) ?
> ServerSideEncryptionCustomerMethod.None :
> ServerSideEncryptionCustomerMethod.AES256
> };
> transferUtility.Upload(transferRequest);
> }
>
> I am having hard time locating the default PartSize this library uses, but
> I think it is 5MB as the content-length header is a bit more than that
> size. Perhaps this sanitized header is useful:
>
> PUT
> [redacted]/delete-me?partNumber=18=2~2dt4pYGY3vfKxBb9FcbVlAnbz_z3HTV
> HTTP/1.1
> Expect: 100-continue
> x-amz-server-side-encryption-customer-algorithm: AES256
> x-amz-server-side-encryption-customer-key: [redacted]
> x-amz-server-side-encryption-customer-key-MD5: [redacted]
> User-Agent: aws-sdk-dotnet-coreclr/3.3.31.24 aws-sdk-dotnet-core/3.3.32.2
> .NET_Core/4.6.27317.07 OS/Microsoft_Windows_10.0.17763 ClientAsync
> TransferManager/MultipartUploadCommand
> Host: [redacted]
> X-Amz-Date: [redacted]
> X-Amz-Decoded-Content-Length: 5242880
> X-Amz-Content-SHA256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD
> Authorization: [redacted]
> Content-Length: 5248726
> Content-Type: text/plain
>
> 14000;chunk-signature=[redacted]
> [payload here]
>
> Thank you again for help on this matter. I'll point our vendor towards
> this bug as well.
>
> Cheers,
> Dan
>
> On Mon, Mar 18, 2019 at 12:05 PM Casey Bodley  wrote:
>
>> Hi Dan,
>>
>> We just got a similar report about SSE-C in
>> http://tracker.ceph.com/issues/38700 that seems to be related to
>> multipart uploads. Could you please add some details there about your s3
>> client, its multipart chunk size, and your ceph version?
>>
>> On 3/18/19 2:38 PM, Dan Smith wrote:
>> > Hello,
>> >
>> > I have stored more than 167 million files in ceph using the S3 api.
>> > Out of those 167 million+ files, one file is not storing correctly.
>> >
>> > The file is 92MB in size. I have stored files much larger and much
>> > smaller. If I store the file WITHOUT using the Customer Provided
>> > 256-bit AES key using Server Side encryption, the file stores and
>> > retrieves just fine (SHA256 hashes match).
>> >
>> > If I store the file USING the 256-bit AES key using Server Side
>> > encryption, the file stores without error, however, when I retrieve
>> > the file and compare the hash of the file I retrieve from ceph against
>> > the hash of the original file, the hashes differ.
>> >
>> > If I store the file using Amazon S3, using the same AES key and their
>> > server side encryption the file stores are retrieves using out issue
>> > (hashes match).
>> >
>> > I can reproduce this issue in two different ceph environments.
>> > Thankfully, the file I am storing is not confidential, so I can share
>> > it out to anyone interested in this
>> > issue.(
>> https://s3.amazonaws.com/aws-website-afewgoodmenrankedfantasyfootballcom-j5gvt/delete-me
>> )
>> >
>> > I have opened a ticket with our vendor for support, but I am hoping
>> > someone might be able to give me some ideas on what might be going on
>> > as well.
>> >
>> > Cheers,
>> > Dan
>> >
>> > ___
>> > ceph-users mailing list
>> > 

Re: [ceph-users] Rados Gateway using S3 Api does not store file correctly

2019-03-18 Thread Dan Smith
Hi Casey,

Thanks for the quick response. I have just confirmed that if I set the
PartSize to 500MB the file uploads correctly. I am hesitant to do this in
production but I think we are on the right track. Interestingly enough,
when I set the PartSize to 5242880 the file did not store correctly (it
retrieved with a different hash, and not even the same different hash I saw
before).

Here are my details of my client. Please forgive me for not seeing how to
attach this to the tracker item you mentioned in your email.

I am using the AWSSDK.S3 nuget package version 3.3.31.24, with ceph version
"ceph version 12.2.10-551-gbb089269ea
(bb089269ea0c1272294c6b9777123ac81662b6d2) luminous (stable)"

My relevant c# code is as follows:

using (var client = new AmazonS3Client(AccessKey, SecretKey,
GetS3Config()))
using (var transferUtility = new
Amazon.S3.Transfer.TransferUtility(client))
{
var transferRequest = new
Amazon.S3.Transfer.TransferUtilityUploadRequest
{
BucketName = bucketName,
Key = key,
InputStream = stream,
ServerSideEncryptionCustomerProvidedKey = EncryptionKey,
ServerSideEncryptionCustomerMethod =
string.IsNullOrWhiteSpace(EncryptionKey) ?
ServerSideEncryptionCustomerMethod.None :
ServerSideEncryptionCustomerMethod.AES256
};
transferUtility.Upload(transferRequest);
}

I am having hard time locating the default PartSize this library uses, but
I think it is 5MB as the content-length header is a bit more than that
size. Perhaps this sanitized header is useful:

PUT
[redacted]/delete-me?partNumber=18=2~2dt4pYGY3vfKxBb9FcbVlAnbz_z3HTV
HTTP/1.1
Expect: 100-continue
x-amz-server-side-encryption-customer-algorithm: AES256
x-amz-server-side-encryption-customer-key: [redacted]
x-amz-server-side-encryption-customer-key-MD5: [redacted]
User-Agent: aws-sdk-dotnet-coreclr/3.3.31.24 aws-sdk-dotnet-core/3.3.32.2
.NET_Core/4.6.27317.07 OS/Microsoft_Windows_10.0.17763 ClientAsync
TransferManager/MultipartUploadCommand
Host: [redacted]
X-Amz-Date: [redacted]
X-Amz-Decoded-Content-Length: 5242880
X-Amz-Content-SHA256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD
Authorization: [redacted]
Content-Length: 5248726
Content-Type: text/plain

14000;chunk-signature=[redacted]
[payload here]

Thank you again for help on this matter. I'll point our vendor towards this
bug as well.

Cheers,
Dan

On Mon, Mar 18, 2019 at 12:05 PM Casey Bodley  wrote:

> Hi Dan,
>
> We just got a similar report about SSE-C in
> http://tracker.ceph.com/issues/38700 that seems to be related to
> multipart uploads. Could you please add some details there about your s3
> client, its multipart chunk size, and your ceph version?
>
> On 3/18/19 2:38 PM, Dan Smith wrote:
> > Hello,
> >
> > I have stored more than 167 million files in ceph using the S3 api.
> > Out of those 167 million+ files, one file is not storing correctly.
> >
> > The file is 92MB in size. I have stored files much larger and much
> > smaller. If I store the file WITHOUT using the Customer Provided
> > 256-bit AES key using Server Side encryption, the file stores and
> > retrieves just fine (SHA256 hashes match).
> >
> > If I store the file USING the 256-bit AES key using Server Side
> > encryption, the file stores without error, however, when I retrieve
> > the file and compare the hash of the file I retrieve from ceph against
> > the hash of the original file, the hashes differ.
> >
> > If I store the file using Amazon S3, using the same AES key and their
> > server side encryption the file stores are retrieves using out issue
> > (hashes match).
> >
> > I can reproduce this issue in two different ceph environments.
> > Thankfully, the file I am storing is not confidential, so I can share
> > it out to anyone interested in this
> > issue.(
> https://s3.amazonaws.com/aws-website-afewgoodmenrankedfantasyfootballcom-j5gvt/delete-me
> )
> >
> > I have opened a ticket with our vendor for support, but I am hoping
> > someone might be able to give me some ideas on what might be going on
> > as well.
> >
> > Cheers,
> > Dan
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados Gateway using S3 Api does not store file correctly

2019-03-18 Thread Casey Bodley

Hi Dan,

We just got a similar report about SSE-C in 
http://tracker.ceph.com/issues/38700 that seems to be related to 
multipart uploads. Could you please add some details there about your s3 
client, its multipart chunk size, and your ceph version?


On 3/18/19 2:38 PM, Dan Smith wrote:

Hello,

I have stored more than 167 million files in ceph using the S3 api. 
Out of those 167 million+ files, one file is not storing correctly.


The file is 92MB in size. I have stored files much larger and much 
smaller. If I store the file WITHOUT using the Customer Provided 
256-bit AES key using Server Side encryption, the file stores and 
retrieves just fine (SHA256 hashes match).


If I store the file USING the 256-bit AES key using Server Side 
encryption, the file stores without error, however, when I retrieve 
the file and compare the hash of the file I retrieve from ceph against 
the hash of the original file, the hashes differ.


If I store the file using Amazon S3, using the same AES key and their 
server side encryption the file stores are retrieves using out issue 
(hashes match).


I can reproduce this issue in two different ceph environments. 
Thankfully, the file I am storing is not confidential, so I can share 
it out to anyone interested in this 
issue.(https://s3.amazonaws.com/aws-website-afewgoodmenrankedfantasyfootballcom-j5gvt/delete-me)


I have opened a ticket with our vendor for support, but I am hoping 
someone might be able to give me some ideas on what might be going on 
as well.


Cheers,
Dan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rados Gateway using S3 Api does not store file correctly

2019-03-18 Thread Dan Smith
Hello,

I have stored more than 167 million files in ceph using the S3 api. Out of
those 167 million+ files, one file is not storing correctly.

The file is 92MB in size. I have stored files much larger and much smaller.
If I store the file WITHOUT using the Customer Provided 256-bit AES key
using Server Side encryption, the file stores and retrieves just fine
(SHA256 hashes match).

If I store the file USING the 256-bit AES key using Server Side encryption,
the file stores without error, however, when I retrieve the file and
compare the hash of the file I retrieve from ceph against the hash of the
original file, the hashes differ.

If I store the file using Amazon S3, using the same AES key and their
server side encryption the file stores are retrieves using out issue
(hashes match).

I can reproduce this issue in two different ceph environments. Thankfully,
the file I am storing is not confidential, so I can share it out to anyone
interested in this issue.(
https://s3.amazonaws.com/aws-website-afewgoodmenrankedfantasyfootballcom-j5gvt/delete-me
)

I have opened a ticket with our vendor for support, but I am hoping someone
might be able to give me some ideas on what might be going on as well.

Cheers,
Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-target-api service fails to start with address family not supported

2019-03-18 Thread Wesley Dillingham
This worked perfectly, thanks.

From: Jason Dillaman 
Sent: Monday, March 18, 2019 9:19 AM
To: Wesley Dillingham
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] rbd-target-api service fails to start with address 
family not supported

Looks like you have the IPv6 stack disabled. You will need to override
the bind address from "[::]" to "0.0.0.0" via the "api_host" setting
[1] in "/etc/ceph/iscsi-gateway.cfg"

[1] 
https://github.com/ceph/ceph-iscsi/blob/master/ceph_iscsi_config/settings.py#L100

On Mon, Mar 18, 2019 at 11:09 AM Wesley Dillingham
 wrote:
>
> I am having some difficulties getting the iscsi gateway and api setup. 
> Working with a 12.2.8 cluster. And the gateways are Centos 7.6.1810 kernel 
> 3.10.0-957.5.1.el7.x86_64
>
> Using a previous version of ceph iscsi packages:
> ceph-iscsi-config-2.6-2.6.el7.noarch
> ceph-iscsi-tools-2.1-2.1.el7.noarch
> ceph-iscsi-cli-2.7-2.7.el7.noarch
>
> I encountered the error when attemping to create my rbd device:
>
> /disks> create rbd image=iscsi_poc_1 size=500g max_data_area_mb=8
> Failed : 500 INTERNAL SERVER ERROR
>
>
> This created the RBD but when attempting to list the iscsi disks in gwcli 
> nothing is displayed/registered. I consulted the mailing list etc and the 
> first suggestion was to ensure tcmu runner is running and it is. Also I am 
> running the latest tcmu-runner and libtcmu as acquired from shaman.
>
> Some further mailing reading suggested I run a more recent version of the 
> ceph iscsi packages and so I upgraded to the latest bulds from shaman:
> ceph-iscsi-config-2.6-80.g24deeb2.el7.noarch
> ceph-iscsi-cli-2.7-105.g4802654.el7.noarch
>
> This gives me the following errors from journalctl:
>
> ]:  * Running on http://[::]:5000/
> ]: Traceback (most recent call last):
> ]: File "/usr/bin/rbd-target-api", line 2053, in 
> ]: main()
> ]: File "/usr/bin/rbd-target-api", line 2002, in main
> ]: ssl_context=context)
> ]: File "/usr/lib/python2.
> ]: run_simple(host, port, self, **options)
> ]: File "/usr/lib/python2.
> ]: inner()
> ]: File "/usr/lib/python2.
> ]: passthrough_errors, ssl_context).serve_forever()
> ]: File "/usr/lib/python2.
> ]: passthrough_errors, ssl_context)
> ]: File "/usr/lib/python2.
> ]: HTTPServer.__init__(self, (host, int(port)), handler)
> ]: File "/usr/lib64/python2.
> ]: self.socket_type)
> ]: File "/usr/lib64/python2.
> ]: _sock = _realsocket(family, type, proto)
> socket.error: [Errno 97] Address family not supported by protocol
>
> my iscsi-gateway.conf is as follows:
>
> [config]
> cluster_name = ceph
> gateway_keyring = ceph.client.admin.keyring
> api_secure = false
> trusted_ip_list = 
> prometheus_exporter = false
> debug = true
>
> Firewalls are disabled and both nodes can speak to eachother on 5000
> the gw service is able to be started independently however the api service is 
> busted.
>
> I am hoping someone can advise on how to get past my most recent: 
> "socket.error: [Errno 97] Address family not supported by protocol" and 
> generally recommend the most stable current iteration of rbd iscsi to be used 
> with a luminous or later cluster.
>
> Thank you.
>
>
> Respectfully,
>
> Wes Dillingham
> wdilling...@godaddy.com
> Site Reliability Engineer IV - Platform Storage / Ceph
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebuild after upgrade

2019-03-18 Thread Brent Kennedy
Thanks for taking the time to answer!  It’s a really old cluster, so that
does make sense, thanks for confirming!

-Brent

-Original Message-
From: Hector Martin  
Sent: Monday, March 18, 2019 1:07 AM
To: Brent Kennedy ; 'Ceph Users'

Subject: Re: [ceph-users] Rebuild after upgrade

On 18/03/2019 13:24, Brent Kennedy wrote:
> I finally received approval to upgrade our old firefly(0.8.7) cluster 
> to Luminous.  I started the upgrade, upgrading to hammer(0.94.10), 
> then jewel(10.2.11), but after jewel, I ran the “ceph osd crush 
> tunables optimal” command, then “ceph –s” command showed 60% of the 
> objects were misplaced.  Now the cluster is just churning while it 
> does the recovery for that.
> 
> Is this something that happens when upgrading from firefly up?  I had 
> done a hammer upgrade to Jewel before, no rebalance occurred after 
> issuing that command.

Any time you change the CRUSH tunables, you can expect data movement. 
The exact impact can vary from nothing (if no changes were made or the
changes don't impact your actual pools/CRUSH rules) to a lot of data
movement. This is documented here:

http://docs.ceph.com/docs/master/rados/operations/crush-map/

In particular, you turned on CRUSH_TUNALBLES5, which causes a large amount
of data movement:
http://docs.ceph.com/docs/master/rados/operations/crush-map/#jewel-crush-tun
ables5

Going from Firefly to Hammer has a much smaller impact (see the CRUSH_V4
section).

--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] nautilus: dashboard configuration issue

2019-03-18 Thread Daniele Riccucci

Hello,
thank you for the fix, I'll be on the lookout for the next ceph/daemon 
container image.


Regards,
Daniele

On 18/03/19 12:59, Volker Theile wrote:

Hello Daniele,

your problem is tracker by https://tracker.ceph.com/issues/38528 and
fixed in the latest Ceph 14 builds. To workaround the problem simply
disable SSL for your specific manager.

$ ceph config set mgr mgr/dashboard//ssl false

See https://tracker.ceph.com/issues/38528#note-1.


Regards
Volker


Am 17.03.19 um 03:11 schrieb Daniele Riccucci:

Hello,
I have a small cluster deployed with ceph-ansible on containers.
Recently, without realizing that ceph_docker_image_tag was set to
latest by default, the cluster got upgraded to nautilus and I was
unable to roll back.
Everything seems to be running smoothly except for the dashboard.

$ ceph status

   cluster:
     id: d5c50302-0d8e-47cb-ab86-c15842372900
     health: HEALTH_ERR
     Module 'dashboard' has failed: IOError("Port 8443 not free
on '::'",)
[...]

I already have a service running on port 8443 and I don't need SSL so
I ran:

 ceph config set mgr mgr/dashboard/ssl false

and

 ceph config set mgr mgr/dashboard/server_port 

according to the docs, to change this behavior.
Running `ceph config dump` returns the following:

WHO MASK LEVEL    OPTION VALUE    RO
mgr  advanced mgr/dashboard/server_port  8088 *
mgr  advanced mgr/dashboard/ssl  false    *

By dumping the configuration I found that 2 keys were present:

$ ceph config-key dump
{
     [...]
     "config/mgr/mgr/dashboard/server_port": "8088",
     "config/mgr/mgr/dashboard/ssl": "false",
     "config/mgr/mgr/dashboard/username": "devster",
     "mgr/dashboard/key": "",
     "mgr/dashboard/crt": ""
}

which likely are the SSL certificates. I deleted them, disabled the
module and re-enabled it, however the following happened:

$ ceph status
   cluster:
     id: d5c50302-0d8e-47cb-ab86-c15842372900
     health: HEALTH_OK
[...]

$ ceph mgr module ls | jq .enabled_modules
[
   "dashboard",
   "iostat",
   "prometheus",
   "restful"
]

$ ceph mgr services
{
   "prometheus": "http://localhost:9283/;
}

Dashboard seems enabled but unavailable.
What am I missing?
Thank you.

Daniele

P.S. what is the difference between `ceph config` and `ceph config-key`?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Support for buster with nautilus?

2019-03-18 Thread Paul Emmerich
We'll provide Buster packages:

curl https://mirror.croit.io/keys/release.asc | apt-key add -
echo 'deb https://mirror.croit.io/debian-nautilus/ stretch main' >>
/etc/apt/sources.list.d/croit-ceph.list

The mirror currently contains the latest 14.1.1 release candidate.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Mar 18, 2019 at 4:33 PM Lars Täuber  wrote:
>
> Hi there!
>
> I just started to install a ceph cluster.
> I'd like to take the nautilus release.
> Because of hardware restrictions (network driver modules) I had to take the 
> buster release of Debian.
>
> Will there be buster packages of nautilus available after the release?
>
> Thanks for this great storage!
>
> Cheers,
> Lars
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Nautilus for Ubuntu Cosmic?

2019-03-18 Thread John Hearns
Thankyou Marc.
I cloned the GitHub repo and am building the packages. No biggie really and
hey, I do like living on the edge.

On Mon, 18 Mar 2019 at 16:04, Marc Roos  wrote:

>
>
> If you want the excitement, can I then wish you my possible future ceph
> cluster problems, so I won't have them ;)
>
>
>
>
> -Original Message-
> From: John Hearns
> Sent: 18 March 2019 17:00
> To: ceph-users
> Subject: [ceph-users] Ceph Nautilus for Ubuntu Cosmic?
>
> May I ask if there is a repository for the latest Ceph Nautilus for
> Ubuntu?
> Specifically Ubuntu 18.10 Cosmic Cuttlefish.
>
> Perhaps I am payig a penalty for living on the bleeding edge. But one
> does have to have some excitement in life.
>
> Thanks
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Nautilus for Ubuntu Cosmic?

2019-03-18 Thread Marc Roos
 

If you want the excitement, can I then wish you my possible future ceph 
cluster problems, so I won't have them ;)




-Original Message-
From: John Hearns 
Sent: 18 March 2019 17:00
To: ceph-users
Subject: [ceph-users] Ceph Nautilus for Ubuntu Cosmic?

May I ask if there is a repository for the latest Ceph Nautilus for 
Ubuntu?
Specifically Ubuntu 18.10 Cosmic Cuttlefish.

Perhaps I am payig a penalty for living on the bleeding edge. But one 
does have to have some excitement in life.

Thanks


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Nautilus for Ubuntu Cosmic?

2019-03-18 Thread John Hearns
May I ask if there is a repository for the latest Ceph Nautilus for Ubuntu?
Specifically Ubuntu 18.10 Cosmic Cuttlefish.

Perhaps I am payig a penalty for living on the bleeding edge. But one does
have to have some excitement in life.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd pg-upmap-items not working

2019-03-18 Thread Dan van der Ster
The balancer optimizes # PGs / crush weight. That host looks already
quite balanced for that metric.

If the balancing is not optimal for a specific pool that has most of
the data, then you can use the `optimize myplan ` param.

-- dan


On Mon, Mar 18, 2019 at 4:39 PM Kári Bertilsson  wrote:
>
> Because i have tested failing the mgr & rebooting all the servers in random 
> order multiple times. The upmap optimizer did never find more optimizations 
> to do after the initial optimizations. I tried leaving the balancer ON for 
> days and also OFF and running manually several times.
>
> i did manually move just a few pg's from the fullest disks to the lowest 
> disks and free space increased by 7% so it was clearly not perfectly 
> distributed.
>
> I have now replaced 10 disks with larger ones, and after finshing syncing & 
> then running the upmap balancer i am having similar results. The upmap 
> optimizer did a few optimizations, but now says "Error EALREADY: Unable to 
> find further optimization,or distribution is already perfect".
>
> Looking at a snippet from "ceph osd df tree"... you can see it's not quite 
> perfect. I am wondering if this could be because of the size difference 
> between OSD's ? as i am running disks ranging from 1-10TB in the same host.
>
> 17   hdd   1.09200  1.0 1.09TiB  741GiB  377GiB 66.27 1.09  13 
> osd.17
> 18   hdd   1.09200  1.0 1.09TiB  747GiB  370GiB 66.86 1.10  13 
> osd.18
> 19   hdd   1.09200  1.0 1.09TiB  572GiB  546GiB 51.20 0.84  10 
> osd.19
> 23   hdd   2.72899  1.0 2.73TiB 1.70TiB 1.03TiB 62.21 1.02  31 
> osd.23
> 29   hdd   1.09200  1.0 1.09TiB  627GiB  491GiB 56.11 0.92  11 
> osd.29
> 30   hdd   1.09200  1.0 1.09TiB  574GiB  544GiB 51.34 0.84  10 
> osd.30
> 32   hdd   2.72899  1.0 2.73TiB 1.73TiB 1023GiB 63.41 1.04  31 
> osd.32
> 43   hdd   2.72899  1.0 2.73TiB 1.57TiB 1.16TiB 57.37 0.94  28 
> osd.43
> 45   hdd   2.72899  1.0 2.73TiB 1.68TiB 1.05TiB 61.51 1.01  30 
> osd.45
>
>
> Config keys are as follows
> "mgr/balancer/max_misplaced" = 1
> "mgr/balancer/upmax_max_deviation" = 0.0001
> "mgr/balancer/upmax_max_iterations" = 1000
>
> Any ideas what could cause this ? Any info i can give to help diagnose ?
>
> On Fri, Mar 15, 2019 at 3:48 PM David Turner  wrote:
>>
>> Why do you think that it can't resolve this by itself?  You just said that 
>> the balancer was able to provide an optimization, but then that the 
>> distribution isn't perfect.  When there are no further optimizations, 
>> running `ceph balancer optimize plan` won't create a plan with any changes.  
>> Possibly the active mgr needs a kick.  When my cluster isn't balancing when 
>> it's supposed to, I just run `ceph mgr fail {active mgr}` and within a 
>> minute or so the cluster is moving PGs around.
>>
>> On Sat, Mar 9, 2019 at 8:05 PM Kári Bertilsson  wrote:
>>>
>>> Thanks
>>>
>>> I did apply https://github.com/ceph/ceph/pull/26179.
>>>
>>> Running manual upmap commands work now. I did run "ceph balancer optimize 
>>> new"and It did add a few upmaps.
>>>
>>> But now another issue. Distribution is far from perfect but the balancer 
>>> can't find further optimization.
>>> Specifically OSD 23 is getting way more pg's than the other 3tb OSD's.
>>>
>>> See https://pastebin.com/f5g5Deak
>>>
>>> On Fri, Mar 1, 2019 at 10:25 AM  wrote:

 > Backports should be available in v12.2.11.

 s/v12.2.11/ v12.2.12/

 Sorry for the typo.




 原始邮件
 发件人:谢型果10072465
 收件人:d...@vanderster.com ;
 抄送人:ceph-users@lists.ceph.com ;
 日 期 :2019年03月01日 17:09
 主 题 :Re: [ceph-users] ceph osd pg-upmap-items not working
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 See https://github.com/ceph/ceph/pull/26179

 Backports should be available in v12.2.11.

 Or you can manually do it by simply adopting 
 https://github.com/ceph/ceph/pull/26127   if you are eager to get out of 
 the trap right now.







 发件人:DanvanderSter 
 收件人:Kári Bertilsson ;
 抄送人:ceph-users ;谢型果10072465;
 日 期 :2019年03月01日 14:48
 主 题 :Re: [ceph-users] ceph osd pg-upmap-items not working
 It looks like that somewhat unusual crush rule is confusing the new
 upmap cleaning.
 (debug_mon 10 on the active mon should show those cleanups).

 I'm copying Xie Xingguo, and probably you should create a tracker for this.

 -- dan




 On Fri, Mar 1, 2019 at 3:12 AM Kári Bertilsson  
 wrote:
 >
 > This is the pool
 > pool 41 'ec82_pool' erasure size 10 min_size 8 crush_rule 1 object_hash 
 > rjenkins pg_num 512 pgp_num 512 last_change 63794 lfor 21731/21731 flags 
 > hashpspool,ec_overwrites stripe_width 32768 application 

Re: [ceph-users] ceph osd pg-upmap-items not working

2019-03-18 Thread Kári Bertilsson
Because i have tested failing the mgr & rebooting all the servers in random
order multiple times. The upmap optimizer did never find more optimizations
to do after the initial optimizations. I tried leaving the balancer ON for
days and also OFF and running manually several times.

i did manually move just a few pg's from the fullest disks to the lowest
disks and free space increased by 7% so it was clearly not perfectly
distributed.

I have now replaced 10 disks with larger ones, and after finshing syncing &
then running the upmap balancer i am having similar results. The upmap
optimizer did a few optimizations, but now says "Error EALREADY: Unable to
find further optimization,or distribution is already perfect".

Looking at a snippet from "ceph osd df tree"... you can see it's not quite
perfect. I am wondering if this could be because of the size difference
between OSD's ? as i am running disks ranging from 1-10TB in the same host.

17   hdd   1.09200  1.0 1.09TiB  741GiB  377GiB 66.27 1.09  13
osd.17
18   hdd   1.09200  1.0 1.09TiB  747GiB  370GiB 66.86 1.10  13
osd.18
19   hdd   1.09200  1.0 1.09TiB  572GiB  546GiB 51.20 0.84  10
osd.19
23   hdd   2.72899  1.0 2.73TiB 1.70TiB 1.03TiB 62.21 1.02  31
osd.23
29   hdd   1.09200  1.0 1.09TiB  627GiB  491GiB 56.11 0.92  11
osd.29
30   hdd   1.09200  1.0 1.09TiB  574GiB  544GiB 51.34 0.84  10
osd.30
32   hdd   2.72899  1.0 2.73TiB 1.73TiB 1023GiB 63.41 1.04  31
osd.32
43   hdd   2.72899  1.0 2.73TiB 1.57TiB 1.16TiB 57.37 0.94  28
osd.43
45   hdd   2.72899  1.0 2.73TiB 1.68TiB 1.05TiB 61.51 1.01  30
osd.45


Config keys are as follows
"mgr/balancer/max_misplaced" = 1
"mgr/balancer/upmax_max_deviation" = 0.0001
"mgr/balancer/upmax_max_iterations" = 1000

Any ideas what could cause this ? Any info i can give to help diagnose ?

On Fri, Mar 15, 2019 at 3:48 PM David Turner  wrote:

> Why do you think that it can't resolve this by itself?  You just said that
> the balancer was able to provide an optimization, but then that the
> distribution isn't perfect.  When there are no further optimizations,
> running `ceph balancer optimize plan` won't create a plan with any
> changes.  Possibly the active mgr needs a kick.  When my cluster isn't
> balancing when it's supposed to, I just run `ceph mgr fail {active mgr}`
> and within a minute or so the cluster is moving PGs around.
>
> On Sat, Mar 9, 2019 at 8:05 PM Kári Bertilsson 
> wrote:
>
>> Thanks
>>
>> I did apply https://github.com/ceph/ceph/pull/26179.
>>
>> Running manual upmap commands work now. I did run "ceph balancer
>> optimize new"and It did add a few upmaps.
>>
>> But now another issue. Distribution is far from perfect but the balancer
>> can't find further optimization.
>> Specifically OSD 23 is getting way more pg's than the other 3tb OSD's.
>>
>> See https://pastebin.com/f5g5Deak
>>
>> On Fri, Mar 1, 2019 at 10:25 AM  wrote:
>>
>>> > Backports should be available in v12.2.11.
>>>
>>> s/v12.2.11/ v12.2.12/
>>>
>>> Sorry for the typo.
>>>
>>>
>>>
>>>
>>> 原始邮件
>>> *发件人:*谢型果10072465
>>> *收件人:*d...@vanderster.com ;
>>> *抄送人:*ceph-users@lists.ceph.com ;
>>> *日 期 :*2019年03月01日 17:09
>>> *主 题 :**Re: [ceph-users] ceph osd pg-upmap-items not working*
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> See  
>>> 
>>> 
>>> 
>>> https://github.com/ceph/ceph/pull/26179
>>>
>>> Backports should be available in v12.2.11.
>>>
>>> Or you can manually do it by simply adopting
>>> 
>>> 
>>> 
>>> 
>>> 
>>> https://github.com/ceph/ceph/pull/26127   if you are eager to get out
>>> of the trap right now.
>>>
>>> 
>>> 
>>> 
>>>
>>> 
>>> 
>>>
>>>
>>>
>>>
>>>
>>>
>>> *发件人:*DanvanderSter 
>>> *收件人:*Kári Bertilsson ;
>>> *抄送人:*ceph-users ;谢型果10072465;
>>> *日 期 :*2019年03月01日 14:48
>>> *主 题 :**Re: [ceph-users] ceph osd pg-upmap-items not working*
>>> It looks like that somewhat unusual crush rule is confusing the new
>>> upmap cleaning.
>>> (debug_mon 10 on the active mon should show those cleanups).
>>>
>>>
>>> I'm copying Xie Xingguo, and probably you should create a tracker for this.
>>>
>>> -- dan
>>>
>>>
>>>
>>>
>>> On Fri, Mar 1, 2019 at 3:12 AM Kári Bertilsson >> > wrote:
>>> >
>>> > This is the pool
>>>
>>> > pool 41 'ec82_pool' erasure 

[ceph-users] Support for buster with nautilus?

2019-03-18 Thread Lars Täuber
Hi there!

I just started to install a ceph cluster.
I'd like to take the nautilus release.
Because of hardware restrictions (network driver modules) I had to take the 
buster release of Debian.

Will there be buster packages of nautilus available after the release?

Thanks for this great storage!

Cheers,
Lars
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd-target-api service fails to start with address family not supported

2019-03-18 Thread Jason Dillaman
Looks like you have the IPv6 stack disabled. You will need to override
the bind address from "[::]" to "0.0.0.0" via the "api_host" setting
[1] in "/etc/ceph/iscsi-gateway.cfg"

[1] 
https://github.com/ceph/ceph-iscsi/blob/master/ceph_iscsi_config/settings.py#L100

On Mon, Mar 18, 2019 at 11:09 AM Wesley Dillingham
 wrote:
>
> I am having some difficulties getting the iscsi gateway and api setup. 
> Working with a 12.2.8 cluster. And the gateways are Centos 7.6.1810 kernel 
> 3.10.0-957.5.1.el7.x86_64
>
> Using a previous version of ceph iscsi packages:
> ceph-iscsi-config-2.6-2.6.el7.noarch
> ceph-iscsi-tools-2.1-2.1.el7.noarch
> ceph-iscsi-cli-2.7-2.7.el7.noarch
>
> I encountered the error when attemping to create my rbd device:
>
> /disks> create rbd image=iscsi_poc_1 size=500g max_data_area_mb=8
> Failed : 500 INTERNAL SERVER ERROR
>
>
> This created the RBD but when attempting to list the iscsi disks in gwcli 
> nothing is displayed/registered. I consulted the mailing list etc and the 
> first suggestion was to ensure tcmu runner is running and it is. Also I am 
> running the latest tcmu-runner and libtcmu as acquired from shaman.
>
> Some further mailing reading suggested I run a more recent version of the 
> ceph iscsi packages and so I upgraded to the latest bulds from shaman:
> ceph-iscsi-config-2.6-80.g24deeb2.el7.noarch
> ceph-iscsi-cli-2.7-105.g4802654.el7.noarch
>
> This gives me the following errors from journalctl:
>
> ]:  * Running on http://[::]:5000/
> ]: Traceback (most recent call last):
> ]: File "/usr/bin/rbd-target-api", line 2053, in 
> ]: main()
> ]: File "/usr/bin/rbd-target-api", line 2002, in main
> ]: ssl_context=context)
> ]: File "/usr/lib/python2.
> ]: run_simple(host, port, self, **options)
> ]: File "/usr/lib/python2.
> ]: inner()
> ]: File "/usr/lib/python2.
> ]: passthrough_errors, ssl_context).serve_forever()
> ]: File "/usr/lib/python2.
> ]: passthrough_errors, ssl_context)
> ]: File "/usr/lib/python2.
> ]: HTTPServer.__init__(self, (host, int(port)), handler)
> ]: File "/usr/lib64/python2.
> ]: self.socket_type)
> ]: File "/usr/lib64/python2.
> ]: _sock = _realsocket(family, type, proto)
> socket.error: [Errno 97] Address family not supported by protocol
>
> my iscsi-gateway.conf is as follows:
>
> [config]
> cluster_name = ceph
> gateway_keyring = ceph.client.admin.keyring
> api_secure = false
> trusted_ip_list = 
> prometheus_exporter = false
> debug = true
>
> Firewalls are disabled and both nodes can speak to eachother on 5000
> the gw service is able to be started independently however the api service is 
> busted.
>
> I am hoping someone can advise on how to get past my most recent: 
> "socket.error: [Errno 97] Address family not supported by protocol" and 
> generally recommend the most stable current iteration of rbd iscsi to be used 
> with a luminous or later cluster.
>
> Thank you.
>
>
> Respectfully,
>
> Wes Dillingham
> wdilling...@godaddy.com
> Site Reliability Engineer IV - Platform Storage / Ceph
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd-target-api service fails to start with address family not supported

2019-03-18 Thread Wesley Dillingham
I am having some difficulties getting the iscsi gateway and api setup. Working 
with a 12.2.8 cluster. And the gateways are Centos 7.6.1810 kernel 
3.10.0-957.5.1.el7.x86_64

Using a previous version of ceph iscsi packages:
ceph-iscsi-config-2.6-2.6.el7.noarch
ceph-iscsi-tools-2.1-2.1.el7.noarch
ceph-iscsi-cli-2.7-2.7.el7.noarch

I encountered the error when attemping to create my rbd device:


/disks> create rbd image=iscsi_poc_1 size=500g max_data_area_mb=8
Failed : 500 INTERNAL SERVER ERROR

This created the RBD but when attempting to list the iscsi disks in gwcli 
nothing is displayed/registered. I consulted the mailing list etc and the first 
suggestion was to ensure tcmu runner is running and it is. Also I am running 
the latest tcmu-runner and libtcmu as acquired from shaman.

Some further mailing reading suggested I run a more recent version of the ceph 
iscsi packages and so I upgraded to the latest bulds from shaman:
ceph-iscsi-config-2.6-80.g24deeb2.el7.noarch
ceph-iscsi-cli-2.7-105.g4802654.el7.noarch

This gives me the following errors from journalctl:

]:  * Running on http://[::]:5000/
]: Traceback (most recent call last):
]: File "/usr/bin/rbd-target-api", line 2053, in 
]: main()
]: File "/usr/bin/rbd-target-api", line 2002, in main
]: ssl_context=context)
]: File "/usr/lib/python2.
]: run_simple(host, port, self, **options)
]: File "/usr/lib/python2.
]: inner()
]: File "/usr/lib/python2.
]: passthrough_errors, ssl_context).serve_forever()
]: File "/usr/lib/python2.
]: passthrough_errors, ssl_context)
]: File "/usr/lib/python2.
]: HTTPServer.__init__(self, (host, int(port)), handler)
]: File "/usr/lib64/python2.
]: self.socket_type)
]: File "/usr/lib64/python2.
]: _sock = _realsocket(family, type, proto)
socket.error: [Errno 97] Address family not supported by protocol

my iscsi-gateway.conf is as follows:

[config]
cluster_name = ceph
gateway_keyring = ceph.client.admin.keyring
api_secure = false
trusted_ip_list = 
prometheus_exporter = false
debug = true

Firewalls are disabled and both nodes can speak to eachother on 5000
the gw service is able to be started independently however the api service is 
busted.

I am hoping someone can advise on how to get past my most recent: 
"socket.error: [Errno 97] Address family not supported by protocol" and 
generally recommend the most stable current iteration of rbd iscsi to be used 
with a luminous or later cluster.

Thank you.


Respectfully,

Wes Dillingham
wdilling...@godaddy.com
Site Reliability Engineer IV - Platform Storage / Ceph

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Dylan McCulloch
>> >please run following command. It will show where is 4.
>> >
>> >rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent
>> >ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
>> >
>>
>> $ ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
>> {
>> "ino": 4,
>> "ancestors": [
>> {
>> "dirino": 1,
>> "dname": "lost+found",
>> "version": 1
>> }
>> ],
>> "pool": 20,
>> "old_pools": []
>> }
>>
>> I guess it may have a very large number of files from previous recovery 
>> operations?
>>
>
>Yes, these files are created by cephfs-data-scan. If you don't want
>them, you can delete "lost+found"

That explains it. Many thanks for your help.

>> >On Mon, Mar 18, 2019 at 8:15 PM Dylan McCulloch  wrote:
>> >>
>> >> >> >> >cephfs does not create/use object "4.".  Please show us 
>> >> >> >> >some
>> >> >> >> >of its keys.
>> >> >> >> >
>> >> >> >>
>> >> >> >> https://pastebin.com/WLfLTgni
>> >> >> >> Thanks
>> >> >> >>
>> >> >> > Is the object recently modified?
>> >> >> >
>> >> >> >rados -p hpcfs_metadata stat 4.
>> >> >> >
>> >> >>
>> >> >> $ rados -p hpcfs_metadata stat 4.
>> >> >> hpcfs_metadata/4. mtime 2018-09-17 08:11:50.00, size 0
>> >> >>
>> >> >please check if 4. has omap header and xattrs
>> >> >
>> >> >rados -p hpcfs_data listxattr 4.
>> >> >
>> >> >rados -p hpcfs_data getomapheader 4.
>> >> >
>> >>
>> >> Not sure if that was a typo^^ and you would like the above commands run 
>> >> on the 4. object in the metadata pool.
>> >> Ran commands on both
>> >>
>> >> $ rados -p hpcfs_data listxattr 4.
>> >> error getting xattr set hpcfs_data/4.: (2) No such file or 
>> >> directory
>> >> $ rados -p hpcfs_data getomapheader 4.
>> >> error getting omap header hpcfs_data/4.: (2) No such file or 
>> >> directory
>> >>
>> >> $ rados -p hpcfs_metadata listxattr 4.
>> >> layout
>> >> parent
>> >> $ rados -p hpcfs_metadata getomapheader 4.
>> >> header (274 bytes) :
>> >>   04 03 0c 01 00 00 01 00  00 00 00 00 00 00 00 00  
>> >> ||
>> >> 0010  00 00 00 00 00 00 03 02  28 00 00 00 00 00 00 00  
>> >> |(...|
>> >> 0020  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  
>> >> ||
>> >> 0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> >> ||
>> >> 0040  00 00 00 00 03 02 28 00  00 00 00 00 00 00 00 00  
>> >> |..(.|
>> >> 0050  00 00 00 00 00 00 00 00  00 00 01 00 00 00 00 00  
>> >> ||
>> >> 0060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> >> ||
>> >> 0070  00 00 03 02 38 00 00 00  00 00 00 00 00 00 00 00  
>> >> |8...|
>> >> 0080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> >> ||
>> >> *
>> >> 00b0  03 02 38 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> >> |..8.|
>> >> 00c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> >> ||
>> >> *
>> >> 00e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 01 00  
>> >> ||
>> >> 00f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> >> ||
>> >> *
>> >> 0110  00 00 |..|
>> >> 0112
>> >>
>> >> $ rados -p hpcfs_metadata getxattr 4. layout
>> >> 
>> >> $ rados -p hpcfs_metadata getxattr 4. parent
>> >> <
>> >> lost+found
>> >>
>> >> >> >> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch 
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi all,
>> >> >> >> >>
>> >> >> >> >> We have a large omap object warning on one of our Ceph clusters.
>> >> >> >> >> The only reports I've seen regarding the "large omap objects" 
>> >> >> >> >> warning from other users were related to RGW bucket sharding, 
>> >> >> >> >> however we do not have RGW configured on this cluster.
>> >> >> >> >> The large omap object ~10GB resides in a CephFS metadata pool.
>> >> >> >> >>
>> >> >> >> >> It's perhaps worth mentioning that we had to perform disaster 
>> >> >> >> >> recovery steps [1] on this cluster last year after a network 
>> >> >> >> >> issue, so we're not sure whether this large omap object is a 
>> >> >> >> >> result of those previous recovery processes or whether it's 
>> >> >> >> >> completely unrelated.
>> >> >> >> >>
>> >> >> >> >> Ceph version: 12.2.8
>> >> >> >> >> osd_objectstore: Bluestore
>> >> >> >> >> RHEL 7.5
>> >> >> >> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
>> >> >> >> >>
>> >> >> >> >> We have set: "mds_bal_fragment_size_max": "50" (Default 
>> >> >> >> >> 10)
>> >> >> >> >>
>> >> >> >> >> $ ceph health detail
>> >> >> >> >> HEALTH_WARN 1 large omap objects
>> >> >> >> >> LARGE_OMAP_OBJECTS 1 large omap objects
>> >> >> >> >> 1 large objects found in pool 'hpcfs_metadata'
>> >> 

[ceph-users] Blustore disks without assigned PGs but with data left

2019-03-18 Thread Xavier Trilla
Hi there,

We have some small SSDs we use just to store radosgw metadata. I'm in the 
process of replacing some of them, but when I took them out of the ruleset I 
use for the radosgw pools I've seen something weird:

21   ssd   0.11099  1.0  111GiB 26.4GiB 84.8GiB 23.75 0.34   0 
osd.21
31   ssd   0.13899  1.0  139GiB 22.9GiB  116GiB 16.48 0.24   0 
osd.31
94   ssd   0.22299  1.0  223GiB 22.8GiB  200GiB 10.21 0.15   0 
osd.94
95   ssd   0.22299  1.0  223GiB 24.0GiB  199GiB 10.78 0.15   0 
osd.95

As you can see, each of them still has like 20 some GB of data, but 0 PGs. Is 
this related to bluestore WAL and block.db? Or there is something weird going 
on here?

Thanks!

Xavier Trilla P.
Clouding.io

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
On Mon, Mar 18, 2019 at 9:50 PM Dylan McCulloch  wrote:
>
>
> >please run following command. It will show where is 4.
> >
> >rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent
> >ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
> >
>
> $ ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
> {
> "ino": 4,
> "ancestors": [
> {
> "dirino": 1,
> "dname": "lost+found",
> "version": 1
> }
> ],
> "pool": 20,
> "old_pools": []
> }
>
> I guess it may have a very large number of files from previous recovery 
> operations?
>

Yes, these files are created by cephfs-data-scan. If you don't want
them, you can delete "lost+found"


> >On Mon, Mar 18, 2019 at 8:15 PM Dylan McCulloch  wrote:
> >>
> >> >> >> >cephfs does not create/use object "4.".  Please show us some
> >> >> >> >of its keys.
> >> >> >> >
> >> >> >>
> >> >> >> https://pastebin.com/WLfLTgni
> >> >> >> Thanks
> >> >> >>
> >> >> > Is the object recently modified?
> >> >> >
> >> >> >rados -p hpcfs_metadata stat 4.
> >> >> >
> >> >>
> >> >> $ rados -p hpcfs_metadata stat 4.
> >> >> hpcfs_metadata/4. mtime 2018-09-17 08:11:50.00, size 0
> >> >>
> >> >please check if 4. has omap header and xattrs
> >> >
> >> >rados -p hpcfs_data listxattr 4.
> >> >
> >> >rados -p hpcfs_data getomapheader 4.
> >> >
> >>
> >> Not sure if that was a typo^^ and you would like the above commands run on 
> >> the 4. object in the metadata pool.
> >> Ran commands on both
> >>
> >> $ rados -p hpcfs_data listxattr 4.
> >> error getting xattr set hpcfs_data/4.: (2) No such file or 
> >> directory
> >> $ rados -p hpcfs_data getomapheader 4.
> >> error getting omap header hpcfs_data/4.: (2) No such file or 
> >> directory
> >>
> >> $ rados -p hpcfs_metadata listxattr 4.
> >> layout
> >> parent
> >> $ rados -p hpcfs_metadata getomapheader 4.
> >> header (274 bytes) :
> >>   04 03 0c 01 00 00 01 00  00 00 00 00 00 00 00 00  
> >> ||
> >> 0010  00 00 00 00 00 00 03 02  28 00 00 00 00 00 00 00  
> >> |(...|
> >> 0020  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  
> >> ||
> >> 0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> >> ||
> >> 0040  00 00 00 00 03 02 28 00  00 00 00 00 00 00 00 00  
> >> |..(.|
> >> 0050  00 00 00 00 00 00 00 00  00 00 01 00 00 00 00 00  
> >> ||
> >> 0060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> >> ||
> >> 0070  00 00 03 02 38 00 00 00  00 00 00 00 00 00 00 00  
> >> |8...|
> >> 0080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> >> ||
> >> *
> >> 00b0  03 02 38 00 00 00 00 00  00 00 00 00 00 00 00 00  
> >> |..8.|
> >> 00c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> >> ||
> >> *
> >> 00e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 01 00  
> >> ||
> >> 00f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> >> ||
> >> *
> >> 0110  00 00 |..|
> >> 0112
> >>
> >> $ rados -p hpcfs_metadata getxattr 4. layout
> >> 
> >> $ rados -p hpcfs_metadata getxattr 4. parent
> >> <
> >> lost+found
> >>
> >> >> >> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch 
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Hi all,
> >> >> >> >>
> >> >> >> >> We have a large omap object warning on one of our Ceph clusters.
> >> >> >> >> The only reports I've seen regarding the "large omap objects" 
> >> >> >> >> warning from other users were related to RGW bucket sharding, 
> >> >> >> >> however we do not have RGW configured on this cluster.
> >> >> >> >> The large omap object ~10GB resides in a CephFS metadata pool.
> >> >> >> >>
> >> >> >> >> It's perhaps worth mentioning that we had to perform disaster 
> >> >> >> >> recovery steps [1] on this cluster last year after a network 
> >> >> >> >> issue, so we're not sure whether this large omap object is a 
> >> >> >> >> result of those previous recovery processes or whether it's 
> >> >> >> >> completely unrelated.
> >> >> >> >>
> >> >> >> >> Ceph version: 12.2.8
> >> >> >> >> osd_objectstore: Bluestore
> >> >> >> >> RHEL 7.5
> >> >> >> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
> >> >> >> >>
> >> >> >> >> We have set: "mds_bal_fragment_size_max": "50" (Default 
> >> >> >> >> 10)
> >> >> >> >>
> >> >> >> >> $ ceph health detail
> >> >> >> >> HEALTH_WARN 1 large omap objects
> >> >> >> >> LARGE_OMAP_OBJECTS 1 large omap objects
> >> >> >> >> 1 large objects found in pool 'hpcfs_metadata'
> >> >> >> >> Search the cluster log for 'Large omap object found' for more 
> >> >> >> >> details.
> >> >> >> >>
> >> >> >> >> # Find 

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Dylan McCulloch

>please run following command. It will show where is 4.
>
>rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent
>ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
>

$ ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
{
"ino": 4,
"ancestors": [
{
"dirino": 1,
"dname": "lost+found",
"version": 1
}
],
"pool": 20,
"old_pools": []
}

I guess it may have a very large number of files from previous recovery 
operations?

>On Mon, Mar 18, 2019 at 8:15 PM Dylan McCulloch  wrote:
>>
>> >> >> >cephfs does not create/use object "4.".  Please show us some
>> >> >> >of its keys.
>> >> >> >
>> >> >>
>> >> >> https://pastebin.com/WLfLTgni
>> >> >> Thanks
>> >> >>
>> >> > Is the object recently modified?
>> >> >
>> >> >rados -p hpcfs_metadata stat 4.
>> >> >
>> >>
>> >> $ rados -p hpcfs_metadata stat 4.
>> >> hpcfs_metadata/4. mtime 2018-09-17 08:11:50.00, size 0
>> >>
>> >please check if 4. has omap header and xattrs
>> >
>> >rados -p hpcfs_data listxattr 4.
>> >
>> >rados -p hpcfs_data getomapheader 4.
>> >
>>
>> Not sure if that was a typo^^ and you would like the above commands run on 
>> the 4. object in the metadata pool.
>> Ran commands on both
>>
>> $ rados -p hpcfs_data listxattr 4.
>> error getting xattr set hpcfs_data/4.: (2) No such file or directory
>> $ rados -p hpcfs_data getomapheader 4.
>> error getting omap header hpcfs_data/4.: (2) No such file or 
>> directory
>>
>> $ rados -p hpcfs_metadata listxattr 4.
>> layout
>> parent
>> $ rados -p hpcfs_metadata getomapheader 4.
>> header (274 bytes) :
>>   04 03 0c 01 00 00 01 00  00 00 00 00 00 00 00 00  
>> ||
>> 0010  00 00 00 00 00 00 03 02  28 00 00 00 00 00 00 00  
>> |(...|
>> 0020  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  
>> ||
>> 0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> ||
>> 0040  00 00 00 00 03 02 28 00  00 00 00 00 00 00 00 00  
>> |..(.|
>> 0050  00 00 00 00 00 00 00 00  00 00 01 00 00 00 00 00  
>> ||
>> 0060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> ||
>> 0070  00 00 03 02 38 00 00 00  00 00 00 00 00 00 00 00  
>> |8...|
>> 0080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> ||
>> *
>> 00b0  03 02 38 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> |..8.|
>> 00c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> ||
>> *
>> 00e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 01 00  
>> ||
>> 00f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
>> ||
>> *
>> 0110  00 00 |..|
>> 0112
>>
>> $ rados -p hpcfs_metadata getxattr 4. layout
>> 
>> $ rados -p hpcfs_metadata getxattr 4. parent
>> <
>> lost+found
>>
>> >> >> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch  
>> >> >> >wrote:
>> >> >> >>
>> >> >> >> Hi all,
>> >> >> >>
>> >> >> >> We have a large omap object warning on one of our Ceph clusters.
>> >> >> >> The only reports I've seen regarding the "large omap objects" 
>> >> >> >> warning from other users were related to RGW bucket sharding, 
>> >> >> >> however we do not have RGW configured on this cluster.
>> >> >> >> The large omap object ~10GB resides in a CephFS metadata pool.
>> >> >> >>
>> >> >> >> It's perhaps worth mentioning that we had to perform disaster 
>> >> >> >> recovery steps [1] on this cluster last year after a network issue, 
>> >> >> >> so we're not sure whether this large omap object is a result of 
>> >> >> >> those previous recovery processes or whether it's completely 
>> >> >> >> unrelated.
>> >> >> >>
>> >> >> >> Ceph version: 12.2.8
>> >> >> >> osd_objectstore: Bluestore
>> >> >> >> RHEL 7.5
>> >> >> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
>> >> >> >>
>> >> >> >> We have set: "mds_bal_fragment_size_max": "50" (Default 10)
>> >> >> >>
>> >> >> >> $ ceph health detail
>> >> >> >> HEALTH_WARN 1 large omap objects
>> >> >> >> LARGE_OMAP_OBJECTS 1 large omap objects
>> >> >> >> 1 large objects found in pool 'hpcfs_metadata'
>> >> >> >> Search the cluster log for 'Large omap object found' for more 
>> >> >> >> details.
>> >> >> >>
>> >> >> >> # Find pg with large omap object
>> >> >> >> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk 
>> >> >> >> '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep 
>> >> >> >> num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 
>> >> >> >> 1"
>> >> >> >> 20.103: 1
>> >> >> >>
>> >> >> >> # OSD log entry showing relevant object
>> >> >> >> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] 
>> >> >> >> 

Re: [ceph-users] Newly added OSDs will not stay up

2019-03-18 Thread Josh Haft
Turns out this was due to a switch misconfiguration on the cluster
network. I use jumbo frames and essentially the new server's
connections were not configured with the correct MTU on the switch. So
this caused some traffic to flow, but eventually the servers wanted to
send larger frame sizes than the switch allowed, and that prevented
OSDs from receiving osd_ping messages.

Sorry for the noise!
Josh


On Thu, Mar 14, 2019 at 3:45 PM Josh Haft  wrote:
>
> Hello fellow Cephers,
>
> My 12.2.2 cluster is pretty full so I've been adding new nodes/OSDs.
> Last week I added two new nodes with 12 OSDs each and they are still
> backfilling. I have max_backfills tuned quite low across the board to
> minimize client impact. Yesterday I brought two more nodes online each
> with 12 OSDs and added them to the crushmap under a staging root,
> planning to add those to root=default when the two from last week
> complete backfilling. When the OSDs processes came up they all did
> what I describe below and since it only takes two OSDs on different
> hosts... the mons started marking existing OSDs down. So I backed that
> out and am now just working with a single OSD on of the new nodes
> until I can figure this out.
>
> When the OSD process starts up it's listening on ports 6800 and 6801
> on both the cluster and public interfaces. It successfully gets the
> current osdmap from a monitor and chooses 10 OSDs to peer with, all of
> which fail.
>
> It doesn't appear to be a basic networking issue; I turned up debug
> osd and ms to 20 and based on the following it looks like a successful
> ping/reply with the OSD peer (osd.0), but after a while the log says
> it's never heard from this OSD.
>
> 2019-03-14 14:17:42.350902 7fe698776700 10 osd.403 103451
> _add_heartbeat_peer: new peer osd.0 10.8.78.23:6814/8498484
> 10.8.76.23:6805/8498484
> 2019-03-14 14:17:44.165460 7fe68df61700  1 -- 10.8.76.48:0/67279 -->
> 10.8.76.23:6805/8498484 -- osd_ping(ping e103451 stamp 2019-03-14
> 14:17:44.165415) v4 -- 0x55844222aa00 con 0
> 2019-03-14 14:17:44.165467 7fe68df61700 20 -- 10.8.76.48:0/67279 >>
> 10.8.76.23:6805/8498484 conn(0x558442368000 :-1 s=STATE_OPEN pgs=2349
> cs=1 l=1).prepare_send_message m osd_ping(ping e103451 stamp
> 2019-03-14 14:17:44.165415) v4
> 2019-03-14 14:17:44.165471 7fe68df61700 20 -- 10.8.76.48:0/67279 >>
> 10.8.76.23:6805/8498484 conn(0x558442368000 :-1 s=STATE_OPEN pgs=2349
> cs=1 l=1).prepare_send_message encoding features 2305244844532236283
> 0x55844222aa00 osd_ping(ping e103451 stamp 2019-03-14 14:17:44.165415)
> v4
> 2019-03-14 14:17:44.165691 7fe6a574e700  5 -- 10.8.76.48:0/67279 >>
> 10.8.76.23:6805/8498484 conn(0x558442368000 :-1
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2349 cs=1 l=1). rx
> osd.0 seq 1 0x55844206ba00 osd_ping(ping_reply e103451 stamp
> 2019-03-14 14:17:44.165415) v4
> 2019-03-14 14:17:44.165697 7fe6a574e700  1 -- 10.8.76.48:0/67279 <==
> osd.0 10.8.76.23:6805/8498484 1  osd_ping(ping_reply e103451 stamp
> 2019-03-14 14:17:44.165415) v4  2004+0+0 (4204681659 0 0)
> 0x55844206ba00 con 0x558442368000
>
> ... seq 2-6...
>
> 2019-03-14 14:17:57.468338 7fe68df61700  1 -- 10.8.76.48:0/67279 -->
> 10.8.76.23:6805/8498484 -- osd_ping(ping e103451 stamp 2019-03-14
> 14:17:57.468301) v4 -- 0x5584422e2c00 con 0
> 2019-03-14 14:17:57.468343 7fe68df61700 20 -- 10.8.76.48:0/67279 >>
> 10.8.76.23:6805/8498484 conn(0x558442368000 :-1 s=STATE_OPEN pgs=2349
> cs=1 l=1).prepare_send_message m osd_ping(ping e103451 stamp
> 2019-03-14 14:17:57.468301) v4
> 2019-03-14 14:17:57.468348 7fe68df61700 20 -- 10.8.76.48:0/67279 >>
> 10.8.76.23:6805/8498484 conn(0x558442368000 :-1 s=STATE_OPEN pgs=2349
> cs=1 l=1).prepare_send_message encoding features 2305244844532236283
> 0x5584422e2c00 osd_ping(ping e103451 stamp 2019-03-14 14:17:57.468301)
> v4
> 2019-03-14 14:17:57.468554 7fe6a574e700  5 -- 10.8.76.48:0/67279 >>
> 10.8.76.23:6805/8498484 conn(0x558442368000 :-1
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2349 cs=1 l=1). rx
> osd.0 seq 6 0x55844222a600 osd_ping(ping_reply e103451 stamp
> 2019-03-14 14:17:57.468301) v4
> 2019-03-14 14:17:57.468561 7fe6a574e700  1 -- 10.8.76.48:0/67279 <==
> osd.0 10.8.76.23:6805/8498484 6  osd_ping(ping_reply e103451 stamp
> 2019-03-14 14:17:57.468301) v4  2004+0+0 (306125004 0 0)
> 0x55844222a600 con 0x558442368000
> 2019-03-14 14:18:04.266809 7fe6a1f89700 -1 osd.403 103451
> heartbeat_check: no reply from 10.8.76.23:6805 osd.0 ever on either
> front or back, first ping sent 2019-03-14 14:17:44.165415 (cutoff
> 2019-03-14 14:17:44.266808)
> 2019-03-14 14:18:05.267163 7fe6a1f89700 -1 osd.403 103451
> heartbeat_check: no reply from 10.8.76.23:6805 osd.0 ever on either
> front or back, first ping sent 2019-03-14 14:17:44.165415 (cutoff
> 2019-03-14 14:17:45.267163)
> 2019-03-14 14:18:06.267296 7fe6a1f89700 -1 osd.403 103451
> heartbeat_check: no reply from 10.8.76.23:6805 osd.0 ever on either
> front or back, first ping sent 2019-03-14 

[ceph-users] mgr/balancer/upmap_max_deviation not working in Luminous 12.2.8

2019-03-18 Thread Xavier Trilla
Hi,

For one of our Ceph clusters, I'm trying to modify the balancer configuration, 
so it will keep working until it achieves a better distribution.

After checking the mailing list, looks like that the key controlling this for 
upmap is mgr/balancer/upmap_max_deviation but it does not seem to make any 
difference (even after restarting the active mgr daemon) and I keep getting the 
"Error EALREADY: Unable to find further optimization,or distribution is already 
perfect" message when trying to create a new plan.

On the other side, if I download the map and I use the osdmaptool and specify 
the --upmap-deviation parameter it works as expected.

Here is the output of ceph config-key dump:

{
"mgr/balancer/active": "1",
"mgr/balancer/max_misplaced": "0.01",
"mgr/balancer/mode": "upmap",
"mgr/balancer/upmap_max_deviation": ".002"
}

Any ideas?

Thanks!

Xavier Trilla P.
Clouding.io

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
please run following command. It will show where is 4.

rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent
ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json




On Mon, Mar 18, 2019 at 8:15 PM Dylan McCulloch  wrote:
>
> >> >> >cephfs does not create/use object "4.".  Please show us some
> >> >> >of its keys.
> >> >> >
> >> >>
> >> >> https://pastebin.com/WLfLTgni
> >> >> Thanks
> >> >>
> >> > Is the object recently modified?
> >> >
> >> >rados -p hpcfs_metadata stat 4.
> >> >
> >>
> >> $ rados -p hpcfs_metadata stat 4.
> >> hpcfs_metadata/4. mtime 2018-09-17 08:11:50.00, size 0
> >>
> >please check if 4. has omap header and xattrs
> >
> >rados -p hpcfs_data listxattr 4.
> >
> >rados -p hpcfs_data getomapheader 4.
> >
>
> Not sure if that was a typo^^ and you would like the above commands run on 
> the 4. object in the metadata pool.
> Ran commands on both
>
> $ rados -p hpcfs_data listxattr 4.
> error getting xattr set hpcfs_data/4.: (2) No such file or directory
> $ rados -p hpcfs_data getomapheader 4.
> error getting omap header hpcfs_data/4.: (2) No such file or directory
>
> $ rados -p hpcfs_metadata listxattr 4.
> layout
> parent
> $ rados -p hpcfs_metadata getomapheader 4.
> header (274 bytes) :
>   04 03 0c 01 00 00 01 00  00 00 00 00 00 00 00 00  ||
> 0010  00 00 00 00 00 00 03 02  28 00 00 00 00 00 00 00  |(...|
> 0020  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  ||
> 0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> 0040  00 00 00 00 03 02 28 00  00 00 00 00 00 00 00 00  |..(.|
> 0050  00 00 00 00 00 00 00 00  00 00 01 00 00 00 00 00  ||
> 0060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> 0070  00 00 03 02 38 00 00 00  00 00 00 00 00 00 00 00  |8...|
> 0080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> *
> 00b0  03 02 38 00 00 00 00 00  00 00 00 00 00 00 00 00  |..8.|
> 00c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> *
> 00e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 01 00  ||
> 00f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> *
> 0110  00 00 |..|
> 0112
>
> $ rados -p hpcfs_metadata getxattr 4. layout
> 
> $ rados -p hpcfs_metadata getxattr 4. parent
> <
> lost+found
>
> >> >> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch  
> >> >> >wrote:
> >> >> >>
> >> >> >> Hi all,
> >> >> >>
> >> >> >> We have a large omap object warning on one of our Ceph clusters.
> >> >> >> The only reports I've seen regarding the "large omap objects" 
> >> >> >> warning from other users were related to RGW bucket sharding, 
> >> >> >> however we do not have RGW configured on this cluster.
> >> >> >> The large omap object ~10GB resides in a CephFS metadata pool.
> >> >> >>
> >> >> >> It's perhaps worth mentioning that we had to perform disaster 
> >> >> >> recovery steps [1] on this cluster last year after a network issue, 
> >> >> >> so we're not sure whether this large omap object is a result of 
> >> >> >> those previous recovery processes or whether it's completely 
> >> >> >> unrelated.
> >> >> >>
> >> >> >> Ceph version: 12.2.8
> >> >> >> osd_objectstore: Bluestore
> >> >> >> RHEL 7.5
> >> >> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
> >> >> >>
> >> >> >> We have set: "mds_bal_fragment_size_max": "50" (Default 10)
> >> >> >>
> >> >> >> $ ceph health detail
> >> >> >> HEALTH_WARN 1 large omap objects
> >> >> >> LARGE_OMAP_OBJECTS 1 large omap objects
> >> >> >> 1 large objects found in pool 'hpcfs_metadata'
> >> >> >> Search the cluster log for 'Large omap object found' for more 
> >> >> >> details.
> >> >> >>
> >> >> >> # Find pg with large omap object
> >> >> >> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk 
> >> >> >> '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep 
> >> >> >> num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 
> >> >> >> 1"
> >> >> >> 20.103: 1
> >> >> >>
> >> >> >> # OSD log entry showing relevant object
> >> >> >> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large 
> >> >> >> omap object found. Object: 20:c0ce80d4:::4.:head Key count: 
> >> >> >> 24698995 Size (bytes): 11410935690
> >> >> >>
> >> >> >> # Confirm default warning thresholds for large omap object
> >> >> >> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
> >> >> >> "osd_deep_scrub_large_omap_object_key_threshold": "200",
> >> >> >> "osd_deep_scrub_large_omap_object_value_sum_threshold": 
> >> >> >> "1073741824",
> >> >> >>
> >> >> >> # Dump keys/values of problematic 

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Dylan McCulloch
>> >> >cephfs does not create/use object "4.".  Please show us some
>> >> >of its keys.
>> >> >
>> >>
>> >> https://pastebin.com/WLfLTgni
>> >> Thanks
>> >>
>> > Is the object recently modified?
>> >
>> >rados -p hpcfs_metadata stat 4.
>> >
>>
>> $ rados -p hpcfs_metadata stat 4.
>> hpcfs_metadata/4. mtime 2018-09-17 08:11:50.00, size 0
>>
>please check if 4. has omap header and xattrs
>
>rados -p hpcfs_data listxattr 4.
>
>rados -p hpcfs_data getomapheader 4.
>

Not sure if that was a typo^^ and you would like the above commands run on the 
4. object in the metadata pool.
Ran commands on both

$ rados -p hpcfs_data listxattr 4.
error getting xattr set hpcfs_data/4.: (2) No such file or directory
$ rados -p hpcfs_data getomapheader 4.
error getting omap header hpcfs_data/4.: (2) No such file or directory

$ rados -p hpcfs_metadata listxattr 4.
layout
parent
$ rados -p hpcfs_metadata getomapheader 4.
header (274 bytes) :
  04 03 0c 01 00 00 01 00  00 00 00 00 00 00 00 00  ||
0010  00 00 00 00 00 00 03 02  28 00 00 00 00 00 00 00  |(...|
0020  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  ||
0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
0040  00 00 00 00 03 02 28 00  00 00 00 00 00 00 00 00  |..(.|
0050  00 00 00 00 00 00 00 00  00 00 01 00 00 00 00 00  ||
0060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
0070  00 00 03 02 38 00 00 00  00 00 00 00 00 00 00 00  |8...|
0080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
00b0  03 02 38 00 00 00 00 00  00 00 00 00 00 00 00 00  |..8.|
00c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
00e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 01 00  ||
00f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0110  00 00 |..|
0112

$ rados -p hpcfs_metadata getxattr 4. layout

$ rados -p hpcfs_metadata getxattr 4. parent
<
lost+found

>> >> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch  
>> >> >wrote:
>> >> >>
>> >> >> Hi all,
>> >> >>
>> >> >> We have a large omap object warning on one of our Ceph clusters.
>> >> >> The only reports I've seen regarding the "large omap objects" warning 
>> >> >> from other users were related to RGW bucket sharding, however we do 
>> >> >> not have RGW configured on this cluster.
>> >> >> The large omap object ~10GB resides in a CephFS metadata pool.
>> >> >>
>> >> >> It's perhaps worth mentioning that we had to perform disaster recovery 
>> >> >> steps [1] on this cluster last year after a network issue, so we're 
>> >> >> not sure whether this large omap object is a result of those previous 
>> >> >> recovery processes or whether it's completely unrelated.
>> >> >>
>> >> >> Ceph version: 12.2.8
>> >> >> osd_objectstore: Bluestore
>> >> >> RHEL 7.5
>> >> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
>> >> >>
>> >> >> We have set: "mds_bal_fragment_size_max": "50" (Default 10)
>> >> >>
>> >> >> $ ceph health detail
>> >> >> HEALTH_WARN 1 large omap objects
>> >> >> LARGE_OMAP_OBJECTS 1 large omap objects
>> >> >> 1 large objects found in pool 'hpcfs_metadata'
>> >> >> Search the cluster log for 'Large omap object found' for more 
>> >> >> details.
>> >> >>
>> >> >> # Find pg with large omap object
>> >> >> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk 
>> >> >> '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep 
>> >> >> num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"
>> >> >> 20.103: 1
>> >> >>
>> >> >> # OSD log entry showing relevant object
>> >> >> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large 
>> >> >> omap object found. Object: 20:c0ce80d4:::4.:head Key count: 
>> >> >> 24698995 Size (bytes): 11410935690
>> >> >>
>> >> >> # Confirm default warning thresholds for large omap object
>> >> >> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
>> >> >> "osd_deep_scrub_large_omap_object_key_threshold": "200",
>> >> >> "osd_deep_scrub_large_omap_object_value_sum_threshold": 
>> >> >> "1073741824",
>> >> >>
>> >> >> # Dump keys/values of problematic object, creates 46.65GB file
>> >> >> $ rados -p hpcfs_metadata listomapvals '4.' > 
>> >> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> >> >> $ ll /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> >> >> -rw-r--r-- 1 root root 50089561860 Mar  4 18:16 
>> >> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> >> >>
>> >> >> # Confirm key count matches OSD log entry warning
>> >> >> $ rados -p hpcfs_metadata listomapkeys '4.' | wc -l
>> >> >> 

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
please check if 4. has omap header and xattrs

rados -p hpcfs_data listxattr 4.

rados -p hpcfs_data getomapheader 4.


On Mon, Mar 18, 2019 at 7:37 PM Dylan McCulloch  wrote:
>
> >> >
> >> >cephfs does not create/use object "4.".  Please show us some
> >> >of its keys.
> >> >
> >>
> >> https://pastebin.com/WLfLTgni
> >> Thanks
> >>
> > Is the object recently modified?
> >
> >rados -p hpcfs_metadata stat 4.
> >
>
> $ rados -p hpcfs_metadata stat 4.
> hpcfs_metadata/4. mtime 2018-09-17 08:11:50.00, size 0
>
rados -p hpcfs_metadata getomapheader 4.




> >> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch  
> >> >wrote:
> >> >>
> >> >> Hi all,
> >> >>
> >> >> We have a large omap object warning on one of our Ceph clusters.
> >> >> The only reports I've seen regarding the "large omap objects" warning 
> >> >> from other users were related to RGW bucket sharding, however we do not 
> >> >> have RGW configured on this cluster.
> >> >> The large omap object ~10GB resides in a CephFS metadata pool.
> >> >>
> >> >> It's perhaps worth mentioning that we had to perform disaster recovery 
> >> >> steps [1] on this cluster last year after a network issue, so we're not 
> >> >> sure whether this large omap object is a result of those previous 
> >> >> recovery processes or whether it's completely unrelated.
> >> >>
> >> >> Ceph version: 12.2.8
> >> >> osd_objectstore: Bluestore
> >> >> RHEL 7.5
> >> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
> >> >>
> >> >> We have set: "mds_bal_fragment_size_max": "50" (Default 10)
> >> >>
> >> >> $ ceph health detail
> >> >> HEALTH_WARN 1 large omap objects
> >> >> LARGE_OMAP_OBJECTS 1 large omap objects
> >> >> 1 large objects found in pool 'hpcfs_metadata'
> >> >> Search the cluster log for 'Large omap object found' for more 
> >> >> details.
> >> >>
> >> >> # Find pg with large omap object
> >> >> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk 
> >> >> '{print $1}'`; do echo -n "$i: "; ceph pg $i query |grep 
> >> >> num_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1"
> >> >> 20.103: 1
> >> >>
> >> >> # OSD log entry showing relevant object
> >> >> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large 
> >> >> omap object found. Object: 20:c0ce80d4:::4.:head Key count: 
> >> >> 24698995 Size (bytes): 11410935690
> >> >>
> >> >> # Confirm default warning thresholds for large omap object
> >> >> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
> >> >> "osd_deep_scrub_large_omap_object_key_threshold": "200",
> >> >> "osd_deep_scrub_large_omap_object_value_sum_threshold": 
> >> >> "1073741824",
> >> >>
> >> >> # Dump keys/values of problematic object, creates 46.65GB file
> >> >> $ rados -p hpcfs_metadata listomapvals '4.' > 
> >> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> >> $ ll /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> >> -rw-r--r-- 1 root root 50089561860 Mar  4 18:16 
> >> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> >>
> >> >> # Confirm key count matches OSD log entry warning
> >> >> $ rados -p hpcfs_metadata listomapkeys '4.' | wc -l
> >> >> 24698995
> >> >>
> >> >> # The omap keys/vals for that object appear to have been 
> >> >> unchanged/static for at least a couple of months:
> >> >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> >> fd00ceb68607b477626178b2d81fefb926460107  
> >> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4__20190108
> >> >> fd00ceb68607b477626178b2d81fefb926460107  
> >> >> /tmp/hpcfs_metadata_object_omap_vals_4__20190108
> >> >>
> >> >> I haven't gone through all 24698995 keys yet, but while most appear to 
> >> >> relate to objects in the hpcfs_data CephFS data pool, there are a 
> >> >> significant number of keys (rough guess 25%) that don't appear to have 
> >> >> corresponding objects in the hpcfs_data pool.
> >> >>
> >> >> Any assistance or pointers to troubleshoot further would be very much 
> >> >> appreciated.
> >> >>
> >> >> Thanks,
> >> >> Dylan
> >> >>
> >> >> [1] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/
> >> >>
> >> >> ___
> >> >> ceph-users mailing list
> >> >> ceph-users@lists.ceph.com
> >> >> https://protect-au.mimecast.com/s/sfrDCq7By5sN9Nv1sQbY6q?domain=lists.ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] nautilus: dashboard configuration issue

2019-03-18 Thread Volker Theile
Hello Daniele,

your problem is tracker by https://tracker.ceph.com/issues/38528 and
fixed in the latest Ceph 14 builds. To workaround the problem simply
disable SSL for your specific manager.

$ ceph config set mgr mgr/dashboard//ssl false

See https://tracker.ceph.com/issues/38528#note-1.


Regards
Volker


Am 17.03.19 um 03:11 schrieb Daniele Riccucci:
> Hello,
> I have a small cluster deployed with ceph-ansible on containers.
> Recently, without realizing that ceph_docker_image_tag was set to
> latest by default, the cluster got upgraded to nautilus and I was
> unable to roll back.
> Everything seems to be running smoothly except for the dashboard.
>
> $ ceph status
>
>   cluster:
>     id: d5c50302-0d8e-47cb-ab86-c15842372900
>     health: HEALTH_ERR
>     Module 'dashboard' has failed: IOError("Port 8443 not free
> on '::'",)
> [...]
>
> I already have a service running on port 8443 and I don't need SSL so
> I ran:
>
> ceph config set mgr mgr/dashboard/ssl false
>
> and
>
> ceph config set mgr mgr/dashboard/server_port 
>
> according to the docs, to change this behavior.
> Running `ceph config dump` returns the following:
>
> WHO MASK LEVEL    OPTION VALUE    RO
> mgr  advanced mgr/dashboard/server_port  8088 *
> mgr  advanced mgr/dashboard/ssl  false    *
>
> By dumping the configuration I found that 2 keys were present:
>
> $ ceph config-key dump
> {
>     [...]
>     "config/mgr/mgr/dashboard/server_port": "8088",
>     "config/mgr/mgr/dashboard/ssl": "false",
>     "config/mgr/mgr/dashboard/username": "devster",
>     "mgr/dashboard/key": "",
>     "mgr/dashboard/crt": ""
> }
>
> which likely are the SSL certificates. I deleted them, disabled the
> module and re-enabled it, however the following happened:
>
> $ ceph status
>   cluster:
>     id: d5c50302-0d8e-47cb-ab86-c15842372900
>     health: HEALTH_OK
> [...]
>
> $ ceph mgr module ls | jq .enabled_modules
> [
>   "dashboard",
>   "iostat",
>   "prometheus",
>   "restful"
> ]
>
> $ ceph mgr services
> {
>   "prometheus": "http://localhost:9283/;
> }
>
> Dashboard seems enabled but unavailable.
> What am I missing?
> Thank you.
>
> Daniele
>
> P.S. what is the difference between `ceph config` and `ceph config-key`?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-- 
Volker Theile
Software Engineer | openATTIC
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
Phone: +49 173 5876879
E-Mail: vthe...@suse.com




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Dylan McCulloch
>> >
>> >cephfs does not create/use object "4.".  Please show us some
>> >of its keys.
>> >
>>
>> https://pastebin.com/WLfLTgni
>> Thanks
>>
> Is the object recently modified?
>
>rados -p hpcfs_metadata stat 4.
>

$ rados -p hpcfs_metadata stat 4.
hpcfs_metadata/4. mtime 2018-09-17 08:11:50.00, size 0

>> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch  wrote:
>> >>
>> >> Hi all,
>> >>
>> >> We have a large omap object warning on one of our Ceph clusters.
>> >> The only reports I've seen regarding the "large omap objects" warning 
>> >> from other users were related to RGW bucket sharding, however we do not 
>> >> have RGW configured on this cluster.
>> >> The large omap object ~10GB resides in a CephFS metadata pool.
>> >>
>> >> It's perhaps worth mentioning that we had to perform disaster recovery 
>> >> steps [1] on this cluster last year after a network issue, so we're not 
>> >> sure whether this large omap object is a result of those previous 
>> >> recovery processes or whether it's completely unrelated.
>> >>
>> >> Ceph version: 12.2.8
>> >> osd_objectstore: Bluestore
>> >> RHEL 7.5
>> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
>> >>
>> >> We have set: "mds_bal_fragment_size_max": "50" (Default 10)
>> >>
>> >> $ ceph health detail
>> >> HEALTH_WARN 1 large omap objects
>> >> LARGE_OMAP_OBJECTS 1 large omap objects
>> >> 1 large objects found in pool 'hpcfs_metadata'
>> >> Search the cluster log for 'Large omap object found' for more details.
>> >>
>> >> # Find pg with large omap object
>> >> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk '{print 
>> >> $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | 
>> >> head -1 | awk '{print $2}'; done | grep ": 1"
>> >> 20.103: 1
>> >>
>> >> # OSD log entry showing relevant object
>> >> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large omap 
>> >> object found. Object: 20:c0ce80d4:::4.:head Key count: 24698995 
>> >> Size (bytes): 11410935690
>> >>
>> >> # Confirm default warning thresholds for large omap object
>> >> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
>> >> "osd_deep_scrub_large_omap_object_key_threshold": "200",
>> >> "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",
>> >>
>> >> # Dump keys/values of problematic object, creates 46.65GB file
>> >> $ rados -p hpcfs_metadata listomapvals '4.' > 
>> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> >> $ ll /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> >> -rw-r--r-- 1 root root 50089561860 Mar  4 18:16 
>> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> >>
>> >> # Confirm key count matches OSD log entry warning
>> >> $ rados -p hpcfs_metadata listomapkeys '4.' | wc -l
>> >> 24698995
>> >>
>> >> # The omap keys/vals for that object appear to have been unchanged/static 
>> >> for at least a couple of months:
>> >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> >> fd00ceb68607b477626178b2d81fefb926460107  
>> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4__20190108
>> >> fd00ceb68607b477626178b2d81fefb926460107  
>> >> /tmp/hpcfs_metadata_object_omap_vals_4__20190108
>> >>
>> >> I haven't gone through all 24698995 keys yet, but while most appear to 
>> >> relate to objects in the hpcfs_data CephFS data pool, there are a 
>> >> significant number of keys (rough guess 25%) that don't appear to have 
>> >> corresponding objects in the hpcfs_data pool.
>> >>
>> >> Any assistance or pointers to troubleshoot further would be very much 
>> >> appreciated.
>> >>
>> >> Thanks,
>> >> Dylan
>> >>
>> >> [1] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/
>> >>
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
On Mon, Mar 18, 2019 at 6:05 PM Dylan McCulloch  wrote:
>
>
> >
> >cephfs does not create/use object "4.".  Please show us some
> >of its keys.
> >
>
> https://pastebin.com/WLfLTgni
> Thanks
>
 Is the object recently modified?

rados -p hpcfs_metadata stat 4.



> >On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch  wrote:
> >>
> >> Hi all,
> >>
> >> We have a large omap object warning on one of our Ceph clusters.
> >> The only reports I've seen regarding the "large omap objects" warning from 
> >> other users were related to RGW bucket sharding, however we do not have 
> >> RGW configured on this cluster.
> >> The large omap object ~10GB resides in a CephFS metadata pool.
> >>
> >> It's perhaps worth mentioning that we had to perform disaster recovery 
> >> steps [1] on this cluster last year after a network issue, so we're not 
> >> sure whether this large omap object is a result of those previous recovery 
> >> processes or whether it's completely unrelated.
> >>
> >> Ceph version: 12.2.8
> >> osd_objectstore: Bluestore
> >> RHEL 7.5
> >> Kernel: 4.4.135-1.el7.elrepo.x86_64
> >>
> >> We have set: "mds_bal_fragment_size_max": "50" (Default 10)
> >>
> >> $ ceph health detail
> >> HEALTH_WARN 1 large omap objects
> >> LARGE_OMAP_OBJECTS 1 large omap objects
> >> 1 large objects found in pool 'hpcfs_metadata'
> >> Search the cluster log for 'Large omap object found' for more details.
> >>
> >> # Find pg with large omap object
> >> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk '{print 
> >> $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | 
> >> head -1 | awk '{print $2}'; done | grep ": 1"
> >> 20.103: 1
> >>
> >> # OSD log entry showing relevant object
> >> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large omap 
> >> object found. Object: 20:c0ce80d4:::4.:head Key count: 24698995 
> >> Size (bytes): 11410935690
> >>
> >> # Confirm default warning thresholds for large omap object
> >> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
> >> "osd_deep_scrub_large_omap_object_key_threshold": "200",
> >> "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",
> >>
> >> # Dump keys/values of problematic object, creates 46.65GB file
> >> $ rados -p hpcfs_metadata listomapvals '4.' > 
> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> $ ll /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> -rw-r--r-- 1 root root 50089561860 Mar  4 18:16 
> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >>
> >> # Confirm key count matches OSD log entry warning
> >> $ rados -p hpcfs_metadata listomapkeys '4.' | wc -l
> >> 24698995
> >>
> >> # The omap keys/vals for that object appear to have been unchanged/static 
> >> for at least a couple of months:
> >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> fd00ceb68607b477626178b2d81fefb926460107  
> >> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> >> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4__20190108
> >> fd00ceb68607b477626178b2d81fefb926460107  
> >> /tmp/hpcfs_metadata_object_omap_vals_4__20190108
> >>
> >> I haven't gone through all 24698995 keys yet, but while most appear to 
> >> relate to objects in the hpcfs_data CephFS data pool, there are a 
> >> significant number of keys (rough guess 25%) that don't appear to have 
> >> corresponding objects in the hpcfs_data pool.
> >>
> >> Any assistance or pointers to troubleshoot further would be very much 
> >> appreciated.
> >>
> >> Thanks,
> >> Dylan
> >>
> >> [1] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Constant Compaction on one mimic node

2019-03-18 Thread Alex Litvak

From what I see, the message is generated by a mon container on each node.  
Does mon issue a manual compaction of rocksdb at some point (debug is a rocksdb 
one)?

On 3/18/2019 12:33 AM, Konstantin Shalygin wrote:

I am getting a huge number of messages on one out of three nodes showing Manual 
compaction starting all the time.  I see no such of log entries on the other 
nodes in the cluster.

Mar 16 06:40:11 storage1n1-chi docker[24502]: debug 2019-03-16 06:40:11.441 
7f6967af4700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024]
[default] Manual compaction starting
Mar 16 06:40:11 storage1n1-chi docker[24502]: message repeated 4 times: [ debug 
2019-03-16 06:40:11.441 7f6967af4700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024]
[default] Manual compaction starting]
Mar 16 06:42:21 storage1n1-chi docker[24502]: debug 2019-03-16 06:42:21.466 
7f6970305700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:77]
[JOB 1021] Syncing log #194307

I am not sure what triggers those message on one node an not on the others.

Checking config on all mons

debug_leveldb 4/5  override
debug_memdb   4/5  override
debug_mgr 0/5  override
debug_mgrc0/5  override
debug_rocksdb 4/5  override

Documentation tells nothing about the compaction logs or at least I couldn't 
find anything specific to my issue.


You should look to docker side I think, because this is manual compaction, like 
`ceph daemon osd.0 compact` from admin socket or `ceph tell osd.0 compact` from 
admin cli.



k


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Constant Compaction on one mimic node

2019-03-18 Thread Alex Litvak

Konstantin,

I am not sure I understand.  You mean something in the container does a manual 
compaction job sporadically? What would be doing that? I am confused.

On 3/18/2019 12:33 AM, Konstantin Shalygin wrote:

I am getting a huge number of messages on one out of three nodes showing Manual 
compaction starting all the time.  I see no such of log entries on the other 
nodes in the cluster.

Mar 16 06:40:11 storage1n1-chi docker[24502]: debug 2019-03-16 06:40:11.441 
7f6967af4700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024]
[default] Manual compaction starting
Mar 16 06:40:11 storage1n1-chi docker[24502]: message repeated 4 times: [ debug 
2019-03-16 06:40:11.441 7f6967af4700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1024]
[default] Manual compaction starting]
Mar 16 06:42:21 storage1n1-chi docker[24502]: debug 2019-03-16 06:42:21.466 
7f6970305700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.2/rpm/el7/BUILD/ceph-13.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:77]
[JOB 1021] Syncing log #194307

I am not sure what triggers those message on one node an not on the others.

Checking config on all mons

debug_leveldb 4/5  override
debug_memdb   4/5  override
debug_mgr 0/5  override
debug_mgrc0/5  override
debug_rocksdb 4/5  override

Documentation tells nothing about the compaction logs or at least I couldn't 
find anything specific to my issue.


You should look to docker side I think, because this is manual compaction, like 
`ceph daemon osd.0 compact` from admin socket or `ceph tell osd.0 compact` from 
admin cli.



k


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Dylan McCulloch

>
>cephfs does not create/use object "4.".  Please show us some
>of its keys.
>

https://pastebin.com/WLfLTgni
Thanks

>On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch  wrote:
>>
>> Hi all,
>>
>> We have a large omap object warning on one of our Ceph clusters.
>> The only reports I've seen regarding the "large omap objects" warning from 
>> other users were related to RGW bucket sharding, however we do not have RGW 
>> configured on this cluster.
>> The large omap object ~10GB resides in a CephFS metadata pool.
>>
>> It's perhaps worth mentioning that we had to perform disaster recovery steps 
>> [1] on this cluster last year after a network issue, so we're not sure 
>> whether this large omap object is a result of those previous recovery 
>> processes or whether it's completely unrelated.
>>
>> Ceph version: 12.2.8
>> osd_objectstore: Bluestore
>> RHEL 7.5
>> Kernel: 4.4.135-1.el7.elrepo.x86_64
>>
>> We have set: "mds_bal_fragment_size_max": "50" (Default 10)
>>
>> $ ceph health detail
>> HEALTH_WARN 1 large omap objects
>> LARGE_OMAP_OBJECTS 1 large omap objects
>> 1 large objects found in pool 'hpcfs_metadata'
>> Search the cluster log for 'Large omap object found' for more details.
>>
>> # Find pg with large omap object
>> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk '{print 
>> $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | 
>> head -1 | awk '{print $2}'; done | grep ": 1"
>> 20.103: 1
>>
>> # OSD log entry showing relevant object
>> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large omap 
>> object found. Object: 20:c0ce80d4:::4.:head Key count: 24698995 Size 
>> (bytes): 11410935690
>>
>> # Confirm default warning thresholds for large omap object
>> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
>> "osd_deep_scrub_large_omap_object_key_threshold": "200",
>> "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",
>>
>> # Dump keys/values of problematic object, creates 46.65GB file
>> $ rados -p hpcfs_metadata listomapvals '4.' > 
>> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> $ ll /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> -rw-r--r-- 1 root root 50089561860 Mar  4 18:16 
>> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>>
>> # Confirm key count matches OSD log entry warning
>> $ rados -p hpcfs_metadata listomapkeys '4.' | wc -l
>> 24698995
>>
>> # The omap keys/vals for that object appear to have been unchanged/static 
>> for at least a couple of months:
>> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> fd00ceb68607b477626178b2d81fefb926460107  
>> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4__20190108
>> fd00ceb68607b477626178b2d81fefb926460107  
>> /tmp/hpcfs_metadata_object_omap_vals_4__20190108
>>
>> I haven't gone through all 24698995 keys yet, but while most appear to 
>> relate to objects in the hpcfs_data CephFS data pool, there are a 
>> significant number of keys (rough guess 25%) that don't appear to have 
>> corresponding objects in the hpcfs_data pool.
>>
>> Any assistance or pointers to troubleshoot further would be very much 
>> appreciated.
>>
>> Thanks,
>> Dylan
>>
>> [1] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
cephfs does not create/use object "4.".  Please show us some
of its keys.

On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch  wrote:
>
> Hi all,
>
> We have a large omap object warning on one of our Ceph clusters.
> The only reports I've seen regarding the "large omap objects" warning from 
> other users were related to RGW bucket sharding, however we do not have RGW 
> configured on this cluster.
> The large omap object ~10GB resides in a CephFS metadata pool.
>
> It's perhaps worth mentioning that we had to perform disaster recovery steps 
> [1] on this cluster last year after a network issue, so we're not sure 
> whether this large omap object is a result of those previous recovery 
> processes or whether it's completely unrelated.
>
> Ceph version: 12.2.8
> osd_objectstore: Bluestore
> RHEL 7.5
> Kernel: 4.4.135-1.el7.elrepo.x86_64
>
> We have set: "mds_bal_fragment_size_max": "50" (Default 10)
>
> $ ceph health detail
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
> 1 large objects found in pool 'hpcfs_metadata'
> Search the cluster log for 'Large omap object found' for more details.
>
> # Find pg with large omap object
> $ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk '{print 
> $1}'`; do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | 
> head -1 | awk '{print $2}'; done | grep ": 1"
> 20.103: 1
>
> # OSD log entry showing relevant object
> osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large omap 
> object found. Object: 20:c0ce80d4:::4.:head Key count: 24698995 Size 
> (bytes): 11410935690
>
> # Confirm default warning thresholds for large omap object
> $ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
> "osd_deep_scrub_large_omap_object_key_threshold": "200",
> "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",
>
> # Dump keys/values of problematic object, creates 46.65GB file
> $ rados -p hpcfs_metadata listomapvals '4.' > 
> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> $ ll /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> -rw-r--r-- 1 root root 50089561860 Mar  4 18:16 
> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
>
> # Confirm key count matches OSD log entry warning
> $ rados -p hpcfs_metadata listomapkeys '4.' | wc -l
> 24698995
>
> # The omap keys/vals for that object appear to have been unchanged/static for 
> at least a couple of months:
> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> fd00ceb68607b477626178b2d81fefb926460107  
> /tmp/hpcfs_metadata_object_omap_vals_4._20190304
> $ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4__20190108
> fd00ceb68607b477626178b2d81fefb926460107  
> /tmp/hpcfs_metadata_object_omap_vals_4__20190108
>
> I haven't gone through all 24698995 keys yet, but while most appear to relate 
> to objects in the hpcfs_data CephFS data pool, there are a significant number 
> of keys (rough guess 25%) that don't appear to have corresponding objects in 
> the hpcfs_data pool.
>
> Any assistance or pointers to troubleshoot further would be very much 
> appreciated.
>
> Thanks,
> Dylan
>
> [1] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS - large omap object

2019-03-18 Thread Dylan McCulloch
Hi all,

We have a large omap object warning on one of our Ceph clusters.
The only reports I've seen regarding the "large omap objects" warning from 
other users were related to RGW bucket sharding, however we do not have RGW 
configured on this cluster.
The large omap object ~10GB resides in a CephFS metadata pool.

It's perhaps worth mentioning that we had to perform disaster recovery steps 
[1] on this cluster last year after a network issue, so we're not sure whether 
this large omap object is a result of those previous recovery processes or 
whether it's completely unrelated.

Ceph version: 12.2.8
osd_objectstore: Bluestore
RHEL 7.5
Kernel: 4.4.135-1.el7.elrepo.x86_64

We have set: "mds_bal_fragment_size_max": "50" (Default 10)

$ ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'hpcfs_metadata'
Search the cluster log for 'Large omap object found' for more details.

# Find pg with large omap object
$ for i in `ceph pg ls-by-pool hpcfs_metadata | tail -n +2 | awk '{print $1}'`; 
do echo -n "$i: "; ceph pg $i query |grep num_large_omap_objects | head -1 | 
awk '{print $2}'; done | grep ": 1"
20.103: 1

# OSD log entry showing relevant object
osd.143 osd.143 172.26.74.23:6826/3428317 1380 : cluster [WRN] Large omap 
object found. Object: 20:c0ce80d4:::4.:head Key count: 24698995 Size 
(bytes): 11410935690

# Confirm default warning thresholds for large omap object
$ ceph daemon osd.143 config show | grep osd_deep_scrub_large_omap
"osd_deep_scrub_large_omap_object_key_threshold": "200",
"osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",

# Dump keys/values of problematic object, creates 46.65GB file
$ rados -p hpcfs_metadata listomapvals '4.' > 
/tmp/hpcfs_metadata_object_omap_vals_4._20190304
$ ll /tmp/hpcfs_metadata_object_omap_vals_4._20190304
-rw-r--r-- 1 root root 50089561860 Mar  4 18:16 
/tmp/hpcfs_metadata_object_omap_vals_4._20190304

# Confirm key count matches OSD log entry warning
$ rados -p hpcfs_metadata listomapkeys '4.' | wc -l
24698995

# The omap keys/vals for that object appear to have been unchanged/static for 
at least a couple of months:
$ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4._20190304
fd00ceb68607b477626178b2d81fefb926460107  
/tmp/hpcfs_metadata_object_omap_vals_4._20190304
$ sha1sum /tmp/hpcfs_metadata_object_omap_vals_4__20190108
fd00ceb68607b477626178b2d81fefb926460107  
/tmp/hpcfs_metadata_object_omap_vals_4__20190108

I haven't gone through all 24698995 keys yet, but while most appear to relate 
to objects in the hpcfs_data CephFS data pool, there are a significant number 
of keys (rough guess 25%) that don't appear to have corresponding objects in 
the hpcfs_data pool.

Any assistance or pointers to troubleshoot further would be very much 
appreciated.

Thanks,
Dylan

[1] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Add to the slack channel

2019-03-18 Thread Mateusz Skała
Hi,
If possible, please add also my account. 
Regards
Mateusz
> Wiadomość napisana przez Trilok Agarwal  w dniu 
> 15.03.2019, o godz. 18:40:
> 
> Hi
> Can somebody over here invite me to join the ceph slack channel
> 
> Thanks
> TRILOK
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com