Re: [ceph-users] Rotating Cephx Keys

2018-07-09 Thread Konstantin Shalygin

On 07/10/2018 11:41 AM, Graeme Gillies wrote:

I think you are missing the part where if you update a key in ceph, in
the space between that and when you update it in ovirt-engine any new
connections to ceph by any ovirt nodes will fail


Yes, this is should be seconds. But, actually startup will not be failed 
for user, because if first start is failed, ovirt-engine will continue 
try's on another hosts.

If you start 1000+ vms per sec this is will not works for you.


  (as the key they have
ovirt side no longer matches what you have in ovirt-engine and all the
ovirt nodes).
oVirt-hosts does not have any cephx keys. oVirt-hosts don't know about 
Ceph anything. Keys is always pushed by ovirt-engine in xml to libvirt.






k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rotating Cephx Keys

2018-07-09 Thread Graeme Gillies



On 10/07/18 14:37, Konstantin Shalygin wrote:
>> If I
>> want to rotate the keys for that I can simply do that ceph cluster side,
>> but then I also need to do that on the client side (in my case virtual
>> machine hypervisors). DUring this window (which might be tiny with
>> decent tooling, but still non-zero) my clients can't do new connections
>> to the ceph cluster, which I assume will cause issues.
>
> It's depends on orchestrator. For example, oVirt maintain cephx keys
> by ovirt-engine. So, if key is changed we need to update key in oVirt,
> after this - every new client will use new key = zero downtime. Simple
> k,v storage.

I think you are missing the part where if you update a key in ceph, in
the space between that and when you update it in ovirt-engine any new
connections to ceph by any ovirt nodes will fail (as the key they have
ovirt side no longer matches what you have in ovirt-engine and all the
ovirt nodes).

That's the problem (unless I am misunderstanding what you are saying)

>
> Don't know how it looks in pure OpenStack, but oVirt hosts not need
> ceph.conf, keys always pushed by ovirt-engine.
>
>
>
> k
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rotating Cephx Keys

2018-07-09 Thread Konstantin Shalygin

If I
want to rotate the keys for that I can simply do that ceph cluster side,
but then I also need to do that on the client side (in my case virtual
machine hypervisors). DUring this window (which might be tiny with
decent tooling, but still non-zero) my clients can't do new connections
to the ceph cluster, which I assume will cause issues.


It's depends on orchestrator. For example, oVirt maintain cephx keys by 
ovirt-engine. So, if key is changed we need to update key in oVirt, 
after this - every new client will use new key = zero downtime. Simple 
k,v storage.


Don't know how it looks in pure OpenStack, but oVirt hosts not need 
ceph.conf, keys always pushed by ovirt-engine.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure coding RBD pool for OpenStack Glance, Nova and Cinder

2018-07-09 Thread Konstantin Shalygin

Does someone have used EC
pools with OpenStack in production ?



By chance, I found that link :
https://www.reddit.com/r/ceph/comments/72yc9m/ceph_openstack_with_ec/


Yes, this good post.

My configuration is:

cinder.conf:



[erasure-rbd-hdd]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = erasure-rbd-hdd
rbd_pool = erasure_rbd_meta
rbd_user = cinder_erasure_hdd
rbd_ceph_conf = /etc/ceph/ceph.conf


ceph.conf:



[client.cinder_erasure_hdd]
    rbd default data pool = erasure_rbd_data



Keep in mind, your minimal client version is Luminous.

So trick is - tell to everyone your pool is "erasure_rbd_meta", rbd 
clients will find data pool "erasure_rbd_data" automatically.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recovering from no quorum (2/3 monitors down) via 1 good monitor

2018-07-09 Thread Syahrul Sazli Shaharir
Hi,

I am running proxmox pve-5.1, with ceph luminous 12.2.4 as storage. I
have been running on 3 monitors, up until an abrupt power outage,
resulting in 2 monitors down and unable to start, while 1 monitor up
but with no quorum.

I tried extracting monmap from the good monitor and injecting it into
the other two, but got different errors for each:-

1. mon.mail1

# ceph-mon -i mail1 --inject-monmap /tmp/monmap
2018-07-10 11:29:03.562840 7f7d82845f80 -1 abort: Corruption: Bad
table magic number*** Caught signal (Aborted) **
 in thread 7f7d82845f80 thread_name:ceph-mon

 ceph version 12.2.4 (4832b6f0acade977670a37c20ff5dbe69e727416)
luminous (stable)
 1: (()+0x9439e4) [0x5652655669e4]
 2: (()+0x110c0) [0x7f7d81bfe0c0]
 3: (gsignal()+0xcf) [0x7f7d7ee12fff]
 4: (abort()+0x16a) [0x7f7d7ee1442a]
 5: (RocksDBStore::get(std::__cxx11::basic_string, std::allocator > const&,
std::__cxx11::basic_string,
std::allocator > const&, ceph::buffer::list*)+0x2f9)
[0x5652650a2eb9]
 6: (main()+0x1377) [0x565264ec3c57]
 7: (__libc_start_main()+0xf1) [0x7f7d7ee002e1]
 8: (_start()+0x2a) [0x565264f5954a]
2018-07-10 11:29:03.563721 7f7d82845f80 -1 *** Caught signal (Aborted) **
 in thread 7f7d82845f80 thread_name:ceph-mon

2.  mon,mail2

# ceph-mon -i mail2 --inject-monmap /tmp/monmap
2018-07-10 11:18:07.536097 7f161e2e3f80 -1 rocksdb: Corruption: Can't
access /065339.sst: IO error:
/var/lib/ceph/mon/ceph-mail2/store.db/065339.sst: No such file or
directory
Can't access /065337.sst: IO error:
/var/lib/ceph/mon/ceph-mail2/store.db/065337.sst: No such file or
directory

2018-07-10 11:18:07.536106 7f161e2e3f80 -1 error opening mon data
directory at '/var/lib/ceph/mon/ceph-mail2': (22) Invalid argument

Any other way I can recover other than rebuilding the monitor store
from the OSDs?

Thanks.

-- 
--sazli
Syahrul Sazli Shaharir 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-09 Thread Linh Vu
While we're on this topic, could someone please explain to me what 
`cephfs-table-tool all reset inode` does?


Does it only reset what the MDS has in its cache, and after starting up again, 
the MDS will read in new inode range from the metadata pool?


If so, does it mean *before* we run `cephfs-table-tool take_inos`, we must run 
`cephfs-table-tool all reset inode`?


Cheers,

Linh


From: ceph-users  on behalf of Wido den 
Hollander 
Sent: Saturday, 7 July 2018 12:26:15 AM
To: John Spray
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors



On 07/06/2018 01:47 PM, John Spray wrote:
> On Fri, Jul 6, 2018 at 12:19 PM Wido den Hollander  wrote:
>>
>>
>>
>> On 07/05/2018 03:36 PM, John Spray wrote:
>>> On Thu, Jul 5, 2018 at 1:42 PM Dennis Kramer (DBS)  wrote:

 Hi list,

 I have a serious problem now... I think.

 One of my users just informed me that a file he created (.doc file) has
 a different content then before. It looks like the file's inode is
 completely wrong and points to the wrong object. I myself have found
 another file with the same symptoms. I'm afraid my (production) FS is
 corrupt now, unless there is a possibility to fix the inodes.
>>>
>>> You can probably get back to a state with some valid metadata, but it
>>> might not necessarily be the metadata the user was expecting (e.g. if
>>> two files are claiming the same inode number, one of them's is
>>> probably going to get deleted).
>>>
 Timeline of what happend:

 Last week I upgraded our Ceph Jewel to Luminous.
 This went without any problem.

 I already had 5 MDS available and went with the Multi-MDS feature and
 enabled it. The seemed to work okay, but after a while my MDS went
 beserk and went flapping (crashed -> replay -> rejoin -> crashed)

 The only way to fix this and get the FS back online was the disaster
 recovery procedure:

 cephfs-journal-tool event recover_dentries summary
 ceph fs set cephfs cluster_down true
 cephfs-table-tool all reset session
 cephfs-table-tool all reset inode
 cephfs-journal-tool --rank=cephfs:0 journal reset
 ceph mds fail 0
 ceph fs reset cephfs --yes-i-really-mean-it
>>>
>>> My concern with this procedure is that the recover_dentries and
>>> journal reset only happened on rank 0, whereas the other 4 MDS ranks
>>> would have retained lots of content in their journals.  I wonder if we
>>> should be adding some more multi-mds aware checks to these tools, to
>>> warn the user when they're only acting on particular ranks (a
>>> reasonable person might assume that recover_dentries with no args is
>>> operating on all ranks, not just 0).  Created
>>> http://tracker.ceph.com/issues/24780 to track improving the default
>>> behaviour.
>>>
 Restarted the MDS and I was back online. Shortly after I was getting a
 lot of "loaded dup inode". In the meanwhile the MDS kept crashing. It
 looks like it had trouble creating new inodes. Right before the crash
 it mostly complained something like:

 -2> 2018-07-05 05:05:01.614290 7f8f8574b700  4 mds.0.server
 handle_client_request client_request(client.324932014:1434 create
 #0x1360346/pyfiles.txt 2018-07-05 05:05:01.607458 caller_uid=0,
 caller_gid=0{}) v2
 -1> 2018-07-05 05:05:01.614320 7f8f7e73d700  5 mds.0.log
 _submit_thread 24100753876035~1070 : EOpen [metablob 0x1360346, 1
 dirs], 1 open files
  0> 2018-07-05 05:05:01.661155 7f8f8574b700 -1 /build/ceph-
 12.2.5/src/mds/MDCache.cc: In function 'void
 MDCache::add_inode(CInode*)' thread 7f8f8574b700 time 2018-07-05
 05:05:01.615123
 /build/ceph-12.2.5/src/mds/MDCache.cc: 262: FAILED assert(!p)

 I also tried to counter the create inode crash by doing the following:

 cephfs-journal-tool event recover_dentries
 cephfs-journal-tool journal reset
 cephfs-table-tool all reset session
 cephfs-table-tool all reset inode
 cephfs-table-tool all take_inos 10
>>>
>>> This procedure is recovering some metadata from the journal into the
>>> main tree, then resetting everything, but duplicate inodes are
>>> happening when the main tree has multiple dentries containing inodes
>>> using the same inode number.
>>>
>>> What you need is something that scans through all the metadata,
>>> notices which entries point to the a duplicate, and snips out those
>>> dentries.  I'm not quite up to date on the latest CephFS forward scrub
>>> bits, so hopefully someone else can chime in to comment on whether we
>>> have the tooling for this already.
>>
>> But to prevent these crashes setting take_inos to a higher number is a
>> good choice, right? You'll loose inodes numbers, but you will have it
>> running without duplicate (new inodes).
>
> Yes -- that's the motivation to skipping inode numbers after some
> damage (but 

Re: [ceph-users] Rotating Cephx Keys

2018-07-09 Thread Gregory Farnum
On Mon, Jul 9, 2018 at 4:57 PM Graeme Gillies  wrote:

> On 10/07/18 04:40, Gregory Farnum wrote:
>
> On Sun, Jul 8, 2018 at 6:06 PM Graeme Gillies  wrote:
>
>> Hi,
>>
>> I was wondering how (if?) people handle rotating cephx keys while
>> keeping cluster up/available.
>>
>> Part of meeting compliance standards such as PCI DSS is making sure that
>> data encryption keys and security credentials are rotated regularly and
>> during other key points (such as notable staff turnover).
>>
>> We are currently looking at using Ceph as a storage solution and was
>> wondering how people handle rotating cephx keys (at the very least, the
>> admin and client.$user keys) while causing minimal/no downtime to ceph
>> or the clients.
>>
>> My understanding is that if you change the keys stored in the ceph kv db
>> then any existing sessions should still continue to work, but any new
>> ones (say, a hypervisor establishing new connections to osds for a new
>> vm volume) will fail until the key on the client side is also updated.
>>
>> I attempted to set two keys against the same client to see if I can have
>> an "overlap" period of new and old keys before rotating out the old key,
>> but it seems that ceph only has the concept of 1 key per user.
>>
>> Any hints, advice, or any information on how to achieve this would be
>> much appreciated.
>>
>>
> This isn't something I've seen come up much. Your understanding sounds
> correct to me, so as a naive developer I'd assume you just change the key
> in the monitors and distribute the new one to whoever should have it.
> There's a small window in which the admin with the old key can't do
> anything, but presumably you can coordinate around that?
>
> The big issue I'm aware of is that orchestration systems like OpenStack
> don't always do a good job supporting those changes — eg, I think it embeds
> some keys in its database descriptor for the rbd volume? :/
> -Greg
>
>
> I think the biggest problem with simply changing keys is that lets say I
> have a client connecting to ceph using a ceph.client.user account. If I
> want to rotate the keys for that I can simply do that ceph cluster side,
> but then I also need to do that on the client side (in my case virtual
> machine hypervisors). DUring this window (which might be tiny with decent
> tooling, but still non-zero) my clients can't do new connections to the
> ceph cluster, which I assume will cause issues.
>

Well, it will depend on what they're doing, right? Most Ceph clients just
set up a monitor connection and then maintain it until they shut down.
Although I guess if they need to establish a session to a *new* monitor and
you've changed the key, that might not go well — the client libraries
really aren't set up for that. Hrm.


>
> I do wonder if an RFE to allow ceph auth to accept multiple keys for
> client would be accepted? That way I would add my new key to the ceph auth
> (so clients can authenticate with either key), then rotate it out on my
> hypervisors, then remove the old key from ceph auth when done.
>

PRs are always welcome! This would probably take some work, though — the
"AuthMonitor" storage would need a pretty serious change, and the client
libraries extended to deal with changing them online as well.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rotating Cephx Keys

2018-07-09 Thread Graeme Gillies
On 10/07/18 04:40, Gregory Farnum wrote:
> On Sun, Jul 8, 2018 at 6:06 PM Graeme Gillies  > wrote:
>
> Hi,
>
> I was wondering how (if?) people handle rotating cephx keys while
> keeping cluster up/available.
>
> Part of meeting compliance standards such as PCI DSS is making
> sure that
> data encryption keys and security credentials are rotated
> regularly and
> during other key points (such as notable staff turnover).
>
> We are currently looking at using Ceph as a storage solution and was
> wondering how people handle rotating cephx keys (at the very
> least, the
> admin and client.$user keys) while causing minimal/no downtime to ceph
> or the clients.
>
> My understanding is that if you change the keys stored in the ceph
> kv db
> then any existing sessions should still continue to work, but any new
> ones (say, a hypervisor establishing new connections to osds for a new
> vm volume) will fail until the key on the client side is also updated.
>
> I attempted to set two keys against the same client to see if I
> can have
> an "overlap" period of new and old keys before rotating out the
> old key,
> but it seems that ceph only has the concept of 1 key per user.
>
> Any hints, advice, or any information on how to achieve this would be
> much appreciated.
>
>
> This isn't something I've seen come up much. Your understanding sounds
> correct to me, so as a naive developer I'd assume you just change the
> key in the monitors and distribute the new one to whoever should have
> it. There's a small window in which the admin with the old key can't
> do anything, but presumably you can coordinate around that?
>
> The big issue I'm aware of is that orchestration systems like
> OpenStack don't always do a good job supporting those changes — eg, I
> think it embeds some keys in its database descriptor for the rbd
> volume? :/
> -Greg

I think the biggest problem with simply changing keys is that lets say I
have a client connecting to ceph using a ceph.client.user account. If I
want to rotate the keys for that I can simply do that ceph cluster side,
but then I also need to do that on the client side (in my case virtual
machine hypervisors). DUring this window (which might be tiny with
decent tooling, but still non-zero) my clients can't do new connections
to the ceph cluster, which I assume will cause issues.

I do wonder if an RFE to allow ceph auth to accept multiple keys for
client would be accepted? That way I would add my new key to the ceph
auth (so clients can authenticate with either key), then rotate it out
on my hypervisors, then remove the old key from ceph auth when done.

As for Openstack, when I used it I was pretty sure it simply used
ceph.conf of the nova-compute hosts to connect to ceph (at least for
libvirt) however that doesn't mean it does something else for other
hypervisors or implementations.

Regards,

Graeme

>
>  
>
> Thanks in advance,
>
> Graeme
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-09 Thread Linh Vu
We're affected by something like this right now (the dup inode causing MDS to 
crash via assert(!p) with add_inode(CInode) function).

In terms of behaviours, shouldn't the MDS simply skip to the next available 
free inode in the event of a dup, than crashing the entire FS because of one 
file? Probably I'm missing something but that'd be a no brainer picking between 
the two?

From: ceph-users  on behalf of Wido den 
Hollander 
Sent: Saturday, 7 July 2018 12:26:15 AM
To: John Spray
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors



On 07/06/2018 01:47 PM, John Spray wrote:
> On Fri, Jul 6, 2018 at 12:19 PM Wido den Hollander  wrote:
>>
>>
>>
>> On 07/05/2018 03:36 PM, John Spray wrote:
>>> On Thu, Jul 5, 2018 at 1:42 PM Dennis Kramer (DBS)  wrote:

 Hi list,

 I have a serious problem now... I think.

 One of my users just informed me that a file he created (.doc file) has
 a different content then before. It looks like the file's inode is
 completely wrong and points to the wrong object. I myself have found
 another file with the same symptoms. I'm afraid my (production) FS is
 corrupt now, unless there is a possibility to fix the inodes.
>>>
>>> You can probably get back to a state with some valid metadata, but it
>>> might not necessarily be the metadata the user was expecting (e.g. if
>>> two files are claiming the same inode number, one of them's is
>>> probably going to get deleted).
>>>
 Timeline of what happend:

 Last week I upgraded our Ceph Jewel to Luminous.
 This went without any problem.

 I already had 5 MDS available and went with the Multi-MDS feature and
 enabled it. The seemed to work okay, but after a while my MDS went
 beserk and went flapping (crashed -> replay -> rejoin -> crashed)

 The only way to fix this and get the FS back online was the disaster
 recovery procedure:

 cephfs-journal-tool event recover_dentries summary
 ceph fs set cephfs cluster_down true
 cephfs-table-tool all reset session
 cephfs-table-tool all reset inode
 cephfs-journal-tool --rank=cephfs:0 journal reset
 ceph mds fail 0
 ceph fs reset cephfs --yes-i-really-mean-it
>>>
>>> My concern with this procedure is that the recover_dentries and
>>> journal reset only happened on rank 0, whereas the other 4 MDS ranks
>>> would have retained lots of content in their journals.  I wonder if we
>>> should be adding some more multi-mds aware checks to these tools, to
>>> warn the user when they're only acting on particular ranks (a
>>> reasonable person might assume that recover_dentries with no args is
>>> operating on all ranks, not just 0).  Created
>>> http://tracker.ceph.com/issues/24780 to track improving the default
>>> behaviour.
>>>
 Restarted the MDS and I was back online. Shortly after I was getting a
 lot of "loaded dup inode". In the meanwhile the MDS kept crashing. It
 looks like it had trouble creating new inodes. Right before the crash
 it mostly complained something like:

 -2> 2018-07-05 05:05:01.614290 7f8f8574b700  4 mds.0.server
 handle_client_request client_request(client.324932014:1434 create
 #0x1360346/pyfiles.txt 2018-07-05 05:05:01.607458 caller_uid=0,
 caller_gid=0{}) v2
 -1> 2018-07-05 05:05:01.614320 7f8f7e73d700  5 mds.0.log
 _submit_thread 24100753876035~1070 : EOpen [metablob 0x1360346, 1
 dirs], 1 open files
  0> 2018-07-05 05:05:01.661155 7f8f8574b700 -1 /build/ceph-
 12.2.5/src/mds/MDCache.cc: In function 'void
 MDCache::add_inode(CInode*)' thread 7f8f8574b700 time 2018-07-05
 05:05:01.615123
 /build/ceph-12.2.5/src/mds/MDCache.cc: 262: FAILED assert(!p)

 I also tried to counter the create inode crash by doing the following:

 cephfs-journal-tool event recover_dentries
 cephfs-journal-tool journal reset
 cephfs-table-tool all reset session
 cephfs-table-tool all reset inode
 cephfs-table-tool all take_inos 10
>>>
>>> This procedure is recovering some metadata from the journal into the
>>> main tree, then resetting everything, but duplicate inodes are
>>> happening when the main tree has multiple dentries containing inodes
>>> using the same inode number.
>>>
>>> What you need is something that scans through all the metadata,
>>> notices which entries point to the a duplicate, and snips out those
>>> dentries.  I'm not quite up to date on the latest CephFS forward scrub
>>> bits, so hopefully someone else can chime in to comment on whether we
>>> have the tooling for this already.
>>
>> But to prevent these crashes setting take_inos to a higher number is a
>> good choice, right? You'll loose inodes numbers, but you will have it
>> running without duplicate (new inodes).
>
> Yes -- that's the motivation to skipping inode numbers after some
> damage (but it won't 

Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Jason Dillaman
Is the link-local address of "fe80::219:99ff:fe9e:3a86%eth0" at least
present on the client computer you used? I would have expected the OSD to
determine the client address, so it's odd that it was able to get a
link-local address.

On Mon, Jul 9, 2018 at 3:43 PM Kevin Olbrich  wrote:

> 2018-07-09 21:25 GMT+02:00 Jason Dillaman :
>
>> BTW -- are you running Ceph on a one-node computer? I thought IPv6
>> addresses starting w/ fe80 were link-local addresses which would probably
>> explain why an interface scope id was appended. The current IPv6 address
>> parser stops reading after it encounters a non hex, colon character [1].
>>
>
> No, this is a compute machine attached to the storage vlan where I
> previously had also local disks.
>
>
>>
>>
>> On Mon, Jul 9, 2018 at 3:14 PM Jason Dillaman 
>> wrote:
>>
>>> Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses
>>> since it is failing to parse the address as valid. Perhaps it's barfing on
>>> the "%eth0" scope id suffix within the address.
>>>
>>> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:
>>>
 Hi!

 I tried to convert an qcow2 file to rbd and set the wrong pool.
 Immediately I stopped the transfer but the image is stuck locked:

 Previusly when that happened, I was able to remove the image after 30
 secs.

 [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
 There is 1 exclusive lock on this image.
 Locker ID  Address

 client.1195723 auto 93921602220416
 [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089

 [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
 93921602220416" client.1195723
 rbd: releasing lock failed: (22) Invalid argument
 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
 address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
 client: (22) Invalid argument

 The image is not in use anywhere!

 How can I force removal of all locks for this image?

 Kind regards,
 Kevin
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>>
>>> --
>>> Jason
>>>
>>
>> [1] https://github.com/ceph/ceph/blob/master/src/msg/msg_types.cc#L108
>>
>> --
>> Jason
>>
>
>

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
2018-07-09 21:25 GMT+02:00 Jason Dillaman :

> BTW -- are you running Ceph on a one-node computer? I thought IPv6
> addresses starting w/ fe80 were link-local addresses which would probably
> explain why an interface scope id was appended. The current IPv6 address
> parser stops reading after it encounters a non hex, colon character [1].
>

No, this is a compute machine attached to the storage vlan where I
previously had also local disks.


>
>
> On Mon, Jul 9, 2018 at 3:14 PM Jason Dillaman  wrote:
>
>> Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses
>> since it is failing to parse the address as valid. Perhaps it's barfing on
>> the "%eth0" scope id suffix within the address.
>>
>> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:
>>
>>> Hi!
>>>
>>> I tried to convert an qcow2 file to rbd and set the wrong pool.
>>> Immediately I stopped the transfer but the image is stuck locked:
>>>
>>> Previusly when that happened, I was able to remove the image after 30
>>> secs.
>>>
>>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
>>> There is 1 exclusive lock on this image.
>>> Locker ID  Address
>>>
>>> client.1195723 auto 93921602220416 [fe80::219:99ff:fe9e:3a86%
>>> eth0]:0/1200385089
>>>
>>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
>>> 93921602220416" client.1195723
>>> rbd: releasing lock failed: (22) Invalid argument
>>> 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
>>> address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>>> 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
>>> client: (22) Invalid argument
>>>
>>> The image is not in use anywhere!
>>>
>>> How can I force removal of all locks for this image?
>>>
>>> Kind regards,
>>> Kevin
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> --
>> Jason
>>
>
> [1] https://github.com/ceph/ceph/blob/master/src/msg/msg_types.cc#L108
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Jason Dillaman
BTW -- are you running Ceph on a one-node computer? I thought IPv6
addresses starting w/ fe80 were link-local addresses which would probably
explain why an interface scope id was appended. The current IPv6 address
parser stops reading after it encounters a non hex, colon character [1].

On Mon, Jul 9, 2018 at 3:14 PM Jason Dillaman  wrote:

> Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses since
> it is failing to parse the address as valid. Perhaps it's barfing on the
> "%eth0" scope id suffix within the address.
>
> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:
>
>> Hi!
>>
>> I tried to convert an qcow2 file to rbd and set the wrong pool.
>> Immediately I stopped the transfer but the image is stuck locked:
>>
>> Previusly when that happened, I was able to remove the image after 30
>> secs.
>>
>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
>> There is 1 exclusive lock on this image.
>> Locker ID  Address
>>
>> client.1195723 auto 93921602220416
>> [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>>
>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
>> 93921602220416" client.1195723
>> rbd: releasing lock failed: (22) Invalid argument
>> 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
>> address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>> 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
>> client: (22) Invalid argument
>>
>> The image is not in use anywhere!
>>
>> How can I force removal of all locks for this image?
>>
>> Kind regards,
>> Kevin
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Jason
>

[1] https://github.com/ceph/ceph/blob/master/src/msg/msg_types.cc#L108

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
Is it possible to force-remove the lock or the image?

Kevin

2018-07-09 21:14 GMT+02:00 Jason Dillaman :

> Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses since
> it is failing to parse the address as valid. Perhaps it's barfing on the
> "%eth0" scope id suffix within the address.
>
> On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:
>
>> Hi!
>>
>> I tried to convert an qcow2 file to rbd and set the wrong pool.
>> Immediately I stopped the transfer but the image is stuck locked:
>>
>> Previusly when that happened, I was able to remove the image after 30
>> secs.
>>
>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
>> There is 1 exclusive lock on this image.
>> Locker ID  Address
>>
>> client.1195723 auto 93921602220416 [fe80::219:99ff:fe9e:3a86%
>> eth0]:0/1200385089
>>
>> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
>> 93921602220416" client.1195723
>> rbd: releasing lock failed: (22) Invalid argument
>> 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
>> address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>> 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
>> client: (22) Invalid argument
>>
>> The image is not in use anywhere!
>>
>> How can I force removal of all locks for this image?
>>
>> Kind regards,
>> Kevin
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Jason Dillaman
Hmm ... it looks like there is a bug w/ RBD locks and IPv6 addresses since
it is failing to parse the address as valid. Perhaps it's barfing on the
"%eth0" scope id suffix within the address.

On Mon, Jul 9, 2018 at 2:47 PM Kevin Olbrich  wrote:

> Hi!
>
> I tried to convert an qcow2 file to rbd and set the wrong pool.
> Immediately I stopped the transfer but the image is stuck locked:
>
> Previusly when that happened, I was able to remove the image after 30 secs.
>
> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
> There is 1 exclusive lock on this image.
> Locker ID  Address
>
> client.1195723 auto 93921602220416
> [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
>
> [root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
> 93921602220416" client.1195723
> rbd: releasing lock failed: (22) Invalid argument
> 2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
> address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
> 2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
> client: (22) Invalid argument
>
> The image is not in use anywhere!
>
> How can I force removal of all locks for this image?
>
> Kind regards,
> Kevin
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'

2018-07-09 Thread Chad William Seys

Hi Greg,

Am i reading this right that you've got a 1-*byte* quota but have 
gigabytes of data in the tree?
I have no idea what that might do to the system, but it wouldn't totally 
surprise me if that was messing something up. Since <10KB definitely 
rounds towards 0...


Yeah, that directory only contains subdirectories, and those subdirs 
have separate quotes set.


E.g. getfattr --only-values -n ceph.quota.max_bytes 
/srv/smb/mcdermott-group/

2

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd lock remove unable to parse address

2018-07-09 Thread Kevin Olbrich
Hi!

I tried to convert an qcow2 file to rbd and set the wrong pool.
Immediately I stopped the transfer but the image is stuck locked:

Previusly when that happened, I was able to remove the image after 30 secs.

[root@vm2003 images1]# rbd -p rbd_vms_hdd lock list fpi_server02
There is 1 exclusive lock on this image.
Locker ID  Address

client.1195723 auto 93921602220416
[fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089

[root@vm2003 images1]# rbd -p rbd_vms_hdd lock rm fpi_server02 "auto
93921602220416" client.1195723
rbd: releasing lock failed: (22) Invalid argument
2018-07-09 20:45:19.080543 7f6c2c267d40 -1 librados: unable to parse
address [fe80::219:99ff:fe9e:3a86%eth0]:0/1200385089
2018-07-09 20:45:19.080555 7f6c2c267d40 -1 librbd: unable to blacklist
client: (22) Invalid argument

The image is not in use anywhere!

How can I force removal of all locks for this image?

Kind regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rotating Cephx Keys

2018-07-09 Thread Gregory Farnum
On Sun, Jul 8, 2018 at 6:06 PM Graeme Gillies  wrote:

> Hi,
>
> I was wondering how (if?) people handle rotating cephx keys while
> keeping cluster up/available.
>
> Part of meeting compliance standards such as PCI DSS is making sure that
> data encryption keys and security credentials are rotated regularly and
> during other key points (such as notable staff turnover).
>
> We are currently looking at using Ceph as a storage solution and was
> wondering how people handle rotating cephx keys (at the very least, the
> admin and client.$user keys) while causing minimal/no downtime to ceph
> or the clients.
>
> My understanding is that if you change the keys stored in the ceph kv db
> then any existing sessions should still continue to work, but any new
> ones (say, a hypervisor establishing new connections to osds for a new
> vm volume) will fail until the key on the client side is also updated.
>
> I attempted to set two keys against the same client to see if I can have
> an "overlap" period of new and old keys before rotating out the old key,
> but it seems that ceph only has the concept of 1 key per user.
>
> Any hints, advice, or any information on how to achieve this would be
> much appreciated.
>
>
This isn't something I've seen come up much. Your understanding sounds
correct to me, so as a naive developer I'd assume you just change the key
in the monitors and distribute the new one to whoever should have it.
There's a small window in which the admin with the old key can't do
anything, but presumably you can coordinate around that?

The big issue I'm aware of is that orchestration systems like OpenStack
don't always do a good job supporting those changes — eg, I think it embeds
some keys in its database descriptor for the rbd volume? :/
-Greg



> Thanks in advance,
>
> Graeme
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow response while "tail -f" on cephfs

2018-07-09 Thread Gregory Farnum
On Mon, Jul 9, 2018 at 12:46 AM Zhou Choury  wrote:

> Hi all
>  We mounted cephfs with ceph-fuse on two machines. We found that if a
> process writing a log on node A, while "tail -f" it on node B will quite
> slow, The mds server also complain like:
> > 2018-07-09 15:10:35.516602 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : 2 slow requests, 1 included below; oldest blocked for > 8.421551
> secs
> > 2018-07-09 15:10:35.516608 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : slow request 5.866578 seconds old, received at 2018-07-09
> 15:10:29.649997: client_request(client.3777088:24818 getattr pAsLsXsFs
> #108a41e 2018-07-09 15:10:03.842337) currently failed to rdlock, waiting
> > 2018-07-09 15:10:48.517367 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : 2 slow requests, 2 included below; oldest blocked for > 5.860196
> secs
> > 2018-07-09 15:10:48.517373 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : slow request 5.860196 seconds old, received at 2018-07-09
> 15:10:42.657139: client_request(client.3777088:24826 getattr pAsLsXsFs
> #108a41e 2018-07-09 15:10:16.843077) currently failed to rdlock, waiting
> > 2018-07-09 15:10:48.517375 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : slow request 5.622276 seconds old, received at 2018-07-09
> 15:10:42.895059: client_request(client.3775872:34689 lookup
> #1011127/choury-test 2018-07-09 15:10:42.894941) currently failed to
> rdlock, waiting
> > 2018-07-09 15:10:51.517530 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : 2 slow requests, 1 included below; oldest blocked for > 8.622448
> secs
> > 2018-07-09 15:10:51.517536 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : slow request 5.399846 seconds old, received at 2018-07-09
> 15:10:46.117661: client_request(client.3775872:34690 lookup
> #1011127/choury-test 2018-07-09 15:10:46.117586) currently failed to
> rdlock, waiting
> > 2018-07-09 15:10:53.517640 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : 2 slow requests, 1 included below; oldest blocked for > 10.622560
> secs
> > 2018-07-09 15:10:53.517646 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : slow request 10.622560 seconds old, received at 2018-07-09
> 15:10:42.895059: client_request(client.3775872:34689 lookup
> #1011127/choury-test 2018-07-09 15:10:42.894941) currently failed to
> rdlock, waiting
> > 2018-07-09 15:10:56.517819 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : 1 slow requests, 1 included below; oldest blocked for > 10.400132
> secs
> > 2018-07-09 15:10:56.517826 7f32fa0c2700  0 log_channel(cluster) log
> [WRN] : slow request 10.400132 seconds old, received at 2018-07-09
> 15:10:46.117661: client_request(client.3775872:34690 lookup
> #1011127/choury-test 2018-07-09 15:10:46.117586) currently failed to
> rdlock, waiting
>
> We reproduced this problem in the test cluster. there's only two
> processed(on two machines) access the cluster, the writer, and tail.
> The test writer code:
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > const char *s =
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
> > int main(int argc, char** argv){
> > FILE *f=fopen(argv[1],"ab+");
> > if(f==NULL){
> > printf("cant'to open destination file\n");
> > return 0;
> > }
> > int line = 0;
> > while(true){
> > char buff[1024]={0};
> > for(int i = 0; i< 200; i++){
> > buff[i] = s[rand()%62];
> > }
> > fprintf(f, "%d: %s\n", line++, buff);
> > fflush(f);
> > sleep(30);
> > }
> > fclose(f);
> > return 0;
> > }
> The version of ceph is 10.2.10.
> How can I reduce the latency and slow requests?
>

This is inherent to CephFS' strict consistency. When you issue a "tail" on
node B, you are forcing node A to flush out all of its data, then stop
writing, *then* the MDS returns the end of the file data to node B — then
node B drops its "capabilities" on the file and node A gets to start
writing again.

You can do things like this on some NFS systems because they don't actually
guarantee you'll see the results of the latest write to a file, but Ceph
doesn't work that way. There are some programmatic options for reducing
consistency, but I'm not sure any of them can be used to speed up a user
task like you have here.
-Greg


> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD for bluestore

2018-07-09 Thread Webert de Souza Lima
bluestore doesn't have a journal like the filestore does, but there is the
WAL (Write-Ahead Log) which is looks like a journal but works differently.
You can (or must, depending or your needs) have SSDs to serve this WAL (and
for Rocks DB).

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*


On Sun, Jul 8, 2018 at 11:58 AM Satish Patel  wrote:

> Folks,
>
> I'm just reading from multiple post that bluestore doesn't need SSD
> journel, is that true?
>
> I'm planning to build 5 node cluster so depending on that I purchase SSD
> for journel.
>
> If it does require SSD for journel then what would be the best vendor and
> model which last long? Any recommendation
>
> Sent from my iPhone
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'

2018-07-09 Thread Gregory Farnum
On Fri, Jul 6, 2018 at 10:30 AM Chad William Seys 
wrote:

> Hi all,
>I'm having a problem that when I mount cephfs with a quota in the
> root mount point, no ceph-fuse appears in 'mount' and df reports:
>
> Filesystem 1K-blocks  Used Available Use% Mounted on
> ceph-fuse  0 0 0- /srv/smb
>
> If I 'ls' I see the expected files:
> # ls -alh
> total 6.0K
> drwxrwxr-x+ 1 root smbadmin  18G Jul  5 17:06 .
> drwxr-xr-x  5 root smbadmin 4.0K Jun 16  2017 ..
> drwxrwx---+ 1 smbadmin smbadmin 3.0G Jan 18 10:50 bigfix-relay-cache
> drwxrwxr-x+ 1 smbadmin smbadmin  15G Jul  6 11:51 instr_files
> drwxrwx---+ 1 smbadmin smbadmin0 Jul  6 11:50 mcdermott-group
>
> Quotas are being used:
> getfattr --only-values -n ceph.quota.max_bytes /srv/smb
> 1
>

Am i reading this right that you've got a 1-*byte* quota but have
gigabytes of data in the tree?
I have no idea what that might do to the system, but it wouldn't totally
surprise me if that was messing something up. Since <10KB definitely rounds
towards 0...
-Greg


>
> Turning off the quota at the mountpoint allows df and mount to work
> correctly.
>
> I'm running 12.2.4 on the servers and 12.2.5 on the client.
>
> Is there a bug report for this?
> Thanks!
> Chad.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] iSCSI SCST not working with Kernel 4.17.5

2018-07-09 Thread Steven Vacaroaia
Hi,

Just wondering if any of you managed to use SCST with kernel 4.17.5 ?
Apparently SCST works only with kernel 3.10

Alternatively, is ceph-iscsi running properly  with the newest kernel ?

Installationa and configuration went well but accessing the LUN fail with
the following error

".. kernel: [2643]: dev_vdisk: vdev_flush_end_io:7071:***ERROR***: FLUSH
bio failed:"

I am using latest SCST ( svn revision 7403 )

I am reluctant to downgrade kernel
Any advice / suggestions will be truly appreciated

Thanks
Steven
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mimic 13.2.1 release date

2018-07-09 Thread Wido den Hollander
Hi,

Is there a release date for Mimic 13.2.1 yet?

There are a few issues which currently make deploying with Mimic 13.2.0
a bit difficult, for example:

- https://tracker.ceph.com/issues/24423
- https://github.com/ceph/ceph/pull/22393

Especially the first one makes it difficult.

13.2.1 would be very welcome with these fixes in there.

Is there a ETA for this version yet?

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client.bootstrap-osd authentication error - which keyrin

2018-07-09 Thread Paul Emmerich
2018-07-09 16:10 GMT+02:00 Thomas Roth :

> Thanks, but doesn't work.
>
> It is always the subcommand
> /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>
> (also 'ceph ... osd tree -i - osd new NEWID')
>
> which fails with client.bootstrap-osd authentication error
>
> Of course, 'ceph osd tree' works just fine.
>
>
> Must be something I have missed when upgrading from Jewel. In fact, there
> where no
> boostrap-xxx/keyrings anywhere, I just have my /etc/ceph/ceph.mon.keyring
> which seems to have managed
> the magic before.
>

You have to put it there manually if you don't use any deployment tool like
ceph-deploy.

If you have a normal admin keyring on that server anyways, then you can
just run

ceph auth get client.bootstrap-osd >
/var/lib/ceph/bootstrap-osd/ceph.keyring


Paul


>
> Cheers,
> Thommas
>
> On 07/06/2018 09:36 PM, Paul Emmerich wrote:
> > Hi,
> >
> > both ceph-disk and ceph-volume need a keyring in the file
> >
> > /var/lib/ceph/bootstrap-osd/ceph.keyring
> >
> > The key should look like this:
> >
> > [client.bootstrap-osd]
> > key = 
> > caps mon = "allow profile bootstrap-osd"
> >
> >
> > Paul
> >
> >
> > 2018-07-06 16:47 GMT+02:00 Thomas Roth :
> >
> >> Hi all,
> >>
> >> I wonder which is the correct key to create/recreate an additional OSD
> >> with 12.2.5.
> >>
> >> Following
> >> http://docs.ceph.com/docs/master/rados/operations/
> >> bluestore-migration/#convert-existing-osds, I took
> >> one of my old OSD out of the cluster, but failed subsequently recreating
> >> it as a BlueStor OSD.
> >>
> >> I tried "ceph-volume" at first, now got one step further using
> "ceph-disk"
> >> with
> >> "ceph-disk prepare --bluestore /dev/sdh", which completed, I assume
> >> successfully.
> >>
> >> However, "ceph-disk activate" fails with basically the same error as
> >> "ceph-volume" before,
> >>
> >>
> >> ~# ceph-disk activate /dev/sdh1
> >> command_with_stdin: 2018-07-06 16:23:18.677429 7f905de45700  0 librados:
> >> client.bootstrap-osd
> >> authentication error (1) Operation not permitted
> >> [errno 1] error connecting to the cluster
> >>
> >>
> >>
> >> Now this test cluster was created under Jewel, where I created OSDs by
> >> "ceph-osd -i $ID --mkfs --mkkey --osd-uuid $UUID"
> >> and
> >> "ceph auth add osd.#{ID} osd 'allow *' mon 'allow profile osd' -i
> >> /var/lib/ceph/osd/ceph-#{ID}/keyring"
> >>
> >> This did not produce any "/var/lib/ceph/bootstrap-osd/ceph.keyring",
> but
> >> I found them on my mon hosts.
> >> "ceph-volume" and "ceph-disk" go looking for that file, so I put it
> there,
> >> to no avail.
> >>
> >>
> >>
> >> Btw, the target server has still several "up" and "in" OSDs running, so
> >> this is not a question of
> >> network or general authentication issues.
> >>
> >> Cheers,
> >> Thomas
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
> >
>
> --
> 
> Thomas Roth
> Department: Informationstechnologie
> Location: SB3 2.291
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1
> 64291 Darmstadt
> www.gsi.de
>
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Darmstadt
> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>
> Geschäftsführung: Ursula Weyrich
> Professor Dr. Paolo Giubellino
> Jörg Blaurock
>
> Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client.bootstrap-osd authentication error - which keyrin

2018-07-09 Thread Alfredo Deza
On Mon, Jul 9, 2018 at 10:10 AM, Thomas Roth  wrote:
> Thanks, but doesn't work.
>
> It is always the subcommand
> /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>
> (also 'ceph ... osd tree -i - osd new NEWID')
>
> which fails with client.bootstrap-osd authentication error
>
> Of course, 'ceph osd tree' works just fine.
>
>
> Must be something I have missed when upgrading from Jewel. In fact, there 
> where no
> boostrap-xxx/keyrings anywhere, I just have my /etc/ceph/ceph.mon.keyring 
> which seems to have managed
> the magic before.

Provisioning tools like ceph-deploy will add the bootstrap keyring if
not present. Please not that /etc/ceph/*.keyring is not sufficient
here. The bootstrap key
must exist.


>
> Cheers,
> Thommas
>
> On 07/06/2018 09:36 PM, Paul Emmerich wrote:
>> Hi,
>>
>> both ceph-disk and ceph-volume need a keyring in the file
>>
>> /var/lib/ceph/bootstrap-osd/ceph.keyring
>>
>> The key should look like this:
>>
>> [client.bootstrap-osd]
>> key = 
>> caps mon = "allow profile bootstrap-osd"
>>
>>
>> Paul
>>
>>
>> 2018-07-06 16:47 GMT+02:00 Thomas Roth :
>>
>>> Hi all,
>>>
>>> I wonder which is the correct key to create/recreate an additional OSD
>>> with 12.2.5.
>>>
>>> Following
>>> http://docs.ceph.com/docs/master/rados/operations/
>>> bluestore-migration/#convert-existing-osds, I took
>>> one of my old OSD out of the cluster, but failed subsequently recreating
>>> it as a BlueStor OSD.
>>>
>>> I tried "ceph-volume" at first, now got one step further using "ceph-disk"
>>> with
>>> "ceph-disk prepare --bluestore /dev/sdh", which completed, I assume
>>> successfully.
>>>
>>> However, "ceph-disk activate" fails with basically the same error as
>>> "ceph-volume" before,
>>>
>>>
>>> ~# ceph-disk activate /dev/sdh1
>>> command_with_stdin: 2018-07-06 16:23:18.677429 7f905de45700  0 librados:
>>> client.bootstrap-osd
>>> authentication error (1) Operation not permitted
>>> [errno 1] error connecting to the cluster
>>>
>>>
>>>
>>> Now this test cluster was created under Jewel, where I created OSDs by
>>> "ceph-osd -i $ID --mkfs --mkkey --osd-uuid $UUID"
>>> and
>>> "ceph auth add osd.#{ID} osd 'allow *' mon 'allow profile osd' -i
>>> /var/lib/ceph/osd/ceph-#{ID}/keyring"
>>>
>>> This did not produce any "/var/lib/ceph/bootstrap-osd/ceph.keyring", but
>>> I found them on my mon hosts.
>>> "ceph-volume" and "ceph-disk" go looking for that file, so I put it there,
>>> to no avail.
>>>
>>>
>>>
>>> Btw, the target server has still several "up" and "in" OSDs running, so
>>> this is not a question of
>>> network or general authentication issues.
>>>
>>> Cheers,
>>> Thomas
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>
> --
> 
> Thomas Roth
> Department: Informationstechnologie
> Location: SB3 2.291
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1
> 64291 Darmstadt
> www.gsi.de
>
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Darmstadt
> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>
> Geschäftsführung: Ursula Weyrich
> Professor Dr. Paolo Giubellino
> Jörg Blaurock
>
> Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client.bootstrap-osd authentication error - which keyrin

2018-07-09 Thread Thomas Roth
Thanks, but doesn't work.

It is always the subcommand
/usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json

(also 'ceph ... osd tree -i - osd new NEWID')

which fails with client.bootstrap-osd authentication error

Of course, 'ceph osd tree' works just fine.


Must be something I have missed when upgrading from Jewel. In fact, there where 
no
boostrap-xxx/keyrings anywhere, I just have my /etc/ceph/ceph.mon.keyring which 
seems to have managed
the magic before.

Cheers,
Thommas

On 07/06/2018 09:36 PM, Paul Emmerich wrote:
> Hi,
> 
> both ceph-disk and ceph-volume need a keyring in the file
> 
> /var/lib/ceph/bootstrap-osd/ceph.keyring
> 
> The key should look like this:
> 
> [client.bootstrap-osd]
> key = 
> caps mon = "allow profile bootstrap-osd"
> 
> 
> Paul
> 
> 
> 2018-07-06 16:47 GMT+02:00 Thomas Roth :
> 
>> Hi all,
>>
>> I wonder which is the correct key to create/recreate an additional OSD
>> with 12.2.5.
>>
>> Following
>> http://docs.ceph.com/docs/master/rados/operations/
>> bluestore-migration/#convert-existing-osds, I took
>> one of my old OSD out of the cluster, but failed subsequently recreating
>> it as a BlueStor OSD.
>>
>> I tried "ceph-volume" at first, now got one step further using "ceph-disk"
>> with
>> "ceph-disk prepare --bluestore /dev/sdh", which completed, I assume
>> successfully.
>>
>> However, "ceph-disk activate" fails with basically the same error as
>> "ceph-volume" before,
>>
>>
>> ~# ceph-disk activate /dev/sdh1
>> command_with_stdin: 2018-07-06 16:23:18.677429 7f905de45700  0 librados:
>> client.bootstrap-osd
>> authentication error (1) Operation not permitted
>> [errno 1] error connecting to the cluster
>>
>>
>>
>> Now this test cluster was created under Jewel, where I created OSDs by
>> "ceph-osd -i $ID --mkfs --mkkey --osd-uuid $UUID"
>> and
>> "ceph auth add osd.#{ID} osd 'allow *' mon 'allow profile osd' -i
>> /var/lib/ceph/osd/ceph-#{ID}/keyring"
>>
>> This did not produce any "/var/lib/ceph/bootstrap-osd/ceph.keyring", but
>> I found them on my mon hosts.
>> "ceph-volume" and "ceph-disk" go looking for that file, so I put it there,
>> to no avail.
>>
>>
>>
>> Btw, the target server has still several "up" and "in" OSDs running, so
>> this is not a question of
>> network or general authentication issues.
>>
>> Cheers,
>> Thomas
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 
> 

-- 

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführung: Ursula Weyrich
Professor Dr. Paolo Giubellino
Jörg Blaurock

Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FYI - Mimic segv in OSD

2018-07-09 Thread Steffen Winther Sørensen


> On 9 Jul 2018, at 15.49, John Spray  wrote:
> 
> On Mon, Jul 9, 2018 at 2:37 PM Steffen Winther Sørensen
>  wrote:
>> 
>> Dunno if this has been seen before so just for info, 1 in 24 OSD just did 
>> this:
>> 
>> Jul  9 15:13:35 n4 ceph-osd: *** Caught signal (Segmentation fault) **
>> Jul  9 15:13:35 n4 ceph-osd: in thread 7ff209282700 thread_name:msgr-worker-2
>> Jul  9 15:13:35 n4 kernel: msgr-worker-2[4697]: segfault at 0 ip 
>> 7ff21002f42b sp 7ff20927b9c0 error 4 in 
>> libtcmalloc.so.4.4.5[7ff210008000+46000]
>> Jul  9 15:13:36 n4 systemd: ceph-osd@2.service: main process exited, 
>> code=killed, status=11/SEGV
>> Jul  9 15:13:36 n4 systemd: Unit ceph-osd@2.service entered failed state.
>> Jul  9 15:13:36 n4 systemd: ceph-osd@2.service failed.
> 
> Hopefully there's a stack trace above those lines in your OSD log?
Nope just what looks like relaunch events:
...
2018-07-09 14:45:17.158 7ff1ef75f700  0 log_channel(cluster) log [DBG] : 3.a0 
scrub starts
2018-07-09 14:45:17.185 7ff1ef75f700  0 log_channel(cluster) log [DBG] : 3.a0 
scrub ok
2018-07-09 15:07:04.398 7ff1eef5e700  0 log_channel(cluster) log [DBG] : 4.b0 
scrub starts
2018-07-09 15:07:04.412 7ff1eef5e700  0 log_channel(cluster) log [DBG] : 4.b0 
scrub ok
2018-07-09 15:13:56.365 7f31359411c0  0 set uid:gid to 167:167 (ceph:ceph)
2018-07-09 15:13:56.365 7f31359411c0  0 ceph version 13.2.0 
(79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable), process (unknown), 
pid 604987
2018-07-09 15:13:56.365 7f31359411c0  0 pidfile_write: ignore empty --pid-file
2018-07-09 15:13:56.442 7f31359411c0  0 load: jerasure load: lrc load: isa
2018-07-09 15:13:56.442 7f31359411c0  1 bdev create path 
/var/lib/ceph/osd/ceph-2/block type kernel
2018-07-09 15:13:56.442 7f31359411c0  1 bdev(0x559559628000 
/var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2018-07-09 15:13:56.443 7f31359411c0  1 bdev(0x559559628000 
/var/lib/ceph/osd/ceph-2/block) open size 146775474176 (0x222c80, 137 GiB) 
block_size 4096 (4 KiB) rotational
2018-07-09 15:13:56.444 7f31359411c0  1 bluestore(/var/lib/ceph/osd/ceph-2) 
_set_cache_sizes cache_size 1073741824 meta 0.5 kv 0.5 data 0
2018-07-09 15:13:56.444 7f31359411c0  1 bdev(0x559559628000 
/var/lib/ceph/osd/ceph-2/block) close
2018-07-09 15:13:56.700 7f31359411c0  1 bluestore(/var/lib/ceph/osd/ceph-2) 
_mount path /var/lib/ceph/osd/ceph-2
2018-07-09 15:13:56.700 7f31359411c0  1 bdev create path 
/var/lib/ceph/osd/ceph-2/block type kernel
2018-07-09 15:13:56.700 7f31359411c0  1 bdev(0x559559628000 
/var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2018-07-09 15:13:56.700 7f31359411c0  1 bdev(0x559559628000 
/var/lib/ceph/osd/ceph-2/block) open size 146775474176 (0x222c80, 137 GiB) 
block_size 4096 (4 KiB) rotational
2018-07-09 15:13:56.701 7f31359411c0  1 bluestore(/var/lib/ceph/osd/ceph-2) 
_set_cache_sizes cache_size 1073741824 meta 0.5 kv 0.5 data 0
2018-07-09 15:13:56.701 7f31359411c0  1 bdev create path 
/var/lib/ceph/osd/ceph-2/block type kernel
2018-07-09 15:13:56.701 7f31359411c0  1 bdev(0x559559628a80 
/var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2018-07-09 15:13:56.701 7f31359411c0  1 bdev(0x559559628a80 
/var/lib/ceph/osd/ceph-2/block) open size 146775474176 (0x222c80, 137 GiB) 
block_size 4096 (4 KiB) rotational
2018-07-09 15:13:56.701 7f31359411c0  1 bluefs add_block_device bdev 1 path 
/var/lib/ceph/osd/ceph-2/block size 137 GiB
2018-07-09 15:13:56.701 7f31359411c0  1 bluefs mount
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
compaction_readahead_size = 2097152
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option compression = 
kNoCompression
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
max_write_buffer_number = 4
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
min_write_buffer_number_to_merge = 1
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
recycle_log_file_num = 4
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
writable_file_max_buffer_size = 0
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option write_buffer_size = 
268435456
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
compaction_readahead_size = 2097152
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option compression = 
kNoCompression
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
max_write_buffer_number = 4
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
min_write_buffer_number_to_merge = 1
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
recycle_log_file_num = 4
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option 
writable_file_max_buffer_size = 0
2018-07-09 15:13:56.757 7f31359411c0  0  set rocksdb option write_buffer_size = 
268435456
2018-07-09 15:13:56.764 7f31359411c0  1 rocksdb: do_open column families: 
[default]
2018-07-09 15:13:56.764 7f31359411c0  4 rocksdb: RocksDB version: 5.13.0

2018-07-09 

Re: [ceph-users] FYI - Mimic segv in OSD

2018-07-09 Thread John Spray
On Mon, Jul 9, 2018 at 2:37 PM Steffen Winther Sørensen
 wrote:
>
> Dunno if this has been seen before so just for info, 1 in 24 OSD just did 
> this:
>
> Jul  9 15:13:35 n4 ceph-osd: *** Caught signal (Segmentation fault) **
> Jul  9 15:13:35 n4 ceph-osd: in thread 7ff209282700 thread_name:msgr-worker-2
> Jul  9 15:13:35 n4 kernel: msgr-worker-2[4697]: segfault at 0 ip 
> 7ff21002f42b sp 7ff20927b9c0 error 4 in 
> libtcmalloc.so.4.4.5[7ff210008000+46000]
> Jul  9 15:13:36 n4 systemd: ceph-osd@2.service: main process exited, 
> code=killed, status=11/SEGV
> Jul  9 15:13:36 n4 systemd: Unit ceph-osd@2.service entered failed state.
> Jul  9 15:13:36 n4 systemd: ceph-osd@2.service failed.

Hopefully there's a stack trace above those lines in your OSD log?

John

> # ceph --version
> ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
> # cat /etc/centos-release
> CentOS Linux release 7.5.1804 (Core)
> # uname -r
> 3.10.0-862.3.3.el7.x86_64
>
> /Steffen
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'

2018-07-09 Thread John Spray
On Fri, Jul 6, 2018 at 6:30 PM Chad William Seys
 wrote:
>
> Hi all,
>I'm having a problem that when I mount cephfs with a quota in the
> root mount point, no ceph-fuse appears in 'mount' and df reports:
>
> Filesystem 1K-blocks  Used Available Use% Mounted on
> ceph-fuse  0 0 0- /srv/smb
>
> If I 'ls' I see the expected files:
> # ls -alh
> total 6.0K
> drwxrwxr-x+ 1 root smbadmin  18G Jul  5 17:06 .
> drwxr-xr-x  5 root smbadmin 4.0K Jun 16  2017 ..
> drwxrwx---+ 1 smbadmin smbadmin 3.0G Jan 18 10:50 bigfix-relay-cache
> drwxrwxr-x+ 1 smbadmin smbadmin  15G Jul  6 11:51 instr_files
> drwxrwx---+ 1 smbadmin smbadmin0 Jul  6 11:50 mcdermott-group
>
> Quotas are being used:
> getfattr --only-values -n ceph.quota.max_bytes /srv/smb
> 1
>
> Turning off the quota at the mountpoint allows df and mount to work
> correctly.
>
> I'm running 12.2.4 on the servers and 12.2.5 on the client.

That's pretty weird, not something I recall seeing before.  When
quotas are in use, Ceph is implementing the same statfs() hook to
report usage to the OS, but it's doing a getattr() call to the MDS
inside that function.  I wonder if something is going slowly, and
perhaps the OS is ignoring filesystems that don't return promptly, to
avoid hanging "df" on a misbehaving filesystem?

I'd debug this by setting "debug ms = 1", and finding the client's log
in /var/log/ceph.

John

>
> Is there a bug report for this?
> Thanks!
> Chad.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] FYI - Mimic segv in OSD

2018-07-09 Thread Steffen Winther Sørensen
Dunno if this has been seen before so just for info, 1 in 24 OSD just did this:

Jul  9 15:13:35 n4 ceph-osd: *** Caught signal (Segmentation fault) **
Jul  9 15:13:35 n4 ceph-osd: in thread 7ff209282700 thread_name:msgr-worker-2
Jul  9 15:13:35 n4 kernel: msgr-worker-2[4697]: segfault at 0 ip 
7ff21002f42b sp 7ff20927b9c0 error 4 in 
libtcmalloc.so.4.4.5[7ff210008000+46000]
Jul  9 15:13:36 n4 systemd: ceph-osd@2.service: main process exited, 
code=killed, status=11/SEGV
Jul  9 15:13:36 n4 systemd: Unit ceph-osd@2.service entered failed state.
Jul  9 15:13:36 n4 systemd: ceph-osd@2.service failed.

# ceph --version
ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
# cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core) 
# uname -r
3.10.0-862.3.3.el7.x86_64

/Steffen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Different write pools for RGW objects

2018-07-09 Thread Adrian Nicolae

Hi,

I was wondering if I can have  different destination pools for the S3 
objects uploaded to Ceph via RGW based on the object's size.


For example :

- smaller S3 objects (let's say smaller than 1MB) should go to a 
replicated pool


- medium and big objects should go to a EC pool

Is there any way to do that ? I couldn't find such configuration option 
in the Crush Rules docs and the RGW docs.


Thanks.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fuse vs kernel client

2018-07-09 Thread Jake Grimmett
Hi Manuel,

My own experiences are that cephfs kernel client is significantly faster
than fuse, however the fuse client is generally more reliable.

If you need the extra speed of the kernel client on Centos, it may be
worth using the ml kernel, as this gives you much more up to date cephfs
support.

If I understand your benchmarking, the two machines you are using to
test cephfs-fuse vs cephfs-kernel are very different, so too are your
test parameters. To get an accurate comparison, why not mount both fuse
and kernel on the same machine, and then re-run your tests?

best,

Jake


On 09/07/18 09:18, Manuel Sopena Ballesteros wrote:
> Dear ceph community,
> 
>  
> 
> I just installed ceph luminous in a small NVMe cluster for testing and I
> tested 2 clients:
> 
>  
> 
> Client 1:
> 
> VM running centos 7
> 
> Ceph client: kernel
> 
> # cpus: 4
> 
> RAM: 16GB
> 
>  
> 
> Fio test
> 
>  
> 
> # sudo fio --name=xx --filename=/mnt/mycephfs/test.file3 --filesize=100G
> --iodepth=1 --rw=write --bs=4M --numjobs=2 --group_reporting
> 
> xx: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T)
> 4096KiB-4096KiB, ioengine=psync, iodepth=1
> 
> ...
> 
> fio-3.1
> 
> Starting 2 processes
> 
> xx: Laying out IO file (1 file / 102400MiB)
> 
> Jobs: 1 (f=1): [_(1),W(1)][100.0%][r=0KiB/s,w=2325MiB/s][r=0,w=581
> IOPS][eta 00m:00s]
> 
> xx: (groupid=0, jobs=2): err= 0: pid=24290: Mon Jul  9 17:54:57 2018
> 
>   write: IOPS=550, BW=2203MiB/s (2310MB/s)(200GiB/92946msec)
> 
>     clat (usec): min=946, max=464990, avg=3519.59, stdev=7031.97
> 
>  lat (usec): min=1010, max=465091, avg=3612.53, stdev=7035.85
> 
>     clat percentiles (usec):
> 
>  |  1.00th=[  1188],  5.00th=[  1631], 10.00th=[  2245], 20.00th=[ 
> 2409],
> 
>  | 30.00th=[  2540], 40.00th=[  2671], 50.00th=[  2802], 60.00th=[ 
> 2966],
> 
>  | 70.00th=[  3195], 80.00th=[  3654], 90.00th=[  5080], 95.00th=[ 
> 6521],
> 
>  | 99.00th=[ 11469], 99.50th=[ 16450], 99.90th=[100140],
> 99.95th=[149947],
> 
>  | 99.99th=[291505]
> 
>    bw (  MiB/s): min=  224, max= 2064, per=50.01%, avg=1101.97,
> stdev=205.16, samples=369
> 
>    iops    : min=   56, max=  516, avg=275.27, stdev=51.29, samples=369
> 
>   lat (usec)   : 1000=0.01%
> 
>   lat (msec)   : 2=7.89%, 4=75.24%, 10=15.42%, 20=1.09%, 50=0.22%
> 
>   lat (msec)   : 100=0.04%, 250=0.08%, 500=0.02%
> 
>   cpu  : usr=2.31%, sys=76.39%, ctx=15743, majf=1, minf=55
> 
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>=64=0.0%
> 
>  submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>=64=0.0%
> 
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>=64=0.0%
> 
>  issued rwt: total=0,51200,0, short=0,0,0, dropped=0,0,0
> 
>  latency   : target=0, window=0, percentile=100.00%, depth=1
> 
>  
> 
> Run status group 0 (all jobs):
> 
>   WRITE: bw=2203MiB/s (2310MB/s), 2203MiB/s-2203MiB/s
> (2310MB/s-2310MB/s), io=200GiB (215GB), run=92946-92946msec
> 
>  
> 
> Client 2:
> 
> Physical machine running Ubuntu xenial
> 
> Ceph client: FUSE
> 
> # cpus: 56
> 
> RAM: 512
> 
>  
> 
> Fio test
> 
>  
> 
> $ sudo fio --name=xx --filename=/mnt/cephfs/test.file2 --filesize=5G
> --iodepth=1 --rw=write --bs=4M --numjobs=1 --group_reporting
> 
> xx: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=sync, iodepth=1
> 
> fio-2.2.10
> 
> Starting 1 process
> 
> xx: Laying out IO file(s) (1 file(s) / 5120MB)
> 
> Jobs: 1 (f=1): [W(1)] [91.7% done] [0KB/580.0MB/0KB /s] [0/145/0 iops]
> [eta 00m:01s]
> 
> xx: (groupid=0, jobs=1): err= 0: pid=6065: Mon Jul  9 17:44:02 2018
> 
>   write: io=5120.0MB, bw=497144KB/s, iops=121, runt= 10546msec
> 
>     clat (msec): min=3, max=159, avg= 7.94, stdev= 4.81
> 
>  lat (msec): min=3, max=159, avg= 8.08, stdev= 4.82
> 
>     clat percentiles (msec):
> 
>  |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
> 
>  | 30.00th=[    7], 40.00th=[    8], 50.00th=[    8], 60.00th=[    9],
> 
>  | 70.00th=[    9], 80.00th=[   10], 90.00th=[   11], 95.00th=[   11],
> 
>  | 99.00th=[   12], 99.50th=[   13], 99.90th=[   61], 99.95th=[  159],
> 
>  | 99.99th=[  159]
> 
>     bw (KB  /s): min=185448, max=726183, per=97.08%, avg=482611.80,
> stdev=118874.09
> 
>     lat (msec) : 4=1.64%, 10=88.20%, 20=10.00%, 100=0.08%, 250=0.08%
> 
>   cpu  : usr=1.63%, sys=34.44%, ctx=42266, majf=0, minf=1586
> 
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>=64=0.0%
> 
>  submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>=64=0.0%
> 
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>=64=0.0%
> 
>  issued    : total=r=0/w=1280/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
> 
>  latency   : target=0, window=0, percentile=100.00%, depth=1
> 
>  
> 
> Run status group 0 (all jobs):
> 
>   WRITE: io=5120.0MB, aggrb=497143KB/s, minb=497143KB/s,
> maxb=497143KB/s, mint=10546msec, maxt=10546msec
> 
>  
> 

[ceph-users] radosgw frontend : civetweb vs fastcgi

2018-07-09 Thread Will Zhao
Hi:
I see that civetweb is still using poll and multithread,  compared with
fastcgi, which one should  I use ?  Which one has better performance ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests

2018-07-09 Thread Brad Hubbard
On Mon, Jul 9, 2018 at 5:28 PM, Benjamin Naber
 wrote:
> Hi @all,
>
> Problem seems to be solved, afther downgrading from Kernel 4.17.2 to 
> 3.10.0-862.
> Anyone other have issues with newer Kernels and osd nodes?

I'd suggest you pursue that with whoever supports the kernel
exhibiting the problem.

>
> kind regards
>
> Ben
>
>> Brad Hubbard  hat am 5. Juli 2018 um 01:16 geschrieben:
>>
>>
>> On Wed, Jul 4, 2018 at 6:26 PM, Benjamin Naber  
>> wrote:
>> > Hi @all,
>> >
>> > im currently in testing for setup an production environment based on the 
>> > following OSD Nodes:
>> >
>> > CEPH Version: luminous 12.2.5
>> >
>> > 5x OSD Nodes with following specs:
>> >
>> > - 8 Core Intel Xeon 2,0 GHZ
>> >
>> > - 96GB Ram
>> >
>> > - 10x 1,92 TB Intel DC S4500 connectet via SATA
>> >
>> > - 4x 10 Gbit NIC 2 bonded via LACP for Backend Network 2 bonded via LACP 
>> > for Backend Network.
>> >
>> > if i run some fio benchmark via a VM that ist running on a RBD Device on a 
>> > KVM testing Host. the cluster always runs into slow request warning. Also 
>> > the performance goes heavy down.
>> >
>> > If i dump the osd that stucks, i get the following output:
>> >
>> > {
>> > "ops": [
>> > {
>> > "description": "osd_op(client.141944.0:359346834 13.1da 
>> > 13:5b8b7fd3:::rbd_data.170a3238e1f29.00be:head [write 
>> > 2097152~1048576] snapc 0=[] ondisk+write+known_if_redirected e2755)",
>> > "initiated_at": "2018-07-04 10:00:49.475879",
>> > "age": 287.180328,
>> > "duration": 287.180355,
>> > "type_data": {
>> > "flag_point": "waiting for sub ops",
>> > "client_info": {
>> > "client": "client.141944",
>> > "client_addr": "10.111.90.1:0/3532639465",
>> > "tid": 359346834
>> > },
>> > "events": [
>> > {
>> > "time": "2018-07-04 10:00:49.475879",
>> > "event": "initiated"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.476935",
>> > "event": "queued_for_pg"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.477547",
>> > "event": "reached_pg"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.477578",
>> > "event": "started"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.477614",
>> > "event": "waiting for subops from 5,26"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.484679",
>> > "event": "op_commit"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.484681",
>> > "event": "op_applied"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.485588",
>> > "event": "sub_op_commit_rec from 5"
>> > }
>> > ]
>> > }
>> > },
>> > {
>> > "description": "osd_op(client.141944.0:359346835 13.1da 
>> > 13:5b8b7fd3:::rbd_data.170a3238e1f29.00be:head [write 
>> > 3145728~1048576] snapc 0=[] ondisk+write+known_if_redirected e2755)",
>> > "initiated_at": "2018-07-04 10:00:49.477065",
>> > "age": 287.179143,
>> > "duration": 287.179221,
>> > "type_data": {
>> > "flag_point": "waiting for sub ops",
>> > "client_info": {
>> > "client": "client.141944",
>> > "client_addr": "10.111.90.1:0/3532639465",
>> > "tid": 359346835
>> > },
>> > "events": [
>> > {
>> > "time": "2018-07-04 10:00:49.477065",
>> > "event": "initiated"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.478116",
>> > "event": "queued_for_pg"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.478178",
>> > "event": "reached_pg"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.478201",
>> > "event": "started"
>> > },
>> > {
>> > "time": "2018-07-04 10:00:49.478232",
>> > "event": "waiting for subops from 5,26"
>> > },
>> > 

Re: [ceph-users] fuse vs kernel client

2018-07-09 Thread Daniel Baumann
On 07/09/2018 10:18 AM, Manuel Sopena Ballesteros wrote:
> FUSE is supposed to run slower.

in our tests with ceph 11.2.x and 12.2.x clusters, cephfs-fuse is always
around 10 times slower than kernel cephfs.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fuse vs kernel client

2018-07-09 Thread Manuel Sopena Ballesteros
Dear ceph community,

I just installed ceph luminous in a small NVMe cluster for testing and I tested 
2 clients:

Client 1:
VM running centos 7
Ceph client: kernel
# cpus: 4
RAM: 16GB

Fio test

# sudo fio --name=xx --filename=/mnt/mycephfs/test.file3 --filesize=100G 
--iodepth=1 --rw=write --bs=4M --numjobs=2 --group_reporting
xx: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 
4096KiB-4096KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 2 processes
xx: Laying out IO file (1 file / 102400MiB)
Jobs: 1 (f=1): [_(1),W(1)][100.0%][r=0KiB/s,w=2325MiB/s][r=0,w=581 IOPS][eta 
00m:00s]
xx: (groupid=0, jobs=2): err= 0: pid=24290: Mon Jul  9 17:54:57 2018
  write: IOPS=550, BW=2203MiB/s (2310MB/s)(200GiB/92946msec)
clat (usec): min=946, max=464990, avg=3519.59, stdev=7031.97
 lat (usec): min=1010, max=465091, avg=3612.53, stdev=7035.85
clat percentiles (usec):
 |  1.00th=[  1188],  5.00th=[  1631], 10.00th=[  2245], 20.00th=[  2409],
 | 30.00th=[  2540], 40.00th=[  2671], 50.00th=[  2802], 60.00th=[  2966],
 | 70.00th=[  3195], 80.00th=[  3654], 90.00th=[  5080], 95.00th=[  6521],
 | 99.00th=[ 11469], 99.50th=[ 16450], 99.90th=[100140], 99.95th=[149947],
 | 99.99th=[291505]
   bw (  MiB/s): min=  224, max= 2064, per=50.01%, avg=1101.97, stdev=205.16, 
samples=369
   iops: min=   56, max=  516, avg=275.27, stdev=51.29, samples=369
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=7.89%, 4=75.24%, 10=15.42%, 20=1.09%, 50=0.22%
  lat (msec)   : 100=0.04%, 250=0.08%, 500=0.02%
  cpu  : usr=2.31%, sys=76.39%, ctx=15743, majf=1, minf=55
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued rwt: total=0,51200,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=2203MiB/s (2310MB/s), 2203MiB/s-2203MiB/s (2310MB/s-2310MB/s), 
io=200GiB (215GB), run=92946-92946msec

Client 2:
Physical machine running Ubuntu xenial
Ceph client: FUSE
# cpus: 56
RAM: 512

Fio test

$ sudo fio --name=xx --filename=/mnt/cephfs/test.file2 --filesize=5G 
--iodepth=1 --rw=write --bs=4M --numjobs=1 --group_reporting
xx: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=sync, iodepth=1
fio-2.2.10
Starting 1 process
xx: Laying out IO file(s) (1 file(s) / 5120MB)
Jobs: 1 (f=1): [W(1)] [91.7% done] [0KB/580.0MB/0KB /s] [0/145/0 iops] [eta 
00m:01s]
xx: (groupid=0, jobs=1): err= 0: pid=6065: Mon Jul  9 17:44:02 2018
  write: io=5120.0MB, bw=497144KB/s, iops=121, runt= 10546msec
clat (msec): min=3, max=159, avg= 7.94, stdev= 4.81
 lat (msec): min=3, max=159, avg= 8.08, stdev= 4.82
clat percentiles (msec):
 |  1.00th=[4],  5.00th=[5], 10.00th=[6], 20.00th=[7],
 | 30.00th=[7], 40.00th=[8], 50.00th=[8], 60.00th=[9],
 | 70.00th=[9], 80.00th=[   10], 90.00th=[   11], 95.00th=[   11],
 | 99.00th=[   12], 99.50th=[   13], 99.90th=[   61], 99.95th=[  159],
 | 99.99th=[  159]
bw (KB  /s): min=185448, max=726183, per=97.08%, avg=482611.80, 
stdev=118874.09
lat (msec) : 4=1.64%, 10=88.20%, 20=10.00%, 100=0.08%, 250=0.08%
  cpu  : usr=1.63%, sys=34.44%, ctx=42266, majf=0, minf=1586
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued: total=r=0/w=1280/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=5120.0MB, aggrb=497143KB/s, minb=497143KB/s, maxb=497143KB/s, 
mint=10546msec, maxt=10546msec

NOTE: I did an iperf test from Client 2 to ceph nodes and the bandwidth is 
~25GBs

QUESTION:
According to the documentation, FUSE is supposed to run slower. I found the 
client 2 using FUSE being much slower than client 1. Could someone advice if 
this is expected?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in 

[ceph-users] Slow response while "tail -f" on cephfs

2018-07-09 Thread Zhou Choury
Hi all
 We mounted cephfs with ceph-fuse on two machines. We found that if a process 
writing a log on node A, while "tail -f" it on node B will quite slow, The mds 
server also complain like:
> 2018-07-09 15:10:35.516602 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 2 
> slow requests, 1 included below; oldest blocked for > 8.421551 secs
> 2018-07-09 15:10:35.516608 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 
> slow request 5.866578 seconds old, received at 2018-07-09 15:10:29.649997: 
> client_request(client.3777088:24818 getattr pAsLsXsFs #108a41e 2018-07-09 
> 15:10:03.842337) currently failed to rdlock, waiting
> 2018-07-09 15:10:48.517367 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 2 
> slow requests, 2 included below; oldest blocked for > 5.860196 secs
> 2018-07-09 15:10:48.517373 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 
> slow request 5.860196 seconds old, received at 2018-07-09 15:10:42.657139: 
> client_request(client.3777088:24826 getattr pAsLsXsFs #108a41e 2018-07-09 
> 15:10:16.843077) currently failed to rdlock, waiting
> 2018-07-09 15:10:48.517375 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 
> slow request 5.622276 seconds old, received at 2018-07-09 15:10:42.895059: 
> client_request(client.3775872:34689 lookup #1011127/choury-test 
> 2018-07-09 15:10:42.894941) currently failed to rdlock, waiting
> 2018-07-09 15:10:51.517530 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 2 
> slow requests, 1 included below; oldest blocked for > 8.622448 secs
> 2018-07-09 15:10:51.517536 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 
> slow request 5.399846 seconds old, received at 2018-07-09 15:10:46.117661: 
> client_request(client.3775872:34690 lookup #1011127/choury-test 
> 2018-07-09 15:10:46.117586) currently failed to rdlock, waiting
> 2018-07-09 15:10:53.517640 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 2 
> slow requests, 1 included below; oldest blocked for > 10.622560 secs
> 2018-07-09 15:10:53.517646 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 
> slow request 10.622560 seconds old, received at 2018-07-09 15:10:42.895059: 
> client_request(client.3775872:34689 lookup #1011127/choury-test 
> 2018-07-09 15:10:42.894941) currently failed to rdlock, waiting
> 2018-07-09 15:10:56.517819 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 1 
> slow requests, 1 included below; oldest blocked for > 10.400132 secs
> 2018-07-09 15:10:56.517826 7f32fa0c2700  0 log_channel(cluster) log [WRN] : 
> slow request 10.400132 seconds old, received at 2018-07-09 15:10:46.117661: 
> client_request(client.3775872:34690 lookup #1011127/choury-test 
> 2018-07-09 15:10:46.117586) currently failed to rdlock, waiting

We reproduced this problem in the test cluster. there's only two processed(on 
two machines) access the cluster, the writer, and tail.
The test writer code:
> #include 
> #include 
> #include 
> #include 
> 
> const char *s = 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
> int main(int argc, char** argv){
> FILE *f=fopen(argv[1],"ab+");   
> if(f==NULL){
> printf("cant'to open destination file\n");
> return 0;
> }
> int line = 0;
> while(true){
> char buff[1024]={0};
> for(int i = 0; i< 200; i++){
> buff[i] = s[rand()%62];
> }
> fprintf(f, "%d: %s\n", line++, buff);
> fflush(f);
> sleep(30);
> }
> fclose(f);
> return 0;
> }
The version of ceph is 10.2.10.
How can I reduce the latency and slow requests?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OT: Bad Sector Count - suggestions and experiences?

2018-07-09 Thread Götz Reinicke
Hi,

I apologize for the OT, but I hope some ceph users with bigger installations 
have a lot more experiences than the users reporting theire home problem (NAS 
with 2 disks … ) I saw a lot googling that topic.

Luckily we had not as much hard disk failures as some coworkers, but now with 
more and more disks in use, the topic of Bad Sector Counts gets more importent.

How do you react to Bad Sector Counts from SMART? At what point do you start to 
replace the disk? Any thresholds in percent or absolute numbers?

Currently I got the message from two 8 TB disk; one has 5 the other 24 Bad 
Sectors reported.

Thanks for feedback and suggestions. Regards . Götz

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com