[ceph-users] Monitoring a rbd map rbd connection

2017-08-24 Thread Hauke Homburg
Hallo,

Ich want to monitor the mapped Connection between a rbd map rbdimage an
a /dev/rbd device.

This i want to do with icinga.

Has anyone a Idea how i can do this?

My first Idea is to touch and remove a File in the mount point. I am not
sure that this is the the only thing i have to do


Thanks for Help

Hauke

-- 
www.w3-creative.de

www.westchat.de

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [SSD NVM FOR JOURNAL] Performance issues

2017-08-24 Thread Christian Balzer

Hello,

On Thu, 24 Aug 2017 14:49:24 -0300 Guilherme Steinmüller wrote:

> Hello Christian.
> 
> First of all, thanks for your considerations, I really appreciate it.
> 
> 2017-08-23 21:34 GMT-03:00 Christian Balzer :
> 
> >
> > Hello,
> >
> > On Wed, 23 Aug 2017 09:11:18 -0300 Guilherme Steinmüller wrote:
> >  
> > > Hello!
> > >
> > > I recently installed INTEL SSD 400GB 750 SERIES PCIE 3.0 X4 in 3 of my  
> > OSD  
> > > nodes.
> > >  
> > Well, you know what's coming now, don't you?
> >
> > That's a consumer device, with 70GB writes per day endurance.
> > unless you're essentially having a read-only cluster, you're throwing away
> > money.
> >  
> 
> Yes, we knew that we were going to buy a consumer device due to our limited
> budget and our objective of constructing a small plan of a production
> cloud. This model seemed acceptable. It was the top list of the consumer
> models on Sebastien's benchmarks
> 
> We are a lab that depends on different budget sources to accquire
> equipments, so they can vary and most of the time we are limited by
> different budget ranges.
> 
Noted, I hope your tests won't last too long or move a lot of data. ^o^

> >  
> > > First of all, here's is an schema describing how my cluster is:
> > >
> > > [image: Imagem inline 1]
> > >
> > > [image: Imagem inline 2]
> > >
> > > I primarily use my ceph as a beckend for OpenStack nova, glance, swift  
> > and  
> > > cinder. My crushmap is configured to have rulesets for SAS disks, SATA
> > > disks and another ruleset that resides in HPE nodes using SATA disks too.
> > >
> > > Before installing the new journal in HPE nodes, i was using one of the
> > > disks that today are OSDs (osd.35, osd.34 and osd.33). After upgrading  
> > the  
> > > journal, i noticed that a dd command writing 1gb blocks in openstack nova
> > > instances doubled the throughput but the value expected was actually 400%
> > > or 500% since in the Dell nodes that we have another nova pool the
> > > throughput is around this value.
> > >  
> > Apples, oranges and bananas.
> > You're comparing different HW (and no, I'm not going to look this up)
> > which may or may not have vastly different capabilities (like HW cache),
> > RAM and (unlikely relevant) CPU.
> >  
> 
> 
> Indeed, we took this into account. The HP server were cheaper and have a
> poor configuration due that limited budget source.
> 
> 
> > Your NVMe may also be plugged into a different, insufficient PCIe slot for
> > all we know.
> >  
> 
> I checked this. I compared the slots identifying the slot information
> between the 3 dell nodes and 3 hp nodes by running:
> 
> # ls -l /sys/block/nvme0n1
> # lspci -vvv -s :06:00.0 <- slot identifier
> 
> The only difference is:
> 
> Dell has a parameter called *Cache Line Size: 32 bytes* and HP doesn't have
> this.
> 
That shouldn't be relevant, AFAIK.

> 
> 
> > You're also using very different HDDs, which definitely will be a factor.
> >
> >  
> I thought that the backend disks would not interfer that much. For example,
> the ceph journal has a parameter called filestore max sync interval, which
> means that ceph journal will commit the transactions to the backend OSDs in
> a defined interval, ours is set to 35. So the client requests go first to
> SSD and than is commited to the OSDs.
> 
As I wrote before, the journal comes not into play for any large amounts
of data unless massively tuned and/or under extreme pressure.

You need to touch much more of the journal and filestore parameters than
max_sync, which will do nothing to prevent from min_sync and other values
to start flushing more or less immediately.

And tuning things so the journal is used extensively by default will
result in I/O storms slowing things to a crawl when it finally flushes to
the HDDs.

If your google foo is strong enough you should find the relevant
discussions about this, often in context with SSD OSDs where such tuning
makes some sense.

> 
> > But most importanly you're comparing 2 pools of vastly different ODS
> > count, no wonder a pool with 15 OSDs is faster in sequential writes than
> > one with 9.
> >
> > Here is a demonstration of the scenario and the difference in performance  
> > > between Dell nodes and HPE nodes:
> > >
> > >
> > >
> > > Scenario:
> > >
> > >
> > >-Using pools to store instance disks for OpenStack
> > >
> > >
> > >- Pool nova in "ruleset SAS" placed on c4-osd201, c4-osd202 and
> > >c4-osd203 with 5 osds per hosts
> > >  
> > SAS  
> > >
> > >- Pool nova_hpedl180 in "ruleset NOVA_HPEDL180" placed on  
> > c4-osd204,  
> > >c4-osd205, c4-osd206 with 3 osds per hosts
> > >  
> > SATA  
> > >
> > >- Every OSD has one partition of 35GB in a INTEL SSD 400GB 750
> > >SERIES PCIE 3.0 X4
> > >  
> > Overkill, but since your NVMe will die shortly anyway...
> >
> > With large sequential tests, the journal will have nearly NO impact on the
> > result, even if tuned to that effect.
> >  
> > >
> > >   

Re: [ceph-users] RBD encryption options?

2017-08-24 Thread Daniel K
Awesome -- I searched and all I could find was restricting access at the
pool level

I will investigate the dm-crypt/RBD path also.


Thanks again!

On Thu, Aug 24, 2017 at 7:40 PM, Alex Gorbachev 
wrote:

>
> On Mon, Aug 21, 2017 at 9:03 PM Daniel K  wrote:
>
>> Are there any client-side options to encrypt an RBD device?
>>
>> Using latest luminous RC, on Ubuntu 16.04 and a 4.10 kernel
>>
>> I assumed adding client site encryption  would be as simple as using
>> luks/dm-crypt/cryptsetup after adding the RBD device to /etc/ceph/rbdmap
>> and enabling the rbdmap service -- but I failed to consider the order of
>> things loading and it appears that the RBD gets mapped too late for
>> dm-crypt to recognize it as valid.It just keeps telling me it's not a valid
>> LUKS device.
>>
>> I know you can run the OSDs on an encrypted drive, but I was hoping for
>> something client side since it's not exactly simple(as far as I can tell)
>> to restrict client access to a single(or group) of RBDs within a shared
>> pool.
>>
>
> Daniel, we used info from here for single or multiple RBD mappings to
> client
>
> https://blog-fromsomedude.rhcloud.com/2016/04/26/
> Allowing-a-RBD-client-to-map-only-one-RBD
>
> Also, I ran into the race condition with zfs, and would up putting zfs and
> rbdmap into rc.local.  It should work for dm-crypt as well.
>
> Regards,
> Alex
>
>
>
>> Any suggestions?
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> --
> --
> Alex Gorbachev
> Storcium
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD encryption options?

2017-08-24 Thread Alex Gorbachev
On Mon, Aug 21, 2017 at 9:03 PM Daniel K  wrote:

> Are there any client-side options to encrypt an RBD device?
>
> Using latest luminous RC, on Ubuntu 16.04 and a 4.10 kernel
>
> I assumed adding client site encryption  would be as simple as using
> luks/dm-crypt/cryptsetup after adding the RBD device to /etc/ceph/rbdmap
> and enabling the rbdmap service -- but I failed to consider the order of
> things loading and it appears that the RBD gets mapped too late for
> dm-crypt to recognize it as valid.It just keeps telling me it's not a valid
> LUKS device.
>
> I know you can run the OSDs on an encrypted drive, but I was hoping for
> something client side since it's not exactly simple(as far as I can tell)
> to restrict client access to a single(or group) of RBDs within a shared
> pool.
>

Daniel, we used info from here for single or multiple RBD mappings to client

https://blog-fromsomedude.rhcloud.com/2016/04/26/Allowing-a-RBD-client-to-map-only-one-RBD


Also, I ran into the race condition with zfs, and would up putting zfs and
rbdmap into rc.local.  It should work for dm-crypt as well.

Regards,
Alex



> Any suggestions?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-- 
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ruleset vs replica count

2017-08-24 Thread Sinan Polat
Hi David,

 

Thank you for your reply, I will read about the min_size 1 value.

 

What about my initial question, anyone?

 

Thanks!

 

Van: David Turner [mailto:drakonst...@gmail.com] 
Verzonden: donderdag 24 augustus 2017 19:45
Aan: Sinan Polat; ceph-us...@ceph.com
Onderwerp: Re: [ceph-users] Ruleset vs replica count

 

> min_size 1

STOP THE MADNESS.  Search the ML to realize why you should never user a 
min_size of 1.

 

I'm curious as well as to what this sort of configuration will do for how many 
copies are stored between DCs.

 

On Thu, Aug 24, 2017 at 1:03 PM Sinan Polat  wrote:

Hi,

 

In a Multi Datacenter Cluster I have the following rulesets:

--

rule ams5_ssd {

ruleset 1

type replicated

min_size 1

max_size 10

step take ams5-ssd

step chooseleaf firstn 2 type host

step emit

step take ams6-ssd

step chooseleaf firstn -2 type host

step emit

}

rule ams6_ssd {

ruleset 2

type replicated

min_size 1

max_size 10

step take ams6-ssd

step chooseleaf firstn 2 type host

step emit

step take ams5-ssd

step chooseleaf firstn -2 type host

step emit

}

--

 

The replication size is set to 3.

 

When for example ruleset 1 is used, how is the replication being done? Does it 
store 2 replica’s in ams5-ssd and store 1 replica in ams6-ssd? Or does it store 
3 replicas in ams5-ssd and 3 replicas in ams6-ssd?

 

Thanks!

 

Sinan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Multisite metadata sync init

2017-08-24 Thread David Turner
Apparently the data shards that are behind go in both directions, but only
one zone is aware of the problem.  Each cluster has objects in their data
pool that the other doesn't have.  I'm thinking about initiating a `data
sync init` on both sides (one at a time) to get them back on the same
page.  Does anyone know if that command will overwrite any local data that
the zone has that the other doesn't if you run `data sync init` on it?

On Thu, Aug 24, 2017 at 1:51 PM David Turner  wrote:

> After restarting the 2 RGW daemons on the second site again, everything
> caught up on the metadata sync.  Is there something about having 2 RGW
> daemons on each side of the multisite that might be causing an issue with
> the sync getting stale?  I have another realm set up the same way that is
> having a hard time with its data shards being behind.  I haven't told them
> to resync, but yesterday I noticed 90 shards were behind.  It's caught back
> up to only 17 shards behind, but the oldest change not applied is 2 months
> old and no order of restarting RGW daemons is helping to resolve this.
>
> On Thu, Aug 24, 2017 at 10:59 AM David Turner 
> wrote:
>
>> I have a RGW Multisite 10.2.7 set up for bi-directional syncing.  This
>> has been operational for 5 months and working fine.  I recently created a
>> new user on the master zone, used that user to create a bucket, and put in
>> a public-acl object in there.  The Bucket created on the second site, but
>> the user did not and the object errors out complaining about the access_key
>> not existing.
>>
>> That led me to think that the metadata isn't syncing, while bucket and
>> data both are.  I've also confirmed that data is syncing for other buckets
>> as well in both directions. The sync status from the second site was this.
>>
>>
>>1.
>>
>>  metadata sync syncing
>>
>>2.
>>
>>full sync: 0/64 shards
>>
>>3.
>>
>>incremental sync: 64/64 shards
>>
>>4.
>>
>>metadata is caught up with master
>>
>>5.
>>
>>  data sync source: f4c12327-4721-47c9-a365-86332d84c227 
>> (public-atl01)
>>
>>6.
>>
>>syncing
>>
>>7.
>>
>>full sync: 0/128 shards
>>
>>8.
>>
>>incremental sync: 128/128 shards
>>
>>9.
>>
>>data is caught up with source
>>
>>
>>
>> Sync status leads me to think that the second site believes it is up to
>> date, even though it is missing a freshly created user.  I restarted all of
>> the rgw daemons for the zonegroup, but it didn't trigger anything to fix
>> the missing user in the second site.  I did some googling and found the
>> sync init commands mentioned in a few ML posts and used metadata sync init
>> and now have this as the sync status.
>>
>>
>>1.
>>
>>  metadata sync preparing for full sync
>>
>>2.
>>
>>full sync: 64/64 shards
>>
>>3.
>>
>>full sync: 0 entries to sync
>>
>>4.
>>
>>incremental sync: 0/64 shards
>>
>>5.
>>
>>metadata is behind on 70 shards
>>
>>6.
>>
>>oldest incremental change not applied: 2017-03-01 
>> 21:13:43.0.126971s
>>
>>7.
>>
>>  data sync source: f4c12327-4721-47c9-a365-86332d84c227 
>> (public-atl01)
>>
>>8.
>>
>>syncing
>>
>>9.
>>
>>full sync: 0/128 shards
>>
>>10.
>>
>>incremental sync: 128/128 shards
>>
>>11.
>>
>>data is caught up with source
>>
>>
>>
>> It definitely triggered a fresh sync and told it to forget about what
>> it's previously applied as the date of the oldest change not applied is the
>> day we initially set up multisite for this zone.  The problem is that was
>> over 12 hours ago and the sync stat hasn't caught up on any shards yet.
>>
>> Does anyone have any suggestions other than blast the second site and set
>> it back up with a fresh start (the only option I can think of at this
>> point)?
>>
>> Thank you,
>> David Turner
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...

2017-08-24 Thread David Turner
Not all storage requirements are made equal.  I'm sure you could create an
EC pool and a replica pool and utilize each for RBDs of associated
requirements.  Like I'd probably make smaller OS RBDs and larger EC data
drives or whatever works best for your use cases.  I'm not running SSD
pools, but I use EC vs Replica pools for my VMs to optimize space and
performance based on requirements.

On Thu, Aug 24, 2017 at 3:49 PM Xavier Trilla <
xavier.tri...@silicontower.net> wrote:

> Mark, thanks for the information.
>
> Well, maybe EC and RBD once Luminous is  released makes sense for a lower
> speed storage tier. Where costs are more important than performance. Let's
> see if I can find some time -pretty busy with other projects- to test it
> with Luminous and Bluestore.
>
> Thanks!
>
> -Mensaje original-
> De: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] En nombre de
> Mark Nelson
> Enviado el: jueves, 24 de agosto de 2017 2:18
> Para: ceph-users@lists.ceph.com
> Asunto: Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...
>
>
>
> On 08/23/2017 06:18 PM, Xavier Trilla wrote:
> > Oh man, what do you know!... I'm quite amazed. I've been reviewing more
> documentation about min_replica_size and seems like it doesn't work as I
> thought (Although I remember specifically reading it somewhere some years
> ago :/ ).
> >
> > And, as all replicas need to be written before primary OSD informs the
> client about the write being completed, we cannot have the third replica on
> HDDs, no way. It would kill latency.
> >
> > Well, we'll just keep adding NVMs to our cluster (I mean, S4500 and
> P4500 price difference is negligible) and we'll decrease the primary
> affinity weight for SATA SSDs, just to be sure we get the most out of NVMe.
> >
> > BTW, does anybody have any experience so far with erasure coding and
> > rbd? A 2/3 profile, would really save space on SSDs but I'm afraid
> > about the extra calculations needed and how will it affect
> > performance... Well, maybe I'll check into it, and I'll start a new
> > thread :)
>
> There's a decent chance you'll get higher performance with something like
> EC 6+2 vs 3X replication for large writes due simply to having less data to
> write (we see somewhere between 2x and 3x rep performance in the lab for
> 4MB writes to RBD). Small random writes will almost certainly be slower due
> to increased latency.  Reads in general will be slower as well.  With
> replication the read comes entirely from the primary but in EC you have to
> fetch chunks from the secondaries and reconstruct the object before sending
> it back to the client.
>
> So basically compared to 3X rep you'll likely gain some performance on
> large writes, lose some performance on large reads, and lose more
> performance on small writes/reads (dependent on cpu speed and various other
> factors).
>
> Mark
>
> >
> > Anyway, thanks for the info!
> > Xavier.
> >
> > -Mensaje original-
> > De: Christian Balzer [mailto:ch...@gol.com] Enviado el: martes, 22 de
> > agosto de 2017 2:40
> > Para: ceph-users@lists.ceph.com
> > CC: Xavier Trilla 
> > Asunto: Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...
> >
> >
> > Hello,
> >
> >
> > Firstly, what David said.
> >
> > On Mon, 21 Aug 2017 20:25:07 + Xavier Trilla wrote:
> >
> >> Hi,
> >>
> >> I'm working into improving the costs of our actual ceph cluster. We
> actually keep 3 x replicas, all of them in SSDs (That cluster hosts several
> hundred VMs RBD disks) and lately I've been wondering if the following
> setup would make sense, in order to improve cost / performance.
> >>
> >
> > Have you done a full analysis of your current cluster, as in utilization
> of your SSDs (IOPS), CPU, etc with atop/iostat/collectd/grafana?
> > During peak utilization times?
> >
> > If so, you should have a decent enough idea of what level IOPS you need
> and can design from there.
> >
> >> The ideal would be to move PG primaries to high performance nodes using
> NVMe, keep secondary replica in SSDs and move the third replica to HDDs.
> >>
> >> Most probably the hardware will be:
> >>
> >> 1st Replica: Intel P4500 NVMe (2TB)
> >> 2nd Replica: Intel S3520 SATA SSD (1.6TB)
> > Unless you have:
> > a) a lot of these and/or
> > b) very little writes
> > what David said.
> >
> > Aside from that whole replica idea not working. as you think.
> >
> >> 3rd Replica: WD Gold Harddrives (2 TB) (I'm considering either 1TB o
> >> 2TB model, as I want to have as many spins as possible)
> >>
> >> Also, hosts running OSDs would have a quite different HW
> >> configuration (In our experience NVMe need crazy CPU power in order
> >> to get the best out of them)
> >>
> > Correct, one might run into that with pure NVMe/SSD nodes.
> >
> >> I know the NVMe and SATA SSD replicas will work, no problem about that
> (We'll just adjust the primary affinity and crushmap in order to have the
> desired data layoff + primary OSDs) what 

Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...

2017-08-24 Thread Xavier Trilla
Mark, thanks for the information. 

Well, maybe EC and RBD once Luminous is  released makes sense for a lower speed 
storage tier. Where costs are more important than performance. Let's see if I 
can find some time -pretty busy with other projects- to test it with Luminous 
and Bluestore.

Thanks!

-Mensaje original-
De: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] En nombre de Mark 
Nelson
Enviado el: jueves, 24 de agosto de 2017 2:18
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...



On 08/23/2017 06:18 PM, Xavier Trilla wrote:
> Oh man, what do you know!... I'm quite amazed. I've been reviewing more 
> documentation about min_replica_size and seems like it doesn't work as I 
> thought (Although I remember specifically reading it somewhere some years ago 
> :/ ).
>
> And, as all replicas need to be written before primary OSD informs the client 
> about the write being completed, we cannot have the third replica on HDDs, no 
> way. It would kill latency.
>
> Well, we'll just keep adding NVMs to our cluster (I mean, S4500 and P4500 
> price difference is negligible) and we'll decrease the primary affinity 
> weight for SATA SSDs, just to be sure we get the most out of NVMe.
>
> BTW, does anybody have any experience so far with erasure coding and 
> rbd? A 2/3 profile, would really save space on SSDs but I'm afraid 
> about the extra calculations needed and how will it affect 
> performance... Well, maybe I'll check into it, and I'll start a new 
> thread :)

There's a decent chance you'll get higher performance with something like EC 
6+2 vs 3X replication for large writes due simply to having less data to write 
(we see somewhere between 2x and 3x rep performance in the lab for 4MB writes 
to RBD). Small random writes will almost certainly be slower due to increased 
latency.  Reads in general will be slower as well.  With replication the read 
comes entirely from the primary but in EC you have to fetch chunks from the 
secondaries and reconstruct the object before sending it back to the client.

So basically compared to 3X rep you'll likely gain some performance on large 
writes, lose some performance on large reads, and lose more performance on 
small writes/reads (dependent on cpu speed and various other factors).

Mark

>
> Anyway, thanks for the info!
> Xavier.
>
> -Mensaje original-
> De: Christian Balzer [mailto:ch...@gol.com] Enviado el: martes, 22 de 
> agosto de 2017 2:40
> Para: ceph-users@lists.ceph.com
> CC: Xavier Trilla 
> Asunto: Re: [ceph-users] NVMe + SSD + HDD RBD Replicas with Bluestore...
>
>
> Hello,
>
>
> Firstly, what David said.
>
> On Mon, 21 Aug 2017 20:25:07 + Xavier Trilla wrote:
>
>> Hi,
>>
>> I'm working into improving the costs of our actual ceph cluster. We actually 
>> keep 3 x replicas, all of them in SSDs (That cluster hosts several hundred 
>> VMs RBD disks) and lately I've been wondering if the following setup would 
>> make sense, in order to improve cost / performance.
>>
>
> Have you done a full analysis of your current cluster, as in utilization of 
> your SSDs (IOPS), CPU, etc with atop/iostat/collectd/grafana?
> During peak utilization times?
>
> If so, you should have a decent enough idea of what level IOPS you need and 
> can design from there.
>
>> The ideal would be to move PG primaries to high performance nodes using 
>> NVMe, keep secondary replica in SSDs and move the third replica to HDDs.
>>
>> Most probably the hardware will be:
>>
>> 1st Replica: Intel P4500 NVMe (2TB)
>> 2nd Replica: Intel S3520 SATA SSD (1.6TB)
> Unless you have:
> a) a lot of these and/or
> b) very little writes
> what David said.
>
> Aside from that whole replica idea not working. as you think.
>
>> 3rd Replica: WD Gold Harddrives (2 TB) (I'm considering either 1TB o 
>> 2TB model, as I want to have as many spins as possible)
>>
>> Also, hosts running OSDs would have a quite different HW 
>> configuration (In our experience NVMe need crazy CPU power in order 
>> to get the best out of them)
>>
> Correct, one might run into that with pure NVMe/SSD nodes.
>
>> I know the NVMe and SATA SSD replicas will work, no problem about that 
>> (We'll just adjust the primary affinity and crushmap in order to have the 
>> desired data layoff + primary OSDs) what I'm worried is about the HDD 
>> replica.
>>
>> Also the pool will have min_size 1 (Would love to use min_size 2, but it 
>> would kill latency times) so, even if we have to do some maintenance in the 
>> NVMe nodes, writes to HDDs will be always "lazy".
>>
>> Before bluestore (we are planning to move to luminous most probably by the 
>> end of the year or beginning 2018, once it is released and tested properly) 
>> I would just use  SSD/NVMe journals for the HDDs. So, all writes would go to 
>> the SSD journal, and then moved to the HDD. But now, with Bluestore I don't 
>> think that's an option anymore.
>>
> Bluestore 

Re: [ceph-users] RGW multisite sync data sync shard stuck

2017-08-24 Thread David Turner
Andreas, did you find a solution to your multisite sync issues with the
stuck shards?  I'm also on 10.2.7 and having this problem.  One realm has
stuck shards for data sync and another realm says it's up to date, but
isn't receiving new users via metadata sync.  I ran metadata sync init on
it and it had all up to date metadata information when it finished, but
then new users weren't synced again.  I don't know what to do  to get these
working stably.  There are 2 RGW's for each realm in each zone in
master/master allowing data to sync in both directions.

On Mon, Jun 5, 2017 at 3:05 AM Andreas Calminder <
andreas.calmin...@klarna.com> wrote:

> Hello,
> I'm using Ceph jewel (10.2.7) and as far as I know I'm using the jewel
> multisite setup (multiple zones) as described here
> http://docs.ceph.com/docs/master/radosgw/multisite/ and two ceph
> clusters, one in each site. Stretching clusters over multiple sites
> are seldom/never worth the hassle in my opinion. The reason the
> replication ended up in a bad state, seems to be a mix of multiple
> issues, first it's that if you shove a lot of objects into a bucket
> +1M the bucket index starts to drag the rados gateways down, there's
> also some kind of memory leak in rgw when the sync has failed
> http://tracker.ceph.com/issues/19446, causing the rgw daemons to die
> left and right due to out of memory errors and some times also other
> parts of the system would be dragged down with them.
>
> On 4 June 2017 at 22:22,   wrote:
> > Hi Andreas.
> >
> > Well, we do _NOT_ need multiside in our environment, but unfortunately
> is is the basis for the announced "metasearch", based on ElasticSearch...
> so we try to implement a "multisite" config on Kraken (v11.2.0) since
> weeks, but never succeeded so far. We have purged and started all over with
> the multiside config for about ~5x by now.
> >
> > We have one CEPH cluster with two RadosGW's on top (so NOT two CEPH
> cluster!), not sure if this makes a difference!?
> >
> > Can you please share some infos about your (former working?!?) setup?
> Like
> > - which CEPH version are you on
> > - old deprecated "federated" or "new from Jewel" multiside setup
> > - one or multiple CEPH clusters
> >
> > Great to see that multisite seems to work somehow somewhere. We were
> really in doubt :O
> >
> > Thanks & regards
> >  Anton
> >
> > P.S.: If someone reads this, who has a working "one Kraken CEPH cluster"
> based multisite setup (or, let me dream, even a working ElasticSearch setup
> :| ) please step out of the dark and enlighten us :O
> >
> > Gesendet: Dienstag, 30. Mai 2017 um 11:02 Uhr
> > Von: "Andreas Calminder" 
> > An: ceph-users@lists.ceph.com
> > Betreff: [ceph-users] RGW multisite sync data sync shard stuck
> > Hello,
> > I've got a sync issue with my multisite setup. There's 2 zones in 1
> > zone group in 1 realm. The data sync in the non-master zone have stuck
> > on Incremental sync is behind by 1 shard, this wasn't noticed until
> > the radosgw instances in the master zone started dying from out of
> > memory issues, all radosgw instances in the non-master zone was then
> > shutdown to ensure services in the master zone while trying to
> > troubleshoot the issue.
> >
> > From the rgw logs in the master zone I see entries like:
> >
> > 2017-05-29 16:10:34.717988 7fbbc1ffb700 0 ERROR: failed to sync
> > object:
> 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_1.ext
> > 2017-05-29 16:10:34.718016 7fbbc1ffb700 0 ERROR: failed to sync
> > object:
> 12354/BUCKETNAME:be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2374181.27/dirname_1/dirname_2/filename_2.ext
> > 2017-05-29 16:10:34.718504 7fbbc1ffb700 0 ERROR: failed to fetch
> > remote data log info: ret=-5
> > 2017-05-29 16:10:34.719443 7fbbc1ffb700 0 ERROR: a sync operation
> > returned error
> > 2017-05-29 16:10:34.720291 7fbc167f4700 0 store->fetch_remote_obj()
> > returned r=-5
> >
> > sync status in the non-master zone reports that the metadata is up to
> > sync and that the data sync is behind on 1 shard and that the oldest
> > incremental change not applied is about 2 weeks back.
> >
> > I'm not quite sure how to proceed, is there a way to find out the id
> > of the shard and force some kind of re-sync of the data in it from the
> > master zone? I'm unable to have the non-master zone rgw's running
> > because it'll leave the master zone in a bad state with rgw dying
> > every now and then.
> >
> > Regards,
> > Andreas
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
>
>
> --
> Andreas Calminder
> System Administrator
> IT Operations Core Services
>
> Klarna AB (publ)
> Sveavägen 46, 111 34 Stockholm
> Tel: +46 8 120 120 00 <+46%208%20120%20120%2000>
> Reg no: 556737-0431
> klarna.com
> ___

[ceph-users] RGW Multisite metadata sync init

2017-08-24 Thread David Turner
I have a RGW Multisite 10.2.7 set up for bi-directional syncing.  This has
been operational for 5 months and working fine.  I recently created a new
user on the master zone, used that user to create a bucket, and put in a
public-acl object in there.  The Bucket created on the second site, but the
user did not and the object errors out complaining about the access_key not
existing.

That led me to think that the metadata isn't syncing, while bucket and data
both are.  I've also confirmed that data is syncing for other buckets as
well in both directions. The sync status from the second site was this.


   1.

 metadata sync syncing

   2.

   full sync: 0/64 shards

   3.

   incremental sync: 64/64 shards

   4.

   metadata is caught up with master

   5.

 data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01)

   6.

   syncing

   7.

   full sync: 0/128 shards

   8.

   incremental sync: 128/128 shards

   9.

   data is caught up with source



Sync status leads me to think that the second site believes it is up to
date, even though it is missing a freshly created user.  I restarted all of
the rgw daemons for the zonegroup, but it didn't trigger anything to fix
the missing user in the second site.  I did some googling and found the
sync init commands mentioned in a few ML posts and used metadata sync init
and now have this as the sync status.


   1.

 metadata sync preparing for full sync

   2.

   full sync: 64/64 shards

   3.

   full sync: 0 entries to sync

   4.

   incremental sync: 0/64 shards

   5.

   metadata is behind on 70 shards

   6.

   oldest incremental change not applied: 2017-03-01
21:13:43.0.126971s

   7.

 data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01)

   8.

   syncing

   9.

   full sync: 0/128 shards

   10.

   incremental sync: 128/128 shards

   11.

   data is caught up with source



It definitely triggered a fresh sync and told it to forget about what it's
previously applied as the date of the oldest change not applied is the day
we initially set up multisite for this zone.  The problem is that was over
12 hours ago and the sync stat hasn't caught up on any shards yet.

Does anyone have any suggestions other than blast the second site and set
it back up with a fresh start (the only option I can think of at this
point)?

Thank you,
David Turner
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph rbd lock

2017-08-24 Thread lista

Dears,

Some days ago, I read about this comands rbd lock add and rbd lock 
remove , this commands will go maintened in ceph in future versions, or the better 
form, to use lock in ceph, will go exclusive-lock and this commands will go depreciated 
?

Thanks a Lot,
Marcelo___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Multisite metadata sync init

2017-08-24 Thread David Turner
After restarting the 2 RGW daemons on the second site again, everything
caught up on the metadata sync.  Is there something about having 2 RGW
daemons on each side of the multisite that might be causing an issue with
the sync getting stale?  I have another realm set up the same way that is
having a hard time with its data shards being behind.  I haven't told them
to resync, but yesterday I noticed 90 shards were behind.  It's caught back
up to only 17 shards behind, but the oldest change not applied is 2 months
old and no order of restarting RGW daemons is helping to resolve this.

On Thu, Aug 24, 2017 at 10:59 AM David Turner  wrote:

> I have a RGW Multisite 10.2.7 set up for bi-directional syncing.  This has
> been operational for 5 months and working fine.  I recently created a new
> user on the master zone, used that user to create a bucket, and put in a
> public-acl object in there.  The Bucket created on the second site, but the
> user did not and the object errors out complaining about the access_key not
> existing.
>
> That led me to think that the metadata isn't syncing, while bucket and
> data both are.  I've also confirmed that data is syncing for other buckets
> as well in both directions. The sync status from the second site was this.
>
>
>1.
>
>  metadata sync syncing
>
>2.
>
>full sync: 0/64 shards
>
>3.
>
>incremental sync: 64/64 shards
>
>4.
>
>metadata is caught up with master
>
>5.
>
>  data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01)
>
>6.
>
>syncing
>
>7.
>
>full sync: 0/128 shards
>
>8.
>
>incremental sync: 128/128 shards
>
>9.
>
>data is caught up with source
>
>
>
> Sync status leads me to think that the second site believes it is up to
> date, even though it is missing a freshly created user.  I restarted all of
> the rgw daemons for the zonegroup, but it didn't trigger anything to fix
> the missing user in the second site.  I did some googling and found the
> sync init commands mentioned in a few ML posts and used metadata sync init
> and now have this as the sync status.
>
>
>1.
>
>  metadata sync preparing for full sync
>
>2.
>
>full sync: 64/64 shards
>
>3.
>
>full sync: 0 entries to sync
>
>4.
>
>incremental sync: 0/64 shards
>
>5.
>
>metadata is behind on 70 shards
>
>6.
>
>oldest incremental change not applied: 2017-03-01 
> 21:13:43.0.126971s
>
>7.
>
>  data sync source: f4c12327-4721-47c9-a365-86332d84c227 (public-atl01)
>
>8.
>
>syncing
>
>9.
>
>full sync: 0/128 shards
>
>10.
>
>incremental sync: 128/128 shards
>
>11.
>
>data is caught up with source
>
>
>
> It definitely triggered a fresh sync and told it to forget about what it's
> previously applied as the date of the oldest change not applied is the day
> we initially set up multisite for this zone.  The problem is that was over
> 12 hours ago and the sync stat hasn't caught up on any shards yet.
>
> Does anyone have any suggestions other than blast the second site and set
> it back up with a fresh start (the only option I can think of at this
> point)?
>
> Thank you,
> David Turner
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [SSD NVM FOR JOURNAL] Performance issues

2017-08-24 Thread Guilherme Steinmüller
Hello Christian.

First of all, thanks for your considerations, I really appreciate it.

2017-08-23 21:34 GMT-03:00 Christian Balzer :

>
> Hello,
>
> On Wed, 23 Aug 2017 09:11:18 -0300 Guilherme Steinmüller wrote:
>
> > Hello!
> >
> > I recently installed INTEL SSD 400GB 750 SERIES PCIE 3.0 X4 in 3 of my
> OSD
> > nodes.
> >
> Well, you know what's coming now, don't you?
>
> That's a consumer device, with 70GB writes per day endurance.
> unless you're essentially having a read-only cluster, you're throwing away
> money.
>

Yes, we knew that we were going to buy a consumer device due to our limited
budget and our objective of constructing a small plan of a production
cloud. This model seemed acceptable. It was the top list of the consumer
models on Sebastien's benchmarks

We are a lab that depends on different budget sources to accquire
equipments, so they can vary and most of the time we are limited by
different budget ranges.

>
> > First of all, here's is an schema describing how my cluster is:
> >
> > [image: Imagem inline 1]
> >
> > [image: Imagem inline 2]
> >
> > I primarily use my ceph as a beckend for OpenStack nova, glance, swift
> and
> > cinder. My crushmap is configured to have rulesets for SAS disks, SATA
> > disks and another ruleset that resides in HPE nodes using SATA disks too.
> >
> > Before installing the new journal in HPE nodes, i was using one of the
> > disks that today are OSDs (osd.35, osd.34 and osd.33). After upgrading
> the
> > journal, i noticed that a dd command writing 1gb blocks in openstack nova
> > instances doubled the throughput but the value expected was actually 400%
> > or 500% since in the Dell nodes that we have another nova pool the
> > throughput is around this value.
> >
> Apples, oranges and bananas.
> You're comparing different HW (and no, I'm not going to look this up)
> which may or may not have vastly different capabilities (like HW cache),
> RAM and (unlikely relevant) CPU.
>


Indeed, we took this into account. The HP server were cheaper and have a
poor configuration due that limited budget source.


> Your NVMe may also be plugged into a different, insufficient PCIe slot for
> all we know.
>

I checked this. I compared the slots identifying the slot information
between the 3 dell nodes and 3 hp nodes by running:

# ls -l /sys/block/nvme0n1
# lspci -vvv -s :06:00.0 <- slot identifier

The only difference is:

Dell has a parameter called *Cache Line Size: 32 bytes* and HP doesn't have
this.



> You're also using very different HDDs, which definitely will be a factor.
>
>
I thought that the backend disks would not interfer that much. For example,
the ceph journal has a parameter called filestore max sync interval, which
means that ceph journal will commit the transactions to the backend OSDs in
a defined interval, ours is set to 35. So the client requests go first to
SSD and than is commited to the OSDs.


> But most importanly you're comparing 2 pools of vastly different ODS
> count, no wonder a pool with 15 OSDs is faster in sequential writes than
> one with 9.
>
> Here is a demonstration of the scenario and the difference in performance
> > between Dell nodes and HPE nodes:
> >
> >
> >
> > Scenario:
> >
> >
> >-Using pools to store instance disks for OpenStack
> >
> >
> >- Pool nova in "ruleset SAS" placed on c4-osd201, c4-osd202 and
> >c4-osd203 with 5 osds per hosts
> >
> SAS
> >
> >- Pool nova_hpedl180 in "ruleset NOVA_HPEDL180" placed on
> c4-osd204,
> >c4-osd205, c4-osd206 with 3 osds per hosts
> >
> SATA
> >
> >- Every OSD has one partition of 35GB in a INTEL SSD 400GB 750
> >SERIES PCIE 3.0 X4
> >
> Overkill, but since your NVMe will die shortly anyway...
>
> With large sequential tests, the journal will have nearly NO impact on the
> result, even if tuned to that effect.
>
> >
> >- Internal link for cluster and public network of 10Gbps
> >
> >
> >- Deployment via ceph-ansible. Same configuration define in
> ansible
> >for every host on cluster
> >
> >
> >
> > *Instance on pool nova in ruleset SAS:*
> >
> >
> ># dd if=/dev/zero of=/mnt/bench bs=1G count=1 oflag=direct
> >1+0 records in
> >1+0 records out
> >1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.56255 s, 419 MB/s
> >
> This is a very small test for what you're trying to determine and not
> going to be very representative.
> If for example there _is_ a HW cache of 2GB on the Dell nodes, it would
> fit nicely in there.
>
>
 Dell has PERC H730 Mini (Embedded) each with cache memory size of 1024 MB
otherwise my HP uses a B140i dynamic array. Both HP and Dell doesn't use
any raid level for the OSDs, just Dell for the Operating System.



> >
> > *Instance on pool nova in ruleset NOVA_HPEDL180:*
> >
> >  #  dd if=/dev/zero of=/mnt/bench bs=1G count=1 oflag=direct
> >  1+0 records in
> >  1+0 records out
> >  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.8243 s, 90.8 MB/s

Re: [ceph-users] Ruleset vs replica count

2017-08-24 Thread David Turner
> min_size 1
STOP THE MADNESS.  Search the ML to realize why you should never user a
min_size of 1.

I'm curious as well as to what this sort of configuration will do for how
many copies are stored between DCs.

On Thu, Aug 24, 2017 at 1:03 PM Sinan Polat  wrote:

> Hi,
>
>
>
> In a Multi Datacenter Cluster I have the following rulesets:
>
> --
>
> rule ams5_ssd {
>
> ruleset 1
>
> type replicated
>
> min_size 1
>
> max_size 10
>
> step take ams5-ssd
>
> step chooseleaf firstn 2 type host
>
> step emit
>
> step take ams6-ssd
>
> step chooseleaf firstn -2 type host
>
> step emit
>
> }
>
> rule ams6_ssd {
>
> ruleset 2
>
> type replicated
>
> min_size 1
>
> max_size 10
>
> step take ams6-ssd
>
> step chooseleaf firstn 2 type host
>
> step emit
>
> step take ams5-ssd
>
> step chooseleaf firstn -2 type host
>
> step emit
>
> }
>
> --
>
>
>
> The replication size is set to 3.
>
>
>
> When for example ruleset 1 is used, how is the replication being done?
> Does it store 2 replica’s in ams5-ssd and store 1 replica in ams6-ssd? Or
> does it store 3 replicas in ams5-ssd and 3 replicas in ams6-ssd?
>
>
>
> Thanks!
>
>
>
> Sinan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ruleset vs replica count

2017-08-24 Thread Sinan Polat
Hi,

 

In a Multi Datacenter Cluster I have the following rulesets:

--

rule ams5_ssd {

ruleset 1

type replicated

min_size 1

max_size 10

step take ams5-ssd

step chooseleaf firstn 2 type host

step emit

step take ams6-ssd

step chooseleaf firstn -2 type host

step emit

}

rule ams6_ssd {

ruleset 2

type replicated

min_size 1

max_size 10

step take ams6-ssd

step chooseleaf firstn 2 type host

step emit

step take ams5-ssd

step chooseleaf firstn -2 type host

step emit

}

--

 

The replication size is set to 3.

 

When for example ruleset 1 is used, how is the replication being done? Does
it store 2 replica's in ams5-ssd and store 1 replica in ams6-ssd? Or does it
store 3 replicas in ams5-ssd and 3 replicas in ams6-ssd?

 

Thanks!

 

Sinan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Day Netherlands: 20-09-2017

2017-08-24 Thread Wido den Hollander
Hi,

In less then a month the Ceph Day in NL is coming up. It will be hosted by BIT 
[0] at their great venue in Ede, NL.

The schedule hasn't been posted yet, we are still working on that. There will 
be a great talk from the people of BIT showing off their SSD-only cluster 
spread out over multiple DCs.

There is a direct train from Schiphol Airport (Amsterdam) to Ede where a 
taxi-service will be arranged to bring you back and from the venue without any 
charge.

Registration is free! :-)

More information: http://ceph.com/cephdays/netherlands2017/

Wido

[0]: https://www.bit.nl/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 回复: cephfs, kernel(4.12.8) client version hung(D status), ceph version 0.94.9

2017-08-24 Thread donglifec...@gmail.com
ZhengYan,

host1 client,  ls hung too, stack is:
cat /proc/863/stack
[] ceph_mdsc_do_request+0x183/0x240 [ceph]
[] __ceph_do_getattr+0xcd/0x1d0 [ceph]
[] ceph_getattr+0x2c/0x100 [ceph]
[] vfs_getattr_nosec+0x9c/0xf0
[] vfs_getattr+0x36/0x40
[] vfs_statx+0x8e/0xe0
[] SYSC_newlstat+0x3d/0x70
[] SyS_newlstat+0xe/0x10
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x

Thanks a lot.



donglifec...@gmail.com
 
发件人: donglifec...@gmail.com
发送时间: 2017-08-24 17:40
收件人: zyan
抄送: ceph-users
主题: [ceph-users]cephfs, kernel(4.12.8) client version hung(D status), ceph 
version 0.94.9
ZhengYan,

I meet a problem,   Follow the steps outlined below:

1.  create 30G file test823
2.  host1 client(kernel 4.12.8)
  cat /mnt/cephfs/a/test823 > /mnt/cephfs/a/test823-backup
  ls -al /mnt/cephfs/a/* 

3. host2 client(kernel 4.12.8)
  while true; do cp /home/scripts/512k.file /mnt/cephfs/a/512k.file$i ; 
done // loop copy file 
  cat /mnt/cephfs/a/test823-backup > /mnt/cephfs/a/newtestfile
  ls -al /mnt/cephfs/a/*
  
4. host2 client hung, stack is :
[ 9462.754853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 9462.756838] bashD0 32738  14988 0x0084
[ 9462.758568] Call Trace:
[ 9462.759945]  __schedule+0x28a/0x880
[ 9462.761414]  schedule+0x36/0x80
[ 9462.762835]  rwsem_down_write_failed+0x20d/0x380
[ 9462.764433]  call_rwsem_down_write_failed+0x17/0x30
[ 9462.766075]  ? __ceph_getxattr+0x340/0x340 [ceph]
[ 9462.767693]  down_write+0x2d/0x40
[ 9462.769175]  do_truncate+0x67/0xc0
[ 9462.770642]  path_openat+0xaba/0x13b0
[ 9462.772136]  do_filp_open+0x91/0x100
[ 9462.773616]  ? __check_object_size+0x159/0x190
[ 9462.775156]  ? __alloc_fd+0x46/0x170
[ 9462.776574]  do_sys_open+0x124/0x210
[ 9462.777972]  SyS_open+0x1e/0x20
[ 9462.779320]  do_syscall_64+0x67/0x150
[ 9462.780736]  entry_SYSCALL64_slow_path+0x25/0x25

[root@cephtest ~]# cat /proc/29541/stack
[] ceph_mdsc_do_request+0x183/0x240 [ceph]
[] __ceph_setattr+0x3fc/0x8b0 [ceph]
[] ceph_setattr+0x3c/0x60 [ceph]
[] notify_change+0x266/0x440
[] do_truncate+0x75/0xc0
[] path_openat+0xaba/0x13b0
[] do_filp_open+0x91/0x100
[] do_sys_open+0x124/0x210
[] SyS_open+0x1e/0x20
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x

[root@cephtest ~]# cat /proc/32738/stack
[] call_rwsem_down_write_failed+0x17/0x30
[] do_truncate+0x67/0xc0
[] path_openat+0xaba/0x13b0
[] do_filp_open+0x91/0x100
[] do_sys_open+0x124/0x210
[] SyS_open+0x1e/0x20
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x

ceph log is:
f pending pAsLsXs issued pAsLsXsFcb, sent 1921.069365 seconds ago
2017-08-24 17:16:00.219523 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 1000424 pending 
pAsLsXs issued pAsLsXsFcb, sent 1921.063079 seconds ago
2017-08-24 17:16:00.219534 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 1000521 pending 
pAsLsXs issued pAsLsXsFcb, sent 1921.026983 seconds ago
2017-08-24 17:16:00.219545 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 1000523 pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.985596 seconds ago
2017-08-24 17:16:00.219574 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 1000528 pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.866863 seconds ago
2017-08-24 17:16:00.219592 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 100052a pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.788282 seconds ago
2017-08-24 17:16:00.219606 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 100052c pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.712564 seconds ago
2017-08-24 17:16:00.219618 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 100052f pending 
pAsLsXs issued pAsLsXsFcb, sent 1920.563784 seconds ago
2017-08-24 17:16:00.219630 7f746db8f700  0 log_channel(cluster) log [WRN] : 
client.268113 isn't responding to mclientcaps(revoke), ino 100040b pending 
pAsLsXsFsc issued pAsLsXsFscb, sent 1920.506752 seconds ago
2017-08-24 17:16:00.219741 7f746db8f700  0 log_channel(cluster) log [WRN] : 4 
slow requests, 1 included below; oldest blocked for > 1941.487238 secs
2017-08-24 17:16:00.219753 7f746db8f700  0 log_channel(cluster) log [WRN] : 
slow request 1920.507384 seconds old, received at 2017-08-24 16:43:59.712319: 
client_request(client.268101:1122217 getattr pAsLsXsFs #100040b 2017-08-24 
16:44:00.463827) currently failed to rdlock, waiting

Thanks a lot.

 





donglifec...@gmail.com
___
ceph-users mailing list