Re: [ceph-users] [EXTERNAL] Ceph performance is too good (impossible..)...

2016-12-12 Thread Will . Boege
My understanding is that when using direct=1 on a raw block device FIO 
(aka-you) will have to handle all the sector alignment or the request will get 
buffered to perform the alignment.

Try adding the –blockalign=512b option to your jobs, or better yet just use the 
native FIO RBD engine.

Something like this (untested) -

[A]
ioengine=rbd
clientname=admin
pool=rbd
rbdname=fio_test
direct=1
group_reporting=1
unified_rw_reporting=1
time_based=1
rw=read
bs=4MB
numjobs=16
ramp_time=10
runtime=20

From: ceph-users  on behalf of V Plus 

Date: Sunday, December 11, 2016 at 7:44 PM
To: "ceph-users@lists.ceph.com" 
Subject: [EXTERNAL] [ceph-users] Ceph performance is too good (impossible..)...

Hi Guys,
we have a ceph cluster with 6 machines (6 OSD per host).
1. I created 2 images in Ceph, and map them to another host A (outside the Ceph 
cluster). On host A, I got /dev/rbd0 and /dev/rbd1.
2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job 
descriptions can be found below)
"sudo fio fioA.job -output a.txt & sudo  fio fioB.job -output b.txt  & wait"
3. After the test, in a.txt, we got bw=1162.7MB/s, in b.txt, we get 
bw=3579.6MB/s.
The results do NOT make sense because there is only one NIC on host A, and its 
limit is 10 Gbps (1.25GB/s).

I suspect it is because of the cache setting.
But I am sure that in file /etc/ceph/ceph.conf on host A,I already added:
[client]
rbd cache = false

Could anyone give me a hint what is missing? why
Thank you very much.

fioA.job:
[A]
direct=1
group_reporting=1
unified_rw_reporting=1
size=100%
time_based=1
filename=/dev/rbd0
rw=read
bs=4MB
numjobs=16
ramp_time=10
runtime=20

fioB.job:
[B]
direct=1
group_reporting=1
unified_rw_reporting=1
size=100%
time_based=1
filename=/dev/rbd1
rw=read
bs=4MB
numjobs=16
ramp_time=10
runtime=20

Thanks...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Will . Boege
Thanks for the explanation.  I guess this case you outlined explains why the 
Ceph developers chose to make this a ‘safe’ default.

2 osds are transiently down and the third fails hard. The PGs on the 3rd osd 
with no more replicas are marked unfound.  You bring up 1 and 2 and these PGs 
will remain unfound because they were stale, at that point you can either 
revert or delete those PGs. Am I understanding that correctly?

I still think there is a cost/benefit conversation you can have around this 
setting.  A 2 OSD failure situation will be far far more probable than the 
‘sequence of events’ type failure you outlined above.  There is a cost to 
several blocked IO events per year - availability, to protect from a data loss 
event that might be a once every three year type thing. 

I guess it’s just where you want to put that needle on the spectrum of 
availability vs integrity.

On 12/7/16, 2:10 PM, "Wido den Hollander"  wrote:


> Op 7 december 2016 om 21:04 schreef "Will.Boege" :
> 
> 
> Hi Wido,
> 
> Just curious how blocking IO to the final replica provides protection 
from data loss?  I’ve never really understood why this is a Ceph best practice. 
 In my head all 3 replicas would be on devices that have roughly the same odds 
of physically failing or getting logically corrupted in any given minute.  Not 
sure how blocking IO prevents this.
> 

Say, disk #1 fails and you have #2 and #3 left. Now #2 fails leaving only 
#3 left.

By block you know that #2 and #3 still have the same data. Although #2 
failed it could be that it is the host which went down but the disk itself is 
just fine. Maybe the SATA cable broke, you never know.

If disk #3 now fails you can still continue your operation if you bring #2 
back. It has the same data on disk as #3 had before it failed. Since you didn't 
allow for any I/O on #3 when #2 went down earlier.

If you would have accepted writes on #3 while #1 and #2 were gone you have 
invalid/old data on #2 by the time it comes back.

Writes were made on #3 but that one really broke down. You managed to get 
#2 back, but it doesn't have the changes which #3 had.

The result is corrupted data.

Does this make sense?

Wido

> On 12/7/16, 9:11 AM, "ceph-users on behalf of LOIC DEVULDER" 
 wrote:
> 
> > -Message d'origine-
> > De : Wido den Hollander [mailto:w...@42on.com]
> > Envoyé : mercredi 7 décembre 2016 16:01
> > À : ceph-us...@ceph.com; LOIC DEVULDER - U329683 

> > Objet : RE: [ceph-users] 2x replication: A BIG warning
> > 
> > 
> > > Op 7 december 2016 om 15:54 schreef LOIC DEVULDER
> > :
> > >
> > >
> > > Hi Wido,
> > >
> > > > As a Ceph consultant I get numerous calls throughout the year to
> > > > help people with getting their broken Ceph clusters back online.
> > > >
> > > > The causes of downtime vary vastly, but one of the biggest 
causes is
> > > > that people use replication 2x. size = 2, min_size = 1.
> > >
> > > We are building a Ceph cluster for our OpenStack and for data 
integrity
> > reasons we have chosen to set size=3. But we want to continue to 
access
> > data if 2 of our 3 osd server are dead, so we decided to set 
min_size=1.
> > >
> > > Is it a (very) bad idea?
> > >
> > 
> > I would say so. Yes, downtime is annoying on your cloud, but data 
loss if
> > even worse, much more worse.
> > 
> > I would always run with min_size = 2 and manually switch to 
min_size = 1
> > if the situation really requires it at that moment.
> > 
> > Loosing two disks at the same time is something which doesn't 
happen that
> > much, but if it happens you don't want to modify any data on the 
only copy
> > which you still have left.
> > 
> > Setting min_size to 1 should be a manual action imho when size = 3 
and you
> > loose two copies. In that case YOU decide at that moment if it is 
the
> > right course of action.
> > 
> > Wido
> 
> Thanks for your quick response!
> 
> That's make sense, I will try to convince my colleagues :-)
> 
> Loic
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
>




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Will . Boege
Hi Wido,

Just curious how blocking IO to the final replica provides protection from data 
loss?  I’ve never really understood why this is a Ceph best practice.  In my 
head all 3 replicas would be on devices that have roughly the same odds of 
physically failing or getting logically corrupted in any given minute.  Not 
sure how blocking IO prevents this.

On 12/7/16, 9:11 AM, "ceph-users on behalf of LOIC DEVULDER" 
 wrote:

> -Message d'origine-
> De : Wido den Hollander [mailto:w...@42on.com]
> Envoyé : mercredi 7 décembre 2016 16:01
> À : ceph-us...@ceph.com; LOIC DEVULDER - U329683 
> Objet : RE: [ceph-users] 2x replication: A BIG warning
> 
> 
> > Op 7 december 2016 om 15:54 schreef LOIC DEVULDER
> :
> >
> >
> > Hi Wido,
> >
> > > As a Ceph consultant I get numerous calls throughout the year to
> > > help people with getting their broken Ceph clusters back online.
> > >
> > > The causes of downtime vary vastly, but one of the biggest causes is
> > > that people use replication 2x. size = 2, min_size = 1.
> >
> > We are building a Ceph cluster for our OpenStack and for data integrity
> reasons we have chosen to set size=3. But we want to continue to access
> data if 2 of our 3 osd server are dead, so we decided to set min_size=1.
> >
> > Is it a (very) bad idea?
> >
> 
> I would say so. Yes, downtime is annoying on your cloud, but data loss if
> even worse, much more worse.
> 
> I would always run with min_size = 2 and manually switch to min_size = 1
> if the situation really requires it at that moment.
> 
> Loosing two disks at the same time is something which doesn't happen that
> much, but if it happens you don't want to modify any data on the only copy
> which you still have left.
> 
> Setting min_size to 1 should be a manual action imho when size = 3 and you
> loose two copies. In that case YOU decide at that moment if it is the
> right course of action.
> 
> Wido

Thanks for your quick response!

That's make sense, I will try to convince my colleagues :-)

Loic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: ceph in an OSPF environment

2016-11-23 Thread Will . Boege
Check your MTU. I think ospf has issues when fragmenting. Try setting your 
interface MTU to something obnoxiously small to ensure that anything upstream 
isn't fragmenting - say 1200. If it works try a saner value like 1496 which 
accounts for any vlan headers. 

If you're running in a spine/leaf you might just want to consider segregating 
Ceph replication traffic by interface and not network.  

I'd also be interested in seeing any reference arch around Ceph in spine leaf 
that anyone has implemented. 

> On Nov 23, 2016, at 11:29 AM, Darrell Enns  wrote:
> 
> You may also need to do something with the "public network" and/or "cluster 
> network" options in ceph.conf.
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Darrell Enns
> Sent: Wednesday, November 23, 2016 9:24 AM
> To: ceph-us...@ceph.com
> Subject: Re: [ceph-users] ceph in an OSPF environment
> 
> As far as I am aware, there is no broadcast or multicast traffic involved (at 
> least, I don't see any on my cluster). So there should be no issue with 
> routing it over layer 3. Have you checked the following:
> 
> - name resolution working on all hosts
> - firewall/acl rules
> - selinux
> - tcpdump the mon traffic (port 6789) to see that it's getting through 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: osd set noin ignored for old OSD ids

2016-11-23 Thread Will . Boege
>From my experience noin doesn't stop new OSDs from being marked in. noin only 
>works on OSDs already in the crushmap. To accomplish the behavior you want 
>I've injected "mon osd auto mark new in = false" into MONs. This also seems to 
>set their OSD weight to 0 when they are created. 

> On Nov 23, 2016, at 1:47 PM, Gregory Farnum  wrote:
> 
> On Tue, Nov 22, 2016 at 7:56 PM, Adrian Saul
>  wrote:
>> 
>> Hi ,
>> As part of migration between hardware I have been building new OSDs and 
>> cleaning up old ones  (osd rm osd.x, osd crush rm osd.x, auth del osd.x).   
>> To try and prevent rebalancing kicking in until all the new OSDs are created 
>> on a host I use "ceph osd set noin", however what I have seen is that if the 
>> new OSD that is created uses a new unique ID, then the flag is honoured and 
>> the OSD remains out until I bring it in.  However if the OSD re-uses a 
>> previous OSD id then it will go straight to in and start backfilling.  I 
>> have to manually out the OSD to stop it (or set nobackfill,norebalance).
>> 
>> Am I doing something wrong in this process or is there something about 
>> "noin" that is ignored for previously existing OSDs that have been removed 
>> from both the OSD map and crush map?
> 
> There are a lot of different pieces of an OSD ID that need to get
> deleted for it to be truly gone; my guess is you've missed some of
> those. The noin flag doesn't prevent unlinked-but-up CRUSH entries
> from getting placed back into the tree, etc.
> 
> We may also have a bug though, so if you can demonstrate that the ID
> doesn't exist in the CRUSH and OSD dumps then please create a ticket
> at tracker.ceph.com!
> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: pg stuck with unfound objects on non exsisting osd's

2016-11-01 Thread Will . Boege
Start with a rolling restart of just the OSDs one system at a time, checking 
the status after each restart.

On Nov 1, 2016, at 6:20 PM, Ronny Aasen 
> wrote:

thanks for the suggestion.

is a rolling reboot sufficient? or must all osd's be down at the same time ?
one is no problem.  the other takes some scheduling..

Ronny Aasen


On 01.11.2016 21:52, c...@elchaka.de wrote:
Hello Ronny,

if it is possible for you, try to Reboot all OSD Nodes.

I had this issue on my test Cluster and it become healthy after rebooting.

Hth
- Mehmet

Am 1. November 2016 19:55:07 MEZ, schrieb Ronny Aasen 
:

Hello.

I have a cluster stuck with 2 pg's stuck undersized degraded, with 25
unfound objects.

# ceph health detail
HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2 pgs stuck 
unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 294599/149522370 
objects degraded (0.197%); recovery 640073/149522370 objects misplaced 
(0.428%); recovery 25/46579241 unfound (0.000%); noout flag(s) set
pg 6.d4 is stuck unclean for 8893374.380079, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck unclean for 8896787.249470, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]
pg 6.d4 is stuck undersized for 438122.427341, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck undersized for 416947.461950, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]pg
6.d4 is stuck degraded for 438122.427402, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck degraded for 416947.462010, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]
pg 6.d4 is active+recovering+undersized+degraded+remapped, acting [62], 25 
unfound
pg 6.ab is active+recovering+undersized+degraded+remapped, acting [18,12]
recovery 294599/149522370 objects degraded (0.197%)
recovery 640073/149522370 objects misplaced (0.428%)
recovery 25/46579241 unfound (0.000%)
noout flag(s) set


have been following the troubleshooting guide at
http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
but gets stuck without a resolution.

luckily it is not critical data. so i wanted to mark the pg lost so it
could become health-ok<
 br
/>

# ceph pg 6.d4 mark_unfound_lost delete
Error EINVAL: pg has 25 unfound objects but we haven't probed all
sources, not marking lost

querying the pg i see that it would want osd.80 and osd 36

  {
 "osd": "80",
 "status": "osd is down"
 },

trying to mark the osd's lost does not work either. since the osd's was
removed from the cluster a long time ago.

# ceph osd lost 80 --yes-i-really-mean-it
osd.80 is not down or doesn't exist

# ceph osd lost 36 --yes-i-really-mean-it
osd.36 is not down or doesn't exist


and this is where i am stuck.

have tried stopping and starting the 3 osd's but that did not have any
effect.

Anyone have any advice how to proceed ?

full output at:  http://paste.debian.net/hidden/be03a185/

this is hammer 0.94.9  on debian 8.


kind regards

Ronny Aasen






ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: Instance filesystem corrupt

2016-10-26 Thread Will . Boege
Strangely enough, I’m also seeing similar user issues – a strangely high volume 
of corrupt instance boot disks.

At this point I’m attributing it to the fact that our Ceph cluster is patched 9 
months ahead of our RedHat OSP Kilo environment.  However that’s a total guess 
at this point…..

From: ceph-users  on behalf of 
"keynes_...@wistron.com" 
Date: Wednesday, October 26, 2016 at 8:28 PM
To: "ahmedmostafa...@gmail.com" , 
"dilla...@redhat.com" 
Cc: "ceph-users@lists.ceph.com" 
Subject: [EXTERNAL] Re: [ceph-users] Instance filesystem corrupt


Hum ~~~seems we have in common
We use
rbd snap create to make snapshot for instances volumes
rbd export and rbd export-diff command to make daily backup.

Now we got 29 instances and 33 volumes


[id:image007.jpg@01D1747D.DB260110]

Keynes  Lee李 俊 賢

Direct:

+886-2-6612-1025

Mobile:

+886-9-1882-3787

Fax:

+886-2-6612-1991



E-Mail:

keynes_...@wistron.com



From: Ahmed Mostafa [mailto:ahmedmostafa...@gmail.com]
Sent: Wednesday, October 26, 2016 9:57 PM
To: dilla...@redhat.com
Cc: Keynes Lee/WHQ/Wistron ; ceph-users 

Subject: Re: [ceph-users] Instance filesystem corrupt

Actually i have the same problem when starting an instance backed up by librbd
But this only happens when trying to start 60+ instance

But I decided that this is due to the fact that we are using old hardware that 
is not able to respond to high demand.

Could that be the same issue that you are facing ?


On Wednesday, 26 October 2016, Jason Dillaman 
> wrote:
I am not aware of any similar reports against librbd on Firefly. Do you use any 
configuration overrides? Does the filesystem corruption appears while the 
instances are running or only after a shutdown / restart of the instance?

On Wed, Oct 26, 2016 at 12:46 AM, 
>
 wrote:
No , we are using Firefly (0.80.7).
As we are using HPE Helion OpenStack 2.1.5, and what the version is was 
embedded is Firefly.

An upgrade was planning, but should will not happen  soon.





From: Will.Boege 
[mailto:will.bo...@target.com]
Sent: Wednesday, October 26, 2016 12:03 PM
To: Keynes Lee/WHQ/Wistron 
>;
 
ceph-users@lists.ceph.com
Subject: Re: [EXTERNAL] [ceph-users] Instance filesystem corrupt

Just out of curiosity, did you recently upgrade to Jewel?

From: ceph-users 
>
 on behalf of 
"keynes_...@wistron.com"
 
>
Date: Tuesday, October 25, 2016 at 10:52 PM
To: 
"ceph-users@lists.ceph.com"
 
>
Subject: [EXTERNAL] [ceph-users] Instance filesystem corrupt

We are using OpenStack + Ceph.
Recently we found a lot of filesystem corrupt incident on instances.
Some of them are correctable, fixed by fsck, but the others have no luck, just 
corrupt and can never start up again.

We found this issue on vary operation systems of instances. They are
Redhat4 / CentOS 7 / Windows 2012

Could someone please advise us some troubleshooting direction ?


[cid:image002.jpg@01D22FE2.0EDB8EB0]

Keynes  Lee李 俊 賢

Direct:

+886-2-6612-1025

Mobile:

+886-9-1882-3787

Fax:

+886-2-6612-1991



E-Mail:

keynes_...@wistron.com




---

This email contains confidential or legally privileged information and is for 
the sole use of its intended recipient.

Any unauthorized review, use, copying or distribution of this email or the 
content of this email is strictly prohibited.

If you are not the intended recipient, you may reply to the sender and should 
delete this e-mail immediately.

---

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
___

Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt

2016-10-25 Thread Will . Boege
Just out of curiosity, did you recently upgrade to Jewel?

From: ceph-users  on behalf of 
"keynes_...@wistron.com" 
Date: Tuesday, October 25, 2016 at 10:52 PM
To: "ceph-users@lists.ceph.com" 
Subject: [EXTERNAL] [ceph-users] Instance filesystem corrupt

We are using OpenStack + Ceph.
Recently we found a lot of filesystem corrupt incident on instances.
Some of them are correctable, fixed by fsck, but the others have no luck, just 
corrupt and can never start up again.

We found this issue on vary operation systems of instances. They are
Redhat4 / CentOS 7 / Windows 2012

Could someone please advise us some troubleshooting direction ?


[id:image007.jpg@01D1747D.DB260110]

Keynes  Lee李 俊 賢

Direct:

+886-2-6612-1025

Mobile:

+886-9-1882-3787

Fax:

+886-2-6612-1991



E-Mail:

keynes_...@wistron.com




---

This email contains confidential or legally privileged information and is for 
the sole use of its intended recipient.

Any unauthorized review, use, copying or distribution of this email or the 
content of this email is strictly prohibited.

If you are not the intended recipient, you may reply to the sender and should 
delete this e-mail immediately.

---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Benchmarks using fio tool gets stuck

2016-10-05 Thread Will . Boege
Because you do not have segregated networks, the cluster traffic is most likely 
drowning out the FIO user traffic.  This is especially exacerbated by the fact 
that it is only a 1gb link between the cluster nodes.

If you are planning on using this cluster for anything other than testing, 
you’ll want to re-evaluate your network architecture.

+  >= 10gbe
+ Dedicated cluster network


From: Mario Rodríguez Molins 
Date: Wednesday, October 5, 2016 at 8:38 AM
To: "Will.Boege" 
Cc: "ceph-users@lists.ceph.com" 
Subject: Re: [EXTERNAL] [ceph-users] Benchmarks using fio tool gets stuck

Hi,

Currently, we do not have a separated cluster network and our setup is:
 - 3 nodes for OSD with 1Gbps links. Each node is running a unique OSD daemon. 
Although we plan to increase the number of OSDs per host.
 - 3 virtual machines also with 1Gbps links, where each vm is running one 
monitor daemon (two of them are running a metadata server too).
 - The two clients used for testing purposes are also 2 vms.

In each run of FIO tool, we do the following steps (all of them in the client):
 1.- Create an rbd image of 1Gb within a pool and map this image to a block 
device
 2.- Create the ext4 filesystem in this block device
 3.- Unmap the device from the client
 4.- Before testing, drop caches (echo 3 | tee /proc/sys/vm/drop_caches && sync)
 5.- Perform the fio test, setting the pool and name of the rbd image. In each 
run, the block size used is changed.
 6.- Remove the image from the pool



Thanks in advance!

On Wed, Oct 5, 2016 at 2:57 PM, Will.Boege 
> wrote:
What does your network setup look like?  Do you have a separate cluster network?

Can you explain how you are performing the FIO test? Are you mounting a volume 
through krbd and testing that from a different server?

On Oct 5, 2016, at 3:11 AM, Mario Rodríguez Molins 
> wrote:
Hello,

We are setting a new cluster of Ceph and doing some benchmarks on it.
At this moment, our cluster consists of:
 - 3 nodes for OSD. In our current configuration one daemon per node.
 - 3 nodes for monitors (MON). In two of these nodes, there is a metadata 
server (MDS).

Benchmarks are performed with tools that ceph/rados provides us as well as with 
fio benchmark tool.
Our benchmark tests are based on this tutorial: 
http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance.

Using fio benchmark tool, we are having some issues. After some executions, the 
fio process gets stuck with futex_wait_queue_me call:
# cat /proc/14413/stack
[] futex_wait_queue_me+0xd2/0x140
[] futex_wait+0xff/0x260
[] wake_up_q+0x2d/0x60
[] futex_requeue+0x2c1/0x930
[] do_futex+0x2b1/0xb20
[] handle_mm_fault+0x14e1/0x1cd0
[] wake_up_new_task+0x108/0x1a0
[] SyS_futex+0x83/0x180
[] __do_page_fault+0x221/0x510
[] system_call_fast_compare_end+0xc/0x96
[] 0x

Logs of osd and mon daemons do not show any information or error about what the 
problem could be.

Executing strace command to trace the execution of the fio process show the 
following:

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632809, {1475609725, 98199000}, ) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 98347}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 345690227}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632811, {1475609725, 348199000},  
[pid 14429] <... futex resumed> )   = -1 ETIMEDOUT (Connection timed out)
[pid 14429] clock_gettime(CLOCK_REALTIME, {1475609725, 127563261}) = 0
[pid 14429] futex(0x7cefc8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14429] futex(0x7cf01c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
79103, {1475609727, 127563261},  
[pid 14416] <... futex resumed> )   = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 348403}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 595788486}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632813, {1475609725, 598199000}, ) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 598360}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 845712817}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632815, {1475609725, 848199000}, ) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 848353}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 95705677}) = 0
[pid 14416] futex(0x7fffdffa16fc, 

Re: [ceph-users] [EXTERNAL] Benchmarks using fio tool gets stuck

2016-10-05 Thread Will . Boege
What does your network setup look like?  Do you have a separate cluster network?

Can you explain how you are performing the FIO test? Are you mounting a volume 
through krbd and testing that from a different server?

On Oct 5, 2016, at 3:11 AM, Mario Rodríguez Molins 
> wrote:

Hello,

We are setting a new cluster of Ceph and doing some benchmarks on it.
At this moment, our cluster consists of:
 - 3 nodes for OSD. In our current configuration one daemon per node.
 - 3 nodes for monitors (MON). In two of these nodes, there is a metadata 
server (MDS).

Benchmarks are performed with tools that ceph/rados provides us as well as with 
fio benchmark tool.
Our benchmark tests are based on this tutorial: 
http://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance.

Using fio benchmark tool, we are having some issues. After some executions, the 
fio process gets stuck with futex_wait_queue_me call:
# cat /proc/14413/stack
[] futex_wait_queue_me+0xd2/0x140
[] futex_wait+0xff/0x260
[] wake_up_q+0x2d/0x60
[] futex_requeue+0x2c1/0x930
[] do_futex+0x2b1/0xb20
[] handle_mm_fault+0x14e1/0x1cd0
[] wake_up_new_task+0x108/0x1a0
[] SyS_futex+0x83/0x180
[] __do_page_fault+0x221/0x510
[] system_call_fast_compare_end+0xc/0x96
[] 0x

Logs of osd and mon daemons do not show any information or error about what the 
problem could be.

Executing strace command to trace the execution of the fio process show the 
following:

[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632809, {1475609725, 98199000}, ) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 98347}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 345690227}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632811, {1475609725, 348199000},  
[pid 14429] <... futex resumed> )   = -1 ETIMEDOUT (Connection timed out)
[pid 14429] clock_gettime(CLOCK_REALTIME, {1475609725, 127563261}) = 0
[pid 14429] futex(0x7cefc8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14429] futex(0x7cf01c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
79103, {1475609727, 127563261},  
[pid 14416] <... futex resumed> )   = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 348403}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 595788486}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632813, {1475609725, 598199000}, ) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 598360}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 845712817}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632815, {1475609725, 848199000}, ) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609725, 848353}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 95705677}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632817, {1475609726, 98199000}, ) = -1 ETIMEDOUT (Connection timed out)
[pid 14416] gettimeofday({1475609726, 98359}, NULL) = 0
[pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
[pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 345711731}) = 0
[pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 
632819, {1475609726, 348199000},  
[pid 14418] <... futex resumed> )   = -1 ETIMEDOUT (Connection timed out)
[pid 14418] futex(0x7c1f08, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14418] clock_gettime(CLOCK_REALTIME, {1475609726, 103526543}) = 0
[pid 14418] futex(0x7c1f5c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
31641, {1475609731, 103526543},  
[pid 14419] <... futex resumed> )   = -1 ETIMEDOUT (Connection timed out)



[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730557149}) = 0
[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730727417}) = 0
[pid 14423] futex(0x7c8c34, FUTEX_CMP_REQUEUE_PRIVATE, 1, 
2147483647, 0x7c8b60, 15902 
[pid 14425] <... futex resumed> )   = 0
[pid 14425] futex(0x7c8b60, FUTEX_WAIT_PRIVATE, 2, NULL 
[pid 14423] <... futex resumed> )   = 1
[pid 14423] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1 
[pid 14425] <... futex resumed> )   = 0
[pid 14425] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 14425] clock_gettime(CLOCK_REALTIME, {1475609728, 731160249}) = 0
[pid 14425] sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
{"\200\4\364W\271\236\224+", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) 
= 9
[pid 14425] futex(0x7c8c34, FUTEX_WAIT_PRIVATE, 15903, NULL 
[pid 14423] <... futex resumed> )   = 1
[pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 

Re: [ceph-users] [EXTERNAL] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

2016-09-22 Thread Will . Boege
Just went through this upgrading a ~400 OSD cluster. I was in the EXACT spot 
you were in. The faster you can get all OSDs to the same version as the MONs 
the better. We decided to power forward and the performance got better for 
every OSD node we patched. 

Additionally I also discovered your LevelDBs will start growing exponentially 
if you leave your cluster in that state for too long. 

Pretty sure the downrev OSDs are aggressively getting osdmaps from the MONs 
causing some kind of spinlock condition. 

> On Sep 21, 2016, at 4:21 PM, Stillwell, Bryan J  
> wrote:
> 
> While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9 I've
> run into serious performance issues every time I restart an OSD.
> 
> At first I thought the problem I was running into was caused by the osdmap
> encoding bug that Dan and Wido ran into when upgrading to 0.94.7, because
> I was seeing a ton (millions) of these messages in the logs:
> 
> 2016-09-21 20:48:32.831040 osd.504 24.161.248.128:6810/96488 24 : cluster
> [WRN] failed to encode map e727985 with expected cry
> 
> Here are the links to their descriptions of the problem:
> 
> http://www.spinics.net/lists/ceph-devel/msg30450.html
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg30783.html
> 
> I tried the solution of using the following command to stop those errors
> from occurring:
> 
> ceph tell osd.* injectargs '--clog_to_monitors false'
> 
> Which did get the messages to stop spamming the log files, however, it
> didn't fix the performance issue for me.
> 
> Using dstat on the mon nodes I was able to determine that every time the
> osdmap is updated (by running 'ceph osd pool set data size 2' in this
> example) it causes the outgoing network on all mon nodes to be saturated
> for multiple seconds at a time:
> 
> system total-cpu-usage --memory-usage- -net/total-
> -dsk/total- --io/total-
> time |usr sys idl wai hiq siq| used  buff  cach  free| recv
> send| read  writ| read  writ
> 21-09 21:06:53|  1   0  99   0   0   0|11.8G  273M 18.7G  221G|2326k
> 9015k|   0  1348k|   0  16.0
> 21-09 21:06:54|  1   1  98   0   0   0|11.9G  273M 18.7G  221G|  15M
> 10M|   0  1312k|   0  16.0
> 21-09 21:06:55|  2   2  94   0   0   1|12.3G  273M 18.7G  220G|  14M
> 311M|   048M|   0   309
> 21-09 21:06:56|  2   3  93   0   0   3|12.2G  273M 18.7G  220G|7745k
> 1190M|   016M|   0  93.0
> 21-09 21:06:57|  1   2  96   0   0   1|12.0G  273M 18.7G  220G|8269k
> 1189M|   0  1956k|   0  10.0
> 21-09 21:06:58|  3   1  95   0   0   1|11.8G  273M 18.7G  221G|4854k
> 752M|   0  4960k|   0  21.0
> 21-09 21:06:59|  3   0  97   0   0   0|11.8G  273M 18.7G  221G|3098k
> 25M|   0  5036k|   0  26.0
> 21-09 21:07:00|  1   0  98   0   0   0|11.8G  273M 18.7G  221G|2247k
> 25M|   0  9980k|   0  45.0
> 21-09 21:07:01|  2   1  97   0   0   0|11.8G  273M 18.7G  221G|4149k
> 17M|   076M|   0   427
> 
> That would be 1190 MiB/s (or 9.982 Gbps).
> 
> Restarting every OSD on a node at once as part of the upgrade causes a
> couple minutes worth of network saturation on all three mon nodes.  This
> causes thousands of slow requests and many unhappy OpenStack users.
> 
> I'm now stuck about 15% into the upgrade and haven't been able to
> determine how to move forward (or even backward) without causing another
> outage.
> 
> I've attempted to run the same test on another cluster with 1300+ OSDs and
> the outgoing network on the mon nodes didn't exceed 15 MiB/s (0.126 Gbps).
> 
> Any suggestions on how I can proceed?
> 
> Thanks,
> Bryan
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: jewel blocked requests

2016-09-19 Thread Will . Boege
Sorry make that 'ceph tell osd.* version'

> On Sep 19, 2016, at 2:55 PM, WRIGHT, JON R (JON R)  
> wrote:
> 
> When you say client, we're actually doing everything through Openstack vms 
> and cinder block devices.
> 
> librbd and librados are:
> 
> /usr/lib/librbd.so.1.0.0
> 
> /usr/lib/librados.so.2
> 
> 
> But I think this problem may have been related to a disk going back.  We got 
> Disk I/O errors over the weekend and are replacing a disk, and I think the 
> blocked requests may have all been associated with PGs that included the bad 
> OSD/disk.
> 
> 
> Would this make sense?
> 
> 
> Jon
> 
> On 9/15/2016 3:49 AM, Wido den Hollander wrote:
> 
>>> Op 13 september 2016 om 18:54 schreef "WRIGHT, JON R (JON R)" 
>>> :
>>> 
>>> 
>>> VM Client OS: ubuntu 14.04
>>> 
>>> Openstack: kilo
>>> 
>>> libvirt: 1.2.12
>>> 
>>> nova-compute-kvm: 1:2015.1.4-0ubuntu2
>> What librados/librbd version are you running on the client?
>> 
>> Wido
>> 
>>> Jon
>>> 
>>> On 9/13/2016 11:17 AM, Wido den Hollander wrote:
>>> 
> Op 13 september 2016 om 15:58 schreef "WRIGHT, JON R (JON R)" 
> :
> 
> 
> Yes, I do have old clients running.  The clients are all vms.  Is it
> typical that vm clients have to be rebuilt after a ceph upgrade?
 No, not always, but it is just that I saw this happening recently after a 
 Jewel upgrade.
 
 What version are the client(s) still running?
 
 Wido
 
> Thanks,
> 
> Jon
> 
> 
> On 9/12/2016 4:05 PM, Wido den Hollander wrote:
>>> Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" 
>>> :
>>> 
>>> 
>>> Since upgrading to Jewel from Hammer, we're started to see HEALTH_WARN
>>> because of 'blocked requests > 32 sec'.   Seems to be related to writes.
>>> 
>>> Has anyone else seen this?  Or can anyone suggest what the problem 
>>> might be?
>> Do you by any chance have old clients connecting? I saw this after a 
>> Jewel upgrade as well and it was because of very old clients still 
>> connecting to the cluster.
>> 
>> Wido
>> 
>>> Thanks!
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: jewel blocked requests

2016-09-19 Thread Will . Boege
Do you still have OSDs that aren't upgraded?

What does a 'ceph tell osd.* show' ?

> On Sep 19, 2016, at 2:55 PM, WRIGHT, JON R (JON R)  
> wrote:
> 
> When you say client, we're actually doing everything through Openstack vms 
> and cinder block devices.
> 
> librbd and librados are:
> 
> /usr/lib/librbd.so.1.0.0
> 
> /usr/lib/librados.so.2
> 
> 
> But I think this problem may have been related to a disk going back.  We got 
> Disk I/O errors over the weekend and are replacing a disk, and I think the 
> blocked requests may have all been associated with PGs that included the bad 
> OSD/disk.
> 
> 
> Would this make sense?
> 
> 
> Jon
> 
> On 9/15/2016 3:49 AM, Wido den Hollander wrote:
> 
>>> Op 13 september 2016 om 18:54 schreef "WRIGHT, JON R (JON R)" 
>>> :
>>> 
>>> 
>>> VM Client OS: ubuntu 14.04
>>> 
>>> Openstack: kilo
>>> 
>>> libvirt: 1.2.12
>>> 
>>> nova-compute-kvm: 1:2015.1.4-0ubuntu2
>> What librados/librbd version are you running on the client?
>> 
>> Wido
>> 
>>> Jon
>>> 
>>> On 9/13/2016 11:17 AM, Wido den Hollander wrote:
>>> 
> Op 13 september 2016 om 15:58 schreef "WRIGHT, JON R (JON R)" 
> :
> 
> 
> Yes, I do have old clients running.  The clients are all vms.  Is it
> typical that vm clients have to be rebuilt after a ceph upgrade?
 No, not always, but it is just that I saw this happening recently after a 
 Jewel upgrade.
 
 What version are the client(s) still running?
 
 Wido
 
> Thanks,
> 
> Jon
> 
> 
> On 9/12/2016 4:05 PM, Wido den Hollander wrote:
>>> Op 12 september 2016 om 18:47 schreef "WRIGHT, JON R (JON R)" 
>>> :
>>> 
>>> 
>>> Since upgrading to Jewel from Hammer, we're started to see HEALTH_WARN
>>> because of 'blocked requests > 32 sec'.   Seems to be related to writes.
>>> 
>>> Has anyone else seen this?  Or can anyone suggest what the problem 
>>> might be?
>> Do you by any chance have old clients connecting? I saw this after a 
>> Jewel upgrade as well and it was because of very old clients still 
>> connecting to the cluster.
>> 
>> Wido
>> 
>>> Thanks!
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [EXTERNAL] Re: Increase PG number

2016-09-18 Thread Will . Boege
How many PGs do you have - and how many are you increasing it to?

Increasing PG counts can be disruptive if you are increasing by a large 
proportion of the initial count because all the PG peering involved.  If you 
are doubling the amount of PGs it might be good to do it in stages to minimize 
peering.  For example if you are going from 1024 to 2048 - consider 4 increases 
of 256, allowing the cluster to stabilize in-between, rather that one event 
that doubles the number of PGs.

If you expect this cluster to grow, overshoot the recommended PG count by 50% 
or so.  This will allow you to minimize the PG increase events, and thusly 
impact to your users.

From: ceph-users 
> 
on behalf of Matteo Dacrema >
Date: Sunday, September 18, 2016 at 3:29 PM
To: Goncalo Borges 
>, 
"ceph-users@lists.ceph.com" 
>
Subject: [EXTERNAL] Re: [ceph-users] Increase PG number

Hi , thanks for your reply.

Yes, I’don’t any near full osd.

The problem is not the rebalancing process but the process of creation of new 
pgs.

I’ve only 2 host running Ceph Firefly version with 3 SSDs for journaling each.
During the creation of new pgs all the volumes attached stop to read or write 
showing high iowait.
Ceph -s tell me that there are thousand of slow requests.

When all the pgs are created slow request begin to decrease and the cluster 
start rebalancing process.

Matteo

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.

Il giorno 18 set 2016, alle ore 13:08, Goncalo Borges 
> ha scritto:

Hi
I am assuming that you do not have any near full osd  (either before or along 
the pg splitting process) and that your cluster is healthy.

To minimize the impact on the clients during recover or operations like pg 
splitting, it is good to set the following configs. Obviously the whole 
operation will take longer to recover but the impact on clients will be 
minimized.

#  ceph daemon mon.rccephmon1 config show | egrep 
"(osd_max_backfills|osd_recovery_threads|osd_recovery_op_priority|osd_client_op_priority|osd_recovery_max_active)"
   "osd_max_backfills": "1",
   "osd_recovery_threads": "1",
   "osd_recovery_max_active": "1"
   "osd_client_op_priority": "63",
   "osd_recovery_op_priority": "1"

Cheers
G.

From: ceph-users 
[ceph-users-boun...@lists.ceph.com] 
on behalf of Matteo Dacrema [mdacr...@enter.eu]
Sent: 18 September 2016 03:42
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Increase PG number

Hi All,

I need to expand my ceph cluster and I also need to increase pg number.
In a test environment I see that during pg creation all read and write 
operations are stopped.

Is that a normal behavior ?

Thanks
Matteo
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.


--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Seguire il link qui sotto per segnalarlo come spam:
http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=D6CF2401EE.A1426



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Keystone RADOSGW ACLs

2015-10-19 Thread Will . Boege
I'm working with some teams who would like to not only create ACLs within 
RADOSGW to a tenant level, they would like to tailor ACLs to users within that 
tenant.  After trial and error, I can only seem to get ACLs to stick at a 
tenant level using the keystone tenant ID uuid.

Is this expected behavior for RadosGW ?  Can you only assign bucket ACLs on a 
tenant level with Keystone auth?  There doesn't seem to be a lot of doco out 
there around RadosGW with Keystone auth and its implications.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is there a way to configure a cluster_network for a running cluster?

2015-08-17 Thread Will . Boege
Thinking this through, pretty sure you would need to take your cluster
offline to do this.  I can¹t think of a scenario where you could reliably
keep quorum as you swap your monitors to use the cluster network.

On 8/10/15, 8:59 AM, Daniel Marks daniel.ma...@codecentric.de wrote:

Hi all,

we just found out that our ceph-cluster communicates over the ceph public
network only. Looks like we forgot to configure the cluster_network
parameter during deployment ( :facepalm: ). We are running ceph version
0.94.1 on ubuntu 14.04.1

Is there any documentation or any known procedure to properly configure a
ceph_cluster network for a running cluster (maybe via injectargs)? In
which order should OSDs, MONs and MDSs be configured?

Best regards,
Daniel Marks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests going up and down

2015-07-14 Thread Will . Boege
In my experience I have seen something like this this happen twice - First
time there were unclean PGs because Ceph was down to one replica of a PG.
When that happens Ceph blocks IO to remaining replicas when the number
falls below the Œmin_size¹ parameter. That will manifest as blocked ops.
Second time the disk was Œsoft-failing¹ - gaining many bad sectors but
SMART still reported the drive as OK.  Maybe check OSD.5 and OSD.7 for low
level media errors with a tool like MegaCli, or whatever controller
management tool comes with your hardware.
At any rate, restarting the problem-child OSD is probably troubleshooting
step #1, which you have done.

On 7/14/15, 6:45 AM, Deneau, Tom tom.den...@amd.com wrote:

I don't think there were any stale or unclean PGs,  (when there are,
I have seen health detail list them and it did not in this case).
I have since restarted the 2 osds and the health went immediately to
HEALTH_OK.

-- Tom

 -Original Message-
 From: Will.Boege [mailto:will.bo...@target.com]
 Sent: Monday, July 13, 2015 10:19 PM
 To: Deneau, Tom; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] slow requests going up and down
 
 Does the ceph health detail show anything about stale or unclean PGs, or
 are you just getting the blocked ops messages?
 
 On 7/13/15, 5:38 PM, Deneau, Tom tom.den...@amd.com wrote:
 
 I have a cluster where over the weekend something happened and
successive
 calls to ceph health detail show things like below.
 What does it mean when the number of blocked requests goes up and down
 like this?
 Some clients are still running successfully.
 
 -- Tom Deneau, AMD
 
 
 
 HEALTH_WARN 20 requests are blocked  32 sec; 2 osds have slow requests
 20 ops are blocked  536871 sec
 2 ops are blocked  536871 sec on osd.5
 18 ops are blocked  536871 sec on osd.7
 2 osds have slow requests
 
 HEALTH_WARN 4 requests are blocked  32 sec; 2 osds have slow requests
 4 ops are blocked  536871 sec
 2 ops are blocked  536871 sec on osd.5
 2 ops are blocked  536871 sec on osd.7
 2 osds have slow requests
 
 HEALTH_WARN 27 requests are blocked  32 sec; 2 osds have slow requests
 27 ops are blocked  536871 sec
 2 ops are blocked  536871 sec on osd.5
 25 ops are blocked  536871 sec on osd.7
 2 osds have slow requests
 
 HEALTH_WARN 34 requests are blocked  32 sec; 2 osds have slow requests
 34 ops are blocked  536871 sec
 9 ops are blocked  536871 sec on osd.5
 25 ops are blocked  536871 sec on osd.7
 2 osds have slow requests
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests going up and down

2015-07-13 Thread Will . Boege
Does the ceph health detail show anything about stale or unclean PGs, or
are you just getting the blocked ops messages?

On 7/13/15, 5:38 PM, Deneau, Tom tom.den...@amd.com wrote:

I have a cluster where over the weekend something happened and successive
calls to ceph health detail show things like below.
What does it mean when the number of blocked requests goes up and down
like this?
Some clients are still running successfully.

-- Tom Deneau, AMD



HEALTH_WARN 20 requests are blocked  32 sec; 2 osds have slow requests
20 ops are blocked  536871 sec
2 ops are blocked  536871 sec on osd.5
18 ops are blocked  536871 sec on osd.7
2 osds have slow requests

HEALTH_WARN 4 requests are blocked  32 sec; 2 osds have slow requests
4 ops are blocked  536871 sec
2 ops are blocked  536871 sec on osd.5
2 ops are blocked  536871 sec on osd.7
2 osds have slow requests

HEALTH_WARN 27 requests are blocked  32 sec; 2 osds have slow requests
27 ops are blocked  536871 sec
2 ops are blocked  536871 sec on osd.5
25 ops are blocked  536871 sec on osd.7
2 osds have slow requests

HEALTH_WARN 34 requests are blocked  32 sec; 2 osds have slow requests
34 ops are blocked  536871 sec
9 ops are blocked  536871 sec on osd.5
25 ops are blocked  536871 sec on osd.7
2 osds have slow requests
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com