Re: [ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-03 Thread Gregory Farnum
The quick and dirty cleanup is to restart the OSDs hosting those PGs.
They might have gotten some stuck ops which didn't get woken up; a few
bugs like that have gone by and are resolved in various stable
branches (I'm not sure what release binaries they're in).

On Wed, Feb 3, 2016 at 11:32 PM, Nikola Ciprich
 wrote:
>> Yeah, these inactive PGs are basically guaranteed to be the cause of
>> the problem. There are lots of threads about getting PGs healthy
>> again; you should dig around the archives and the documentation
>> troubleshooting page(s). :)
>> -Greg
>
> Hello Gregory,
>
> well, I wouldn't doubt it, but when the problems started, the only
> unclean pages were some remapped, no inactive, so I guess it must've
> been something else..
>
> but I'm now struggling to get rid of those inactive of course..
> however I've not been successfull so far, I've probably read all
> the related docs and discussions and still haven't found similar
> problem..
>
> pg 6.11 is stuck stale for 79285.647847, current state stale+active+clean, 
> last acting [4,10,8]
> pg 3.198 is stuck stale for 79367.532437, current state stale+active+clean, 
> last acting [8,13]
>
> those two are stale for some reason.. but OSDS 4, 8, 10, 13 are running, there
> are no network problems.. PG query on those just hangs..
>
> I'm running ot of ideas here..
>
> nik
>
>
> --
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 00 Ostrava
>
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
>
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-03 Thread Nikola Ciprich
> Yeah, these inactive PGs are basically guaranteed to be the cause of
> the problem. There are lots of threads about getting PGs healthy
> again; you should dig around the archives and the documentation
> troubleshooting page(s). :)
> -Greg

Hello Gregory,

well, I wouldn't doubt it, but when the problems started, the only
unclean pages were some remapped, no inactive, so I guess it must've
been something else..

but I'm now struggling to get rid of those inactive of course..
however I've not been successfull so far, I've probably read all
the related docs and discussions and still haven't found similar
problem..

pg 6.11 is stuck stale for 79285.647847, current state stale+active+clean, last 
acting [4,10,8]
pg 3.198 is stuck stale for 79367.532437, current state stale+active+clean, 
last acting [8,13]

those two are stale for some reason.. but OSDS 4, 8, 10, 13 are running, there
are no network problems.. PG query on those just hangs..

I'm running ot of ideas here..

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpKKRsiUcBaT.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading with mon & osd on same host

2016-02-03 Thread Mika c
Hi,
>** Do the packages (Debian) restart the services upon upgrade?*
​​No need, restart by yourself.

>*Do I need to actually stop all OSDs, or can I upgrade them one by one?*
No need to stop. Just upgrade osd server one by one and restart each osd
daemons.



Best wishes,
Mika


2016-02-03 18:55 GMT+08:00 Udo Waechter :

> Hi,
>
> I would like to upgrade my ceph cluster from hammer to infernalis.
>
> I'm reading the upgrade notes, that I need to upgrade & restart the
> monitors first, then the OSDs.
>
> Now, my cluster has OSDs and Mons on the same hosts (I know that should
> not be the case, but it is :( ).
>
> I'm just wondering:
> * Do the packages (Debian) restart the services upon upgrade?
>
>
> In theory it should work this way:
>
> * install / upgrade the new packages
> * restart all mons
> * stop OSD one by one and change the user accordingly.
>
> Another question then:
>
> Do I need to actually stop all OSDs, or can I upgrade them one by one?
> I don't want to take the whole cluster down :(
>
> Thanks very much,
> udo.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Performance issues related to scrubbing

2016-02-03 Thread Christian Balzer

Hello,

On Wed, 3 Feb 2016 17:48:02 -0800 Cullen King wrote:

> Hello,
> 
> I've been trying to nail down a nasty performance issue related to
> scrubbing. I am mostly using radosgw with a handful of buckets containing
> millions of various sized objects. When ceph scrubs, both regular and
> deep, radosgw blocks on external requests, and my cluster has a bunch of
> requests that have blocked for > 32 seconds. Frequently OSDs are marked
> down.
>   
>From my own (painful) experiences let me state this:

1. When your cluster runs out of steam during deep-scrubs, drop what
you're doing and order more HW (OSDs).
Because this is a sign that it would also be in trouble when doing
recoveries. 

2. If you cluster is inconvenienced by even mere scrubs, you're really in
trouble. 
Threaten the penny pincher with bodily violence and have that new HW
phased in yesterday.

> According to atop, the OSDs being deep scrubbed are reading at only 5mb/s
> to 8mb/s, and a scrub of a 6.4gb placement group takes 10-20 minutes.
> 
> Here's a screenshot of atop from a node:
> https://s3.amazonaws.com/rwgps/screenshots/DgSSRyeF.png
>   
This looks familiar. 
Basically at this point in time the competing read request for all the
objects clash with write requests and completely saturate your HD (about
120 IOPS and 85% busy according to your atop screenshot). 

There are ceph configuration options that can mitigate this to some
extend and which I don't see in your config, like
"osd_scrub_load_threshold" and "osd_scrub_sleep" along with the various IO
priority settings.
However the points above still stand.

XFS defragmentation might help, significantly if your FS is badly
fragmented. But again, this is only a temporary band-aid.

> First question: is this a reasonable speed for scrubbing, given a very
> lightly used cluster? Here's some cluster details:
> 
> deploy@drexler:~$ ceph --version
> ceph version 0.94.1-5-g85a68f9 (85a68f9a8237f7e74f44a1d1fbbd6cb4ac50f8e8)
> 
> 
> 2x Xeon E5-2630 per node, 64gb of ram per node.
>  
More memory can help by keeping hot objects in the page cache (so the
actual disks need not be read and can write at their full IOPS capacity).
A lot of memory (and the correct sysctl settings) will also allow for a
large SLAB space, keeping all those directory entries and other bits in
memory without having to go to disk to get them.

You seem to be just fine CPU wise. 

> 
> deploy@drexler:~$ ceph status
> cluster 234c6825-0e2b-4256-a710-71d29f4f023e
>  health HEALTH_WARN
> 118 requests are blocked > 32 sec
>  monmap e1: 3 mons at {drexler=
> 10.0.0.36:6789/0,lucy=10.0.0.38:6789/0,paley=10.0.0.34:6789/0}
> election epoch 296, quorum 0,1,2 paley,drexler,lucy
>  mdsmap e19989: 1/1/1 up {0=lucy=up:active}, 1 up:standby
>  osdmap e1115: 12 osds: 12 up, 12 in
>   pgmap v21748062: 1424 pgs, 17 pools, 3185 GB data, 20493 kobjects
> 10060 GB used, 34629 GB / 44690 GB avail
> 1422 active+clean
>1 active+clean+scrubbing+deep
>1 active+clean+scrubbing
>   client io 721 kB/s rd, 33398 B/s wr, 53 op/s
>   
You want to avoid having scrubs going on willy-nilly in parallel and at
high peek times, even IF your cluster is capable of handling them.

Depending on how busy your cluster is and its usage pattern, you may do
what I did. 
Kick off a deep scrub of all OSDs "ceph osd deep-scrub \*" like 01:00 on a
Saturday morning. 
If your cluster is fast enough, it will finish before 07:00 (without
killing your client performance) and all regular scrubs will now happen in
that time frame as well (given default settings).
If your cluster isn't fast enough, see my initial 2 points. ^o^

> deploy@drexler:~$ ceph osd tree
> ID WEIGHT   TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 43.67999 root default
> -2 14.56000 host paley
>  0  3.64000 osd.0 up  1.0  1.0
>  3  3.64000 osd.3 up  1.0  1.0
>  6  3.64000 osd.6 up  1.0  1.0
>  9  3.64000 osd.9 up  1.0  1.0
> -3 14.56000 host lucy
>  1  3.64000 osd.1 up  1.0  1.0
>  4  3.64000 osd.4 up  1.0  1.0
>  7  3.64000 osd.7 up  1.0  1.0
> 11  3.64000 osd.11up  1.0  1.0
> -4 14.56000 host drexler
>  2  3.64000 osd.2 up  1.0  1.0
>  5  3.64000 osd.5 up  1.0  1.0
>  8  3.64000 osd.8 up  1.0  1.0
> 10  3.64000 osd.10up  1.0  1.0
> 
> 
> My OSDs are 4tb 7200rpm Hitachi DeskStars, using XFS, with Samsung 850
> Pro journals (very slow, ordered s3700 replacements, but shouldn't pose
> problems for reading as far as I understand things).   

Just to make sure, these are genuine DeskStars?
I'm aski

Re: [ceph-users] Fwd: HEALTH_WARN pool vol has too few pgs

2016-02-03 Thread M Ranga Swami Reddy
Yes, if I change the pg_num on current pool, cluster rebalance start..
Alternatively - I plan to do as below:
1. Createa a new pool with max possible pg_num (as per the pg calc).
2. Copy the current pool to new pool  (during this step - IO should be stopped)
3. Rename the curent pool current.old and rename the new pool to current pool.

After 3rd step - I guess, cluster should be fine without rebalance.

Thanks
Swami


On Thu, Feb 4, 2016 at 11:38 AM, Somnath Roy  wrote:
> You can increase it, but, that will trigger rebalancing and based on the 
> amount of data it will take some time before cluster is coming into clean 
> state.
> Client IO performance will be affected during this.
> BTW this is not really an error , it is a warning because performance on that 
> pool will be affected because of low pg count.
>
> Thanks & Regards
> Somnath
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of M 
> Ranga Swami Reddy
> Sent: Wednesday, February 03, 2016 9:48 PM
> To: Ferhat Ozkasgarli
> Cc: ceph-users
> Subject: Re: [ceph-users] Fwd: HEALTH_WARN pool vol has too few pgs
>
> Current pg_num: 4096.  As per the PG num formula, no OSD * 100/pool size ->
> 184 * 100/3 = 6133, so I can increase to 8192. Is this solves the problem?
>
> Thanks
> Swami
>
> On Thu, Feb 4, 2016 at 2:14 AM, Ferhat Ozkasgarli  
> wrote:
>> As message satates, you must increase placement group number for the pool.
>> Because 108T data require larger pg mumber.
>>
>> On Feb 3, 2016 8:09 PM, "M Ranga Swami Reddy"  wrote:
>>>
>>> Hi,
>>>
>>> I am using ceph for my storage cluster and health shows as WARN state
>>> with too few pgs.
>>>
>>> ==
>>> health HEALTH_WARN pool volumes has too few pgs ==
>>>
>>> The volume pool has 4096 pgs
>>> --
>>> ceph osd pool get volumes pg_num
>>> pg_num: 4096
>>> ---
>>>
>>> and
>>> >ceph df
>>> NAME   ID USED  %USED MAX AVAIL
>>> OBJECTS
>>> volumes4  2830G  0.82  108T
>>> 763509
>>> --
>>>
>>> How do we fix this, without downtime?
>>>
>>> Thanks
>>> Swami
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> PLEASE NOTE: The information contained in this electronic mail message is 
> intended only for the use of the designated recipient(s) named above. If the 
> reader of this message is not the intended recipient, you are hereby notified 
> that you have received this message in error and that any review, 
> dissemination, distribution, or copying of this message is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender by telephone or e-mail (as shown above) immediately and destroy 
> any and all copies of this message in your possession (whether hard copies or 
> electronically stored copies).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: HEALTH_WARN pool vol has too few pgs

2016-02-03 Thread Somnath Roy
You can increase it, but, that will trigger rebalancing and based on the amount 
of data it will take some time before cluster is coming into clean state.
Client IO performance will be affected during this.
BTW this is not really an error , it is a warning because performance on that 
pool will be affected because of low pg count.

Thanks & Regards
Somnath
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of M 
Ranga Swami Reddy
Sent: Wednesday, February 03, 2016 9:48 PM
To: Ferhat Ozkasgarli
Cc: ceph-users
Subject: Re: [ceph-users] Fwd: HEALTH_WARN pool vol has too few pgs

Current pg_num: 4096.  As per the PG num formula, no OSD * 100/pool size ->
184 * 100/3 = 6133, so I can increase to 8192. Is this solves the problem?

Thanks
Swami

On Thu, Feb 4, 2016 at 2:14 AM, Ferhat Ozkasgarli  wrote:
> As message satates, you must increase placement group number for the pool.
> Because 108T data require larger pg mumber.
>
> On Feb 3, 2016 8:09 PM, "M Ranga Swami Reddy"  wrote:
>>
>> Hi,
>>
>> I am using ceph for my storage cluster and health shows as WARN state
>> with too few pgs.
>>
>> ==
>> health HEALTH_WARN pool volumes has too few pgs ==
>>
>> The volume pool has 4096 pgs
>> --
>> ceph osd pool get volumes pg_num
>> pg_num: 4096
>> ---
>>
>> and
>> >ceph df
>> NAME   ID USED  %USED MAX AVAIL
>> OBJECTS
>> volumes4  2830G  0.82  108T
>> 763509
>> --
>>
>> How do we fix this, without downtime?
>>
>> Thanks
>> Swami
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: HEALTH_WARN pool vol has too few pgs

2016-02-03 Thread M Ranga Swami Reddy
Current pg_num: 4096.  As per the PG num formula, no OSD * 100/pool size ->
184 * 100/3 = 6133, so I can increase to 8192. Is this solves the problem?

Thanks
Swami

On Thu, Feb 4, 2016 at 2:14 AM, Ferhat Ozkasgarli  wrote:
> As message satates, you must increase placement group number for the pool.
> Because 108T data require larger pg mumber.
>
> On Feb 3, 2016 8:09 PM, "M Ranga Swami Reddy"  wrote:
>>
>> Hi,
>>
>> I am using ceph for my storage cluster and health shows as WARN state
>> with too few pgs.
>>
>> ==
>> health HEALTH_WARN pool volumes has too few pgs
>> ==
>>
>> The volume pool has 4096 pgs
>> --
>> ceph osd pool get volumes pg_num
>> pg_num: 4096
>> ---
>>
>> and
>> >ceph df
>> NAME   ID USED  %USED MAX AVAIL
>> OBJECTS
>> volumes4  2830G  0.82  108T
>> 763509
>> --
>>
>> How do we fix this, without downtime?
>>
>> Thanks
>> Swami
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Set cache tier pool forward state automatically!

2016-02-03 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256


On Wed, Feb 3, 2016 at 9:00 PM, Christian Balzer  wrote:
> On Wed, 3 Feb 2016 16:57:09 -0700 Robert LeBlanc wrote:

> That's an interesting strategy, I suppose you haven't run into the issue I
> wrote about 2 days ago when switching to forward while running rdb bench?

We haven't, but we are running 0.94.5. If you are running Firefly,
that could be why.

> In my case I venture that the number of really hot objects is small enough
> to not overwhelm things and that 5K IOPS would be all that cluster ever
> needs to provide.

We have 48x Micron M600 1TB drives. They do not perform as well as the
Intel S3610 800GB which seems to have the right balance of performance
and durability for our needs. Since we had to under provision the
M600s, we should be able to get by with the 800GB just fine. Once we
get the drives swapped out, we may do better than the 10K IOPs as well
as the recency fix going into the next version of Hammer it will help
with writeback. With our new cluster, we did fine in writeback mode
until we hit that 10K IOP limit, then we started getting slow I/O
messages, turning it to forward mode and things sped up a lot.


- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.4
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWsuTFCRDmVDuy+mK58QAAxO0P/R2mkmoOE/YSD9Ea8uUz
XBlOI2eibT5DGK6jR/hVL0V0dInNtVM+4yGWEmvJm5nxnwbx+EQd+lCTFQ5y
WouwGQLCMOCiy0rgduTeTwyGHjeIbloGoYYhZQPEFHOMt1lcKcwiEbrEKUYN
csUmEApK2aiPna5dMsvQs39/oATuid9Aec8VwcyCozzWUe/UziXVFhdWw3Q5
2mz8AuOhrmFqd7iyFN9Dici/DXLhBxWgg4PWn81Ggzq/5LHGyyV6A0jiLCBH
/B9rUCOmdfBvdK/GxCG7iUqIjVvIR2mtYFkCu7VL/exsnxuGRB2RHYcXgfVH
rMbZ+gbK/T4XZvUTwDpsfzkEwOTlCuhkcMcHyZLl/MdmcNVXP2+cB9TaCbPI
Hn2H0CuXqQhZ73znQSVS66/QA7s4W5LzMiAUZnOdIX05eVLnZEgstFr8fSEn
O95Y4jLYyQB+CIF9IfA6fgGsvnrs0rTGvYEThk6HL1sa6uVwR5PESVJpapS5
smUenHyp7OPTVdVpGzJh6VOOB08lcA7JFkicCSG1iXTPucuGkuVNMQ2i0LNb
DA/WAbwUqSK1XHIIu2NCaDZsIbSPwWGXj2uwfNFgSzss1UqAVEF0cBfY6c6n
3bdPwY2SgOc7nB+LGDQM6dsaFqDS1E490cFwc85uDTkVOBL0JcAJHAvZV2lD
w4Tj
=H+mV
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Set cache tier pool forward state automatically!

2016-02-03 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I think it depends. If there are no writes, then there probably won't
be any blocking if there are less than min_size OSDs to service a PG.
In an RBD workload, that is highly unlikely. If there is no blocking
then setting the full_ratio to near zero should flush out the blocks
even with less then min_size, but any write would stall it all, maybe.
You would have to test it.

Just be aware that you are trying something that I haven't heard
anyone doing, so it may or may not work. Just do a lot of testing and
think of all the possible failure scenarios that may be possible and
try them. Honestly, if you are writing to it, I wouldn't trust less
than size=3, min_size=2 with the automatic recovery. If you only have
2 copies and there is corruption, you can't easily tell which one is
the right one, at least with 3 you can have a vote. Ceph is also
supposed to get smarter about recovery and use the voting to auto
recovery the "best" candidate and I hope with hashing, it will be a
slam dunk for auto recovery.
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.4
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWsuMoCRDmVDuy+mK58QAAsFYP/RZMRvc7THLM3b+ogEOZ
gmKbqlIQSVncVJT7luItgjxwtlrVoNsAfgFhluk2Mzdo8v2Nwa4jYGquxkoz
YIaoUmBgN5fapopKjCqJ3wIvd5+W1bT9ASlyksI/roIlNkI+p8mnFRsAHm3w
Ik6gZB2YnSYI6mTDsUn2OKpB1u00AQmRDJqT61lRFsBdqmo1H8QM1bP8bp0C
WZsoZpv0dyCLf/aIIe0PAsKrn53/Ha+gehVZST8TVdfkAJrikUiDUrtfPVDv
XNmBKxODPg74ldHhSd2UTvWO84zv3gKipCJ4OGmOk0eQ8MJDVfpKm3wCDNbh
hq7ywRrjIKmtu/5ppZ8o9UAvgzVREbeW4y5LYPbis/cO0T+PyZn+eb66j1qF
VBGHgILDjFKxDi0nMDDrBpjYNzHfv7ZrDgALw9Awgx/3ogi9Iv6UGVRqvV+D
WtvfXIbFJplB/58FARTaQClIkzVzIIMg3VKS6MjiFSk6gRoRwzmvjpmTHgzI
49wQTIwuIArWeITMjf4dxQKsWBqmh/+/D8dt4lb6OSvxYk983RuY4SSWjwIc
sJMzBXVsQfuzE0iF0BuuqXdgRoPmr8FofDN/Vm6eF6wz6nGPGlUjT6t13b0y
1RLhEPtq2YcYNkiE6dT97g16PRCW7R7h1bnQ5EVUIzFb+Dzh7rtjTjyYmm2z
geuA
=gMKn
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Feb 3, 2016 at 5:07 PM, Mihai Gheorghe  wrote:
> Does the cache pool flush when setting a min value ratio if the pool doesn't
> meet the min_size? I mean ceph blocks only writes when an osd fails in a
> pool size of 2 or does it block reads too?
>
> Because on paper it looks good on a small cache pool, in case of osd
> failiure to set lowest ratio for flush, wait for it to finish and then set
> it in forward mode, or disable it completely untill it's fixed.
>
> On 4 Feb 2016 01:57, "Robert LeBlanc"  wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> My experience with Hammer is showing that setting the pool to forward
>> mode is not evicting objects, nor do I think it is flushing objects.
>> We have had our pool in forward mode for weeks now and we still have
>> almost the same amount of I/O to it. There has been a slight shift
>> between SSD and HDD, but I think that is because some objects have
>> cooled off and others have been newly accessed. You may have better
>> luck adjusting the ratios, but we see that there is a big hit to our
>> cluster to do that. We usually do 1% every minute or two to help
>> reduce the impact of evicting the data (we usually drop the cache full
>> ratio 10% or so to evict some objects and we then toggle the cache
>> mode between writeback and forward periodically to warm up the cache.
>> Setting it to writeback will promote so many objects at once that it
>> severely impact our cluster. There is also a limit that we reach at
>> about 10K IOPs when in writeback where with forward I've seen spikes
>> to 64K IOPs. So we turn on writeback for 30-60 seconds (or until the
>> blocked I/O is too much for us to handle), then set it to forward for
>> 60-120 second, rinse and repeat until the impact of writeback isn't so
>> bad, then set it back to forward for a couple more weeks).
>>
>> Needless to say, cache tiering could use some more love. If I get some
>> time, I'd like to try and help that section of code, but I have a
>> couple other more pressing issues I'd like to track down first.
>> -BEGIN PGP SIGNATURE-
>> Version: Mailvelope v1.3.4
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWspPSCRDmVDuy+mK58QAAsQgP/15YrzV+BRt+CGnzZL/Q
>> w6PwnSdw4HBJT4OEqdg+kStCP+SqUSVCiJcdeHo5Sm40smEWVYRim3jsHBSg
>> Z4Woa31XsjYbEw3HCxIoI93OPhaKszOhvktKZxu1iSnyMDDJIYMARlYIjbfc
>> ToCOC/IVe2MMAEtVq+J2fm/NQy6VDGbaUuYcNtkIF41j7vKoNoE3h5qi+L0K
>> cVwUhVTcuSNDuiuJOoduM/vSH6nJzmCnypH1BDTcEOYpvmbXWJ0iTdej2Oa1
>> gVvV7SOcu4PkjzL9MmJB2Cjiiy/zWjUTfN01nBvIatwOjF7AE8vq2pLD9FIs
>> TxmzE4UZgjwJNbkDVQsgHPCeUlEll+t3QKbokpEkQDQgvIOs6NCbj0KYpuhC
>> DWtQCbgYsniT+Md1vWFMgqs0a45ulGxEKUWiUOEXgTJLHH+dbrW32MZEl1Gd
>> yTKyzFarbae6tbAmaMPC8l9vaj15t7bAB0KOokMqZied7EcM1ZoFVqKRahrm
>> 73mIeHiDUwZ8gi+BHKX7OwqKt3VZJYf/+rNJx+g4kp5WN0FEkUMoqF75qO4p
>> 62+PuQIwh6jUpB4cDsbEJd78UGbCptJBojmsNVogU+xiSXTKQmEduP0HqQfG
>> JhTLg3Un2C4/MSGbhRI26csFCzEi66iRXQWdfCITP4Um70KO6dE2C1MAveYg
>> hJ7b
>> =CaRF
>> -

Re: [ceph-users] Set cache tier pool forward state automatically!

2016-02-03 Thread Christian Balzer
On Wed, 3 Feb 2016 16:57:09 -0700 Robert LeBlanc wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> My experience with Hammer is showing that setting the pool to forward
> mode is not evicting objects, nor do I think it is flushing objects.
>
Same here (with Firefly).

> We have had our pool in forward mode for weeks now and we still have
> almost the same amount of I/O to it. There has been a slight shift
> between SSD and HDD, but I think that is because some objects have
> cooled off and others have been newly accessed. You may have better
> luck adjusting the ratios, but we see that there is a big hit to our
> cluster to do that. We usually do 1% every minute or two to help
> reduce the impact of evicting the data (we usually drop the cache full
> ratio 10% or so to evict some objects and we then toggle the cache
> mode between writeback and forward periodically to warm up the cache.
> Setting it to writeback will promote so many objects at once that it
> severely impact our cluster. There is also a limit that we reach at
> about 10K IOPs when in writeback where with forward I've seen spikes
> to 64K IOPs. So we turn on writeback for 30-60 seconds (or until the
> blocked I/O is too much for us to handle), then set it to forward for
> 60-120 second, rinse and repeat until the impact of writeback isn't so
> bad, then set it back to forward for a couple more weeks).
>
That's an interesting strategy, I suppose you haven't run into the issue I
wrote about 2 days ago when switching to forward while running rdb bench?

In my case I venture that the number of really hot objects is small enough
to not overwhelm things and that 5K IOPS would be all that cluster ever
needs to provide.

Regards,

Christian
 
> Needless to say, cache tiering could use some more love. If I get some
> time, I'd like to try and help that section of code, but I have a
> couple other more pressing issues I'd like to track down first.
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.3.4
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWspPSCRDmVDuy+mK58QAAsQgP/15YrzV+BRt+CGnzZL/Q
> w6PwnSdw4HBJT4OEqdg+kStCP+SqUSVCiJcdeHo5Sm40smEWVYRim3jsHBSg
> Z4Woa31XsjYbEw3HCxIoI93OPhaKszOhvktKZxu1iSnyMDDJIYMARlYIjbfc
> ToCOC/IVe2MMAEtVq+J2fm/NQy6VDGbaUuYcNtkIF41j7vKoNoE3h5qi+L0K
> cVwUhVTcuSNDuiuJOoduM/vSH6nJzmCnypH1BDTcEOYpvmbXWJ0iTdej2Oa1
> gVvV7SOcu4PkjzL9MmJB2Cjiiy/zWjUTfN01nBvIatwOjF7AE8vq2pLD9FIs
> TxmzE4UZgjwJNbkDVQsgHPCeUlEll+t3QKbokpEkQDQgvIOs6NCbj0KYpuhC
> DWtQCbgYsniT+Md1vWFMgqs0a45ulGxEKUWiUOEXgTJLHH+dbrW32MZEl1Gd
> yTKyzFarbae6tbAmaMPC8l9vaj15t7bAB0KOokMqZied7EcM1ZoFVqKRahrm
> 73mIeHiDUwZ8gi+BHKX7OwqKt3VZJYf/+rNJx+g4kp5WN0FEkUMoqF75qO4p
> 62+PuQIwh6jUpB4cDsbEJd78UGbCptJBojmsNVogU+xiSXTKQmEduP0HqQfG
> JhTLg3Un2C4/MSGbhRI26csFCzEi66iRXQWdfCITP4Um70KO6dE2C1MAveYg
> hJ7b
> =CaRF
> -END PGP SIGNATURE-
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Wed, Feb 3, 2016 at 10:01 AM, Nick Fisk  wrote:
> > I think this would be better to be done outside of Ceph. It should be
> > quite simple for whatever monitoring software you are using to pick up
> > the disk failure to set the target_dirty_ratio to a very low value or
> > change the actual caching mode.
> >
> > Doing it in Ceph would be complicated as you are then asking Ceph to
> > decide when you are in an at risk scenario, ie would you want it to
> > flush your cache after a quick service reload or node reboot?
> >
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> >> Of Mihai Gheorghe
> >> Sent: 03 February 2016 16:57
> >> To: ceph-users ; ceph-users  >> us...@ceph.com>
> >> Subject: [ceph-users] Set cache tier pool forward state automatically!
> >>
> >> Hi,
> >>
> >> Is there a built in setting in ceph that would set the cache pool from
> >> writeback to forward state automatically in case of an OSD fail from
> >> the pool?
> >>
> >> Let;s say the size of the cache pool is 2. If an OSD fails ceph
> >> blocks write to the pool, making the VM that use this pool to be
> >> unaccesable. But an earlier copy of the data is present on the cold
> >> storage pool prior to the last cache flush.
> >>
> >> In this case, is it possible that when an OSD fails, the data on the
> >> cache pool to be flushed onto the cold storage pool and set the
> >> forward flag automatically on the cache pool? So that the VM can
> >> resume write to the block device as soon as the cache is flushed from
> >> the pool and read/write directly from the cold storage pool untill
> >> manual intervention on the cache pool is done to fix it and set it
> >> back to writeback?
> >>
> >> This way we can get away with a pool size of 2 without worrying for
> >> too much downtime!
> >>
> >> Hope i was explicit enough!
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-u

[ceph-users] Performance issues related to scrubbing

2016-02-03 Thread Cullen King
Hello,

I've been trying to nail down a nasty performance issue related to
scrubbing. I am mostly using radosgw with a handful of buckets containing
millions of various sized objects. When ceph scrubs, both regular and deep,
radosgw blocks on external requests, and my cluster has a bunch of requests
that have blocked for > 32 seconds. Frequently OSDs are marked down.

According to atop, the OSDs being deep scrubbed are reading at only 5mb/s
to 8mb/s, and a scrub of a 6.4gb placement group takes 10-20 minutes.

Here's a screenshot of atop from a node:
https://s3.amazonaws.com/rwgps/screenshots/DgSSRyeF.png

First question: is this a reasonable speed for scrubbing, given a very
lightly used cluster? Here's some cluster details:

deploy@drexler:~$ ceph --version
ceph version 0.94.1-5-g85a68f9 (85a68f9a8237f7e74f44a1d1fbbd6cb4ac50f8e8)


2x Xeon E5-2630 per node, 64gb of ram per node.


deploy@drexler:~$ ceph status
cluster 234c6825-0e2b-4256-a710-71d29f4f023e
 health HEALTH_WARN
118 requests are blocked > 32 sec
 monmap e1: 3 mons at {drexler=
10.0.0.36:6789/0,lucy=10.0.0.38:6789/0,paley=10.0.0.34:6789/0}
election epoch 296, quorum 0,1,2 paley,drexler,lucy
 mdsmap e19989: 1/1/1 up {0=lucy=up:active}, 1 up:standby
 osdmap e1115: 12 osds: 12 up, 12 in
  pgmap v21748062: 1424 pgs, 17 pools, 3185 GB data, 20493 kobjects
10060 GB used, 34629 GB / 44690 GB avail
1422 active+clean
   1 active+clean+scrubbing+deep
   1 active+clean+scrubbing
  client io 721 kB/s rd, 33398 B/s wr, 53 op/s

deploy@drexler:~$ ceph osd tree
ID WEIGHT   TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 43.67999 root default
-2 14.56000 host paley
 0  3.64000 osd.0 up  1.0  1.0
 3  3.64000 osd.3 up  1.0  1.0
 6  3.64000 osd.6 up  1.0  1.0
 9  3.64000 osd.9 up  1.0  1.0
-3 14.56000 host lucy
 1  3.64000 osd.1 up  1.0  1.0
 4  3.64000 osd.4 up  1.0  1.0
 7  3.64000 osd.7 up  1.0  1.0
11  3.64000 osd.11up  1.0  1.0
-4 14.56000 host drexler
 2  3.64000 osd.2 up  1.0  1.0
 5  3.64000 osd.5 up  1.0  1.0
 8  3.64000 osd.8 up  1.0  1.0
10  3.64000 osd.10up  1.0  1.0


My OSDs are 4tb 7200rpm Hitachi DeskStars, using XFS, with Samsung 850 Pro
journals (very slow, ordered s3700 replacements, but shouldn't pose
problems for reading as far as I understand things). MONs are co-located
with OSD nodes, but the nodes are fairly beefy and has very low load.
Drives are on a expanding backplane, with an LSI SAS3008 controller.

I have a fairly standard config as well:

https://gist.github.com/kingcu/aae7373eb62ceb7579da

I know that I don't have a ton of OSDs, but I'd expect a little better
performance than this. Checkout munin of my three nodes:

http://munin.ridewithgps.com/ridewithgps.com/drexler.ridewithgps.com/index.html#disk
http://munin.ridewithgps.com/ridewithgps.com/paley.ridewithgps.com/index.html#disk
http://munin.ridewithgps.com/ridewithgps.com/lucy.ridewithgps.com/index.html#disk


Any input would be appreciated, before I start trying to micro-optimize
config params, as well as upgrading to Infernalis.


Cheers,

Cullen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Set cache tier pool forward state automatically!

2016-02-03 Thread Mihai Gheorghe
Does the cache pool flush when setting a min value ratio if the pool
doesn't meet the min_size? I mean ceph blocks only writes when an osd fails
in a pool size of 2 or does it block reads too?

Because on paper it looks good on a small cache pool, in case of osd
failiure to set lowest ratio for flush, wait for it to finish and then set
it in forward mode, or disable it completely untill it's fixed.
On 4 Feb 2016 01:57, "Robert LeBlanc"  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> My experience with Hammer is showing that setting the pool to forward
> mode is not evicting objects, nor do I think it is flushing objects.
> We have had our pool in forward mode for weeks now and we still have
> almost the same amount of I/O to it. There has been a slight shift
> between SSD and HDD, but I think that is because some objects have
> cooled off and others have been newly accessed. You may have better
> luck adjusting the ratios, but we see that there is a big hit to our
> cluster to do that. We usually do 1% every minute or two to help
> reduce the impact of evicting the data (we usually drop the cache full
> ratio 10% or so to evict some objects and we then toggle the cache
> mode between writeback and forward periodically to warm up the cache.
> Setting it to writeback will promote so many objects at once that it
> severely impact our cluster. There is also a limit that we reach at
> about 10K IOPs when in writeback where with forward I've seen spikes
> to 64K IOPs. So we turn on writeback for 30-60 seconds (or until the
> blocked I/O is too much for us to handle), then set it to forward for
> 60-120 second, rinse and repeat until the impact of writeback isn't so
> bad, then set it back to forward for a couple more weeks).
>
> Needless to say, cache tiering could use some more love. If I get some
> time, I'd like to try and help that section of code, but I have a
> couple other more pressing issues I'd like to track down first.
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.3.4
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJWspPSCRDmVDuy+mK58QAAsQgP/15YrzV+BRt+CGnzZL/Q
> w6PwnSdw4HBJT4OEqdg+kStCP+SqUSVCiJcdeHo5Sm40smEWVYRim3jsHBSg
> Z4Woa31XsjYbEw3HCxIoI93OPhaKszOhvktKZxu1iSnyMDDJIYMARlYIjbfc
> ToCOC/IVe2MMAEtVq+J2fm/NQy6VDGbaUuYcNtkIF41j7vKoNoE3h5qi+L0K
> cVwUhVTcuSNDuiuJOoduM/vSH6nJzmCnypH1BDTcEOYpvmbXWJ0iTdej2Oa1
> gVvV7SOcu4PkjzL9MmJB2Cjiiy/zWjUTfN01nBvIatwOjF7AE8vq2pLD9FIs
> TxmzE4UZgjwJNbkDVQsgHPCeUlEll+t3QKbokpEkQDQgvIOs6NCbj0KYpuhC
> DWtQCbgYsniT+Md1vWFMgqs0a45ulGxEKUWiUOEXgTJLHH+dbrW32MZEl1Gd
> yTKyzFarbae6tbAmaMPC8l9vaj15t7bAB0KOokMqZied7EcM1ZoFVqKRahrm
> 73mIeHiDUwZ8gi+BHKX7OwqKt3VZJYf/+rNJx+g4kp5WN0FEkUMoqF75qO4p
> 62+PuQIwh6jUpB4cDsbEJd78UGbCptJBojmsNVogU+xiSXTKQmEduP0HqQfG
> JhTLg3Un2C4/MSGbhRI26csFCzEi66iRXQWdfCITP4Um70KO6dE2C1MAveYg
> hJ7b
> =CaRF
> -END PGP SIGNATURE-
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Feb 3, 2016 at 10:01 AM, Nick Fisk  wrote:
> > I think this would be better to be done outside of Ceph. It should be
> quite simple for whatever monitoring software you are using to pick up the
> disk failure to set the target_dirty_ratio to a very low value or change
> the actual caching mode.
> >
> > Doing it in Ceph would be complicated as you are then asking Ceph to
> decide when you are in an at risk scenario, ie would you want it to flush
> your cache after a quick service reload or node reboot?
> >
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> Of
> >> Mihai Gheorghe
> >> Sent: 03 February 2016 16:57
> >> To: ceph-users ; ceph-users  >> us...@ceph.com>
> >> Subject: [ceph-users] Set cache tier pool forward state automatically!
> >>
> >> Hi,
> >>
> >> Is there a built in setting in ceph that would set the cache pool from
> >> writeback to forward state automatically in case of an OSD fail from
> the pool?
> >>
> >> Let;s say the size of the cache pool is 2. If an OSD fails ceph blocks
> write to
> >> the pool, making the VM that use this pool to be unaccesable. But an
> earlier
> >> copy of the data is present on the cold storage pool prior to the last
> cache
> >> flush.
> >>
> >> In this case, is it possible that when an OSD fails, the data on the
> cache pool
> >> to be flushed onto the cold storage pool and set the forward flag
> >> automatically on the cache pool? So that the VM can resume write to the
> >> block device as soon as the cache is flushed from the pool and
> read/write
> >> directly from the cold storage pool untill manual intervention on the
> cache
> >> pool is done to fix it and set it back to writeback?
> >>
> >> This way we can get away with a pool size of 2 without worrying for too
> much
> >> downtime!
> >>
> >> Hope i was explicit enough!
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/

Re: [ceph-users] Set cache tier pool forward state automatically!

2016-02-03 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

My experience with Hammer is showing that setting the pool to forward
mode is not evicting objects, nor do I think it is flushing objects.
We have had our pool in forward mode for weeks now and we still have
almost the same amount of I/O to it. There has been a slight shift
between SSD and HDD, but I think that is because some objects have
cooled off and others have been newly accessed. You may have better
luck adjusting the ratios, but we see that there is a big hit to our
cluster to do that. We usually do 1% every minute or two to help
reduce the impact of evicting the data (we usually drop the cache full
ratio 10% or so to evict some objects and we then toggle the cache
mode between writeback and forward periodically to warm up the cache.
Setting it to writeback will promote so many objects at once that it
severely impact our cluster. There is also a limit that we reach at
about 10K IOPs when in writeback where with forward I've seen spikes
to 64K IOPs. So we turn on writeback for 30-60 seconds (or until the
blocked I/O is too much for us to handle), then set it to forward for
60-120 second, rinse and repeat until the impact of writeback isn't so
bad, then set it back to forward for a couple more weeks).

Needless to say, cache tiering could use some more love. If I get some
time, I'd like to try and help that section of code, but I have a
couple other more pressing issues I'd like to track down first.
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.4
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWspPSCRDmVDuy+mK58QAAsQgP/15YrzV+BRt+CGnzZL/Q
w6PwnSdw4HBJT4OEqdg+kStCP+SqUSVCiJcdeHo5Sm40smEWVYRim3jsHBSg
Z4Woa31XsjYbEw3HCxIoI93OPhaKszOhvktKZxu1iSnyMDDJIYMARlYIjbfc
ToCOC/IVe2MMAEtVq+J2fm/NQy6VDGbaUuYcNtkIF41j7vKoNoE3h5qi+L0K
cVwUhVTcuSNDuiuJOoduM/vSH6nJzmCnypH1BDTcEOYpvmbXWJ0iTdej2Oa1
gVvV7SOcu4PkjzL9MmJB2Cjiiy/zWjUTfN01nBvIatwOjF7AE8vq2pLD9FIs
TxmzE4UZgjwJNbkDVQsgHPCeUlEll+t3QKbokpEkQDQgvIOs6NCbj0KYpuhC
DWtQCbgYsniT+Md1vWFMgqs0a45ulGxEKUWiUOEXgTJLHH+dbrW32MZEl1Gd
yTKyzFarbae6tbAmaMPC8l9vaj15t7bAB0KOokMqZied7EcM1ZoFVqKRahrm
73mIeHiDUwZ8gi+BHKX7OwqKt3VZJYf/+rNJx+g4kp5WN0FEkUMoqF75qO4p
62+PuQIwh6jUpB4cDsbEJd78UGbCptJBojmsNVogU+xiSXTKQmEduP0HqQfG
JhTLg3Un2C4/MSGbhRI26csFCzEi66iRXQWdfCITP4Um70KO6dE2C1MAveYg
hJ7b
=CaRF
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Feb 3, 2016 at 10:01 AM, Nick Fisk  wrote:
> I think this would be better to be done outside of Ceph. It should be quite 
> simple for whatever monitoring software you are using to pick up the disk 
> failure to set the target_dirty_ratio to a very low value or change the 
> actual caching mode.
>
> Doing it in Ceph would be complicated as you are then asking Ceph to decide 
> when you are in an at risk scenario, ie would you want it to flush your cache 
> after a quick service reload or node reboot?
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Mihai Gheorghe
>> Sent: 03 February 2016 16:57
>> To: ceph-users ; ceph-users > us...@ceph.com>
>> Subject: [ceph-users] Set cache tier pool forward state automatically!
>>
>> Hi,
>>
>> Is there a built in setting in ceph that would set the cache pool from
>> writeback to forward state automatically in case of an OSD fail from the 
>> pool?
>>
>> Let;s say the size of the cache pool is 2. If an OSD fails ceph blocks write 
>> to
>> the pool, making the VM that use this pool to be unaccesable. But an earlier
>> copy of the data is present on the cold storage pool prior to the last cache
>> flush.
>>
>> In this case, is it possible that when an OSD fails, the data on the cache 
>> pool
>> to be flushed onto the cold storage pool and set the forward flag
>> automatically on the cache pool? So that the VM can resume write to the
>> block device as soon as the cache is flushed from the pool and read/write
>> directly from the cold storage pool untill manual intervention on the cache
>> pool is done to fix it and set it back to writeback?
>>
>> This way we can get away with a pool size of 2 without worrying for too much
>> downtime!
>>
>> Hope i was explicit enough!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-03 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Once we put in our cache tier the I/O on the spindles was so low, we
just moved the journals off the SSDs onto the spindles and left the
SSD space for cache. There have been testing showing that better
performance can be achieved by putting more OSDs on an NVMe disk, but
you also have to balance that with OSDs not being evenly distributed
so some OSDs will use more space than others. I probably wouldn't go
more than 4 100 GB partitions, but it really depends on the number of
PGs and your data distribution. Also, even with all the data in the
cache, there is still a performance penalty for having the caching
tier vs. a native SSD pool. So if you are not using the tiering, move
to a straight SSD pool.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Feb 3, 2016 at 5:01 AM, Sascha Vogt  wrote:
> Hi all,
>
> we recently tried adding a cache tier to our ceph cluster. We had 5
> spinning disks per hosts with a single journal NVMe disk, hosting the 5
> journals (1 OSD per spinning disk). We have 4 hosts up to now, so
> overall 4 NVMes hosting 20 journals for 20 spinning disks.
>
> As we had some space left on the NVMes so we made two additional
> partitions on each NVMe and created a 4 OSD cache tier.
>
> To our surprise the 4 OSD cache pool was able to deliver the same
> performance then the previous 20 OSD pool while reducing the OPs on the
> spinning disk to zero as long as the cache pool was sufficient to hold
> all / most data (ceph is used for very short living KVM virtual machines
> which do pretty heavy disk IO).
>
> As we don't need that much more storage right now we decided to extend
> our cluster by adding 8 additional NVMe disks solely as a cache pool and
> freeing the journal NVMes again. Now the question is: How to organize
> the OSDs on the NVMe disks (2 per host)?
>
> As the NVMes peak around 5-7 concurrent sequential writes (tested with
> fio) I thought about using 5 OSDs per NVMe. That would mean 10
> partitions (5 journals, 5 data). On the other hand the NVMes are only
> 400GB large, so that would result in OSD disk sizes for <80 GB
> (depending on the journal size).
>
> Would it make sense to skip the separate Journal partition and leave the
> journal on the data disk itself and limitting it to a rather small
> amount (lets say 1 GB or even less?) as SSDs typically don't like
> sequential writes anyway?
>
> Or, if I leave journal and data on separate partitions should I reduce
> the number of OSDs per disk to 3 as Ceph will most likly write to
> journal and data in parallel and I therefore already get 6 parallel
> "threads" of IO?
>
> Any feedback is highly appreciated :)
>
> Greetings
> -Sascha-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.3.4
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWspEZCRDmVDuy+mK58QAA3/0P/3ZXZ+VZxKqQmDkw178R
mgPzRnBrUavzjC4UI+CSg2q2xtcU0hhqW1htz/EXnd9Ou7pMUP5pG+FpInmw
aOAjBqVGjVsxauQlbPeSmw2h+E0BfbRMp3YnFeI8Lx/OKBvpXbm1XDJFZ7PK
4EWI9QLpXwF0inb9qgVU9qwmsT1ZJYSHe3P9F+nue1QQhDijdIjCZ8PzHWK6
02rnuHHMynfA+J9JN05Uy9M5qynHleO6LPeoFwEfzq1S+VOFz/HMNRm5Sua4
u4EwZAhDKGBZ1F01+HMQdwYBshVf87YahPqRuvE9dL3MFR6v0loMhNDikDpD
nbwtHsS3cR1Ti6CU+SJniXxYSjiYOyWwXIwGMn6xVl0VkcRBrt/o8fonIe6o
Zdb/8+1Jo7Z26NjBsyZ0sNv2kBlhJmlElj0ANEtwScDL7tcVhXNt97BFvJbF
aDpTpBvSWcipEOdlPEMN5rgeIYJRWu6A/w925cd5mXgqD5p98IKdkh7nc9OE
JbiNe4Aw4FeLqF6EqKx/pYxUucSW0GwS8K9nlQFcz53UmqenbISGy4C699Lx
unxCAewFCLfQFztiLhoHntwyyQTUX+wpERURGv76asP9M3RxDHFyWrZMBw65
skeWf5PNu2kiMS7RDYWmm12tvIbi+8w/xib/VwTmjxNf4MtDb2qfTb72ssbh
2Xn2
=gfXk
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] placement group lost by using force_create_pg ?

2016-02-03 Thread Nikola Ciprich
Hello cephers,

I think I've got into pretty bad situation :(

I mistakenly run force_create_pg on one placement group in live cluster
Now it's stuck in creating state. Now I suppose the placement group
content is lost, right? Is there a way to recover it? Or at least
way to find out which objects are affected by it? I've only found
ways to find to which placement group objects belong, but not the other
direction (apart from trying all objects).

some data are in rbd objects, some on cephfs...

is there a way to help?

it'd be rally appreciated...

thanks a lot in advance

with best regards

nikola cirpich

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpzf96s06zZ7.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Tech Talk - High-Performance Production Databases on Ceph

2016-02-03 Thread Mark Nelson
Basically sticking to a single socket lets you avoid a lot of NUMA 
issues that can crop up on dual socket machines so long as you still 
have enough overall CPU power.  Ben England and Joe Mario here at Red 
Hat have been looking into some of these issues using C2C to observe 
things like remote cache line contention under Ceph workloads.


I think a very interesting high performance setup could be tightly 
packed nodes with 40GbE, high-clocked E3 processors, 2-3 2.5" NVMe 
devices per node, and optionally a single system disk.  That would match 
the form factor of a lot of blade or sled chassis so long as you have 
NVMe capable single-socket sleds.  You lose per-node disk density but 
partially make it back up with the cheap high clock speed processors and 
overall node density.


Mark

On 02/03/2016 03:01 PM, Josef Johansson wrote:

I was fascinated as well. This is how it should be done ☺

We are in the middle of ordering and I saw the notice that they use
single socket systems for the OSDs due to latency issues. I have only
seen dual socket systems on the OSD setups here. Is this something you
should do with new SSD clusters?

Regards,
Josef


On Sat, 30 Jan 2016 09:43 Nick Fisk mailto:n...@fisk.me.uk>> wrote:

Yes, thank you very much. I've just finished going through this and
found it very interesting. The dynamic nature of the infrastructure
from top to bottom is fascinating, especially the use of OSPF per
container.

One question though, are those latency numbers for writes on Ceph
correct? 9us is very fast or is it something to do with the 1/100
buffered nature of the test?

 > -Original Message-
 > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com
] On Behalf Of
 > Gregory Farnum
 > Sent: 29 January 2016 21:25
 > To: Patrick McGarry mailto:pmcga...@redhat.com>>
 > Cc: Ceph Devel mailto:ceph-de...@vger.kernel.org>>; Ceph-User  us...@ceph.com >
 > Subject: Re: [ceph-users] Ceph Tech Talk - High-Performance
Production
 > Databases on Ceph
 >
 > This is super cool — thanks, Thorvald, for the realistic picture
of how
 > databases behave on rbd!
 >
 > On Thu, Jan 28, 2016 at 11:56 AM, Patrick McGarry
mailto:pmcga...@redhat.com>>
 > wrote:
 > > Hey cephers,
 > >
 > > Here are the links to both the video and the slides from the
Ceph Tech
 > > Talk today. Thanks again to Thorvald and Medallia for stepping
forward
 > > to present.
 > >
 > > Video: https://youtu.be/OqlC7S3cUKs
 > >
 > > Slides:
 > > http://www.slideshare.net/Inktank_Ceph/2016jan28-high-performance-
 > prod
 > > uction-databases-on-ceph-57620014
 > >
 > >
 > > --
 > >
 > > Best Regards,
 > >
 > > Patrick McGarry
 > > Director Ceph Community || Red Hat
 > > http://ceph.com  || http://community.redhat.com @scuttlemonkey ||
 > > @ceph
 > > --
 > > To unsubscribe from this list: send the line "unsubscribe
ceph-devel"
 > > in the body of a message to majord...@vger.kernel.org
 More
 > majordomo
 > > info at http://vger.kernel.org/majordomo-info.html
 > ___
 > ceph-users mailing list
 > ceph-users@lists.ceph.com 
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Tech Talk - High-Performance Production Databases on Ceph

2016-02-03 Thread Josef Johansson
I was fascinated as well. This is how it should be done ☺

We are in the middle of ordering and I saw the notice that they use single
socket systems for the OSDs due to latency issues. I have only seen dual
socket systems on the OSD setups here. Is this something you should do with
new SSD clusters?

Regards,
Josef

On Sat, 30 Jan 2016 09:43 Nick Fisk  wrote:

> Yes, thank you very much. I've just finished going through this and found
> it very interesting. The dynamic nature of the infrastructure from top to
> bottom is fascinating, especially the use of OSPF per container.
>
> One question though, are those latency numbers for writes on Ceph correct?
> 9us is very fast or is it something to do with the 1/100 buffered nature of
> the test?
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Gregory Farnum
> > Sent: 29 January 2016 21:25
> > To: Patrick McGarry 
> > Cc: Ceph Devel ; Ceph-User  > us...@ceph.com>
> > Subject: Re: [ceph-users] Ceph Tech Talk - High-Performance Production
> > Databases on Ceph
> >
> > This is super cool — thanks, Thorvald, for the realistic picture of how
> > databases behave on rbd!
> >
> > On Thu, Jan 28, 2016 at 11:56 AM, Patrick McGarry 
> > wrote:
> > > Hey cephers,
> > >
> > > Here are the links to both the video and the slides from the Ceph Tech
> > > Talk today. Thanks again to Thorvald and Medallia for stepping forward
> > > to present.
> > >
> > > Video: https://youtu.be/OqlC7S3cUKs
> > >
> > > Slides:
> > > http://www.slideshare.net/Inktank_Ceph/2016jan28-high-performance-
> > prod
> > > uction-databases-on-ceph-57620014
> > >
> > >
> > > --
> > >
> > > Best Regards,
> > >
> > > Patrick McGarry
> > > Director Ceph Community || Red Hat
> > > http://ceph.com  ||  http://community.redhat.com @scuttlemonkey ||
> > > @ceph
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > in the body of a message to majord...@vger.kernel.org More
> > majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: HEALTH_WARN pool vol has too few pgs

2016-02-03 Thread Ferhat Ozkasgarli
As message satates, you must increase placement group number for the pool.
Because 108T data require larger pg mumber.
On Feb 3, 2016 8:09 PM, "M Ranga Swami Reddy"  wrote:

> Hi,
>
> I am using ceph for my storage cluster and health shows as WARN state
> with too few pgs.
>
> ==
> health HEALTH_WARN pool volumes has too few pgs
> ==
>
> The volume pool has 4096 pgs
> --
> ceph osd pool get volumes pg_num
> pg_num: 4096
> ---
>
> and
> >ceph df
> NAME   ID USED  %USED MAX AVAIL OBJECTS
> volumes4  2830G  0.82  108T  763509
> --
>
> How do we fix this, without downtime?
>
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] e9 handle_probe ignoring

2016-02-03 Thread Oliver Dzombic
Hi,

after the cluster changed its cluster id, because we reissued a
ceph-deploy by mistake, we had to change everything to the new id.

Now we see on the nodes:

2016-02-03 19:59:51.729969 7f11ef540700  0 mon.ceph2@1(peon) e9
handle_probe ignoring fsid  != 


What does this mean ?

Thank you !


-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: HEALTH_WARN pool vol has too few pgs

2016-02-03 Thread M Ranga Swami Reddy
Hi,

I am using ceph for my storage cluster and health shows as WARN state
with too few pgs.

==
health HEALTH_WARN pool volumes has too few pgs
==

The volume pool has 4096 pgs
--
ceph osd pool get volumes pg_num
pg_num: 4096
---

and
>ceph df
NAME   ID USED  %USED MAX AVAIL OBJECTS
volumes4  2830G  0.82  108T  763509
--

How do we fix this, without downtime?

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-03 Thread Gregory Farnum
On Wed, Feb 3, 2016 at 2:32 AM, Nikola Ciprich
 wrote:
> Hello Gregory,
>
> in the meantime, I managed to break it further :(
>
> I tried getting rid of active+remapped pgs and got some undersized
> instead.. nto sure whether this can be related..
>
> anyways here's the status:
>
> ceph -s
> cluster ff21618e-5aea-4cfe-83b6-a0d2d5b4052a
>  health HEALTH_WARN
> 3 pgs degraded
> 2 pgs stale
> 3 pgs stuck degraded
> 1 pgs stuck inactive
> 2 pgs stuck stale
> 242 pgs stuck unclean
> 3 pgs stuck undersized
> 3 pgs undersized
> recovery 65/3374343 objects degraded (0.002%)
> recovery 186187/3374343 objects misplaced (5.518%)
> mds0: Behind on trimming (155/30)
>  monmap e3: 3 mons at 
> {remrprv1a=10.0.0.1:6789/0,remrprv1b=10.0.0.2:6789/0,remrprv1c=10.0.0.3:6789/0}
> election epoch 522, quorum 0,1,2 remrprv1a,remrprv1b,remrprv1c
>  mdsmap e342: 1/1/1 up {0=remrprv1c=up:active}, 2 up:standby
>  osdmap e4385: 21 osds: 21 up, 21 in; 238 remapped pgs
>   pgmap v18679192: 1856 pgs, 7 pools, 4223 GB data, 1103 kobjects
> 12947 GB used, 22591 GB / 35538 GB avail
> 65/3374343 objects degraded (0.002%)
> 186187/3374343 objects misplaced (5.518%)
> 1612 active+clean
>  238 active+remapped
>3 active+undersized+degraded
>2 stale+active+clean
>1 creating
>   client io 0 B/s rd, 40830 B/s wr, 17 op/s

Yeah, these inactive PGs are basically guaranteed to be the cause of
the problem. There are lots of threads about getting PGs healthy
again; you should dig around the archives and the documentation
troubleshooting page(s). :)
-Greg

>
>
>> What's the full output of "ceph -s"? Have you looked at the MDS admin
>> socket at all — what state does it say it's in?
>
> [root@remrprv1c ceph]# ceph --admin-daemon 
> /var/run/ceph/ceph-mds.remrprv1c.asok dump_ops_in_flight
> {
> "ops": [
> {
> "description": "client_request(client.3052096:83 getattr Fs 
> #1000288 2016-02-03 10:10:46.361591 RETRY=1)",
> "initiated_at": "2016-02-03 10:23:25.791790",
> "age": 3963.093615,
> "duration": 9.519091,
> "type_data": [
> "failed to rdlock, waiting",
> "client.3052096:83",
> "client_request",
> {
> "client": "client.3052096",
> "tid": 83
> },
> [
> {
> "time": "2016-02-03 10:23:25.791790",
> "event": "initiated"
> },
> {
> "time": "2016-02-03 10:23:35.310881",
> "event": "failed to rdlock, waiting"
> }
> ]
> ]
> }
> ],
> "num_ops": 1
> }
>
> seems there's some lock stuck here..
>
> Killing stuck client (it's postgres trying to access cephfs file
> doesn't help..)
>
>
>> -Greg
>>
>> >
>> > My question here is:
>> >
>> > 1) is there some known issue with hammer 0.94.5 or kernel 4.1.15
>> > which could lead to cephfs hangs?
>> >
>> > 2) what can I do to debug what is the cause of this hang?
>> >
>> > 3) is there a way to recover this without hard resetting
>> > node with hung cephfs mount?
>> >
>> > If I could provide more information, please let me know
>> >
>> > I'd really appreciate any help
>> >
>> > with best regards
>> >
>> > nik
>> >
>> >
>> >
>> >
>> > --
>> > -
>> > Ing. Nikola CIPRICH
>> > LinuxBox.cz, s.r.o.
>> > 28.rijna 168, 709 00 Ostrava
>> >
>> > tel.:   +420 591 166 214
>> > fax:+420 596 621 273
>> > mobil:  +420 777 093 799
>> > www.linuxbox.cz
>> >
>> > mobil servis: +420 737 238 656
>> > email servis: ser...@linuxbox.cz
>> > -
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
> --
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS: bad/negative dir size

2016-02-03 Thread Gregory Farnum
On Wed, Feb 3, 2016 at 3:16 AM, Yan, Zheng  wrote:
>
>> On Feb 3, 2016, at 17:51, Markus Blank-Burian  wrote:
>>
>> Hi,
>>
>> on ceph mds startup, I see the following two errors in the our logfiles 
>> (using ceph 9.2.0 and linux 4.4 cephfs kernel client):
>>
>> Feb  2 19:27:13 server1 ceph-mds[1809]: 2016-02-02 19:27:13.363416 
>> 7fce9effd700 -1 log_channel(cluster) log [ERR] : bad/negative dir size on 
>> 603 f(v2008 m2016-02-02 18:52:25.870376 -327=-316+-11)
>> Feb  2 19:27:13 server1 ceph-mds[1809]: 2016-02-02 19:27:13.363442 
>> 7fce9effd700 -1 log_channel(cluster) log [ERR] : unmatched fragstat on 603, 
>> inode has f(v2009 m2016-02-02 18:52:25.870376 -327=-316+-11), dirfrags have 
>> f(v0 m2016-02-02 18:52:25.870376
>>
>> The filesystem is accessible, but I guess that there will we problems in one 
>> directory. How can I find out, which directory is affected and what can I do 
>> to fix this error? Would it be safe to delete the affected directory?
>> Markus
>
> Directory 603 is special directory which is used by MDS internally. The error 
> has already been fixed when MDS reveals these message. No need to worry about.

Hmm, that output is followed by an assert if "mds_verify_scatter" is
set to true, and it seems to fix the issue by just resetting all the
values to 0 — which is probably also insufficient, Zheng?

This also means that you must have enabled the "mds bal frag" option,
right Markus?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Set cache tier pool forward state automatically!

2016-02-03 Thread Nick Fisk
I think this would be better to be done outside of Ceph. It should be quite 
simple for whatever monitoring software you are using to pick up the disk 
failure to set the target_dirty_ratio to a very low value or change the actual 
caching mode.

Doing it in Ceph would be complicated as you are then asking Ceph to decide 
when you are in an at risk scenario, ie would you want it to flush your cache 
after a quick service reload or node reboot?

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mihai Gheorghe
> Sent: 03 February 2016 16:57
> To: ceph-users ; ceph-users  us...@ceph.com>
> Subject: [ceph-users] Set cache tier pool forward state automatically!
> 
> Hi,
> 
> Is there a built in setting in ceph that would set the cache pool from
> writeback to forward state automatically in case of an OSD fail from the pool?
> 
> Let;s say the size of the cache pool is 2. If an OSD fails ceph blocks write 
> to
> the pool, making the VM that use this pool to be unaccesable. But an earlier
> copy of the data is present on the cold storage pool prior to the last cache
> flush.
> 
> In this case, is it possible that when an OSD fails, the data on the cache 
> pool
> to be flushed onto the cold storage pool and set the forward flag
> automatically on the cache pool? So that the VM can resume write to the
> block device as soon as the cache is flushed from the pool and read/write
> directly from the cold storage pool untill manual intervention on the cache
> pool is done to fix it and set it back to writeback?
> 
> This way we can get away with a pool size of 2 without worrying for too much
> downtime!
> 
> Hope i was explicit enough!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Set cache tier pool forward state automatically!

2016-02-03 Thread Mihai Gheorghe
Hi,

Is there a built in setting in ceph that would set the cache pool from
writeback to forward state automatically in case of an OSD fail from the
pool?

Let;s say the size of the cache pool is 2. If an OSD fails ceph blocks
write to the pool, making the VM that use this pool to be unaccesable. But
an earlier copy of the data is present on the cold storage pool prior to
the last cache flush.

In this case, is it possible that when an OSD fails, the data on the cache
pool to be flushed onto the cold storage pool and set the forward flag
automatically on the cache pool? So that the VM can resume write to the
block device as soon as the cache is flushed from the pool and read/write
directly from the cold storage pool untill manual intervention on the cache
pool is done to fix it and set it back to writeback?

This way we can get away with a pool size of 2 without worrying for too
much downtime!

Hope i was explicit enough!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and hadoop (fstab insted of CephFS)

2016-02-03 Thread Noah Watkins
Hi Jose,

I believe what you are referring to is using Hadoop over Ceph via the
VFS implementation of the Ceph client vs the user-space libcephfs
client library. The current Hadoop plugin for Ceph uses the client
library. You could run Hadoop over Ceph using a local Ceph mount
point, but it would take some configuration (I believe this is how
Gluster Hadoop works). To take advantage of data locality, you'd also
want to invoke an ioctl (if locality is currently expose), or also
integrate with libcephfs for that functionality.

-Noah

On Tue, Feb 2, 2016 at 7:50 AM, John Spray  wrote:
> On Tue, Feb 2, 2016 at 3:42 PM, Jose M  wrote:
>> Hi,
>>
>>
>> One simple question, in the ceph docs says that to use Ceph as an HDFS
>> replacement, I can use the CephFs Hadoop plugin
>> (http://docs.ceph.com/docs/master/cephfs/hadoop/).
>>
>>
>> What I would like to know if instead of using the plugin, I can mount ceph
>> in fstab and then point hdfs dirs (namenode, datanode, etc) to this mounted
>> "ceph" dirs, instead of native local dirs.
>>
>> I understand that maybe will involve more configuration steps (configuring
>> fstab in each node), but will this work? Is there any problem with this type
>> of configuration?
>
> Without being a big HDFS expert, it seems like you would be
> essentially putting one distributed filesystem on top of another
> distributed filesystem.  I don't know if you're going to find anything
> that breaks as such, but it's probably not a good idea.
>
> John
>
>>
>> Thanks in advance,
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can not umount ceph osd partition

2016-02-03 Thread Max A. Krasilnikov
Здравствуйте! 

On Wed, Feb 03, 2016 at 04:59:30PM +0100, yoann.moulin wrote:

> Hello,

>> I am using 0.94.5. When I try to umount partition and fsck it I have issue:
>> root@storage003:~# stop ceph-osd id=13
>> ceph-osd stop/waiting
>> root@storage003:~# umount /var/lib/ceph/osd/ceph-13
>> root@storage003:~# fsck -yf /dev/sdf
>> fsck from util-linux 2.20.1
>> e2fsck 1.42.9 (4-Feb-2014)
>> /dev/sdf is in use.
>> e2fsck: Cannot continue, aborting.
>> 
>> There is no /var/lib/ceph/osd/ceph-13 in /proc mounts. But no ability to 
>> check
>> fs.
>> I can mount -o remount,rw, but I would like to umount device for maintenance
>> and, maybe, replace it.
>> 
>> Why I can't umount?

> is "lsof -n | grep /dev/sdf" give something ?

Nothing.

> and are you sure /dev/sdf is the disk for osd 13 ?

Absolutelly. I have even tried fsck -yf /dev/disk/by-label/osd-13. No luck.

Disk is mounted using LABEL in fstab, journal is symlink to
/dev/disk/by-partlabel/j-13.

-- 
WBR, Max A. Krasilnikov
ColoCall Data Center
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-03 Thread Wade Holler
AFAIK when using XFS, parallel write as you described is not enabled.

Regardless in a way though the NVMe drives are so fast it shouldn't matter
much the partitioned journal or other choice.

What I would be more interested in is you replication size on the cache
pool.

This might sound crazy but if your KVM instances are really that short
lived, could you get away with size=2 on the cache pool from and
availability perspective ?
On Wed, Feb 3, 2016 at 7:44 AM Sascha Vogt  wrote:

> Hi Wade,
>
> Am 03.02.2016 um 13:26 schrieb Wade Holler:
> > What is your file system type, XFS or Btrfs ?
> We're using XFS, though for the new cache tier we could also switch to
> btrfs if that suggest a significant performance improvement...
>
> Greetings
> -Sascha-
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can not umount ceph osd partition

2016-02-03 Thread Yoann Moulin
Hello,

> I am using 0.94.5. When I try to umount partition and fsck it I have issue:
> root@storage003:~# stop ceph-osd id=13
> ceph-osd stop/waiting
> root@storage003:~# umount /var/lib/ceph/osd/ceph-13
> root@storage003:~# fsck -yf /dev/sdf
> fsck from util-linux 2.20.1
> e2fsck 1.42.9 (4-Feb-2014)
> /dev/sdf is in use.
> e2fsck: Cannot continue, aborting.
> 
> There is no /var/lib/ceph/osd/ceph-13 in /proc mounts. But no ability to check
> fs.
> I can mount -o remount,rw, but I would like to umount device for maintenance
> and, maybe, replace it.
> 
> Why I can't umount?

is "lsof -n | grep /dev/sdf" give something ?

and are you sure /dev/sdf is the disk for osd 13 ?

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] can not umount ceph osd partition

2016-02-03 Thread Max A. Krasilnikov
Hello!

I am using 0.94.5. When I try to umount partition and fsck it I have issue:
root@storage003:~# stop ceph-osd id=13
ceph-osd stop/waiting
root@storage003:~# umount /var/lib/ceph/osd/ceph-13
root@storage003:~# fsck -yf /dev/sdf
fsck from util-linux 2.20.1
e2fsck 1.42.9 (4-Feb-2014)
/dev/sdf is in use.
e2fsck: Cannot continue, aborting.

There is no /var/lib/ceph/osd/ceph-13 in /proc mounts. But no ability to check
fs.
I can mount -o remount,rw, but I would like to umount device for maintenance
and, maybe, replace it.

Why I can't umount?

-- 
WBR, Max A. Krasilnikov
ColoCall Data Center
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding Cache Tier breaks rbd access

2016-02-03 Thread Udo Waechter
Ah, I might have found the solution:

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg26441.html

Add access to the Cache-tier for libvirt.

I'll try that later.

Talking about it sometimes really helps ;)

Thanks,
udo.

On 02/03/2016 04:25 PM, Udo Waechter wrote:
> Hello,
> 
> I am experimenting with adding a SSD-Cache tier to my existing Ceph
> 0.94.5 Cluster.
> 
> Currently I have:
> 10 OSDs on 5 hosts (spinning disks).
> 2 OSDs on 1 host (SSDs)
> 
> I have followed the cache tier docs:
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/
> 
> 1st I created a new (spinning pool) and set up the SSDs as a cache tier.
> All is fine. I can create/access images with rbd.
> I can seed that it is used with "rados -p cache-pool ls"
> 
> 
> Now, when I add a cache pool to my existing pools, for example the one
> which hosts all my VM images all hell breaks loose.
> 
> The VMs all get I/O end request errors and remount their filesystems R/O.
> If I shut them down and try to start them again, they won't start, virsh
> gives me an error  "cannot read metadata header" (or something like that).
> 
> Also, "rados -p cache-pool ls" hangs forewer.
> 
> I have read here:
> https://software.intel.com/en-us/blogs/2015/03/03/ceph-cache-tiering-introduction
> that adding caches works on-the-fly.
> 
> Is there a special trick to add cache to a running, pool under load
> (which is not very high, though)?
> 
> And what about the "osd tier add-cache   "
> command? its supposed to add a cache to the 1st pool. I can't see this
> be used in bost of the above links.
> 
> Thanks very much,
> udo.
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding Cache Tier breaks rbd access

2016-02-03 Thread Mihai Gheorghe
Did you set read/write permissions to the cache pool? For example, in
openstack i need to set read/write permission for cinder to be able to use
the cache pool.
On 3 Feb 2016 17:25, "Udo Waechter"  wrote:

> Hello,
>
> I am experimenting with adding a SSD-Cache tier to my existing Ceph
> 0.94.5 Cluster.
>
> Currently I have:
> 10 OSDs on 5 hosts (spinning disks).
> 2 OSDs on 1 host (SSDs)
>
> I have followed the cache tier docs:
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/
>
> 1st I created a new (spinning pool) and set up the SSDs as a cache tier.
> All is fine. I can create/access images with rbd.
> I can seed that it is used with "rados -p cache-pool ls"
>
>
> Now, when I add a cache pool to my existing pools, for example the one
> which hosts all my VM images all hell breaks loose.
>
> The VMs all get I/O end request errors and remount their filesystems R/O.
> If I shut them down and try to start them again, they won't start, virsh
> gives me an error  "cannot read metadata header" (or something like that).
>
> Also, "rados -p cache-pool ls" hangs forewer.
>
> I have read here:
>
> https://software.intel.com/en-us/blogs/2015/03/03/ceph-cache-tiering-introduction
> that adding caches works on-the-fly.
>
> Is there a special trick to add cache to a running, pool under load
> (which is not very high, though)?
>
> And what about the "osd tier add-cache   "
> command? its supposed to add a cache to the 1st pool. I can't see this
> be used in bost of the above links.
>
> Thanks very much,
> udo.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding Cache Tier breaks rbd access

2016-02-03 Thread Udo Waechter
Hello,

I am experimenting with adding a SSD-Cache tier to my existing Ceph
0.94.5 Cluster.

Currently I have:
10 OSDs on 5 hosts (spinning disks).
2 OSDs on 1 host (SSDs)

I have followed the cache tier docs:
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/

1st I created a new (spinning pool) and set up the SSDs as a cache tier.
All is fine. I can create/access images with rbd.
I can seed that it is used with "rados -p cache-pool ls"


Now, when I add a cache pool to my existing pools, for example the one
which hosts all my VM images all hell breaks loose.

The VMs all get I/O end request errors and remount their filesystems R/O.
If I shut them down and try to start them again, they won't start, virsh
gives me an error  "cannot read metadata header" (or something like that).

Also, "rados -p cache-pool ls" hangs forewer.

I have read here:
https://software.intel.com/en-us/blogs/2015/03/03/ceph-cache-tiering-introduction
that adding caches works on-the-fly.

Is there a special trick to add cache to a running, pool under load
(which is not very high, though)?

And what about the "osd tier add-cache   "
command? its supposed to add a cache to the 1st pool. I can't see this
be used in bost of the above links.

Thanks very much,
udo.



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Yan, Zheng

> On Feb 3, 2016, at 21:50, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> 
> Hi,
> 
> Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>>  wrote:
>>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
 On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
 GmbH  wrote:
 Or maybe your kernels are too old; Zheng would know.
>>> We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x
>>> for the clients should be possible if that might help.
>> mds log should contain messages like:
>> 
>> client. isn't responding to mclientcaps(revoke)
>> 
>> please send these messages to us.
> 2016-02-03 14:42:25.568800 7fadfd280700  2 mds.0.cache
> check_memory_usage total 17302804, rss 16604996, heap 42916, malloc
> -1008738 mmap 0, baseline 39844, buffers 0, max 1048576, 881503 /
> 388 inodes have caps, 882499 caps, 0.220625 caps per inode
> 2016-02-03 14:42:25.581494 7fadfd280700  0 log_channel(default) log
> [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
> 100815bd349 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.127500
> seconds ago
> 2016-02-03 14:42:25.581519 7fadfd280700  0 log_channel(default) log
> [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
> 100815bf1af pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.085996
> seconds ago
> 2016-02-03 14:42:25.581527 7fadfd280700  0 log_channel(default) log
> [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
> 100815bf4d3 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.084284
> seconds ago
> 2016-02-03 14:42:25.581534 7fadfd280700  0 log_channel(default) log
> [WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
> 100815d2500 pending pAsLsXsFsc issued pAsLsXsFscb, sent 61.731320
> seconds ago
> 2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
> [WRN] : 7 slow requests, 6 included below; oldest blocked for >
> 62.125785 secs
> 2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
> [WRN] : slow request 62.125785 seconds old, received at 2016-02-03
> 14:41:23.455812: client_request(client.10199855:1313157 getattr
> pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
> rdlock, waiting

This seems like dirty page writeback is too slow.  Is there any hung OSD 
request in /sys/kernel/debug/ceph/xxx/osdc?

> 
> -- 
> Kind regards
> Michael

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HEALTH_WARN pool vol has too few pgs

2016-02-03 Thread M Ranga Swami Reddy
Hi,

I am using ceph for my storage cluster and health shows as WARN state
with too few pgs.

==
health HEALTH_WARN pool volumes has too few pgs
==

The volume pool has 4096 pgs
--
ceph osd pool get volumes pg_num
pg_num: 4096
---

and
>ceph df
NAME   ID USED  %USED MAX AVAIL OBJECTS
volumes4  2830G  0.82  108T  763509
--

How do we fix this, without downtime?

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Michael Metz-Martini | SpeedPartner GmbH
Hi,

Am 03.02.2016 um 12:11 schrieb Yan, Zheng:
>> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>>  wrote:
>> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>>> GmbH  wrote:
>>> Or maybe your kernels are too old; Zheng would know.
>> We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x
>> for the clients should be possible if that might help.
> mds log should contain messages like:
> 
> client. isn't responding to mclientcaps(revoke)
> 
> please send these messages to us.
2016-02-03 14:42:25.568800 7fadfd280700  2 mds.0.cache
check_memory_usage total 17302804, rss 16604996, heap 42916, malloc
-1008738 mmap 0, baseline 39844, buffers 0, max 1048576, 881503 /
388 inodes have caps, 882499 caps, 0.220625 caps per inode
2016-02-03 14:42:25.581494 7fadfd280700  0 log_channel(default) log
[WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
100815bd349 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.127500
seconds ago
2016-02-03 14:42:25.581519 7fadfd280700  0 log_channel(default) log
[WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
100815bf1af pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.085996
seconds ago
2016-02-03 14:42:25.581527 7fadfd280700  0 log_channel(default) log
[WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
100815bf4d3 pending pAsLsXsFsc issued pAsLsXsFscb, sent 62.084284
seconds ago
2016-02-03 14:42:25.581534 7fadfd280700  0 log_channel(default) log
[WRN] : client.10199852 isn't responding to mclientcaps(revoke), ino
100815d2500 pending pAsLsXsFsc issued pAsLsXsFscb, sent 61.731320
seconds ago
2016-02-03 14:42:25.581840 7fadfd280700  0 log_channel(default) log
[WRN] : 7 slow requests, 6 included below; oldest blocked for >
62.125785 secs
2016-02-03 14:42:25.581849 7fadfd280700  0 log_channel(default) log
[WRN] : slow request 62.125785 seconds old, received at 2016-02-03
14:41:23.455812: client_request(client.10199855:1313157 getattr
pAsLsXsFs #100815bd349 2016-02-03 14:41:23.452386) currently failed to
rdlock, waiting

-- 
Kind regards
 Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Monthly Dev Meeting Today

2016-02-03 Thread Patrick McGarry
Hey cephers,

Just a reminder that the monthly replacement for CDS (Ceph Developer
Summit) is today at 12:30p EST. This will be a short meeting to
discuss all pending work going on with Ceph, so if you have anything
to share or discuss, please drop a very brief summary in the wiki:

http://tracker.ceph.com/projects/ceph/wiki/CDM_03-FEB-2016

The discussion will be held on BlueJeans, as usual. For call info
please join with the following:

To join the Meeting:
https://bluejeans.com/707503600
To join via Browser:
https://bluejeans.com/707503600/browser
To join with Lync:
https://bluejeans.com/707503600/lync


To join via Room System:
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 707503600

To join via Phone:
1) Dial:
+1 800 451 8679
+1 212 729 5016
(see all numbers -
https://www.intercallonline.com/listNumbersByCode.action?confCode=9891145960)
2) Enter Conference ID: 9891145960


If you have any questions, please don’t hesitate to ask. Thanks.

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hammer - remapped / undersized pgs + related questions

2016-02-03 Thread Nikola Ciprich
Hello,

I'd like to ask few rebalancing and related questions. On one
of my cluster, I got nearfull warning for one of OSDs.

Apart from that, the cluster health was perfectly OK,
all PGs active+clean.

Therefore I used rebalance-by-utilization which changed weights a
bit causing about 30% of data to be misplaced. After that, recovery
started, but it didn't got the cluster to clean state - some pgs
ended up in remapped state and even worse, some of them are left
undersized.

Even though I set weights to values before rebalance, it didn't help.

I'd like to ask more experienced users:

1) when I have cluster with evenly distributed OSDs and weights, it
happens that one of OSD suddenly gets much more filled then the others?

2) why rebalancing weights leads to undersized pgs? Is't this a bug
leading to unnecessary risk of data loss?

3) why changing weights by only a little value leads to such big data
transfers? I changed weight only for one OSD (out of 15) and by only
little value, and it caused about 30% misplaced groups.. is this OK?

4) after some experiments, I also got few pgs stuck in stale+active+clean
or creating state.. how to get rid of those?

5) last but not least, how can I help my cluster getting back to clean
state?


here's df tree:

[root@remrprv1c ceph]# ceph osd df tree
ID WEIGHT   REWEIGHT SIZE   USEAVAIL  %USE  VAR  TYPE NAME   
-8 13.54486- 13534G  8018G  5515G 59.24 1.63 root ssd
-2  4.51495-  4511G  2869G  1641G 63.61 1.75 host remrprv1a-ssd  
 0  0.85999  0.38860   859G   594G   264G 69.22 1.90 osd.0   
 1  0.85999  0.33694   859G   557G   301G 64.89 1.78 osd.1   
 2  0.92999  0.44678   929G   617G   312G 66.43 1.82 osd.2   
 3  0.92999  0.32753   929G   580G   348G 62.46 1.71 osd.3   
 4  0.93500  0.31308   934G   519G   414G 55.60 1.53 osd.4   
-3  4.51495-  4511G  2595G  1915G 57.54 1.58 host remrprv1b-ssd  
 5  0.85999  0.31793   859G   456G   402G 53.16 1.46 osd.5   
 6  0.85999  0.40715   859G   502G   356G 58.47 1.60 osd.6   
 7  0.92999  0.38741   929G   500G   428G 53.87 1.48 osd.7   
 8  0.92999  0.38803   929G   607G   322G 65.30 1.79 osd.8   
 9  0.93500  0.36951   934G   529G   405G 56.64 1.55 osd.9   
-4  4.51495-  4511G  2552G  1958G 56.59 1.55 host remrprv1c-ssd  
10  0.85999  0.34116   859G   456G   402G 53.11 1.46 osd.10  
11  0.85999  0.38770   859G   488G   370G 56.88 1.56 osd.11  
12  0.92999  0.41499   929G   556G   372G 59.90 1.64 osd.12  
13  0.92999  0.35764   929G   534G   394G 57.53 1.58 osd.13  
14  0.93500  0.38669   934G   516G   417G 55.29 1.52 osd.14  
-1 21.59995- 22004G  4929G 17074G 22.40 0.61 root sata   
-7  7.19998-  7334G  1644G  5690G 22.42 0.62 host remrprv1c-sata 
19  3.5  1.0  3667G   819G  2848G 22.33 0.61 osd.19  
20  3.5  1.0  3667G   825G  2841G 22.51 0.62 osd.20  
-6  7.19998-  7334G  1642G  5691G 22.40 0.61 host remrprv1b-sata 
17  3.5  1.0  3667G   806G  2860G 21.99 0.60 osd.17  
18  3.5  1.0  3667G   836G  2831G 22.80 0.63 osd.18  
-5  7.19998-  7334G  1642G  5692G 22.39 0.61 host remrprv1a-sata 
15  3.5  1.0  3667G   853G  2813G 23.28 0.64 osd.15  
16  3.5  1.0  3667G   788G  2879G 21.49 0.59 osd.16  
   TOTAL 35538G 12948G 22590G 36.43  
MIN/MAX VAR: 0.59/1.90  STDDEV: 19.22

here's ceph -s:

[root@remrprv1c ceph]# ceph -s
cluster ff21618e-5aea-4cfe-83b6-a0d2d5b4052a
 health HEALTH_WARN
3 pgs degraded
2 pgs stale
3 pgs stuck degraded
1 pgs stuck inactive
2 pgs stuck stale
242 pgs stuck unclean
3 pgs stuck undersized
3 pgs undersized
recovery 75/3374541 objects degraded (0.002%)
recovery 186194/3374541 objects misplaced (5.518%)
mds0: Behind on trimming (155/30)
 monmap e3: 3 mons at 
{remrprv1a=10.0.0.1:6789/0,remrprv1b=10.0.0.2:6789/0,remrprv1c=10.0.0.3:6789/0}
election epoch 522, quorum 0,1,2 remrprv1a,remrprv1b,remrprv1c
 mdsmap e347: 1/1/1 up {0=remrprv1a=up:active}, 2 up:standby
 osdmap e4423: 21 osds: 21 up, 21 in; 238 remapped pgs
  pgmap v18686541: 1856 pgs, 7 pools, 4224 GB data, 1103 kobjects
12948 GB used, 22590 GB / 35538 GB avail
75/3374541 objects degraded (0.002%)
186194/3374541 objects misplaced (5.518%)
1612 active+clean
 238 active+remapped
   3 active+undersized+degraded
   2 stale+active+clean

Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-03 Thread Sascha Vogt
Hi Wade,

Am 03.02.2016 um 13:26 schrieb Wade Holler:
> What is your file system type, XFS or Btrfs ?
We're using XFS, though for the new cache tier we could also switch to
btrfs if that suggest a significant performance improvement...

Greetings
-Sascha-

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-03 Thread Wade Holler
Hi Sascha,

What is your file system type, XFS or Btrfs ?

Thanks
Wade
On Wed, Feb 3, 2016 at 7:01 AM Sascha Vogt  wrote:

> Hi all,
>
> we recently tried adding a cache tier to our ceph cluster. We had 5
> spinning disks per hosts with a single journal NVMe disk, hosting the 5
> journals (1 OSD per spinning disk). We have 4 hosts up to now, so
> overall 4 NVMes hosting 20 journals for 20 spinning disks.
>
> As we had some space left on the NVMes so we made two additional
> partitions on each NVMe and created a 4 OSD cache tier.
>
> To our surprise the 4 OSD cache pool was able to deliver the same
> performance then the previous 20 OSD pool while reducing the OPs on the
> spinning disk to zero as long as the cache pool was sufficient to hold
> all / most data (ceph is used for very short living KVM virtual machines
> which do pretty heavy disk IO).
>
> As we don't need that much more storage right now we decided to extend
> our cluster by adding 8 additional NVMe disks solely as a cache pool and
> freeing the journal NVMes again. Now the question is: How to organize
> the OSDs on the NVMe disks (2 per host)?
>
> As the NVMes peak around 5-7 concurrent sequential writes (tested with
> fio) I thought about using 5 OSDs per NVMe. That would mean 10
> partitions (5 journals, 5 data). On the other hand the NVMes are only
> 400GB large, so that would result in OSD disk sizes for <80 GB
> (depending on the journal size).
>
> Would it make sense to skip the separate Journal partition and leave the
> journal on the data disk itself and limitting it to a rather small
> amount (lets say 1 GB or even less?) as SSDs typically don't like
> sequential writes anyway?
>
> Or, if I leave journal and data on separate partitions should I reduce
> the number of OSDs per disk to 3 as Ceph will most likly write to
> journal and data in parallel and I therefore already get 6 parallel
> "threads" of IO?
>
> Any feedback is highly appreciated :)
>
> Greetings
> -Sascha-
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-03 Thread Sascha Vogt
Hi all,

we recently tried adding a cache tier to our ceph cluster. We had 5
spinning disks per hosts with a single journal NVMe disk, hosting the 5
journals (1 OSD per spinning disk). We have 4 hosts up to now, so
overall 4 NVMes hosting 20 journals for 20 spinning disks.

As we had some space left on the NVMes so we made two additional
partitions on each NVMe and created a 4 OSD cache tier.

To our surprise the 4 OSD cache pool was able to deliver the same
performance then the previous 20 OSD pool while reducing the OPs on the
spinning disk to zero as long as the cache pool was sufficient to hold
all / most data (ceph is used for very short living KVM virtual machines
which do pretty heavy disk IO).

As we don't need that much more storage right now we decided to extend
our cluster by adding 8 additional NVMe disks solely as a cache pool and
freeing the journal NVMes again. Now the question is: How to organize
the OSDs on the NVMe disks (2 per host)?

As the NVMes peak around 5-7 concurrent sequential writes (tested with
fio) I thought about using 5 OSDs per NVMe. That would mean 10
partitions (5 journals, 5 data). On the other hand the NVMes are only
400GB large, so that would result in OSD disk sizes for <80 GB
(depending on the journal size).

Would it make sense to skip the separate Journal partition and leave the
journal on the data disk itself and limitting it to a rather small
amount (lets say 1 GB or even less?) as SSDs typically don't like
sequential writes anyway?

Or, if I leave journal and data on separate partitions should I reduce
the number of OSDs per disk to 3 as Ceph will most likly write to
journal and data in parallel and I therefore already get 6 parallel
"threads" of IO?

Any feedback is highly appreciated :)

Greetings
-Sascha-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS: bad/negative dir size

2016-02-03 Thread Yan, Zheng

> On Feb 3, 2016, at 17:51, Markus Blank-Burian  wrote:
> 
> Hi,
>  
> on ceph mds startup, I see the following two errors in the our logfiles 
> (using ceph 9.2.0 and linux 4.4 cephfs kernel client):
>  
> Feb  2 19:27:13 server1 ceph-mds[1809]: 2016-02-02 19:27:13.363416 
> 7fce9effd700 -1 log_channel(cluster) log [ERR] : bad/negative dir size on 603 
> f(v2008 m2016-02-02 18:52:25.870376 -327=-316+-11)
> Feb  2 19:27:13 server1 ceph-mds[1809]: 2016-02-02 19:27:13.363442 
> 7fce9effd700 -1 log_channel(cluster) log [ERR] : unmatched fragstat on 603, 
> inode has f(v2009 m2016-02-02 18:52:25.870376 -327=-316+-11), dirfrags have 
> f(v0 m2016-02-02 18:52:25.870376
>  
> The filesystem is accessible, but I guess that there will we problems in one 
> directory. How can I find out, which directory is affected and what can I do 
> to fix this error? Would it be safe to delete the affected directory?
> Markus

Directory 603 is special directory which is used by MDS internally. The error 
has already been fixed when MDS reveals these message. No need to worry about.

Regards
Yan, Zheng 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Yan, Zheng

> On Feb 3, 2016, at 17:39, Michael Metz-Martini | SpeedPartner GmbH 
>  wrote:
> 
> Hi,
> 
> Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
>> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
>> GmbH  wrote:
>>> Putting some higher load via cephfs on the cluster leads to messages
>>> like mds0: Client X failing to respond to capability release after some
>>> minutes. Requests from other clients start to block after a while.
>>> Rebooting the client named client resolves the issue.
>> There are some bugs around this functionality, but I *think* your
>> clients are new enough it shouldn't be an issue.
>> However, it's entirely possible your clients are actually making use
>> of enough inodes that the MDS server is running into its default
>> limits. If your MDS has memory available, you probably want to
>> increase the cache size from its default 100k (mds cache size = X).
> mds_cache_size is already 400 and so a lot higher than usual.
> (google said I should increase ...)
> 
>> Or maybe your kernels are too old; Zheng would know.
> We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x
> for the clients should be possible if that might help.
> 

mds log should contain messages like:

client. isn't responding to mclientcaps(revoke)

please send these messages to us.

Regards
Yan, Zheng



> -- 
> Kind regards
> Michael
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upgrading with mon & osd on same host

2016-02-03 Thread Udo Waechter
Hi,

I would like to upgrade my ceph cluster from hammer to infernalis.

I'm reading the upgrade notes, that I need to upgrade & restart the
monitors first, then the OSDs.

Now, my cluster has OSDs and Mons on the same hosts (I know that should
not be the case, but it is :( ).

I'm just wondering:
* Do the packages (Debian) restart the services upon upgrade?


In theory it should work this way:

* install / upgrade the new packages
* restart all mons
* stop OSD one by one and change the user accordingly.

Another question then:

Do I need to actually stop all OSDs, or can I upgrade them one by one?
I don't want to take the whole cluster down :(

Thanks very much,
udo.



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Same SSD-Cache-Pool for multiple Spinning-Disks-Pools?

2016-02-03 Thread Ferhat Ozkasgarli
Hello Udo,

You can not use one cache pool for multiple back end pools. You must create
new caching pool for every back end pool.

On Wed, Feb 3, 2016 at 12:32 PM, Udo Waechter  wrote:

> Hello everyone,
>
> I'm using ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
> with debian 8
>
> I have now implemented a SSD (2 OSDs) cache tier for one of my pool.
>
> I am now wondering whether it is possible to use the same SSD-Pool for
> multiple pools as a cache tier? Or do I need to create a cache pool for
> each of my spinning-disks-pools?
>
> Thank you very much,
> udo.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-03 Thread Nikola Ciprich
Hello Gregory,

in the meantime, I managed to break it further :(

I tried getting rid of active+remapped pgs and got some undersized
instead.. nto sure whether this can be related..

anyways here's the status:

ceph -s
cluster ff21618e-5aea-4cfe-83b6-a0d2d5b4052a
 health HEALTH_WARN
3 pgs degraded
2 pgs stale
3 pgs stuck degraded
1 pgs stuck inactive
2 pgs stuck stale
242 pgs stuck unclean
3 pgs stuck undersized
3 pgs undersized
recovery 65/3374343 objects degraded (0.002%)
recovery 186187/3374343 objects misplaced (5.518%)
mds0: Behind on trimming (155/30)
 monmap e3: 3 mons at 
{remrprv1a=10.0.0.1:6789/0,remrprv1b=10.0.0.2:6789/0,remrprv1c=10.0.0.3:6789/0}
election epoch 522, quorum 0,1,2 remrprv1a,remrprv1b,remrprv1c
 mdsmap e342: 1/1/1 up {0=remrprv1c=up:active}, 2 up:standby
 osdmap e4385: 21 osds: 21 up, 21 in; 238 remapped pgs
  pgmap v18679192: 1856 pgs, 7 pools, 4223 GB data, 1103 kobjects
12947 GB used, 22591 GB / 35538 GB avail
65/3374343 objects degraded (0.002%)
186187/3374343 objects misplaced (5.518%)
1612 active+clean
 238 active+remapped
   3 active+undersized+degraded
   2 stale+active+clean
   1 creating
  client io 0 B/s rd, 40830 B/s wr, 17 op/s


> What's the full output of "ceph -s"? Have you looked at the MDS admin
> socket at all — what state does it say it's in?

[root@remrprv1c ceph]# ceph --admin-daemon 
/var/run/ceph/ceph-mds.remrprv1c.asok dump_ops_in_flight
{
"ops": [
{
"description": "client_request(client.3052096:83 getattr Fs 
#1000288 2016-02-03 10:10:46.361591 RETRY=1)",
"initiated_at": "2016-02-03 10:23:25.791790",
"age": 3963.093615,
"duration": 9.519091,
"type_data": [
"failed to rdlock, waiting",
"client.3052096:83",
"client_request",
{
"client": "client.3052096",
"tid": 83
},
[
{
"time": "2016-02-03 10:23:25.791790",
"event": "initiated"
},
{
"time": "2016-02-03 10:23:35.310881",
"event": "failed to rdlock, waiting"
}
]
]
}
],
"num_ops": 1
}

seems there's some lock stuck here.. 

Killing stuck client (it's postgres trying to access cephfs file
doesn't help..)


> -Greg
> 
> >
> > My question here is:
> >
> > 1) is there some known issue with hammer 0.94.5 or kernel 4.1.15
> > which could lead to cephfs hangs?
> >
> > 2) what can I do to debug what is the cause of this hang?
> >
> > 3) is there a way to recover this without hard resetting
> > node with hung cephfs mount?
> >
> > If I could provide more information, please let me know
> >
> > I'd really appreciate any help
> >
> > with best regards
> >
> > nik
> >
> >
> >
> >
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpGoG5McCNrp.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Same SSD-Cache-Pool for multiple Spinning-Disks-Pools?

2016-02-03 Thread Udo Waechter
Hello everyone,

I'm using ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
with debian 8

I have now implemented a SSD (2 OSDs) cache tier for one of my pool.

I am now wondering whether it is possible to use the same SSD-Pool for
multiple pools as a cache tier? Or do I need to create a cache pool for
each of my spinning-disks-pools?

Thank you very much,
udo.



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-03 Thread Gregory Farnum
On Wed, Feb 3, 2016 at 1:21 AM, Nikola Ciprich
 wrote:
> Hello fellow ceph users and developers
>
> few days ago, I've update one our small cluster
> (three nodes) to kernel 4.1.15. Today I got cephfs
> stuck on one of the nodes.
>
> cpeh -s reports:
> mds0: Behind on trimming (155/30)
>
> restarting all MDS servers didn't help.
>
> all three cluster nodes are running hammer 0.94.5 on
> Centos 6, kernel 4.1.15.
>
> Each node runs 7 OSD daemons, monitor and MDS server
> (I know it's better to run those daemons separately, but
> we were tight on budget here and hardware should be sufficient)

What's the full output of "ceph -s"? Have you looked at the MDS admin
socket at all — what state does it say it's in?
-Greg

>
> My question here is:
>
> 1) is there some known issue with hammer 0.94.5 or kernel 4.1.15
> which could lead to cephfs hangs?
>
> 2) what can I do to debug what is the cause of this hang?
>
> 3) is there a way to recover this without hard resetting
> node with hung cephfs mount?
>
> If I could provide more information, please let me know
>
> I'd really appreciate any help
>
> with best regards
>
> nik
>
>
>
>
> --
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
>
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS: bad/negative dir size

2016-02-03 Thread Markus Blank-Burian
Hi,

 

on ceph mds startup, I see the following two errors in the our logfiles
(using ceph 9.2.0 and linux 4.4 cephfs kernel client):

 

Feb  2 19:27:13 server1 ceph-mds[1809]: 2016-02-02 19:27:13.363416
7fce9effd700 -1 log_channel(cluster) log [ERR] : bad/negative dir size on
603 f(v2008 m2016-02-02 18:52:25.870376 -327=-316+-11)

Feb  2 19:27:13 server1 ceph-mds[1809]: 2016-02-02 19:27:13.363442
7fce9effd700 -1 log_channel(cluster) log [ERR] : unmatched fragstat on 603,
inode has f(v2009 m2016-02-02 18:52:25.870376 -327=-316+-11), dirfrags have
f(v0 m2016-02-02 18:52:25.870376

 

The filesystem is accessible, but I guess that there will we problems in one
directory. How can I find out, which directory is affected and what can I do
to fix this error? Would it be safe to delete the affected directory?

Markus

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Michael Metz-Martini | SpeedPartner GmbH
Hi,

Am 03.02.2016 um 10:26 schrieb Gregory Farnum:
> On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
> GmbH  wrote:
>> Putting some higher load via cephfs on the cluster leads to messages
>> like mds0: Client X failing to respond to capability release after some
>> minutes. Requests from other clients start to block after a while.
>> Rebooting the client named client resolves the issue.
> There are some bugs around this functionality, but I *think* your
> clients are new enough it shouldn't be an issue.
> However, it's entirely possible your clients are actually making use
> of enough inodes that the MDS server is running into its default
> limits. If your MDS has memory available, you probably want to
> increase the cache size from its default 100k (mds cache size = X).
mds_cache_size is already 400 and so a lot higher than usual.
(google said I should increase ...)

> Or maybe your kernels are too old; Zheng would know.
We're already far away from centos-Dist-Kernel. but upgrading to 4.4.x
for the clients should be possible if that might help.

-- 
Kind regards
 Michael

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client X failing to respond to capability release

2016-02-03 Thread Gregory Farnum
On Tue, Feb 2, 2016 at 10:09 PM, Michael Metz-Martini | SpeedPartner
GmbH  wrote:
> Hi,
>
> we're experiencing some strange issues running ceph 0.87 in our, I
> think, quite large cluster (taking number of objects as a measurement).
>
>  mdsmap e721086: 1/1/1 up {0=storagemds01=up:active}, 2 up:standby
>  osdmap e143048: 92 osds: 92 up, 92 in
> flags noout,noscrub,nodeep-scrub
>   pgmap v45790682: 4736 pgs, 6 pools, 109 TB data, 3841 Mobjects
> 255 TB used, 48892 GB / 303 TB avail
>
> Putting some higher load via cephfs on the cluster leads to messages
> like mds0: Client X failing to respond to capability release after some
> minutes. Requests from other clients start to block after a while.
>
> Rebooting the client named client resolves the issue.

There are some bugs around this functionality, but I *think* your
clients are new enough it shouldn't be an issue.
However, it's entirely possible your clients are actually making use
of enough inodes that the MDS server is running into its default
limits. If your MDS has memory available, you probably want to
increase the cache size from its default 100k (mds cache size = X).

Or maybe your kernels are too old; Zheng would know.
-Greg

>
> Clients are a mix of CentOS6 & CentOS7 running kernel
> 4.1.4-1.el7.elrepo.x86_64
> 4.1.4-1.el6.elrepo.x86_64
> 4.4.0-2.el6.elrepo.x86_64
> but other releases show the same behavior.
>
> Currently running 3 OSD Nodes and 3 combined MDS/MON-Nodes.
>
> What information do you need to further track down this issue? Quite
> unsure so this is only a rough overview of the setup.
>
>
> We have another issue with sometimes broken files ; bad checksums after
> storage, but I think I will start a new thread for this ;-)
>
> Thanks!
>
> --
> Kind Regards
>  Michael
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to monit ceph bandwidth?

2016-02-03 Thread Gregory Farnum
On Tue, Feb 2, 2016 at 9:23 PM, yang  wrote:
> Hello everyone,
> I have a ceph cluster (v0.94.5) with cephFS. There is several clients in the 
> cluster,
> every client use their own directory in cephFS with ceph-fuse.
>
> I want to monit the IO bandwidth of the cluster and the client.
> r/w bandwidth and op/s can be reached by `ceph -s`,
>  But I do not know how to monit the IO performance of those clients.
>
> Futhermore, it's better to separate read bandwidth and
> write bandwidth?
> Does the current version of ceph support this?

There's not a quick and easy way to do this. If you have access to the
client admin sockets, you can look at the objecter perf counters,
which include statistics on bytes read and written. That's the only
good idea I have though.
-Greg

>
>
> Thanks,
> yang
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-03 Thread Nikola Ciprich
Hello fellow ceph users and developers

few days ago, I've update one our small cluster
(three nodes) to kernel 4.1.15. Today I got cephfs
stuck on one of the nodes.

cpeh -s reports:
mds0: Behind on trimming (155/30)

restarting all MDS servers didn't help.

all three cluster nodes are running hammer 0.94.5 on
Centos 6, kernel 4.1.15. 

Each node runs 7 OSD daemons, monitor and MDS server
(I know it's better to run those daemons separately, but
we were tight on budget here and hardware should be sufficient)

My question here is:

1) is there some known issue with hammer 0.94.5 or kernel 4.1.15
which could lead to cephfs hangs?

2) what can I do to debug what is the cause of this hang?

3) is there a way to recover this without hard resetting
node with hung cephfs mount?

If I could provide more information, please let me know

I'd really appreciate any help

with best regards

nik




-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpY4LNDIUhpv.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com