Re: [ceph-users] "store is getting too big" on monitors after Firefly to Giant upgrade

2014-12-09 Thread Haomai Wang
Maybe you can enable "mon_compact_on_start=true" when restarting mon,
it will compact data

On Wed, Dec 10, 2014 at 6:50 AM, Kevin Sumner  wrote:
> Hi all,
>
> We recently upgraded our cluster to Giant from.  Since then, we’ve been
> driving load tests against CephFS.  However, we’re getting “store is getting
> too big” warnings from the monitors and the mons have started consuming way
> more disk space, 40GB-60GB now as opposed to ~10GB pre-upgrade.  Is this
> expected?  Is there anything I can do to ease the store’s size?
>
> Thanks!
>
> :: ~ » ceph status
> cluster f1aefa73-b968-41e0-9a28-9a465db5f10b
>  health HEALTH_WARN mon.cluster4-monitor001 store is getting too big!
> 45648 MB >= 15360 MB; mon.cluster4-monitor002 store is getting too big!
> 56939 MB >= 15360 MB; mon.cluster4-monitor003 store is getting too big!
> 28647 MB >= 15360 MB; mon.cluster4-monitor004 store is getting too big!
> 60655 MB >= 15360 MB; mon.cluster4-monitor005 store is getting too big!
> 57335 MB >= 15360 MB
>  monmap e3: 5 mons at
> {cluster4-monitor001=17.138.96.12:6789/0,cluster4-monitor002=17.138.96.13:6789/0,cluster4-monitor003=17.138.96.14:6789/0,cluster4-monitor004=17.138.96.15:6789/0,cluster4-monitor005=17.138.96.16:6789/0},
> election epoch 34938, quorum 0,1,2,3,4
> cluster4-monitor001,cluster4-monitor002,cluster4-monitor003,cluster4-monitor004,cluster4-monitor005
>  mdsmap e6538: 1/1/1 up {0=cluster4-monitor001=up:active}
>  osdmap e49500: 501 osds: 470 up, 469 in
>   pgmap v1369307: 98304 pgs, 3 pools, 4933 GB data, 1976 kobjects
> 16275 GB used, 72337 GB / 93366 GB avail
>98304 active+clean
>   client io 3463 MB/s rd, 18710 kB/s wr, 7456 op/s
> --
> Kevin Sumner
> ke...@sumner.io
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple MDS servers...

2014-12-09 Thread Christopher Armstrong
JIten,

You simply start more metadata servers. You'll notice when you inspect the
cluster health that one will be the active, and the rest will be standbys.

Chris

On Tue, Dec 9, 2014 at 3:10 PM, JIten Shah  wrote:

> Hi Greg,
>
> Sorry for the confusion.  I am not looking for active/active configuration
> which I know is not supported but what documentation can I refer to for
> installing an active/stndby MDSes ?
>
> I tried looking on Ceph.com but could not find that explains how to setup
> an active/standby MDS cluster.
>
> Thanks.
>
> —Jiten
>
> On Dec 9, 2014, at 12:50 PM, Gregory Farnum  wrote:
>
> > You'll need to be a little more explicit about your question. In
> > general there is nothing special that needs to be done. If you're
> > trying to get multiple active MDSes (instead of active and
> > standby/standby-replay/etc) you'll need to tell the monitors to
> > increase the mds num (check the docs; this is not recommended right
> > now). You obviously need to add an MDS entry to one of your nodes
> > somewhere, the mechanism for which can differ based on how you're
> > managing your cluster.
> > But you don't need to do anything explicit like tell everybody
> > globally that there are multiple MDSes.
> > -Greg
> >
> > On Mon, Dec 8, 2014 at 10:48 AM, JIten Shah  wrote:
> >> Do I need to update the ceph.conf  to support multiple MDS servers?
> >>
> >> —Jiten
> >>
> >> On Nov 24, 2014, at 6:56 AM, Gregory Farnum  wrote:
> >>
> >>> On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah  wrote:
>  Hi Greg,
> 
>  I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any
> ceph
>  folders. I have always tried to set it up as mds.lab-cephmon002, so I
> am
>  wondering where is it getting that value from?
> >>>
> >>> No idea, sorry. Probably some odd mismatch between expectations and
> >>> how the names are actually being parsed and saved.
> >>> -Greg
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple MDS servers...

2014-12-09 Thread JIten Shah
I have been trying to do that for quite some time now (using puppet) but ti 
keeps failing. Here’s what the error says.

Error: Could not start Service[ceph-mds]: Execution of '/sbin/service ceph 
start mds.Lab-cephmon003' returned 1: /etc/init.d/ceph: mds.Lab-cephmon003 not 
found (/etc/ceph/ceph.conf defines mon.Lab-cephmon003 , /var/lib/ceph defines 
mon.Lab-cephmon003)
Wrapped exception:
Execution of '/sbin/service ceph start mds.Lab-cephmon003' returned 1: 
/etc/init.d/ceph: mds.Lab-cephmon003 not found (/etc/ceph/ceph.conf defines 
mon.Lab-cephmon003 , /var/lib/ceph defines mon.Lab-cephmon003)
Error: /Stage[main]/Ceph::Mds/Service[ceph-mds]/ensure: change from stopped to 
running failed: Could not start Service[ceph-mds]: Execution of '/sbin/service 
ceph start mds.Lab-cephmon003' returned 1: /etc/init.d/ceph: mds.Lab-cephmon003 
not found (/etc/ceph/ceph.conf defines mon.Lab-cephmon003 , /var/lib/ceph 
defines mon.Lab-cephmon003)



On Dec 9, 2014, at 3:12 PM, Christopher Armstrong  wrote:

> JIten,
> 
> You simply start more metadata servers. You'll notice when you inspect the 
> cluster health that one will be the active, and the rest will be standbys.
> 
> Chris
> 
> On Tue, Dec 9, 2014 at 3:10 PM, JIten Shah  wrote:
> Hi Greg,
> 
> Sorry for the confusion.  I am not looking for active/active configuration 
> which I know is not supported but what documentation can I refer to for 
> installing an active/stndby MDSes ?
> 
> I tried looking on Ceph.com but could not find that explains how to setup an 
> active/standby MDS cluster.
> 
> Thanks.
> 
> —Jiten
> 
> On Dec 9, 2014, at 12:50 PM, Gregory Farnum  wrote:
> 
> > You'll need to be a little more explicit about your question. In
> > general there is nothing special that needs to be done. If you're
> > trying to get multiple active MDSes (instead of active and
> > standby/standby-replay/etc) you'll need to tell the monitors to
> > increase the mds num (check the docs; this is not recommended right
> > now). You obviously need to add an MDS entry to one of your nodes
> > somewhere, the mechanism for which can differ based on how you're
> > managing your cluster.
> > But you don't need to do anything explicit like tell everybody
> > globally that there are multiple MDSes.
> > -Greg
> >
> > On Mon, Dec 8, 2014 at 10:48 AM, JIten Shah  wrote:
> >> Do I need to update the ceph.conf  to support multiple MDS servers?
> >>
> >> —Jiten
> >>
> >> On Nov 24, 2014, at 6:56 AM, Gregory Farnum  wrote:
> >>
> >>> On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah  wrote:
>  Hi Greg,
> 
>  I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph
>  folders. I have always tried to set it up as mds.lab-cephmon002, so I am
>  wondering where is it getting that value from?
> >>>
> >>> No idea, sorry. Probably some odd mismatch between expectations and
> >>> how the names are actually being parsed and saved.
> >>> -Greg
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is mon initial members used after the first quorum?

2014-12-09 Thread Christopher Armstrong
Hi folks,

I think we have a bit of confusion around how initial members is used. I
understand that we can specify a single monitor (or a subset of monitors)
so that the cluster can form a quorum when it first comes up. This is how
we're using the setting now - so the cluster can come up with just one
monitor, with the other monitors to follow later.

However, a Deis user reported that when the monitor in his initial members
list went down, radosgw stopped functioning, even though there are three
mons in his config file. I would think that the radosgw client would
connect to any of the nodes in the config file to get the state of the
cluster, and that the initial members list is only used when the monitors
first come up and are trying to achieve quorum.

The issue he filed is here: https://github.com/deis/deis/issues/2711

He also found this Ceph issue filed: https://github.com/ceph/ceph/pull/1233

Is that what we're seeing here? Can anyone point us in the right direction?

Thanks,

Chris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple MDS servers...

2014-12-09 Thread JIten Shah
Hi Greg,

Sorry for the confusion.  I am not looking for active/active configuration 
which I know is not supported but what documentation can I refer to for 
installing an active/stndby MDSes ?

I tried looking on Ceph.com but could not find that explains how to setup an 
active/standby MDS cluster.

Thanks.

—Jiten

On Dec 9, 2014, at 12:50 PM, Gregory Farnum  wrote:

> You'll need to be a little more explicit about your question. In
> general there is nothing special that needs to be done. If you're
> trying to get multiple active MDSes (instead of active and
> standby/standby-replay/etc) you'll need to tell the monitors to
> increase the mds num (check the docs; this is not recommended right
> now). You obviously need to add an MDS entry to one of your nodes
> somewhere, the mechanism for which can differ based on how you're
> managing your cluster.
> But you don't need to do anything explicit like tell everybody
> globally that there are multiple MDSes.
> -Greg
> 
> On Mon, Dec 8, 2014 at 10:48 AM, JIten Shah  wrote:
>> Do I need to update the ceph.conf  to support multiple MDS servers?
>> 
>> —Jiten
>> 
>> On Nov 24, 2014, at 6:56 AM, Gregory Farnum  wrote:
>> 
>>> On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah  wrote:
 Hi Greg,
 
 I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph
 folders. I have always tried to set it up as mds.lab-cephmon002, so I am
 wondering where is it getting that value from?
>>> 
>>> No idea, sorry. Probably some odd mismatch between expectations and
>>> how the names are actually being parsed and saved.
>>> -Greg
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active+degraded on an empty new cluster

2014-12-09 Thread Craig Lewis
When I first created a test cluster, I used 1 GiB disks.  That causes
problems.

Ceph has a CRUSH weight.  By default, the weight is the size of the disk in
TiB, truncated to 2 decimal places.  ie, any disk smaller than 10 GiB will
have a weight of 0.00.

I increased all of my virtual disks to 10 GiB.  After rebooting the nodes
(to see the changes), everything healed.


On Tue, Dec 9, 2014 at 9:45 AM, Gregory Farnum  wrote:

> It looks like your OSDs all have weight zero for some reason. I'd fix
> that. :)
> -Greg
>
> On Tue, Dec 9, 2014 at 6:24 AM Giuseppe Civitella <
> giuseppe.civite...@gmail.com> wrote:
>
>> Hi,
>>
>> thanks for the quick answer.
>> I did try the force_create_pg on a pg but is stuck on "creating":
>> root@ceph-mon1:/home/ceph# ceph pg dump |grep creating
>> dumped all in format plain
>> 2.2f0   0   0   0   0   0   0   creating
>>2014-12-09 13:11:37.384808  0'0 0:0 []  -1  []
>>-1  0'0 0.000'0  0.00
>>
>> root@ceph-mon1:/home/ceph# ceph pg 2.2f query
>> { "state": "active+degraded",
>>   "epoch": 105,
>>   "up": [
>> 0],
>>   "acting": [
>> 0],
>>   "actingbackfill": [
>> "0"],
>>   "info": { "pgid": "2.2f",
>>   "last_update": "0'0",
>>   "last_complete": "0'0",
>>   "log_tail": "0'0",
>>   "last_user_version": 0,
>>   "last_backfill": "MAX",
>>   "purged_snaps": "[]",
>>   "last_scrub": "0'0",
>>   "last_scrub_stamp": "2014-12-06 14:15:11.499769",
>>   "last_deep_scrub": "0'0",
>>   "last_deep_scrub_stamp": "2014-12-06 14:15:11.499769",
>>   "last_clean_scrub_stamp": "0.00",
>>   "log_size": 0,
>>   "ondisk_log_size": 0,
>>   "stats_invalid": "0",
>>   "stat_sum": { "num_bytes": 0,
>>   "num_objects": 0,
>>   "num_object_clones": 0,
>>   "num_object_copies": 0,
>>   "num_objects_missing_on_primary": 0,
>>   "num_objects_degraded": 0,
>>   "num_objects_unfound": 0,
>>   "num_objects_dirty": 0,
>>   "num_whiteouts": 0,
>>   "num_read": 0,
>>   "num_read_kb": 0,
>>   "num_write": 0,
>>   "num_write_kb": 0,
>>   "num_scrub_errors": 0,
>>   "num_shallow_scrub_errors": 0,
>>   "num_deep_scrub_errors": 0,
>>   "num_objects_recovered": 0,
>>   "num_bytes_recovered": 0,
>>   "num_keys_recovered": 0,
>>   "num_objects_omap": 0,
>>   "num_objects_hit_set_archive": 0},
>>   "stat_cat_sum": {},
>>   "up": [
>> 0],
>>   "acting": [
>> 0],
>>   "up_primary": 0,
>>   "acting_primary": 0},
>>   "empty": 1,
>>   "dne": 0,
>>   "incomplete": 0,
>>   "last_epoch_started": 104,
>>   "hit_set_history": { "current_last_update": "0'0",
>>   "current_last_stamp": "0.00",
>>   "current_info": { "begin": "0.00",
>>   "end": "0.00",
>>   "version": "0'0"},
>>   "history": []}},
>>   "peer_info": [],
>>   "recovery_state": [
>> { "name": "Started\/Primary\/Active",
>>   "enter_time": "2014-12-09 12:12:52.760384",
>>   "might_have_unfound": [],
>>   "recovery_progress": { "backfill_targets": [],
>>   "waiting_on_backfill": [],
>>   "last_backfill_started": "0\/\/0\/\/-1",
>>   "backfill_info": { "begin": "0\/\/0\/\/-1",
>>   "end": "0\/\/0\/\/-1",
>>   "objects": []},
>>   "peer_backfill_info": [],
>>   "backfills_in_flight": [],
>>   "recovering": [],
>>   "pg_backend": { "pull_from_peer": [],
>>   "pushing": []}},
>>   "scrub": { "scrubber.epoch_start": "0",
>>   "scrubber.active": 0,
>>   "scrubber.block_writes": 0,
>>   "scrubber.finalizing": 0,
>>   "scrubber.waiting_on": 0,
>>   "scrubber.waiting_on_whom": []}},
>> { "name": "Started",
>>   "enter_time": "2014-12-09 12:12:51.845686"}],
>>   "agent_state": {}}root@ceph-mon1:/home/ceph#
>>
>>
>>
>> 2014-12-09 13:01 GMT+01:00 Irek Fasikhov :
>>
>>> Hi.
>>>
>>> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
>>>
>>> ceph pg force_create_pg 
>>>
>>>
>>> 2014-12-09 14:50 GMT+03:00 Giuseppe Civitella <
>>> giuseppe.civite...@gmail.com>:
>>>
 Hi all,

 last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04
 with default kernel.
 There is a ceph monitor a two osd hosts. Here are some datails:
 ceph -s
 cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
  health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
  monmap e1: 1 mons at {ceph-mon1=10.1.1.83:67

Re: [ceph-users] Monitors repeatedly calling for new elections

2014-12-09 Thread Jon Kåre Hellan

On 09. des. 2014 18:19, Sanders, Bill wrote:

Thanks for the response.  I did forget to mention that NTP is setup and does 
appear to be running (just double checked).


You probably know this, but just in case: If 'ntpq -p' shows a '*' in 
front of one of the servers, NTP has managed to synch up. If not, NTP 
has had no effect on your clock.


Jon

Jon Kåre Hellan, UNINETT AS, Trondheim Norway



Is this good enough resolution?

$ for node in $nodes; do ssh tvsa${node} sudo date --rfc-3339=ns; done
2014-12-09 09:15:39.404292557-08:00
2014-12-09 09:15:39.521762397-08:00
2014-12-09 09:15:39.641491188-08:00
2014-12-09 09:15:39.761937524-08:00
2014-12-09 09:15:39.911416676-08:00
2014-12-09 09:15:40.029777457-08:00

Bill

From: Rodrigo Severo [rodr...@fabricadeideias.com]
Sent: Tuesday, December 09, 2014 4:02 AM
To: Sanders, Bill
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors repeatedly calling for new elections

On Mon, Dec 8, 2014 at 5:23 PM, Sanders, Bill  wrote:


Under activity, we'll get monitors going into election cycles repeatedly,
OSD's being "wrongly marked down", as well as slow requests "osd.11
39.7.48.6:6833/21938 failed (3 reports from 1 peers after 52.914693 >= grace
20.00)" .  During this, ceph -w shows the cluster essentially idle.
None of the network, disks, or cpu's ever appear to max out.  It also
doesn't appear to be the same OSD's, MON's, or node causing the problem.
Top reports all 128 GB RAM (negligible swap) in use on the storage nodes.
Only Ceph is running on the storage nodes.



I'm really new to Ceph but my first bet is that your computers aren't
clock synchronized. Are all of them with working ntpds?


Regards,

Rodrigo Severo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "store is getting too big" on monitors after Firefly to Giant upgrade

2014-12-09 Thread Kevin Sumner
Hi all,

We recently upgraded our cluster to Giant from.  Since then, we’ve been driving 
load tests against CephFS.  However, we’re getting “store is getting too big” 
warnings from the monitors and the mons have started consuming way more disk 
space, 40GB-60GB now as opposed to ~10GB pre-upgrade.  Is this expected?  Is 
there anything I can do to ease the store’s size?

Thanks!

:: ~ » ceph status
cluster f1aefa73-b968-41e0-9a28-9a465db5f10b
 health HEALTH_WARN mon.cluster4-monitor001 store is getting too big! 45648 
MB >= 15360 MB; mon.cluster4-monitor002 store is getting too big! 56939 MB >= 
15360 MB; mon.cluster4-monitor003 store is getting too big! 28647 MB >= 15360 
MB; mon.cluster4-monitor004 store is getting too big! 60655 MB >= 15360 MB; 
mon.cluster4-monitor005 store is getting too big! 57335 MB >= 15360 MB
 monmap e3: 5 mons at 
{cluster4-monitor001=17.138.96.12:6789/0,cluster4-monitor002=17.138.96.13:6789/0,cluster4-monitor003=17.138.96.14:6789/0,cluster4-monitor004=17.138.96.15:6789/0,cluster4-monitor005=17.138.96.16:6789/0},
 election epoch 34938, quorum 0,1,2,3,4 
cluster4-monitor001,cluster4-monitor002,cluster4-monitor003,cluster4-monitor004,cluster4-monitor005
 mdsmap e6538: 1/1/1 up {0=cluster4-monitor001=up:active}
 osdmap e49500: 501 osds: 470 up, 469 in
  pgmap v1369307: 98304 pgs, 3 pools, 4933 GB data, 1976 kobjects
16275 GB used, 72337 GB / 93366 GB avail
   98304 active+clean
  client io 3463 MB/s rd, 18710 kB/s wr, 7456 op/s
--
Kevin Sumner
ke...@sumner.io



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] seg fault

2014-12-09 Thread Philipp Strobl
Hi Samuel,

After Reading your Mail again carrfully, i see That my last questions are 
obsolet. 
I will surely Upgrade to 0.67.10 as soon as i can and take a closer look to the 
improvements.

As far as i understand the Release notes to 0.67.11 there are no ugrading 
fixes, so i hope there were no issues when going to firefly afterwards.

Thank you (all) again

Best
Philipp 


> Am 08.12.2014 um 23:52 schrieb Samuel Just :
> 
> To start with, dumpling itself is up to v0.67.11.  You are running
> v0.67.0.  There have been many bug fixes just in dumpling in that
> time.  You should start with upgrading to v0.67.11 even if you plan on
> upgrading to firefly or giant later (there were bug fixes in dumpling
> for bugs which only happen when upgrading to later versions).  Beyond
> that, it depends on your needs.  Giant won't be maintained for a
> particularly long time (like emperor), but firefly will (like
> dumpling).
> -Sam
> 
> On Mon, Dec 8, 2014 at 2:47 PM, Philipp von Strobl-Albeg
>  wrote:
>> Thank you very much.
>> I planed this step already - so good to know ;-)
>> 
>> Do you recommend firefly or giant - without needing radosgw ?
>> 
>> 
>> Best
>> Philipp
>> 
>> 
>>> Am 08.12.2014 um 23:42 schrieb Samuel Just:
>>> 
>>> At a guess, this is something that has long since been fixed in
>>> dumpling, you probably want to upgrade to the current dumpling point
>>> release.
>>> -Sam
>>> 
>>> On Mon, Dec 8, 2014 at 2:40 PM, Philipp von Strobl-Albeg
>>>  wrote:
 
 Hi,
 
 after using the ceph-cluster for months without any problems - thank you
 for
 that great piece of software -, i recognize one osd crashed with
 following
 output.
 What are the recommondations - Just Upgrading or is this not a bug on
 0.67 ?
 
 
 -1> 2014-11-08 04:24:51.127924 7f0d92897700  5 --OSD::tracker-- reqid:
 client.9016.1:5037242, seq: 3484524, time: 2014-11-08 04:24:51.127924,
 event: waiting_for_osdmap, request: osd_op(client.9016.1:5037242
 rb.0.1798.6b8b4567.0076 [write 602112~4096] 2.c90060c7 snapc 7=[]
 e554) v4
 0> 2014-11-08 04:24:51.141626 7f0d88ff9700 -1 *** Caught signal
 (Segmentation fault) **
 in thread 7f0d88ff9700
 
 ceph version 0.67 (e3b7bc5bce8ab330ec1661381072368af3c218a0)
 1: ceph-osd() [0x802577]
 2: (()+0x113d0) [0x7f0db94d93d0]
 3: (std::string::compare(std::string const&) const+0xc)
 [0x7f0db7e81c4c]
 4: (PGLog::check()+0x90) [0x76b8d0]
 5: (PGLog::write_log(ObjectStore::Transaction&, hobject_t
 const&)+0x245)
 [0x7672c5]
 6: (PG::append_log(std::vector>>> std::allocator >&, eversion_t,
 ObjectStore::Transaction&)+0x31d) [0x71f03d]
 7: (ReplicatedPG::do_op(std::tr1::shared_ptr)+0x36f3)
 [0x623e63]
 8: (PG::do_request(std::tr1::shared_ptr,
 ThreadPool::TPHandle&)+0x619) [0x710a19]
 9: (OSD::dequeue_op(boost::intrusive_ptr,
 std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x330) [0x6663f0]
 10: (OSD::OpWQ::_process(boost::intrusive_ptr,
 ThreadPool::TPHandle&)+0x4a0) [0x67cbc0]
 11: (ThreadPool::WorkQueueVal,
 std::tr1::shared_ptr >, boost::intrusive_ptr
> 
> ::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x6b893c]
 
 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8bb156]
 13: (ThreadPool::WorkThread::entry()+0x10) [0x8bcf60]
 14: (()+0x91a7) [0x7f0db94d11a7]
 15: (clone()+0x6d) [0x7f0db76072cd]
 NOTE: a copy of the executable, or `objdump -rdS ` is
 needed to
 interpret this.
 
 --- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
 1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-osd.2.log
 
 --
 Philipp Strobl
 http://www.pilarkto.net
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> --
>> Philipp v. Strobl-Albeg
>> Dipl.-Ing.
>> 
>> Zellerstr. 19
>> 70180 Stuttgart
>> 
>> Tel   +49 711 121 58269
>> Mobil +49 151 270 39710
>> Fax   +49 711 658 3089
>> 
>> http://www.pilarkto.n

[ceph-users] Unable to start radosgw

2014-12-09 Thread Vivek Varghese Cherian
Hi,

I am trying to integrate OpenStack Juno Keystone with the Ceph Object
Gateway(radosw).

I want to use keystone as the users authority. A user that keystone
authorizes to access the gateway will also be created on the radosgw.
Tokens that keystone validates will be considered as valid by the rados
gateway.

I am using the URL http://ceph.com/docs/master/radosgw/keystone/ as my
reference.

I have deployed a 4 node ceph cluster running on Ubuntu 14.04

Host1: ppm-c240-admin.xyz.com (10.x.x.123)
Host2: ppm-c240-ceph1.xyz.com (10.x.x.124)
Host3: ppm-c240-ceph2.xyz.com (10.x.x.125)
Host4: ppm-c240-ceph3.xyz.com (10.x.x.126)


ppm-c240-ceph3.xyz.com is the radosgw host, the radosgw service has
stopped working and I am able to start it using /etc/init.d/radosgw start

My /etc/ceph/ceph.conf on all the 4 nodes is as follows,


root@ppm-c240-ceph3:~# cat /etc/ceph/ceph.conf

[global]
fsid = df18a088-2a70-43f9-b07f-ce8cf7c3349c
mon_initial_members = ppm-c240-admin, ppm-c240-ceph1, ppm-c240-ceph2
mon_host = 10.x.x.123,10.x.x.124,10.x.x.125
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public_network = 10.x.x.0/24
cluster_network = 192.168.0.0/24
osd_pool_default_pg_num = 512
osd_pool_default_pgp_num = 512
debug rgw = 20

[osd]
osd_journal_size = 1

[osd.0]
osd_host = ppm-c240-admin
public_addr = 10.x.x.123
cluster_addr = 192.168.0.10

[osd.1]
osd_host = ppm-c240-admin
public_addr = 10.x.x.123
cluster_addr = 192.168.0.10

[osd.2]
osd_host = ppm-c240-admin
public_addr = 10.x.x.123
cluster_addr = 192.168.0.10

[osd.3]
osd_host = ppm-c240-ceph1
public_addr = 10.x.x.124
cluster_addr = 192.168.0.11

[osd.4]
osd_host = ppm-c240-ceph1
public_addr = 10.x.x.124
cluster_addr = 192.168.0.11

[osd.5]
osd_host = ppm-c240-ceph1
public_addr = 10.x.x.124
cluster_addr = 192.168.0.11

[osd.6]
osd_host = ppm-c240-ceph2
public_addr = 10.x.x.125
cluster_addr = 192.168.0.12

[osd.7]
osd_host = ppm-c240-ceph2
public_addr = 10.x.x.125
cluster_addr = 192.168.0.12

[osd.8]
osd_host = ppm-c240-ceph2
public_addr = 10.x.x.125
cluster_addr = 192.168.0.12

[osd.9]
osd_host = ppm-c240-ceph3
public_addr = 10.x.x.126
cluster_addr = 192.168.0.13

[osd.10]
osd_host = ppm-c240-ceph3
public_addr = 10.x.x.126
cluster_addr = 192.168.0.13

[osd.11]
osd_host = ppm-c240-ceph3
public_addr = 10.x.x.126
cluster_addr = 192.168.0.13

[client.radosgw.gateway]
host = ppm-c240-ceph3
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/radosgw/client.radosgw.gateway.log

rgw keystone url = 10.x.x.175:35357
rgw keystone admin token = xyz123
rgw keystone accepted roles = Member, admin
rgw keystone token cache size = 1
rgw keystone revocation interval = 15 * 60
rgw s3 auth use keystone = true
nss db path = /var/lib/nssdb
root@ppm-c240-ceph3:~#

I am including the coredump for reference:

root@ppm-c240-ceph3:~# /usr/bin/radosgw -n client.radosgw.gateway -d
log-to-stderr
2014-12-09 12:51:31.410944 7f073f6457c0  0 ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 5958
common/ceph_crypto.cc: In function 'void ceph::crypto::init(CephContext*)'
thread 7f073f6457c0 time 2014-12-09 12:51:31.412682
common/ceph_crypto.cc: 54: FAILED assert(s == SECSuccess)
 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (()+0x293ce8) [0x7f073e797ce8]
 2: (common_init_finish(CephContext*, int)+0x10) [0x7f073e76afa0]
 3: (main()+0x340) [0x4665a0]
 4: (__libc_start_main()+0xf5) [0x7f073c932ec5]
 5: /usr/bin/radosgw() [0x4695c7]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.
2014-12-09 12:51:31.413544 7f073f6457c0 -1 common/ceph_crypto.cc: In
function 'void ceph::crypto::init(CephContext*)' thread 7f073f6457c0 time
2014-12-09 12:51:31.412682
common/ceph_crypto.cc: 54: FAILED assert(s == SECSuccess)

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (()+0x293ce8) [0x7f073e797ce8]
 2: (common_init_finish(CephContext*, int)+0x10) [0x7f073e76afa0]
 3: (main()+0x340) [0x4665a0]
 4: (__libc_start_main()+0xf5) [0x7f073c932ec5]
 5: /usr/bin/radosgw() [0x4695c7]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- begin dump of recent events ---
   -13> 2014-12-09 12:51:31.407900 7f073f6457c0  5 asok(0xaf1180)
register_command perfcounters_dump hook 0xaf2c10
   -12> 2014-12-09 12:51:31.407944 7f073f6457c0  5 asok(0xaf1180)
register_command 1 hook 0xaf2c10
   -11> 2014-12-09 12:51:31.407953 7f073f6457c0  5 asok(0xaf1180)
register_command perf dump hook 0xaf2c10
   -10> 2014-12-09 12:51:31.407961 7f073f6457c0  5 asok(0xaf1180)
register_command perfcounters_schema hook 0xaf2c10
-9> 2014-12-09 12:51:31.407992 7f073f6457c0  5 asok(0xaf1180)
register_command 2 hook 0xaf2c10
-8> 2014-12-09 12:51:31.407995 7f073f6457c0  5 asok(0xaf1180)
register_command perf schema hook 0xaf2c10
-7> 2014-12-09 12:51:31.40

Re: [ceph-users] active+degraded on an empty new cluster

2014-12-09 Thread Irek Fasikhov
Hi.

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/

ceph pg force_create_pg 


2014-12-09 14:50 GMT+03:00 Giuseppe Civitella 
:

> Hi all,
>
> last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04 with
> default kernel.
> There is a ceph monitor a two osd hosts. Here are some datails:
> ceph -s
> cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
>  health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
>  monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election epoch 1,
> quorum 0 ceph-mon1
>  osdmap e83: 6 osds: 6 up, 6 in
>   pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects
> 207 MB used, 30446 MB / 30653 MB avail
>  192 active+degraded
>
> root@ceph-mon1:/home/ceph# ceph osd dump
> epoch 99
> fsid c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
> created 2014-12-06 13:15:06.418843
> modified 2014-12-09 11:38:04.353279
> flags
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 18 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 19 flags hashpspool stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 20 flags hashpspool stripe_width 0
> max_osd 6
> osd.0 up   in  weight 1 up_from 90 up_thru 90 down_at 89
> last_clean_interval [58,89) 10.1.1.84:6805/995 10.1.1.84:6806/4000995
> 10.1.1.84:6807/4000995 10.1.1.84:6808/4000995 exists,up
> e3895075-614d-48e2-b956-96e13dbd87fe
> osd.1 up   in  weight 1 up_from 88 up_thru 0 down_at 87
> last_clean_interval [8,87) 10.1.1.85:6800/23146 10.1.1.85:6815/7023146
> 10.1.1.85:6816/7023146 10.1.1.85:6817/7023146 exists,up
> 144bc6ee-2e3d-4118-a460-8cc2bb3ec3e8
> osd.2 up   in  weight 1 up_from 61 up_thru 0 down_at 60
> last_clean_interval [11,60) 10.1.1.85:6805/26784 10.1.1.85:6802/5026784
> 10.1.1.85:6811/5026784 10.1.1.85:6812/5026784 exists,up
> 8d5c7108-ef11-4947-b28c-8e20371d6d78
> osd.3 up   in  weight 1 up_from 95 up_thru 0 down_at 94
> last_clean_interval [57,94) 10.1.1.84:6800/810 10.1.1.84:6810/3000810
> 10.1.1.84:6811/3000810 10.1.1.84:6812/3000810 exists,up
> bd762b2d-f94c-4879-8865-cecd63895557
> osd.4 up   in  weight 1 up_from 97 up_thru 0 down_at 96
> last_clean_interval [74,96) 10.1.1.84:6801/9304 10.1.1.84:6802/2009304
> 10.1.1.84:6803/2009304 10.1.1.84:6813/2009304 exists,up
> 7d28a54b-b474-4369-b958-9e6bf6c856aa
> osd.5 up   in  weight 1 up_from 99 up_thru 0 down_at 98
> last_clean_interval [79,98) 10.1.1.85:6801/19513 10.1.1.85:6808/2019513
> 10.1.1.85:6810/2019513 10.1.1.85:6813/2019513 exists,up
> f4d76875-0e40-487c-a26d-320f8b8d60c5
>
> root@ceph-mon1:/home/ceph# ceph osd tree
> # idweight  type name   up/down reweight
> -1  0   root default
> -2  0   host ceph-osd1
> 0   0   osd.0   up  1
> 3   0   osd.3   up  1
> 4   0   osd.4   up  1
> -3  0   host ceph-osd2
> 1   0   osd.1   up  1
> 2   0   osd.2   up  1
> 5   0   osd.5   up  1
>
> Current HEALTH_WARN state says "192 active+degraded" since I rebooted an
> osd host. Previously it was "incomplete". It never reached a HEALTH_OK
> state.
> Any hint about what to do next to have an healthy cluster?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2014-12-09 Thread Luis Periquito
Hi Wido,
thanks for sharing.

fortunately I'm still running precise but planning on moving to trusty.

>From what I'm aware it's not a good idea to be running discard on the FS,
as it does have an impact of the delete operation, which some may even
consider an unnecessary amount of work for the SSD.

OTOH we should be running TRIM to improve write performance (and the only
reason we are running SSDs is for performance). Running it weekly seems to
be killing it also.

So what do you think will be the best way to do this?

And what about the journal? I'm using a raw partition for it, on a SSD.
Will ceph do a proper trimming of it?

On Tue, Dec 9, 2014 at 9:21 AM, Wido den Hollander  wrote:

> Hi,
>
> Last sunday I got a call early in the morning that a Ceph cluster was
> having some issues. Slow requests and OSDs marking each other down.
>
> Since this is a 100% SSD cluster I was a bit confused and started
> investigating.
>
> It took me about 15 minutes to see that fstrim was running and was
> utilizing the SSDs 100%.
>
> On Ubuntu 14.04 there is a weekly CRON which executes fstrim-all. It
> detects all mountpoints which can be trimmed and starts to trim those.
>
> On the Intel SSDs used here it caused them to become 100% busy for a
> couple of minutes. That was enough for them to no longer respond on
> heartbeats, thus timing out and being marked down.
>
> Luckily we had the "out interval" set to 1800 seconds on that cluster,
> so no OSD was marked as "out".
>
> fstrim-all does not execute fstrim with a ionice priority. From what I
> understand, but haven't tested yet, is that running fstrim with ionice
> -c Idle should solve this.
>
> It's weird that this issue didn't come up earlier on that cluster, but
> after killing fstrim all problems we resolved and the cluster ran
> happily again.
>
> So watch out for fstrim on early Sunday mornings on Ubuntu!
>
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Giant osd problems - loss of IO

2014-12-09 Thread Andrei Mikhailovsky
Following Jake's recommendation I have updated my sysctl.conf file and it seems 
to have helped with the problem of osds being marked down by other osd peers. 
It has been 3 days already. I am currently using the following settings in the 
sysctl.conf: 

# Increase Linux autotuning TCP buffer limits 
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE 
# Don't set tcp_mem itself! Let the kernel scale it based on RAM. 
net.core.rmem_max = 134217728 
net.core.wmem_max = 134217728 
net.core.rmem_default = 134217728 
net.core.wmem_default = 134217728 
net.core.optmem_max = 134217728 
net.ipv4.tcp_rmem = 4096 87380 67108864 
net.ipv4.tcp_wmem = 4096 65536 67108864 

# Make room for more TIME_WAIT sockets due to more clients, 
# and allow them to be reused if we run out of sockets 
# Also increase the max packet backlog 
net.core.somaxconn = 1024 
net.core.netdev_max_backlog = 25 
net.ipv4.tcp_max_syn_backlog = 3 
net.ipv4.tcp_max_tw_buckets = 200 
net.ipv4.tcp_tw_reuse = 1 
net.ipv4.tcp_fin_timeout = 10 

# Disable TCP slow start on idle connections 
net.ipv4.tcp_slow_start_after_idle = 0 

# If your servers talk UDP, also up these limits 
net.ipv4.udp_rmem_min = 8192 
net.ipv4.udp_wmem_min = 8192 

# Disable source routing and redirects 
net.ipv4.conf.all.send_redirects = 0 
net.ipv4.conf.all.accept_redirects = 0 
net.ipv4.conf.all.accept_source_route = 0 

# Mellanox recommended changes 
net.ipv4.tcp_timestamps = 0 
net.ipv4.tcp_sack = 1 
net.ipv4.tcp_low_latency = 1 
net.ipv4.tcp_adv_win_scale = 1 

Jake, thanks for your suggestions. 

Andrei 
- Original Message -

> From: "Jake Young" 
> To: "Andrei Mikhailovsky" ,
> ceph-users@lists.ceph.com
> Sent: Saturday, 6 December, 2014 5:02:15 PM
> Subject: Re: [ceph-users] Giant osd problems - loss of IO

> Forgot to copy the list.

> > I basically cobbled together the settings from examples on the
> > internet.
> 

> > I basically modified this sysctl.conf file with his suggestion for
> > 10gb nics
> 
> > http://www.nateware.com/linux-network-tuning-for-2013.html#.VIG_44eLTII
> 

> > I found these sites helpful as well:
> 

> > http://fasterdata.es.net/host-tuning/linux/
> 

> > This may be of interest to you, it has suggestions for your
> > Mellanox
> > hardware:
> > https://fasterdata.es.net/host-tuning/nic-tuning/mellanox-connectx-3/
> 

> > Fermilab website, link to university research papaer
> 
> > https://indico.fnal.gov/getFile.py/access?contribId=30&sessionId=19&resId=0&materialId=paper&confId=3377
> 

> > This has a great answer that explains different configurations for
> > servers vs clients. It seems to me that osds are both servers and
> > clients, so maybe some of the client tuning would benefit osds as
> > well. This is where I got the somaxconn setting from.
> 
> > http://stackoverflow.com/questions/410616/increasing-the-maximum-number-of-tcp-ip-connections-in-linux
> 

> > I forgot to mention, I'm also setting the txqueuelen for my ceph
> > public nic and ceph private nic in the /etc/rc.local file:
> 

> > /sbin/ifconfig eth0 txqueuelen 1
> 
> > /sbin/ifconfig eth1 txqueuelen 1
> 

> > I do push the same sysctl.conf and rc.local to all of my clients as
> > well. The clients are iSCSI servers which serve vmware hosts. My
> > ceph cluster is rbd only and I currently only have the iSCSI proxy
> > server clients. We'll be adding some KVM hypervisors soon, I'm
> > interested to see how they perform vs my vmware --> iSCSI Server
> > -->
> > Ceph setup.
> 

> > Regarding your sysctl.conf file:
> 

> > I've read on a few different sites that net.ipv4.tcp_mem should not
> > be tuned, since the defaults are good. I have not set it, and I
> > can't speak to the benefit/problems with setting it.
> 

> > You're configured to only use a 4MB TCP buffer, which is very
> > small.
> > It is actually smaller than the defaults for tcp_wmem, which is
> > 6MB.
> > The link above suggests up to a 128MB TCP buffer for the 40gb
> > Mellanox and/or 10gb over a WAN (not sure how to read that). I'm
> > using a 54MB buffer, but I may increase mine to 128MB to see if
> > there is any benefit. That 4MB buffer may be your problem.
> 

> > Your net.core.netdev_max_backlog is 5x bigger than mine. I think
> > I'll
> > increase my setting to 25 as well.
> 

> > Our issue looks like http://tracker.ceph.com/issues/9844 and my
> > crash
> > looks like http://tracker.ceph.com/issues/9788
> 

> > On Fri, Dec 5, 2014 at 5:35 AM, Andrei Mikhailovsky <
> > and...@arhont.com > wrote:
> 

> > > Jake,
> > 
> 

> > > very usefull indeed.
> > 
> 

> > > It looks like I had a similar problem regarding the heartbeat and
> > > as
> > > you' have mentioned, I've not seen such issues on Firefly.
> > > However,
> > > i've not seen any osd crashes.
> > 
> 

> > > Could you please let me know where you got the sysctrl.conf
> > > tunings
> > > from? Was it recommended by the network vendor?
> > 
> 

> > > Also, did you make similar sysctrl.conf changes 

Re: [ceph-users] Query about osd pool default flags & hashpspool

2014-12-09 Thread Gregory Farnum
On Tue, Dec 9, 2014 at 10:24 AM, Abhishek L
 wrote:
> Hi
>
> I was going through various conf options to customize a ceph cluster and
> came across `osd pool default flags` in pool-pg config ref[1]. Though
> the value specifies an integer, though I couldn't find a mention of
> possible values this can take in the docs. Looking a bit deeper onto
> ceph sources [2] a bunch of options are seen at osd_types.h which
> resemble
>
> FLAG_HASHPSPOOL = 1<<0, // hash pg seed and pool together (instead of adding)
> FLAG_FULL = 1<<1, // pool is full
> FLAG_DEBUG_FAKE_EC_POOL = 1<<2, // require ReplicatedPG to act like an EC pg
> FLAG_INCOMPLETE_CLONES = = 1<<3, // may have incomplete clones (bc we 
> are/were an overlay)
>
> Are these the configurable options for the osd pool flags? Also in
> particular what does the `hashpspool' option do?

Yes, I think those are your options. Don't set this value. If you want
to set the hashpspool, use the dedicated flag for that ("osd pool
default flag hashpspool = true"). It's enabled by default (so you
shouldn't change it anyway) and changes how PGs are placed in a way
that makes them more random when there are multiple pools. The other
values are for internal or debugging use only.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple MDS servers...

2014-12-09 Thread Gregory Farnum
You'll need to be a little more explicit about your question. In
general there is nothing special that needs to be done. If you're
trying to get multiple active MDSes (instead of active and
standby/standby-replay/etc) you'll need to tell the monitors to
increase the mds num (check the docs; this is not recommended right
now). You obviously need to add an MDS entry to one of your nodes
somewhere, the mechanism for which can differ based on how you're
managing your cluster.
But you don't need to do anything explicit like tell everybody
globally that there are multiple MDSes.
-Greg

On Mon, Dec 8, 2014 at 10:48 AM, JIten Shah  wrote:
> Do I need to update the ceph.conf  to support multiple MDS servers?
>
> —Jiten
>
> On Nov 24, 2014, at 6:56 AM, Gregory Farnum  wrote:
>
>> On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah  wrote:
>>> Hi Greg,
>>>
>>> I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph
>>> folders. I have always tried to set it up as mds.lab-cephmon002, so I am
>>> wondering where is it getting that value from?
>>
>> No idea, sorry. Probably some odd mismatch between expectations and
>> how the names are actually being parsed and saved.
>> -Greg
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems running ceph commands.on custom linux system

2014-12-09 Thread Jeffrey Ollie
On Tue, Dec 9, 2014 at 10:15 AM, Patrick Darley
 wrote:
>
> I'm having a problem running commands such as `ceph --help` and `ceph -s`.
> These commands output the expected information, but then they hang
> indefinitely.

If you're using Python 2.7.8 it's probably this issue:

http://tracker.ceph.com/issues/8797

-- 
Jeff Ollie
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitors repeatedly calling for new elections

2014-12-09 Thread Sanders, Bill
Apologies for replying to myself, I thought I'd add a bit more information.  We 
don't think the ceph cluster is the issue, but maybe something on the clients 
(bad configuration setting?  Bug in our older version of ceph-client?).  I've 
attached our CrushMap and OSD tree, as well.  Neither /var/log/messages nor 
dmesg show anything interesting.

Below is typical of what I'll see.  This doesn't happen every time the system 
is active, but if I perform several tests during the day I will certainly see 
this.  During the tests, we'll also often find service times pretty high:

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.0046.500.00   17.20 0.00   509.6029.63 
1.74  101.19   1.33   2.28
rbd2  0.00 0.000.000.40 0.0037.4093.50 
1.01   36.00 2500.00 100.00  <-
rbd3  0.00 0.000.000.00 0.00 0.00 0.00 
1.000.00   0.00 100.00
rbd5  0.00 0.000.000.20 0.00 5.0025.00 
0.00   18.00  18.00   0.36
rbd6  0.00 0.000.000.00 0.00 0.00 0.00 
2.000.00   0.00 100.00
rbd7  0.00 0.000.000.00 0.00 0.00 0.00 
2.000.00   0.00 100.00
rbd8  0.00 0.000.000.10 0.00 5.0050.00 
0.00   20.00  20.00   0.20
rbd10 0.00 0.000.000.00 0.00 0.00 0.00 
1.000.00   0.00 100.00
rbd12 0.00 0.000.000.00 0.00 0.00 0.00 
1.000.00   0.00 100.00
rbd13 0.00 0.000.700.7041.1018.2042.36 
0.03   24.86  24.86   3.48
rbd14 0.00 0.000.000.50 0.0030.6061.20 
1.02   48.00 2000.00 100.00  <-
rbd16 0.00 0.000.000.00 0.00 0.00 0.00 
1.000.00   0.00 100.00
rbd18 0.00 0.000.401.10 0.8041.2028.00 
1.04   29.33 666.67 100.00
rbd19 0.00 0.000.000.00 0.00 0.00 0.00 
1.000.00   0.00 100.00
rbd21 0.00 0.000.400.80 2.0028.8025.67 
2.04   31.67 833.33 100.00
rbd22 0.00 0.000.000.60 0.0030.6051.00 
0.02   32.67  32.67   1.96
rbd23 0.00 0.000.000.00 0.00 0.00 0.00 
1.000.00   0.00 100.00
rbd24 0.00 0.000.000.00 0.00 0.00 0.00 
1.000.00   0.00 100.00

And from 'ceph -w' around that same time:

__2014-12-09 11:43:07.446952 osd.14 [WRN] 22 slow requests, 
4 included below; oldest blocked for > 43.561110 secs   
   
2014-12-09 11:43:07.446958 osd.14 [WRN] slow request 30.322788 seconds old, 
received at 2014-12-09 11:42:37.124101: osd_op(client.8232.1:18380981 
rb.0.195f.238e1f29.0d62 [write 0~12288] 3.c7335bdd ondisk+write e499) 
v4 currently waiting for subops from 0,33   

  
2014-12-09 11:43:07.446966 osd.14 [WRN] slow request 30.321805 seconds old, 
received at 2014-12-09 11:42:37.125084: osd_op(client.8232.1:18380985 
rb.0.195f.238e1f29.0d62 [write 12288~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33 

   
2014-12-09 11:43:07.446972 osd.14 [WRN] slow request 30.321003 seconds old, 
received at 2014-12-09 11:42:37.125886: osd_op(client.8232.1:18380986 
rb.0.195f.238e1f29.0d62 [write 520192~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33 

  
2014-12-09 11:43:07.446977 osd.14 [WRN] slow request 30.320158 seconds old, 
received at 2014-12-09 11:42:37.126731: osd_op(client.8232.1:18380987 
rb.0.195f.238e1f29.0d62 [write 1028096~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33 

 
2014-12-09 11:43:08.447263 osd.14 [WRN] 26 slow requests, 4 included below; 
oldest blocked for > 44.561424 secs 
 
2014-12-09 11:43:08.447271 osd.14 [WRN] slow request 30.995124 seconds old, 
received at 2014-12-09 11:42:37.452079: osd_op(client.8232.1:1

Re: [ceph-users] normalizing radosgw

2014-12-09 Thread Abhishek L

Sage Weil writes:
[..]
> Thoughts?  Suggestions?
>
[..]
Suggestion:
radosgw should handle injectargs like other ceph clients do?

This is not a major annoyance, but it would be nice to have.


-- 
Abhishek


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2014-12-09 Thread Wido den Hollander
On 12/09/2014 12:12 PM, Luis Periquito wrote:
> Hi Wido,
> thanks for sharing.
> 
> fortunately I'm still running precise but planning on moving to trusty.
> 
> From what I'm aware it's not a good idea to be running discard on the FS,
> as it does have an impact of the delete operation, which some may even
> consider an unnecessary amount of work for the SSD.
> 

The 'discard' mount option is a real performance killer. You shouldn't
use that.

> OTOH we should be running TRIM to improve write performance (and the only
> reason we are running SSDs is for performance). Running it weekly seems to
> be killing it also.
> 
> So what do you think will be the best way to do this?
> 

I think that fstrim could still run if the proper ionice is used. I
haven't tested that yet, but next Sunday I'll know.

We modified the CRONs there and somebody will be on it to monitor how it
works out.

ionice -c Idle fstrim 

> And what about the journal? I'm using a raw partition for it, on a SSD.
> Will ceph do a proper trimming of it?
> 

No, Ceph will not. The best thing there is to partition just the
beginning of the brand-new SSD and leave 80%~90% unused. The Wear
Leveling algorithm inside the SSD will do the rest.

Wido

> On Tue, Dec 9, 2014 at 9:21 AM, Wido den Hollander  wrote:
> 
>> Hi,
>>
>> Last sunday I got a call early in the morning that a Ceph cluster was
>> having some issues. Slow requests and OSDs marking each other down.
>>
>> Since this is a 100% SSD cluster I was a bit confused and started
>> investigating.
>>
>> It took me about 15 minutes to see that fstrim was running and was
>> utilizing the SSDs 100%.
>>
>> On Ubuntu 14.04 there is a weekly CRON which executes fstrim-all. It
>> detects all mountpoints which can be trimmed and starts to trim those.
>>
>> On the Intel SSDs used here it caused them to become 100% busy for a
>> couple of minutes. That was enough for them to no longer respond on
>> heartbeats, thus timing out and being marked down.
>>
>> Luckily we had the "out interval" set to 1800 seconds on that cluster,
>> so no OSD was marked as "out".
>>
>> fstrim-all does not execute fstrim with a ionice priority. From what I
>> understand, but haven't tested yet, is that running fstrim with ionice
>> -c Idle should solve this.
>>
>> It's weird that this issue didn't come up earlier on that cluster, but
>> after killing fstrim all problems we resolved and the cluster ran
>> happily again.
>>
>> So watch out for fstrim on early Sunday mornings on Ubuntu!
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Query about osd pool default flags & hashpspool

2014-12-09 Thread Abhishek L
Hi

I was going through various conf options to customize a ceph cluster and
came across `osd pool default flags` in pool-pg config ref[1]. Though
the value specifies an integer, though I couldn't find a mention of
possible values this can take in the docs. Looking a bit deeper onto
ceph sources [2] a bunch of options are seen at osd_types.h which
resemble

FLAG_HASHPSPOOL = 1<<0, // hash pg seed and pool together (instead of adding)
FLAG_FULL = 1<<1, // pool is full
FLAG_DEBUG_FAKE_EC_POOL = 1<<2, // require ReplicatedPG to act like an EC pg
FLAG_INCOMPLETE_CLONES = = 1<<3, // may have incomplete clones (bc we are/were 
an overlay)

Are these the configurable options for the osd pool flags? Also in
particular what does the `hashpspool' option do? 


[1] http://ceph.com/docs/next/rados/configuration/pool-pg-config-ref/
[2] https://github.com/ceph/ceph/blob/giant/src/osd/osd_types.h#L815-820


-- 
Abhishek


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active+degraded on an empty new cluster

2014-12-09 Thread Gregory Farnum
It looks like your OSDs all have weight zero for some reason. I'd fix that.
:)
-Greg
On Tue, Dec 9, 2014 at 6:24 AM Giuseppe Civitella <
giuseppe.civite...@gmail.com> wrote:

> Hi,
>
> thanks for the quick answer.
> I did try the force_create_pg on a pg but is stuck on "creating":
> root@ceph-mon1:/home/ceph# ceph pg dump |grep creating
> dumped all in format plain
> 2.2f0   0   0   0   0   0   0   creating
>  2014-12-09 13:11:37.384808  0'0 0:0 []  -1  []
>  -1  0'0 0.000'0  0.00
>
> root@ceph-mon1:/home/ceph# ceph pg 2.2f query
> { "state": "active+degraded",
>   "epoch": 105,
>   "up": [
> 0],
>   "acting": [
> 0],
>   "actingbackfill": [
> "0"],
>   "info": { "pgid": "2.2f",
>   "last_update": "0'0",
>   "last_complete": "0'0",
>   "log_tail": "0'0",
>   "last_user_version": 0,
>   "last_backfill": "MAX",
>   "purged_snaps": "[]",
>   "last_scrub": "0'0",
>   "last_scrub_stamp": "2014-12-06 14:15:11.499769",
>   "last_deep_scrub": "0'0",
>   "last_deep_scrub_stamp": "2014-12-06 14:15:11.499769",
>   "last_clean_scrub_stamp": "0.00",
>   "log_size": 0,
>   "ondisk_log_size": 0,
>   "stats_invalid": "0",
>   "stat_sum": { "num_bytes": 0,
>   "num_objects": 0,
>   "num_object_clones": 0,
>   "num_object_copies": 0,
>   "num_objects_missing_on_primary": 0,
>   "num_objects_degraded": 0,
>   "num_objects_unfound": 0,
>   "num_objects_dirty": 0,
>   "num_whiteouts": 0,
>   "num_read": 0,
>   "num_read_kb": 0,
>   "num_write": 0,
>   "num_write_kb": 0,
>   "num_scrub_errors": 0,
>   "num_shallow_scrub_errors": 0,
>   "num_deep_scrub_errors": 0,
>   "num_objects_recovered": 0,
>   "num_bytes_recovered": 0,
>   "num_keys_recovered": 0,
>   "num_objects_omap": 0,
>   "num_objects_hit_set_archive": 0},
>   "stat_cat_sum": {},
>   "up": [
> 0],
>   "acting": [
> 0],
>   "up_primary": 0,
>   "acting_primary": 0},
>   "empty": 1,
>   "dne": 0,
>   "incomplete": 0,
>   "last_epoch_started": 104,
>   "hit_set_history": { "current_last_update": "0'0",
>   "current_last_stamp": "0.00",
>   "current_info": { "begin": "0.00",
>   "end": "0.00",
>   "version": "0'0"},
>   "history": []}},
>   "peer_info": [],
>   "recovery_state": [
> { "name": "Started\/Primary\/Active",
>   "enter_time": "2014-12-09 12:12:52.760384",
>   "might_have_unfound": [],
>   "recovery_progress": { "backfill_targets": [],
>   "waiting_on_backfill": [],
>   "last_backfill_started": "0\/\/0\/\/-1",
>   "backfill_info": { "begin": "0\/\/0\/\/-1",
>   "end": "0\/\/0\/\/-1",
>   "objects": []},
>   "peer_backfill_info": [],
>   "backfills_in_flight": [],
>   "recovering": [],
>   "pg_backend": { "pull_from_peer": [],
>   "pushing": []}},
>   "scrub": { "scrubber.epoch_start": "0",
>   "scrubber.active": 0,
>   "scrubber.block_writes": 0,
>   "scrubber.finalizing": 0,
>   "scrubber.waiting_on": 0,
>   "scrubber.waiting_on_whom": []}},
> { "name": "Started",
>   "enter_time": "2014-12-09 12:12:51.845686"}],
>   "agent_state": {}}root@ceph-mon1:/home/ceph#
>
>
>
> 2014-12-09 13:01 GMT+01:00 Irek Fasikhov :
>
>> Hi.
>>
>> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
>>
>> ceph pg force_create_pg 
>>
>>
>> 2014-12-09 14:50 GMT+03:00 Giuseppe Civitella <
>> giuseppe.civite...@gmail.com>:
>>
>>> Hi all,
>>>
>>> last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04
>>> with default kernel.
>>> There is a ceph monitor a two osd hosts. Here are some datails:
>>> ceph -s
>>> cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
>>>  health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
>>>  monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election epoch
>>> 1, quorum 0 ceph-mon1
>>>  osdmap e83: 6 osds: 6 up, 6 in
>>>   pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects
>>> 207 MB used, 30446 MB / 30653 MB avail
>>>  192 active+degraded
>>>
>>> root@ceph-mon1:/home/ceph# ceph osd dump
>>> epoch 99
>>> fsid c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
>>> created 2014-12-06 13:15:06.418843
>>> modified 2014-12-09 11:38:04.353279
>>> flags
>>> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 last_change 1

Re: [ceph-users] Monitors repeatedly calling for new elections

2014-12-09 Thread Sanders, Bill
Thanks for the response.  I did forget to mention that NTP is setup and does 
appear to be running (just double checked).

Is this good enough resolution?

$ for node in $nodes; do ssh tvsa${node} sudo date --rfc-3339=ns; done
2014-12-09 09:15:39.404292557-08:00
2014-12-09 09:15:39.521762397-08:00
2014-12-09 09:15:39.641491188-08:00
2014-12-09 09:15:39.761937524-08:00
2014-12-09 09:15:39.911416676-08:00
2014-12-09 09:15:40.029777457-08:00

Bill

From: Rodrigo Severo [rodr...@fabricadeideias.com]
Sent: Tuesday, December 09, 2014 4:02 AM
To: Sanders, Bill
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors repeatedly calling for new elections

On Mon, Dec 8, 2014 at 5:23 PM, Sanders, Bill  wrote:

> Under activity, we'll get monitors going into election cycles repeatedly,
> OSD's being "wrongly marked down", as well as slow requests "osd.11
> 39.7.48.6:6833/21938 failed (3 reports from 1 peers after 52.914693 >= grace
> 20.00)" .  During this, ceph -w shows the cluster essentially idle.
> None of the network, disks, or cpu's ever appear to max out.  It also
> doesn't appear to be the same OSD's, MON's, or node causing the problem.
> Top reports all 128 GB RAM (negligible swap) in use on the storage nodes.
> Only Ceph is running on the storage nodes.


I'm really new to Ceph but my first bet is that your computers aren't
clock synchronized. Are all of them with working ntpds?


Regards,

Rodrigo Severo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problems running ceph commands.on custom linux system

2014-12-09 Thread Patrick Darley

Hi,

I'm having a problem running commands such as `ceph --help` and `ceph 
-s`.
These commands output the expected information, but then they hang 
indefinitely.


Using strace I have found that the system seems to get stuck with futex 
operations
running and timing out repeatedly. However I'm unsure why these futex 
things appear

as I haven't had this problem before.

The output of strace is in this pastebin: http://pastebin.com/AZLA2w39

Has anyone found this problem at any point? Or does anyone have any 
ideas about what might be going on?



I'm using ceph v0.88 and built it from source on a custom linux system.


Thanks,

Patrick
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] active+degraded on an empty new cluster

2014-12-09 Thread Giuseppe Civitella
Hi,

thanks for the quick answer.
I did try the force_create_pg on a pg but is stuck on "creating":
root@ceph-mon1:/home/ceph# ceph pg dump |grep creating
dumped all in format plain
2.2f0   0   0   0   0   0   0   creating
 2014-12-09 13:11:37.384808  0'0 0:0 []  -1  []
 -1  0'0 0.000'0  0.00

root@ceph-mon1:/home/ceph# ceph pg 2.2f query
{ "state": "active+degraded",
  "epoch": 105,
  "up": [
0],
  "acting": [
0],
  "actingbackfill": [
"0"],
  "info": { "pgid": "2.2f",
  "last_update": "0'0",
  "last_complete": "0'0",
  "log_tail": "0'0",
  "last_user_version": 0,
  "last_backfill": "MAX",
  "purged_snaps": "[]",
  "last_scrub": "0'0",
  "last_scrub_stamp": "2014-12-06 14:15:11.499769",
  "last_deep_scrub": "0'0",
  "last_deep_scrub_stamp": "2014-12-06 14:15:11.499769",
  "last_clean_scrub_stamp": "0.00",
  "log_size": 0,
  "ondisk_log_size": 0,
  "stats_invalid": "0",
  "stat_sum": { "num_bytes": 0,
  "num_objects": 0,
  "num_object_clones": 0,
  "num_object_copies": 0,
  "num_objects_missing_on_primary": 0,
  "num_objects_degraded": 0,
  "num_objects_unfound": 0,
  "num_objects_dirty": 0,
  "num_whiteouts": 0,
  "num_read": 0,
  "num_read_kb": 0,
  "num_write": 0,
  "num_write_kb": 0,
  "num_scrub_errors": 0,
  "num_shallow_scrub_errors": 0,
  "num_deep_scrub_errors": 0,
  "num_objects_recovered": 0,
  "num_bytes_recovered": 0,
  "num_keys_recovered": 0,
  "num_objects_omap": 0,
  "num_objects_hit_set_archive": 0},
  "stat_cat_sum": {},
  "up": [
0],
  "acting": [
0],
  "up_primary": 0,
  "acting_primary": 0},
  "empty": 1,
  "dne": 0,
  "incomplete": 0,
  "last_epoch_started": 104,
  "hit_set_history": { "current_last_update": "0'0",
  "current_last_stamp": "0.00",
  "current_info": { "begin": "0.00",
  "end": "0.00",
  "version": "0'0"},
  "history": []}},
  "peer_info": [],
  "recovery_state": [
{ "name": "Started\/Primary\/Active",
  "enter_time": "2014-12-09 12:12:52.760384",
  "might_have_unfound": [],
  "recovery_progress": { "backfill_targets": [],
  "waiting_on_backfill": [],
  "last_backfill_started": "0\/\/0\/\/-1",
  "backfill_info": { "begin": "0\/\/0\/\/-1",
  "end": "0\/\/0\/\/-1",
  "objects": []},
  "peer_backfill_info": [],
  "backfills_in_flight": [],
  "recovering": [],
  "pg_backend": { "pull_from_peer": [],
  "pushing": []}},
  "scrub": { "scrubber.epoch_start": "0",
  "scrubber.active": 0,
  "scrubber.block_writes": 0,
  "scrubber.finalizing": 0,
  "scrubber.waiting_on": 0,
  "scrubber.waiting_on_whom": []}},
{ "name": "Started",
  "enter_time": "2014-12-09 12:12:51.845686"}],
  "agent_state": {}}root@ceph-mon1:/home/ceph#



2014-12-09 13:01 GMT+01:00 Irek Fasikhov :

> Hi.
>
> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
>
> ceph pg force_create_pg 
>
>
> 2014-12-09 14:50 GMT+03:00 Giuseppe Civitella <
> giuseppe.civite...@gmail.com>:
>
>> Hi all,
>>
>> last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04
>> with default kernel.
>> There is a ceph monitor a two osd hosts. Here are some datails:
>> ceph -s
>> cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
>>  health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
>>  monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election epoch
>> 1, quorum 0 ceph-mon1
>>  osdmap e83: 6 osds: 6 up, 6 in
>>   pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects
>> 207 MB used, 30446 MB / 30653 MB avail
>>  192 active+degraded
>>
>> root@ceph-mon1:/home/ceph# ceph osd dump
>> epoch 99
>> fsid c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
>> created 2014-12-06 13:15:06.418843
>> modified 2014-12-09 11:38:04.353279
>> flags
>> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 18 flags hashpspool
>> crash_replay_interval 45 stripe_width 0
>> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 64 pgp_num 64 last_change 19 flags hashpspool
>> stripe_width 0
>> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 20 flags hashpspool stripe_width 0
>> max_osd 6
>> osd.0 up   in 

Re: [ceph-users] Monitors repeatedly calling for new elections

2014-12-09 Thread Rodrigo Severo
On Mon, Dec 8, 2014 at 5:23 PM, Sanders, Bill  wrote:

> Under activity, we'll get monitors going into election cycles repeatedly,
> OSD's being "wrongly marked down", as well as slow requests "osd.11
> 39.7.48.6:6833/21938 failed (3 reports from 1 peers after 52.914693 >= grace
> 20.00)" .  During this, ceph -w shows the cluster essentially idle.
> None of the network, disks, or cpu's ever appear to max out.  It also
> doesn't appear to be the same OSD's, MON's, or node causing the problem.
> Top reports all 128 GB RAM (negligible swap) in use on the storage nodes.
> Only Ceph is running on the storage nodes.


I'm really new to Ceph but my first bet is that your computers aren't
clock synchronized. Are all of them with working ntpds?


Regards,

Rodrigo Severo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] active+degraded on an empty new cluster

2014-12-09 Thread Giuseppe Civitella
Hi all,

last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04 with
default kernel.
There is a ceph monitor a two osd hosts. Here are some datails:
ceph -s
cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
 health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
 monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election epoch 1,
quorum 0 ceph-mon1
 osdmap e83: 6 osds: 6 up, 6 in
  pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects
207 MB used, 30446 MB / 30653 MB avail
 192 active+degraded

root@ceph-mon1:/home/ceph# ceph osd dump
epoch 99
fsid c46d5b02-dab1-40bf-8a3d-f8e4a77b79da
created 2014-12-06 13:15:06.418843
modified 2014-12-09 11:38:04.353279
flags
pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 18 flags hashpspool
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 19 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 20 flags hashpspool stripe_width 0
max_osd 6
osd.0 up   in  weight 1 up_from 90 up_thru 90 down_at 89
last_clean_interval [58,89) 10.1.1.84:6805/995 10.1.1.84:6806/4000995
10.1.1.84:6807/4000995 10.1.1.84:6808/4000995 exists,up
e3895075-614d-48e2-b956-96e13dbd87fe
osd.1 up   in  weight 1 up_from 88 up_thru 0 down_at 87 last_clean_interval
[8,87) 10.1.1.85:6800/23146 10.1.1.85:6815/7023146 10.1.1.85:6816/7023146
10.1.1.85:6817/7023146 exists,up 144bc6ee-2e3d-4118-a460-8cc2bb3ec3e8
osd.2 up   in  weight 1 up_from 61 up_thru 0 down_at 60 last_clean_interval
[11,60) 10.1.1.85:6805/26784 10.1.1.85:6802/5026784 10.1.1.85:6811/5026784
10.1.1.85:6812/5026784 exists,up 8d5c7108-ef11-4947-b28c-8e20371d6d78
osd.3 up   in  weight 1 up_from 95 up_thru 0 down_at 94 last_clean_interval
[57,94) 10.1.1.84:6800/810 10.1.1.84:6810/3000810 10.1.1.84:6811/3000810
10.1.1.84:6812/3000810 exists,up bd762b2d-f94c-4879-8865-cecd63895557
osd.4 up   in  weight 1 up_from 97 up_thru 0 down_at 96 last_clean_interval
[74,96) 10.1.1.84:6801/9304 10.1.1.84:6802/2009304 10.1.1.84:6803/2009304
10.1.1.84:6813/2009304 exists,up 7d28a54b-b474-4369-b958-9e6bf6c856aa
osd.5 up   in  weight 1 up_from 99 up_thru 0 down_at 98 last_clean_interval
[79,98) 10.1.1.85:6801/19513 10.1.1.85:6808/2019513 10.1.1.85:6810/2019513
10.1.1.85:6813/2019513 exists,up f4d76875-0e40-487c-a26d-320f8b8d60c5

root@ceph-mon1:/home/ceph# ceph osd tree
# idweight  type name   up/down reweight
-1  0   root default
-2  0   host ceph-osd1
0   0   osd.0   up  1
3   0   osd.3   up  1
4   0   osd.4   up  1
-3  0   host ceph-osd2
1   0   osd.1   up  1
2   0   osd.2   up  1
5   0   osd.5   up  1

Current HEALTH_WARN state says "192 active+degraded" since I rebooted an
osd host. Previously it was "incomplete". It never reached a HEALTH_OK
state.
Any hint about what to do next to have an healthy cluster?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2014-12-09 Thread Sebastien Han
Good to know. Thanks for sharing!

> On 09 Dec 2014, at 10:21, Wido den Hollander  wrote:
> 
> Hi,
> 
> Last sunday I got a call early in the morning that a Ceph cluster was
> having some issues. Slow requests and OSDs marking each other down.
> 
> Since this is a 100% SSD cluster I was a bit confused and started
> investigating.
> 
> It took me about 15 minutes to see that fstrim was running and was
> utilizing the SSDs 100%.
> 
> On Ubuntu 14.04 there is a weekly CRON which executes fstrim-all. It
> detects all mountpoints which can be trimmed and starts to trim those.
> 
> On the Intel SSDs used here it caused them to become 100% busy for a
> couple of minutes. That was enough for them to no longer respond on
> heartbeats, thus timing out and being marked down.
> 
> Luckily we had the "out interval" set to 1800 seconds on that cluster,
> so no OSD was marked as "out".
> 
> fstrim-all does not execute fstrim with a ionice priority. From what I
> understand, but haven't tested yet, is that running fstrim with ionice
> -c Idle should solve this.
> 
> It's weird that this issue didn't come up earlier on that cluster, but
> after killing fstrim all problems we resolved and the cluster ran
> happily again.
> 
> So watch out for fstrim on early Sunday mornings on Ubuntu!
> 
> --
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Cheers.

Sébastien Han
Cloud Architect

"Always give 100%. Unless you're giving blood."

Phone: +33 (0)1 49 70 99 72
Mail: sebastien@enovance.com
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Watch for fstrim running on your Ubuntu systems

2014-12-09 Thread Wido den Hollander
Hi,

Last sunday I got a call early in the morning that a Ceph cluster was
having some issues. Slow requests and OSDs marking each other down.

Since this is a 100% SSD cluster I was a bit confused and started
investigating.

It took me about 15 minutes to see that fstrim was running and was
utilizing the SSDs 100%.

On Ubuntu 14.04 there is a weekly CRON which executes fstrim-all. It
detects all mountpoints which can be trimmed and starts to trim those.

On the Intel SSDs used here it caused them to become 100% busy for a
couple of minutes. That was enough for them to no longer respond on
heartbeats, thus timing out and being marked down.

Luckily we had the "out interval" set to 1800 seconds on that cluster,
so no OSD was marked as "out".

fstrim-all does not execute fstrim with a ionice priority. From what I
understand, but haven't tested yet, is that running fstrim with ionice
-c Idle should solve this.

It's weird that this issue didn't come up earlier on that cluster, but
after killing fstrim all problems we resolved and the cluster ran
happily again.

So watch out for fstrim on early Sunday mornings on Ubuntu!

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexplainable slow request

2014-12-09 Thread Christian Balzer

Hello,

On Mon, 8 Dec 2014 19:51:00 -0800 Gregory Farnum wrote:

> On Mon, Dec 8, 2014 at 6:39 PM, Christian Balzer  wrote:
> >
> > Hello,
> >
> > Debian Jessie cluster, thus kernel 3.16, ceph 0.80.7.
> > 3 storage nodes with 8 OSDs (journals on 4 SSDs) each, 3 mons.
> > 2 compute nodes, everything connected via Infiniband.
> >
> > This is pre-production, currently there are only 3 VMs and 2 of them
> > were idle at the time. The non-idle one was having 600GB of maildirs
> > copied onto it, which stresses things but not Ceph as those millions
> > of small files coalesce nicely and result in rather few Ceph ops.
> >
> > A couple of hours into that copy marathon (the source FS and machine
> > are slow and rsync isn't particular speedy with this kind of operation
> > either) this happened:
> > ---
> > 2014-12-06 19:20:57.023974 osd.23 10.0.8.23:6815/3552 77 : [WRN] slow
> > request 30 .673939 seconds old, received at 2014-12-06
> > 19:20:26.346746: osd_op(client.33776 .0:743596
> > rb.0.819b.238e1f29.0003f52f [set-alloc-hint object_size 4194304 wr
> > ite_size 4194304,write 1748992~4096] 3.efa97e35 ack+ondisk+write e380)
> > v4 curren tly waiting for subops from 4,8 2014-12-06 19:20:57.023991
> > osd.23 10.0.8.23:6815/3552 78 : [WRN] slow request 30 .673886 seconds
> > old, received at 2014-12-06 19:20:26.346799:
> > osd_op(client.33776 .0:743597 rb.0.819b.238e1f29.0003f52f
> > [set-alloc-hint object_size 4194304 wr ite_size 4194304,write
> > 1945600~4096] 3.efa97e35 ack+ondisk+write e380) v4 curren tly waiting
> > for subops from 4,8 2014-12-06 19:20:57.323976 osd.1
> > 10.0.8.21:6815/4868 123 : [WRN] slow request 30 .910821 seconds old,
> > received at 2014-12-06 19:20:26.413051: osd_op(client.33776 .0:743604
> > rb.0.819b.238e1f29.0003e628 [set-alloc-hint object_size 4194304 wr
> > ite_size 4194304,write 1794048~1835008] 3.5e76b8ba ack+ondisk+write
> > e380) v4 cur rently waiting for subops from 8,17 ---
> >
> > There were a few more later, but they all involved OSD 8 as common
> > factor.
> >
> > Alas there's nothing in the osd-8.log indicating why:
> > ---
> > 2014-12-06 19:13:13.933636 7fce85552700  0 -- 10.0.8.22:6835/5389 >>
> > 10.0.8.6:0/ 716350435 pipe(0x7fcec3c25900 sd=23 :6835 s=0 pgs=0 cs=0
> > l=0 c=0x7fcebfad03c0).a ccept peer addr is really 10.0.8.6:0/716350435
> > (socket is 10.0.8.6:50592/0) 2014-12-06 19:20:56.595773 7fceac82f700
> > 0 log [WRN] : 3 slow requests, 3 included below; oldest blocked for >
> > 30.241397 secs 2014-12-06 19:20:56.595796 7fceac82f700  0 log [WRN] :
> > slow request 30.241397 seconds old, received at 2014-12-06
> > 19:20:26.354247: osd_sub_op(client.33776.0:743596 3.235
> > efa97e35/rb.0.819b.238e1f29.0003f52f/head//3 [] v 380'3783
> > snapset=0=[]:[] snapc=0=[]) v11 currently started 2014-12-06
> > 19:20:56.595825 7fceac82f700  0 log [WRN] : slow request 30.240286
> > seconds old, received at 2014-12-06 19:20:26.355358:
> > osd_sub_op(client.33776.0:743597 3.235
> > efa97e35/rb.0.819b.238e1f29.0003f52f/head//3 [] v 380'3784
> > snapset=0=[]:[] snapc=0=[]) v11 currently started 2014-12-06
> > 19:20:56.595837 7fceac82f700  0 log [WRN] : slow request 30.177186
> > seconds old, received at 2014-12-06 19:20:26.418458:
> > osd_sub_op(client.33776.0:743604 3.ba
> > 5e76b8ba/rb.0.819b.238e1f29.0003e628/head//3 [] v 380'6439
> > snapset=0=[]:[] snapc=0=[]) v11 currently started 
> 
> That these are started and nothing else suggests that they're probably
> waiting for one of the throttles to let them in, rather than
> themselves being particularly slow.
>

If this was indeed caused by one of the (rather numerous) throttles,
wouldn't it be a good idea to log that fact? 
A slow disk is one thing, Ceph permanently seizing up because something
exceeded a threshold sounds noteworthy to me.
 
> >
> > The HDDs and SSDs are new, there's nothing in the pertinent logs or
> > smart that indicates any problem with that HDD or its journal SSD, nor
> > the system in general.
> > This problem persisted (and the VM remained stuck) until OSD 8 was
> > restarted the next day when I discovered this.
> >
> > I suppose this is another "this can't/shouldn't happen" case, but I'd
> > be delighted about any suggestions as to what happened here, potential
> > prevention measures and any insights on how to maybe coax more
> > information out of Ceph if this happens again.
> 
> Nah, there are a million reasons stuff can be slow. 
This wasn't just slow. Those requests never completed even after half a
day had passed with the system and disks being basically idle.

> It might just be a
> transient overload of the disk compared to the others. 

Transient would be fine (though highly unlikely in this scenario), however
it never recovered, see above.


> If you see this
> again while it's happening I'd check the perfcounters; if you're
> keeping historical checks of them go look at the blocked-up times and
> see if any of them are at or near their maximum values.
> -Gre
> 
Ah

Re: [ceph-users] experimental features

2014-12-09 Thread Christian Balzer
On Mon, 08 Dec 2014 08:33:25 -0600 Mark Nelson wrote:

> I've been thinking for a while that we need another more general command 
> than Ceph health to more generally inform you about your cluster.  IE I 
> personally don't like having min/max PG warnings in Ceph health (they 
> can be independently controlled by ceph.conf options but that kind of 
> approach won't scale). I'd like another command that I can run that 
> tells me about this kind of thing.  Same thing with experimental 
> features.  I don't want ceph health warning me if they've been enabled, 
> but I do want to know if they've ever been enabled, when, and whether 
> they are still in effect.
> 

Very much agreed.

Expanding on this, setting a cluster to no-scrub will result in a warning
(very arguably) and while a slow request after 30 seconds is WRN worthy,
after something like a minute it ought to be ERR level as this is likely
to have massive impact on clients.

Christian

> Mark
> 
> On 12/08/2014 06:57 AM, Fred Yang wrote:
> > You will have to consider in the real world whoever built the cluster
> > might not document the dangerous option to let support stuff or
> > successor aware. Thus any experimental feature considered not safe for
> > production should be included in a warning message in 'ceph health',
> > and logs, either log it periodically or log the warning msg upon
> > restart. Feature-wise, 'ceph health detail' should give you a report
> > over all important features/options of the cluster as well.
> >
> > -Fred
> >
> > On Sun, Dec 7, 2014, 11:15 PM Justin Erenkrantz  > > wrote:
> >
> > On Fri, Dec 5, 2014 at 12:46 PM, Mark Nelson
> > mailto:mark.nel...@inktank.com>> wrote:
> >  > I'm in favor of the "allow experimental features" but instead
> > call it:
> >  >
> >  > "ALLOW UNRECOVERABLE DATA CORRUPTING FEATURES" which makes
> >  > things
> > a little
> >  > more explicit. With great power comes great responsibility.
> >
> > +1.
> >
> > For Subversion, we utilize SVN_I_LOVE_CORRUPTED_XXX for a few
> > options that can cause data corruption.  -- justin
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > ceph-devel" in the body of a message to majord...@vger.kernel.org
> > 
> > More majordomo info at http://vger.kernel.org/__majordomo-info.html
> > 
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexplainable slow request

2014-12-09 Thread Christian Balzer
On Mon, 8 Dec 2014 20:36:17 -0800 Gregory Farnum wrote:

> They never fixed themselves? 
As I wrote, it took a restart of OSD 8 to resolve this on the next day.

> Did the reported times ever increase?
Indeed, the last before the reboot was:
---
2014-12-07 13:12:42.933396 7fceac82f700  0 log [WRN] : 14 slow requests, 5 
included below; oldest blocked for > 64336.578995 secs
---

All IOPS hitting that osd.8 (eventually the other VM did as well during a
log write I suppose) were blocked.

> If not I think that's just a reporting bug which is fixed in an
> unreleased branch, but I'd have to check the tracker to be sure.
> 
> On Mon, Dec 8, 2014 at 8:23 PM, Christian Balzer  wrote:
> >
> > Hello,
> >
> > On Mon, 8 Dec 2014 19:51:00 -0800 Gregory Farnum wrote:
> >
> >> On Mon, Dec 8, 2014 at 6:39 PM, Christian Balzer 
> >> wrote:
> >> >
> >> > Hello,
> >> >
> >> > Debian Jessie cluster, thus kernel 3.16, ceph 0.80.7.
> >> > 3 storage nodes with 8 OSDs (journals on 4 SSDs) each, 3 mons.
> >> > 2 compute nodes, everything connected via Infiniband.
> >> >
> >> > This is pre-production, currently there are only 3 VMs and 2 of them
> >> > were idle at the time. The non-idle one was having 600GB of maildirs
> >> > copied onto it, which stresses things but not Ceph as those millions
> >> > of small files coalesce nicely and result in rather few Ceph ops.
> >> >
> >> > A couple of hours into that copy marathon (the source FS and machine
> >> > are slow and rsync isn't particular speedy with this kind of
> >> > operation either) this happened:
> >> > ---
> >> > 2014-12-06 19:20:57.023974 osd.23 10.0.8.23:6815/3552 77 : [WRN]
> >> > slow request 30 .673939 seconds old, received at 2014-12-06
> >> > 19:20:26.346746: osd_op(client.33776 .0:743596
> >> > rb.0.819b.238e1f29.0003f52f [set-alloc-hint object_size 4194304
> >> > wr ite_size 4194304,write 1748992~4096] 3.efa97e35 ack+ondisk+write
> >> > e380) v4 curren tly waiting for subops from 4,8 2014-12-06
> >> > 19:20:57.023991 osd.23 10.0.8.23:6815/3552 78 : [WRN] slow request
> >> > 30 .673886 seconds old, received at 2014-12-06 19:20:26.346799:
> >> > osd_op(client.33776 .0:743597 rb.0.819b.238e1f29.0003f52f
> >> > [set-alloc-hint object_size 4194304 wr ite_size 4194304,write
> >> > 1945600~4096] 3.efa97e35 ack+ondisk+write e380) v4 curren tly
> >> > waiting for subops from 4,8 2014-12-06 19:20:57.323976 osd.1
> >> > 10.0.8.21:6815/4868 123 : [WRN] slow request 30 .910821 seconds old,
> >> > received at 2014-12-06 19:20:26.413051:
> >> > osd_op(client.33776 .0:743604 rb.0.819b.238e1f29.0003e628
> >> > [set-alloc-hint object_size 4194304 wr ite_size 4194304,write
> >> > 1794048~1835008] 3.5e76b8ba ack+ondisk+write e380) v4 cur rently
> >> > waiting for subops from 8,17 ---
> >> >
> >> > There were a few more later, but they all involved OSD 8 as common
> >> > factor.
> >> >
> >> > Alas there's nothing in the osd-8.log indicating why:
> >> > ---
> >> > 2014-12-06 19:13:13.933636 7fce85552700  0 -- 10.0.8.22:6835/5389 >>
> >> > 10.0.8.6:0/ 716350435 pipe(0x7fcec3c25900 sd=23 :6835 s=0 pgs=0 cs=0
> >> > l=0 c=0x7fcebfad03c0).a ccept peer addr is really
> >> > 10.0.8.6:0/716350435 (socket is 10.0.8.6:50592/0) 2014-12-06
> >> > 19:20:56.595773 7fceac82f700 0 log [WRN] : 3 slow requests, 3
> >> > included below; oldest blocked for > 30.241397 secs 2014-12-06
> >> > 19:20:56.595796 7fceac82f700  0 log [WRN] : slow request 30.241397
> >> > seconds old, received at 2014-12-06 19:20:26.354247:
> >> > osd_sub_op(client.33776.0:743596 3.235
> >> > efa97e35/rb.0.819b.238e1f29.0003f52f/head//3 [] v 380'3783
> >> > snapset=0=[]:[] snapc=0=[]) v11 currently started 2014-12-06
> >> > 19:20:56.595825 7fceac82f700  0 log [WRN] : slow request 30.240286
> >> > seconds old, received at 2014-12-06 19:20:26.355358:
> >> > osd_sub_op(client.33776.0:743597 3.235
> >> > efa97e35/rb.0.819b.238e1f29.0003f52f/head//3 [] v 380'3784
> >> > snapset=0=[]:[] snapc=0=[]) v11 currently started 2014-12-06
> >> > 19:20:56.595837 7fceac82f700  0 log [WRN] : slow request 30.177186
> >> > seconds old, received at 2014-12-06 19:20:26.418458:
> >> > osd_sub_op(client.33776.0:743604 3.ba
> >> > 5e76b8ba/rb.0.819b.238e1f29.0003e628/head//3 [] v 380'6439
> >> > snapset=0=[]:[] snapc=0=[]) v11 currently started 
> >>
> >> That these are started and nothing else suggests that they're probably
> >> waiting for one of the throttles to let them in, rather than
> >> themselves being particularly slow.
> >>
> >
> > If this was indeed caused by one of the (rather numerous) throttles,
> > wouldn't it be a good idea to log that fact?
> > A slow disk is one thing, Ceph permanently seizing up because something
> > exceeded a threshold sounds noteworthy to me.
> 
> If it permanently seized up then this is not what happened; 

So am I looking at an unknown bug then?

> if the
> reporting just didn't go away then I'm not sure it's appropriate to
> log every time a throttle gets hit (some of them are suppo

Re: [ceph-users] Unexplainable slow request

2014-12-09 Thread Gregory Farnum
On Mon, Dec 8, 2014 at 6:39 PM, Christian Balzer  wrote:
>
> Hello,
>
> Debian Jessie cluster, thus kernel 3.16, ceph 0.80.7.
> 3 storage nodes with 8 OSDs (journals on 4 SSDs) each, 3 mons.
> 2 compute nodes, everything connected via Infiniband.
>
> This is pre-production, currently there are only 3 VMs and 2 of them were
> idle at the time. The non-idle one was having 600GB of maildirs copied
> onto it, which stresses things but not Ceph as those millions of small
> files coalesce nicely and result in rather few Ceph ops.
>
> A couple of hours into that copy marathon (the source FS and machine are
> slow and rsync isn't particular speedy with this kind of operation either)
> this happened:
> ---
> 2014-12-06 19:20:57.023974 osd.23 10.0.8.23:6815/3552 77 : [WRN] slow request 
> 30
> .673939 seconds old, received at 2014-12-06 19:20:26.346746: 
> osd_op(client.33776
> .0:743596 rb.0.819b.238e1f29.0003f52f [set-alloc-hint object_size 4194304 
> wr
> ite_size 4194304,write 1748992~4096] 3.efa97e35 ack+ondisk+write e380) v4 
> curren
> tly waiting for subops from 4,8
> 2014-12-06 19:20:57.023991 osd.23 10.0.8.23:6815/3552 78 : [WRN] slow request 
> 30
> .673886 seconds old, received at 2014-12-06 19:20:26.346799: 
> osd_op(client.33776
> .0:743597 rb.0.819b.238e1f29.0003f52f [set-alloc-hint object_size 4194304 
> wr
> ite_size 4194304,write 1945600~4096] 3.efa97e35 ack+ondisk+write e380) v4 
> curren
> tly waiting for subops from 4,8
> 2014-12-06 19:20:57.323976 osd.1 10.0.8.21:6815/4868 123 : [WRN] slow request 
> 30
> .910821 seconds old, received at 2014-12-06 19:20:26.413051: 
> osd_op(client.33776
> .0:743604 rb.0.819b.238e1f29.0003e628 [set-alloc-hint object_size 4194304 
> wr
> ite_size 4194304,write 1794048~1835008] 3.5e76b8ba ack+ondisk+write e380) v4 
> cur
> rently waiting for subops from 8,17
> ---
>
> There were a few more later, but they all involved OSD 8 as common factor.
>
> Alas there's nothing in the osd-8.log indicating why:
> ---
> 2014-12-06 19:13:13.933636 7fce85552700  0 -- 10.0.8.22:6835/5389 >> 
> 10.0.8.6:0/
> 716350435 pipe(0x7fcec3c25900 sd=23 :6835 s=0 pgs=0 cs=0 l=0 
> c=0x7fcebfad03c0).a
> ccept peer addr is really 10.0.8.6:0/716350435 (socket is 10.0.8.6:50592/0)
> 2014-12-06 19:20:56.595773 7fceac82f700  0 log [WRN] : 3 slow requests, 3 
> included below; oldest blocked for > 30.241397 secs
> 2014-12-06 19:20:56.595796 7fceac82f700  0 log [WRN] : slow request 30.241397 
> seconds old, received at 2014-12-06 19:20:26.354247: 
> osd_sub_op(client.33776.0:743596 3.235 
> efa97e35/rb.0.819b.238e1f29.0003f52f/head//3 [] v 380'3783 
> snapset=0=[]:[] snapc=0=[]) v11 currently started
> 2014-12-06 19:20:56.595825 7fceac82f700  0 log [WRN] : slow request 30.240286 
> seconds old, received at 2014-12-06 19:20:26.355358: 
> osd_sub_op(client.33776.0:743597 3.235 
> efa97e35/rb.0.819b.238e1f29.0003f52f/head//3 [] v 380'3784 
> snapset=0=[]:[] snapc=0=[]) v11 currently started
> 2014-12-06 19:20:56.595837 7fceac82f700  0 log [WRN] : slow request 30.177186 
> seconds old, received at 2014-12-06 19:20:26.418458: 
> osd_sub_op(client.33776.0:743604 3.ba 
> 5e76b8ba/rb.0.819b.238e1f29.0003e628/head//3 [] v 380'6439 
> snapset=0=[]:[] snapc=0=[]) v11 currently started
> 

That these are started and nothing else suggests that they're probably
waiting for one of the throttles to let them in, rather than
themselves being particularly slow.

>
> The HDDs and SSDs are new, there's nothing in the pertinent logs or smart
> that indicates any problem with that HDD or its journal SSD, nor the
> system in general.
> This problem persisted (and the VM remained stuck) until OSD 8 was
> restarted the next day when I discovered this.
>
> I suppose this is another "this can't/shouldn't happen" case, but I'd be
> delighted about any suggestions as to what happened here, potential
> prevention measures and any insights on how to maybe coax more information
> out of Ceph if this happens again.

Nah, there are a million reasons stuff can be slow. It might just be a
transient overload of the disk compared to the others. If you see this
again while it's happening I'd check the perfcounters; if you're
keeping historical checks of them go look at the blocked-up times and
see if any of them are at or near their maximum values.
-Gre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com