Re: lltng enabled by default and qemu apparmor|selinux problems
IF I can add my $0.02 - we were unable to use the libradosstriper library in RHEL6 because it uses the same initialisation tags as librados and lttng does not like that. We had no problems with RHEL7 version of ceph because lttng is not enabled. Please do not re-enable lttng in RHEL7 and later branches…. Regards Paul On 11/10/2015 18:06, "ceph-devel-ow...@vger.kernel.org on behalf of Alexandre DERUMIER"wrote: >Hi, > >it seem that since this commit > >https://github.com/ceph/ceph/pull/4261/files > >lltng is enabled by default. > >But this give error with qemu when apparmor|selinux is enabled. > >That's why ubuntu && redhat now disable it for their own packages. > >https://bugzilla.redhat.com/show_bug.cgi?id=1223319 >https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1432644 > >In the ubuntu launchpad, Sage has made a reply > >" >Sage Weil (sage-newdream) wrote on 2015-04-02: #21 >FWIW, we are disabling the lttng support in the final hammer release to avoid >this issue (until we come up with a better solution)." > > >It seem that it's still enabled by default in ceph git and ceph.com packages. > >Is it still planned to disable by default ? > > > >-- >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html N�r��yb�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj"��!�i
Fwd: [newstore (again)] how disable double write WAL
Hello everybody, fragment is stored in rocksdb before being written to "/fragments" ? I separed "/db" and "/fragments" but during the bench, everything is writing to "/db" I changed options "newstore_sync_*" without success. Is there any way to write all metadata in "/db" and all data in "/fragments" ? -- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: [newstore (again)] how disable double write WAL
On Mon, 12 Oct 2015, David Casier wrote: > Hello everybody, > fragment is stored in rocksdb before being written to "/fragments" ? > I separed "/db" and "/fragments" but during the bench, everything is writing > to "/db" > I changed options "newstore_sync_*" without success. > > Is there any way to write all metadata in "/db" and all data in "/fragments" ? You can set newstore_overlay_max = 0 to avoid most data landing in db/. But if you are overwriting an existing object, doing write-ahead logging is usually unavoidable because we need to make the update atomic (and the underlying posix fs doesn't provide that). The wip-newstore-frags branch mitigates this somewhat for larger writes by limiting fragment size, but for small IOs this is pretty much always going to be the case. For small IOs, though, putting things in db/ is generally better since we can combine many small ios into a single (rocksdb) journal/wal write. And often leave them there (via the 'overlay' behavior). sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reply: [PATCH] rbd: prevent kernel stack blow up on rbd map
On Mon, Oct 12, 2015 at 4:22 AM, Caoxudongwrote: > By the way, do you think it's necessary that we add the clone-chain-length > limit in user-space code too? librbd is different in a lot of ways and there isn't a clean separation between the client part (i.e. what is essentially reimplemented in the kernel) and the rest (management and maintenance parts, etc). It's certainly not necessary, whether it's desirable - I'm not sure. Also, librbd limit, if we were to introduce one, would probably have to be bigger. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re:Re: The questions of data collection and cache tiering in Ceph
Greg, Thank you a lot for your timely reply. These are really helpful for me.I also have some doubts. In Ceph, besides monitoring pool, pg, object, it can also acquire other statistics such as CPU, IOPS, BW. In order to acquire the information , do the Ceph need to call other tools or have achieved functions in the source code because I could only find easy equation (like the division ) in source code? As a object , there are two parts : data and attributes. Do they store in different spaces finally because I find these is some attributes information in OMap? In your reply, you said ,”any subsequent operations on that object will wait until that durable op is readable before completing.” so if I set the system flushes the objects from journal to disk every 15s, does it mean I could not read the object in 15s because I only write the object on the journal but not yet on the disk? Is it possible to cause some problems? Thank you so much. Yours, Chay 在 2015-10-09 02:34:25,"Gregory Farnum"写道: >On Thu, Oct 8, 2015 at 9:09 AM, 蔡毅 wrote: >> >> Dear developers, >> >>Recently I met some troubles when I read the Ceph’s source code and >> understand the architecture. >> The details of problems are as followed. >> >>1.In monitoring tools, they can collect much data when Ceph runs. I >> wonder what >> kind of data the Ceph can provide (object data, PG data or other data?). >> Could the >> Ceph provide every object’s data (e.g. The times the object is read or wrote >> ,the latest time the object is used ,etc.) ,if Ceph could ,in source code >> ,where >> could I find these details. I really want to know the monitoring data the >> Ceph >> can provides and where they are in source code so that I could know how to >> use it >> more efficiently. For example, I know the Ceph could provide the data of the >> objects’ number per PG, the read and write bandwidth, but I couldn’t find >> how to >> achieve these in source code. > >I'm not quite sure what you're asking here, but I think you'll want to >look at the MPGStats.h message (in ceph/src/messages), and trace >backwards through the OSD code (ceph/src/osd/) which creates them and >then forwards through the monitor code (ceph/src/mon/OSDMonitor.cc) > >> >>2.From official documents, Ceph provides the cache tiering to improve >> performance. But I couldn’t find more details to describe the cache tiering >> like which kind of algorithm the cache agent uses. In the source code, where >> could I find these? > >The cache tiering is part of the OSD. Look at the TierAgentState.h >file and the parts of ReplicatedPG.cc which reference it. > >> >> 3.In write process , there are two responses to client ,first is from >> journal and >> second is occurred when object writes to real disk .so when I write a object >> to >> Ceph using librbd, does not the write finish until the second response >> occurs and >> what mean the first and second responses for clients? When a object writes >> to journal >> but not to filestore (that is not to disk ), could I read this object? If I >> could, >> where could I read this object? > >You get a response from the OSDs: >1) when the write operation is durable. >2) when the write operation is readable. > >The order these arrive in will depend on your OSD configuration (btrfs >can send readable before durable; xfs always sends durable first; >etc). If you get a "durable" response, any subsequent operations on >that object will wait until that durable op is readable before >completing. >-Greg N嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏渮榏z鳐妠ay�蕠跈�,jf"穐殝鄗�畐ア�⒎:+v墾妛鑚豰稛�珣赙zZ+凒殠娸"濟!秈
ceph branch status
-- All Branches -- Adam C. Emerson2015-09-14 12:32:18 -0400 wip-cxx11time 2015-09-15 12:09:20 -0400 wip-cxx11concurrency Adam Crume 2014-12-01 20:45:58 -0800 wip-doc-rbd-replay Alfredo Deza 2015-03-23 16:39:48 -0400 wip-11212 Alfredo Deza 2014-07-08 13:58:35 -0400 wip-8679 2014-09-04 13:58:14 -0400 wip-8366 2014-10-13 11:10:10 -0400 wip-9730 Ali Maredia 2015-09-22 15:10:10 -0400 wip-cmake 2015-10-09 14:58:17 -0400 wip-10587-split-servers Boris Ranto 2015-09-04 15:19:11 +0200 wip-bash-completion Casey Bodley 2015-09-28 17:09:11 -0400 wip-cxx14-test 2015-09-29 15:18:17 -0400 wip-fio-objectstore Dan Mick 2013-07-16 23:00:06 -0700 wip-5634 Daniel Gryniewicz 2015-10-05 09:28:40 -0400 wip-dang-cmake Danny Al-Gaaf 2015-04-23 16:32:00 +0200 wip-da-SCA-20150421 2015-04-23 17:18:57 +0200 wip-nosetests 2015-04-23 18:20:16 +0200 wip-unify-num_objects_degraded 2015-09-28 16:05:12 +0200 wip-da-SCA-20150910 David Zafman 2014-08-29 10:41:23 -0700 wip-libcommon-rebase 2015-04-24 13:14:23 -0700 wip-cot-giant 2015-08-04 07:39:00 -0700 wip-12577-hammer 2015-09-28 11:33:11 -0700 wip-12983 Dongmao Zhang 2014-11-14 19:14:34 +0800 thesues-master Greg Farnum 2015-04-29 21:44:11 -0700 wip-init-names 2015-07-16 09:28:24 -0700 hammer-12297 2015-10-01 22:46:38 -0700 greg-fs-testing 2015-10-02 13:00:59 -0700 greg-infernalis-lock-testing 2015-10-02 13:09:05 -0700 greg-infernalis-lock-testing-cacher 2015-10-07 00:45:24 -0700 greg-infernalis-fs Greg Farnum 2014-10-23 13:33:44 -0700 wip-forward-scrub Guang G Yang 2015-06-26 20:31:44 + wip-ec-readall 2015-07-23 16:13:19 + wip-12316 Guang Yang 2014-08-08 10:41:12 + wip-guangyy-pg-splitting 2014-09-25 00:47:46 + wip-9008 2014-09-30 10:36:39 + guangyy-wip-9614 Haomai Wang 2014-07-27 13:37:49 +0800 wip-flush-set 2015-04-20 00:47:59 +0800 update-organization 2015-07-21 19:33:56 +0800 fio-objectstore 2015-08-26 09:57:27 +0800 wip-recovery-attr Ilya Dryomov 2014-09-05 16:15:10 +0400 wip-rbd-notify-errors Ivo Jimenez 2015-08-24 23:12:45 -0700 hammer-with-new-workunit-for-wip-12551 Jason Dillaman 2015-07-31 13:55:23 -0400 wip-12383-next 2015-08-31 23:17:53 -0400 wip-12698 2015-09-01 10:17:02 -0400 wip-11287 Jenkins 2015-09-30 12:59:03 -0700 rhcs-v0.94.3-ubuntu Jenkins 2014-07-29 05:24:39 -0700 wip-nhm-hang 2015-02-02 10:35:28 -0800 wip-sam-v0.92 2015-08-21 12:46:32 -0700 last 2015-08-21 12:46:32 -0700 loic-v9.0.3 2015-09-15 10:23:18 -0700 rhcs-v0.80.8 2015-09-21 16:48:32 -0700 rhcs-v0.94.1-ubuntu Joao Eduardo Luis 2014-09-10 09:39:23 +0100 wip-leveldb-get.dumpling Joao Eduardo Luis 2014-07-22 15:41:42 +0100 wip-leveldb-misc Joao Eduardo Luis 2014-09-02 17:19:52 +0100 wip-leveldb-get 2014-10-17 16:20:11 +0100 wip-paxos-fix 2014-10-21 21:32:46 +0100 wip-9675.dumpling 2015-07-27 21:56:42 +0100 wip-11470.hammer 2015-09-09 15:45:45 +0100 wip-11786.hammer Joao Eduardo Luis 2014-11-17 16:43:53 + wip-mon-osdmap-cleanup 2014-12-15 16:18:56 + wip-giant-mon-backports 2014-12-17 17:13:57 + wip-mon-backports.firefly 2014-12-17 23:15:10 + wip-mon-sync-fix.dumpling 2015-01-07 23:01:00 + wip-mon-blackhole-mlog-0.87.7 2015-01-10 02:40:42 + wip-dho-joao 2015-01-10 02:46:31 + wip-mon-paxos-fix 2015-01-26 13:00:09 + wip-mon-datahealth-fix 2015-02-04 22:36:14 + wip-10643 2015-09-09 15:43:51 +0100 wip-11786.firefly Joao Eduardo Luis 2015-05-27 23:48:45 +0100 wip-mon-scrub 2015-05-29 12:21:43 +0100 wip-11545 2015-06-05 16:12:57 +0100 wip-10507 2015-06-16 14:34:11 +0100 wip-11470 2015-06-25 00:16:41 +0100 wip-10507-2 2015-07-14 16:52:35 +0100 wip-joao-testing 2015-09-08 09:48:41 +0100 wip-leveldb-hang John
Initial performance cluster SimpleMessenger vs AsyncMessenger results
Hi Guy, Given all of the recent data on how different memory allocator configurations improve SimpleMessenger performance (and the effect of memory allocators and transparent hugepages on RSS memory usage), I thought I'd run some tests looking how AsyncMessenger does in comparison. We spoke about these a bit at the last performance meeting but here's the full write up. The rough conclusion as of right now appears to be: 1) AsyncMessenger performance is not dependent on the memory allocator like with SimpleMessenger. 2) AsyncMessenger is faster than SimpleMessenger with TCMalloc + 32MB (ie default) thread cache. 3) AsyncMessenger is consistently faster than SimpleMessenger for 128K random reads. 4) AsyncMessenger is sometimes slower than SimpleMessenger when memory allocator optimizations are used. 5) AsyncMessenger currently uses far more RSS memory than SimpleMessenger. Here's a link to the paper: https://drive.google.com/file/d/0B2gTBZrkrnpZS1Q4VktjZkhrNHc/view Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wip-addr
On Mon, 12 Oct 2015, David Zafman wrote: > I don't understand how encode/decode of entity_addr_t is changing without > versioning in the encode/decode. This means that this branch is changing the > ceph-objectstore-tool export format if CEPH_FEATURE_MSG_ADDR2 is part of the > features. So we could bump super_header::super_ver if the export format must > change. > > Now that I look at it, I'm sure I can clear the watchers and old_watchers in > object_info_t during export because that is dynamic information and it happens > to include entity_addr_t. I need to verify this, but that may be the only > reason that the objectstore tool needs a valid features value to be passed > there. Ah, yeah... clearing watchers (perhaps optionally, though) sounds fine. sage > > David > > On 10/9/15 2:49 PM, Sage Weil wrote: > > > 2. > > > >(about line 2067 in src/tools/ceph_objectstore_tool.cc) > > > >(use via ceph cmd?) tools - "object store tool". > > > >This has a way to serialize objects which includes a watch list > > > >which includes an address. There should be an option here to say > > > >whether to include exported addresses. > > I think it's safe to use defaults here.. what do you think, David? > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wip-addr
I don't understand how encode/decode of entity_addr_t is changing without versioning in the encode/decode. This means that this branch is changing the ceph-objectstore-tool export format if CEPH_FEATURE_MSG_ADDR2 is part of the features. So we could bump super_header::super_ver if the export format must change. Now that I look at it, I'm sure I can clear the watchers and old_watchers in object_info_t during export because that is dynamic information and it happens to include entity_addr_t. I need to verify this, but that may be the only reason that the objectstore tool needs a valid features value to be passed there. David On 10/9/15 2:49 PM, Sage Weil wrote: 2. >(about line 2067 in src/tools/ceph_objectstore_tool.cc) >(use via ceph cmd?) tools - "object store tool". >This has a way to serialize objects which includes a watch list >which includes an address. There should be an option here to say >whether to include exported addresses. I think it's safe to use defaults here.. what do you think, David? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: [newstore (again)] how disable double write WAL
Ok, Great. With these settings : // newstore_max_dir_size = 4096 newstore_sync_io = true newstore_sync_transaction = true newstore_sync_submit_transaction = true newstore_sync_wal_apply = true newstore_overlay_max = 0 // And direct IO in the benchmark tool (fio) I see that the HDD is 100% charged and there are notransfer of /db to /fragments after stopping benchmark : Great ! But when i launch a bench with random blocs of 256k, i see random blocs between 32k and 256k on HDD. Any idea ? Debits to the HDD are about 8MBps when they could be higher with larger blocs (~30MBps) And 70 MBps without fsync (hard drive cache disabled). Other questions : newstore_sync_io -> true = fsync immediatly, false = fsync later (Thread fsync_wq) ? newstore_sync_transaction -> true = sync in DB ? newstore_sync_submit_transaction -> if false then kv_queue (only if newstore_sync_transaction=false) ? newstore_sync_wal_apply = true -> if false then WAL later (thread wal_wq) ? Is it true ? Way for cache with battery (sync DB and no sync data) ? Thanks for everything ! On 10/12/2015 03:01 PM, Sage Weil wrote: On Mon, 12 Oct 2015, David Casier wrote: Hello everybody, fragment is stored in rocksdb before being written to "/fragments" ? I separed "/db" and "/fragments" but during the bench, everything is writing to "/db" I changed options "newstore_sync_*" without success. Is there any way to write all metadata in "/db" and all data in "/fragments" ? You can set newstore_overlay_max = 0 to avoid most data landing in db/. But if you are overwriting an existing object, doing write-ahead logging is usually unavoidable because we need to make the update atomic (and the underlying posix fs doesn't provide that). The wip-newstore-frags branch mitigates this somewhat for larger writes by limiting fragment size, but for small IOs this is pretty much always going to be the case. For small IOs, though, putting things in db/ is generally better since we can combine many small ios into a single (rocksdb) journal/wal write. And often leave them there (via the 'overlay' behavior). sage -- Cordialement, *David CASIER DCConsulting SARL 4 Trait d'Union 77127 LIEUSAINT **Ligne directe: _01 75 98 53 85_ Email: _david.casier@aevoo.fr_ * -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lltng enabled by default and qemu apparmor|selinux problems
I have an open PR [1] to dynamically enable LTTng-UST via new config options. This change will hopefully trickle down to older release and will avoid the SElinux / AppArmor issues in the default case (which led to downstream Ubuntu and Fedora disabling LTTng-UST support). Anyone that wants to use LTTng-UST (i.e. for generating RBD replay traces) can enable the support and adjust their SElinux / AppArmor rules to accommodate. [1] https://github.com/ceph/ceph/pull/6135 -- Jason Dillaman - Original Message - > From: "Paul HEWLETT (Paul)"> To: "Alexandre DERUMIER" , "ceph-devel" > > Cc: "Sage Weil" > Sent: Monday, October 12, 2015 4:28:06 AM > Subject: Re: lltng enabled by default and qemu apparmor|selinux problems > > IF I can add my $0.02 - we were unable to use the libradosstriper library in > RHEL6 because it uses the same initialisation tags as librados and lttng > does not like that. We had no problems with RHEL7 version of ceph because > lttng is not enabled. Please do not re-enable lttng in RHEL7 and later > branches…. > > Regards > Paul > > > > > On 11/10/2015 18:06, "ceph-devel-ow...@vger.kernel.org on behalf of Alexandre > DERUMIER" aderum...@odiso.com> wrote: > > >Hi, > > > >it seem that since this commit > > > >https://github.com/ceph/ceph/pull/4261/files > > > >lltng is enabled by default. > > > >But this give error with qemu when apparmor|selinux is enabled. > > > >That's why ubuntu && redhat now disable it for their own packages. > > > >https://bugzilla.redhat.com/show_bug.cgi?id=1223319 > >https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1432644 > > > >In the ubuntu launchpad, Sage has made a reply > > > >" > >Sage Weil (sage-newdream) wrote on 2015-04-02: #21 > >FWIW, we are disabling the lttng support in the final hammer release to > >avoid this issue (until we come up with a better solution)." > > > > > >It seem that it's still enabled by default in ceph git and ceph.com > >packages. > > > >Is it still planned to disable by default ? > > > > > > > >-- > >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >the body of a message to majord...@vger.kernel.org > >More majordomo info at http://vger.kernel.org/majordomo-info.html > N�r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��j:+v���w�j�mzZ+��ݢj"�� -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: [newstore (again)] how disable double write WAL
Hi David- On Mon, 12 Oct 2015, David Casier wrote: > Ok, > Great. > > With these settings : > // > newstore_max_dir_size = 4096 > newstore_sync_io = true > newstore_sync_transaction = true > newstore_sync_submit_transaction = true Is this a hard disk? Those settings probably don't make sense since it does every IO synchronously, blocking the submitting IO path... > newstore_sync_wal_apply = true > newstore_overlay_max = 0 > // > > And direct IO in the benchmark tool (fio) > > I see that the HDD is 100% charged and there are notransfer of /db to > /fragments after stopping benchmark : Great ! > > But when i launch a bench with random blocs of 256k, i see random blocs > between 32k and 256k on HDD. Any idea ? Random IOs have to be write ahead logged in rocksdb, which has its own IO pattern. Since you made everything sync above I think it'll depend on how many osd threads get batched together at a time.. maybe. Those settings aren't something I've really tested, and probably only make sense with very fast NVMe devices. > Debits to the HDD are about 8MBps when they could be higher with larger > blocs> (~30MBps) > And 70 MBps without fsync (hard drive cache disabled). > > Other questions : > newstore_sync_io -> true = fsync immediatly, false = fsync later (Thread > fsync_wq) ? yes > newstore_sync_transaction -> true = sync in DB ? synchronously do the rocksdb commit too > newstore_sync_submit_transaction -> if false then kv_queue (only if > newstore_sync_transaction=false) ? yeah.. there is an annoying rocksdb behavior that makes an async transaction submit block if a sync one is in progress, so this queues them up and explicitly batches them. > newstore_sync_wal_apply = true -> if false then WAL later (thread wal_wq) ? the txn commit completion threads can do the wal work synchronously.. this is only a good idea if it's doing aio (which it generally is). > Is it true ? > > Way for cache with battery (sync DB and no sync data) ? ? s > > Thanks for everything ! > > On 10/12/2015 03:01 PM, Sage Weil wrote: > > On Mon, 12 Oct 2015, David Casier wrote: > > > Hello everybody, > > > fragment is stored in rocksdb before being written to "/fragments" ? > > > I separed "/db" and "/fragments" but during the bench, everything is > > > writing > > > to "/db" > > > I changed options "newstore_sync_*" without success. > > > > > > Is there any way to write all metadata in "/db" and all data in > > > "/fragments" ? > > You can set newstore_overlay_max = 0 to avoid most data landing in db/. > > But if you are overwriting an existing object, doing write-ahead logging > > is usually unavoidable because we need to make the update atomic (and the > > underlying posix fs doesn't provide that). The wip-newstore-frags branch > > mitigates this somewhat for larger writes by limiting fragment size, but > > for small IOs this is pretty much always going to be the case. For small > > IOs, though, putting things in db/ is generally better since we can > > combine many small ios into a single (rocksdb) journal/wal write. And > > often leave them there (via the 'overlay' behavior). > > > > sage > > > > > -- > > > Cordialement, > > *David CASIER > DCConsulting SARL > > > 4 Trait d'Union > 77127 LIEUSAINT > > **Ligne directe: _01 75 98 53 85_ > Email: _david.casier@aevoo.fr_ > * > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rgw and the next hammer release v0.94.4
Yeah, it should be fine. On Mon, Oct 12, 2015 at 3:56 PM, Loic Dacharywrote: > Hi, > > After todays private discussion and the merge of > https://github.com/ceph/ceph/pull/6161, I will assume the current hammer > branch (7f485ed5aa620fe982561663bf64356b7e2c38f2) is ready for QE to start > their own round of testing. If I misinterpreted what you wrote, please speak > up and I'll do what's needed ;-) > > Cheers > > On 02/10/2015 22:31, Loic Dachary wrote: >> Hi Yehuda, >> >> The next hammer release as found at https://github.com/ceph/ceph/tree/hammer >> passed the rgw suite (http://tracker.ceph.com/issues/12701#note-58). >> Do you think the hammer branch is ready for QE to start their own round of >> testing ? >> >> Cheers >> >> P.S. http://tracker.ceph.com/issues/12701#Release-information has direct >> links to the pull requests merged into hammer since v0.94.3 in case you need >> more context about one of them. >> > > -- > Loïc Dachary, Artisan Logiciel Libre > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: throttles
dump_historic_ops, slow requests
I have a small ceph cluster (3 nodes, 5 osds each, journals all just partitions on the spinner disks) and I have noticed that when I hit it with a bunch of rados bench clients all doing writes of large (40M objects) with --no-cleanup, the rados bench commands seem to finish OK but I often get health warnings like HEALTH_WARN 4 requests are blocked > 32 sec; 2 osds have slow requests 3 ops are blocked > 32.768 sec on osd.9 1 ops are blocked > 32.768 sec on osd.10 2 osds have slow requests After a couple of minutes, health goes to HEALTH_OK. But if I go to the node containing osd.10 for example and do dump_historic_ops I do get lots of around 20-sec durations but nothing over 32 sec. The 20-sec or so ops are always "ack+ondisk+write+known_if_redirected" with type_data = "commit sent: apply or cleanup" and the following are typical event timings initiated: 14:06:58.205937 reached_pg: 14:07:01.823288, gap= 3617.351 started: 14:07:01.823359, gap= 0.071 waiting for subops from 3: 14:07:01.855259, gap=31.900 commit_queued_for_journal_write: 14:07:03.132697, gap= 1277.438 write_thread_in_journal_buffer: 14:07:03.143356, gap=10.659 journaled_completion_queued: 14:07:04.175863, gap= 1032.507 op_commit: 14:07:04.585040, gap= 409.177 op_applied: 14:07:04.589751, gap= 4.711 sub_op_commit_rec from 3: 14:07:14.682925, gap= 10093.174 commit_sent: 14:07:14.683081, gap= 0.156 done: 14:07:14.683119, gap= 0.038 Should I expect to see a historic op with duration greater than 32 sec? -- Tom Deneau -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
throttles
Looking at the perf counters on my osds, I see wait counts for the following throttle related perf counters: (This is from trying to benchmark using multiple rados bench client processes). throttle-filestore_bytes throttle-msgr_dispatch_throttler-client throttle-osd_client_bytes throttle-osd_client_messages What are the config variables that would allow me to experiment with these throttle limits? (When I look at the output from --admin-daemon osd.xx.asok config show, it's not clear which items these correspond to). -- Tom Deneau -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
enable rbd on ec pool ?
Is there a patch available to enable rbd over an EC pool ? Currently its restricted, 2015-10-12 10:52:23.042085 7f4721ca1840 -1 librbd: error adding image to directory: (95) Operation not supported rbd: create error: (95) Operation not supported Thanks, tomy PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lltng enabled by default and qemu apparmor|selinux problems
>>I have an open PR [1] to dynamically enable LTTng-UST via new config options Great ,Thanks Jason ! - Mail original - De: "Jason Dillaman"À: "Paul HEWLETT (Paul)" Cc: "aderumier" , "ceph-devel" , "Sage Weil" Envoyé: Lundi 12 Octobre 2015 20:32:43 Objet: Re: lltng enabled by default and qemu apparmor|selinux problems I have an open PR [1] to dynamically enable LTTng-UST via new config options. This change will hopefully trickle down to older release and will avoid the SElinux / AppArmor issues in the default case (which led to downstream Ubuntu and Fedora disabling LTTng-UST support). Anyone that wants to use LTTng-UST (i.e. for generating RBD replay traces) can enable the support and adjust their SElinux / AppArmor rules to accommodate. [1] https://github.com/ceph/ceph/pull/6135 -- Jason Dillaman - Original Message - > From: "Paul HEWLETT (Paul)" > To: "Alexandre DERUMIER" , "ceph-devel" > > Cc: "Sage Weil" > Sent: Monday, October 12, 2015 4:28:06 AM > Subject: Re: lltng enabled by default and qemu apparmor|selinux problems > > IF I can add my $0.02 - we were unable to use the libradosstriper library in > RHEL6 because it uses the same initialisation tags as librados and lttng > does not like that. We had no problems with RHEL7 version of ceph because > lttng is not enabled. Please do not re-enable lttng in RHEL7 and later > branches…. > > Regards > Paul > > > > > On 11/10/2015 18:06, "ceph-devel-ow...@vger.kernel.org on behalf of Alexandre > DERUMIER" aderum...@odiso.com> wrote: > > >Hi, > > > >it seem that since this commit > > > >https://github.com/ceph/ceph/pull/4261/files > > > >lltng is enabled by default. > > > >But this give error with qemu when apparmor|selinux is enabled. > > > >That's why ubuntu && redhat now disable it for their own packages. > > > >https://bugzilla.redhat.com/show_bug.cgi?id=1223319 > >https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1432644 > > > >In the ubuntu launchpad, Sage has made a reply > > > >" > >Sage Weil (sage-newdream) wrote on 2015-04-02: #21 > >FWIW, we are disabling the lttng support in the final hammer release to > >avoid this issue (until we come up with a better solution)." > > > > > >It seem that it's still enabled by default in ceph git and ceph.com > >packages. > > > >Is it still planned to disable by default ? > > > > > > > >-- > >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >the body of a message to majord...@vger.kernel.org > >More majordomo info at http://vger.kernel.org/majordomo-info.html > N�r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��j:+v���w�j�mzZ+��ݢj"�� > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rgw and the next hammer release v0.94.4
Hi, After todays private discussion and the merge of https://github.com/ceph/ceph/pull/6161, I will assume the current hammer branch (7f485ed5aa620fe982561663bf64356b7e2c38f2) is ready for QE to start their own round of testing. If I misinterpreted what you wrote, please speak up and I'll do what's needed ;-) Cheers On 02/10/2015 22:31, Loic Dachary wrote: > Hi Yehuda, > > The next hammer release as found at https://github.com/ceph/ceph/tree/hammer > passed the rgw suite (http://tracker.ceph.com/issues/12701#note-58). > Do you think the hammer branch is ready for QE to start their own round of > testing ? > > Cheers > > P.S. http://tracker.ceph.com/issues/12701#Release-information has direct > links to the pull requests merged into hammer since v0.94.3 in case you need > more context about one of them. > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
hammer branch for v0.94.4 ready for QE
Hi Yuri, The hammer branch for v0.94.4 as found at https://github.com/ceph/ceph/commits/hammer has been approved by Yehuda, Josh and Sam (there are no CephFS related commits according to Greg, hence his approval was not relevant) and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/7f485ed5aa620fe982561663bf64356b7e2c38f2 and the details of the tests run are at http://tracker.ceph.com/issues/12701. This time around, instead of adding the table to the description, I propose you add it as a comment (which can be edited later on). It is easier because it's not overloaded with unrelated content. There also is the matter of the maximum size of the description field: there is a real risk of exceeding it and truncate the result. Cheers -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: [ceph-users] Initial performance cluster SimpleMessenger vs AsyncMessenger results
resend On Tue, Oct 13, 2015 at 10:56 AM, Haomai Wangwrote: > COOL > > Interesting that async messenger will consume more memory than simple, in my > mind I always think async should use less memory. I will give a look at this > > On Tue, Oct 13, 2015 at 12:50 AM, Mark Nelson wrote: >> >> Hi Guy, >> >> Given all of the recent data on how different memory allocator >> configurations improve SimpleMessenger performance (and the effect of memory >> allocators and transparent hugepages on RSS memory usage), I thought I'd run >> some tests looking how AsyncMessenger does in comparison. We spoke about >> these a bit at the last performance meeting but here's the full write up. >> The rough conclusion as of right now appears to be: >> >> 1) AsyncMessenger performance is not dependent on the memory allocator >> like with SimpleMessenger. >> >> 2) AsyncMessenger is faster than SimpleMessenger with TCMalloc + 32MB (ie >> default) thread cache. >> >> 3) AsyncMessenger is consistently faster than SimpleMessenger for 128K >> random reads. >> >> 4) AsyncMessenger is sometimes slower than SimpleMessenger when memory >> allocator optimizations are used. >> >> 5) AsyncMessenger currently uses far more RSS memory than SimpleMessenger. >> >> Here's a link to the paper: >> >> https://drive.google.com/file/d/0B2gTBZrkrnpZS1Q4VktjZkhrNHc/view >> >> Mark >> ___ >> ceph-users mailing list >> ceph-us...@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > > Best Regards, > > Wheat -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Initial performance cluster SimpleMessenger vs AsyncMessenger results
On Mon, Oct 12, 2015 at 9:50 AM, Mark Nelsonwrote: > Hi Guy, > > Given all of the recent data on how different memory allocator > configurations improve SimpleMessenger performance (and the effect of memory > allocators and transparent hugepages on RSS memory usage), I thought I'd run > some tests looking how AsyncMessenger does in comparison. We spoke about > these a bit at the last performance meeting but here's the full write up. > The rough conclusion as of right now appears to be: > > 1) AsyncMessenger performance is not dependent on the memory allocator like > with SimpleMessenger. > > 2) AsyncMessenger is faster than SimpleMessenger with TCMalloc + 32MB (ie > default) thread cache. > > 3) AsyncMessenger is consistently faster than SimpleMessenger for 128K > random reads. > > 4) AsyncMessenger is sometimes slower than SimpleMessenger when memory > allocator optimizations are used. > > 5) AsyncMessenger currently uses far more RSS memory than SimpleMessenger. > > Here's a link to the paper: > > https://drive.google.com/file/d/0B2gTBZrkrnpZS1Q4VktjZkhrNHc/view Can you clarify these tests a bit more? I can't make the number of nodes, OSDs, and SSDs work out properly. Were the FIO jobs 256 concurrent ops per job, or in aggregate? Is there any more info that might suggest why the 128KB rand-read (but not read nor write, and not 4k rand-read) was so asymmetrical? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html