Benachrichtigung
Sehr geehrte / ter email Benützer ! Ihre email Adresse hat 1.20,00 (EINEMILLIONZWEIHUNDERTAUSEND EURO) gewonnen . Mit den Glückszahlen 9-3-8-26-28-4-64 In der EURO MILLIONEN EMAIL LOTTERIE.Die Summe ergibt sich aus einer Gewinnausschuttung von. 22.800,000,00 ( ZWEIUNDZWANZIGMILLIONENACHTHUNDERTTOUSEND ) Die Summe wurde durch 19 Gewinnern aus der gleichen Kategorie geteilt. ! Bitte kontaktieren Sie für Ihren Gewinn zuständige Sachbearbeiterin Frau Christiane Hamann per email: christiane_hama...@aol.com BITTE AUSFUILLEN DEIN DATAS AUS UNTEN. Glückszahlen:___ NAME: ___FAMILIENNAME:_ ADRESSE:__ STADT: PLZ: LAND: ___ GEB: DATUM: __BERUF: FESTNETZ TEL.NR: MOBILETELEFON NR: ___FAX: ___ EMAIL:___ DATE SIGNATURE:_ bitte füllen Sie das anschließende Formular vollständig aus und senden es per email zurück ! Hochachtungsvoll Inmaculada Garcia Martinez Koordinator. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Charity/Donation
Hi, My name is Jeffrey Skoll, a philanthropist and the founder of one of the largest private foundations in the world. I believe strongly in ‘giving while living.’ I had one idea that never changed in my mind — that you should use your wealth to help people and I have decided to secretly give USD2.498 Million to a randomly selected individual. On receipt of this email, you should count yourself as the individual. Kindly get back to me at your earliest convenience, so I know your email address is valid. Visit the web page to know more about me: http://www.theglobeandmail.com/news/national/meet-the-canadian-billionaire-whos-giving-it-all-away/article4209888/ or you can read an article of me on Wikipedia. Regards, Jeffrey Skoll. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long peering - throttle at FileStore::queue_transactions
On Mon, 4 Jan 2016, Guang Yang wrote: > Hi Cephers, > Happy New Year! I got question regards to the long PG peering.. > > Over the last several days I have been looking into the *long peering* > problem when we start a OSD / OSD host, what I observed was that the > two peering working threads were throttled (stuck) when trying to > queue new transactions (writing pg log), thus the peering process are > dramatically slow down. > > The first question came to me was, what were the transactions in the > queue? The major ones, as I saw, included: > > - The osd_map and incremental osd_map, this happens if the OSD had > been down for a while (in a large cluster), or when the cluster got > upgrade, which made the osd_map epoch the down OSD had, was far behind > the latest osd_map epoch. During the OSD booting, it would need to > persist all those osd_maps and generate lots of filestore transactions > (linear with the epoch gap). > > As the PG was not involved in most of those epochs, could we only take and > > persist those osd_maps which matter to the PGs on the OSD? This part should happen before the OSD sends the MOSDBoot message, before anyone knows it exists. There is a tunable threshold that controls how recent the map has to be before the OSD tries to boot. If you're seeing this in the real world, be probably just need to adjust that value way down to something small(er). sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Long peering - throttle at FileStore::queue_transactions
We need every OSDMap persisted before persisting later ones because we rely on there being no holes for a bunch of reasons. The deletion transactions are more interesting. It's not part of the boot process, these are deletions resulting from merging in a log from a peer which logically removed an object. It's more noticeable on boot because all PGs will see these operations at once (if there are a bunch of deletes happening). We need to process these transactions before we can serve reads (before we activate) currently since we use the on disk state (modulo the objectcontext locks) as authoritative. That transaction iirc also contains the updated PGLog. We can't avoid writing down the PGLog prior to activation, but we *can* delay the deletes (and even batch/throttle them) if we do some work: 1) During activation, we need to maintain a set of to-be-deleted objects. For each of these objects, we need to populate the objectcontext cache with an exists=false objectcontext so that we don't erroneously read the deleted data. Each of the entries in the to-be-deleted object set would have a reference to the context to keep it alive until the deletion is processed. 2) Any write operation which references one of these objects needs to be preceded by a delete if one has not yet been queued (and the to-be-deleted set updated appropriately). The tricky part is that the primary and replicas may have different objects in this set... The replica would have to insert deletes ahead of any subop (or the ec equilivant) it gets from the primary. For that to work, it needs to have something like the obc cache. I have a wip-replica-read branch which refactors object locking to allow the replica to maintain locks (to avoid replica-reads conflicting with writes). That machinery would probably be the right place to put it. 3) We need to make sure that if a node restarts anywhere in this process that it correctly repopulates the set of to be deleted entries. We might consider a deleted-to version in the log? Not sure about this one since it would be different on the replica and the primary. Anyway, it's actually more complicated than you'd expect and will require more design (and probably depends on wip-replica-read landing). -Sam On Mon, Jan 4, 2016 at 3:32 PM, Guang Yang wrote: > Hi Cephers, > Happy New Year! I got question regards to the long PG peering.. > > Over the last several days I have been looking into the *long peering* > problem when we start a OSD / OSD host, what I observed was that the > two peering working threads were throttled (stuck) when trying to > queue new transactions (writing pg log), thus the peering process are > dramatically slow down. > > The first question came to me was, what were the transactions in the > queue? The major ones, as I saw, included: > > - The osd_map and incremental osd_map, this happens if the OSD had > been down for a while (in a large cluster), or when the cluster got > upgrade, which made the osd_map epoch the down OSD had, was far behind > the latest osd_map epoch. During the OSD booting, it would need to > persist all those osd_maps and generate lots of filestore transactions > (linear with the epoch gap). >> As the PG was not involved in most of those epochs, could we only take and >> persist those osd_maps which matter to the PGs on the OSD? > > - There are lots of deletion transactions, and as the PG booting, it > needs to merge the PG log from its peers, and for the deletion PG > entry, it would need to queue the deletion transaction immediately. >> Could we delay the queue of the transactions until all PGs on the host are >> peered? > > Thanks, > Guang > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD data file are OSD logs
Thanks Sam for the confirmation. Thanks, Guang On Mon, Jan 4, 2016 at 3:59 PM, Samuel Just wrote: > IIRC, you are running giant. I think that's the log rotate dangling > fd bug (not fixed in giant since giant is eol). Fixed upstream > 8778ab3a1ced7fab07662248af0c773df759653d, firefly backport is > b8e3f6e190809febf80af66415862e7c7e415214. > -Sam > > On Mon, Jan 4, 2016 at 3:37 PM, Guang Yang wrote: >> Hi Cephers, >> Before I open a tracker, I would like check if it is a known issue or not.. >> >> One one of our clusters, there was OSD crash during repairing, the >> crash happened after we issued a PG repair for inconsistent PGs, which >> failed because the recorded file size (within xattr) mismatched with >> the actual file size. >> >> The mismatch was caused by the fact that the content of the data file >> are OSD logs, following is from osd.354 on c003: >> >> -rw-r--r-- 1 yahoo root 75168 Jan 3 07:30 >> default.12061.9\u8396947527\u52ac8b3ec6\uo.jpg__head_A2478171__3__7 >> -bash-4.1$ head >> "default.12061.9\u8396947527\u52ac8b3ec6\uo.jpg__head_A2478171__3__7" >> 2016-01-03 07:30:01.600119 7f7fe2096700 15 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) getattrs >> 3.171s7_head/a2478171/default.12061.9_8396947527_52ac8b3ec6_o.jpg/head//3/18446744073709551615/7 >> 2016-01-03 07:30:01.604967 7f7fe2096700 10 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) -ERANGE, len is 494 >> 2016-01-03 07:30:01.604984 7f7fe2096700 10 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) -ERANGE, got 247 >> 2016-01-03 07:30:01.604986 7f7fe2096700 20 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting >> '_user.rgw.idtag' >> 2016-01-03 07:30:01.604996 7f7fe2096700 20 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting '_' >> 2016-01-03 07:30:01.605007 7f7fe2096700 20 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting >> 'snapset' >> 2016-01-03 07:30:01.605013 7f7fe2096700 20 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting >> '_user.rgw.manifest' >> 2016-01-03 07:30:01.605026 7f7fe2096700 20 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting >> 'hinfo_key' >> 2016-01-03 07:30:01.605042 7f7fe2096700 20 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting >> '_user.rgw.x-amz-meta-origin' >> 2016-01-03 07:30:01.605049 7f7fe2096700 20 >> filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting >> '_user.rgw.acl' >> >> >> This only happens on the clusters we turned on the verbose log >> (debug_osd/filestore=20). And we are running ceph v0.87. >> >> Thanks, >> Guang >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD data file are OSD logs
IIRC, you are running giant. I think that's the log rotate dangling fd bug (not fixed in giant since giant is eol). Fixed upstream 8778ab3a1ced7fab07662248af0c773df759653d, firefly backport is b8e3f6e190809febf80af66415862e7c7e415214. -Sam On Mon, Jan 4, 2016 at 3:37 PM, Guang Yang wrote: > Hi Cephers, > Before I open a tracker, I would like check if it is a known issue or not.. > > One one of our clusters, there was OSD crash during repairing, the > crash happened after we issued a PG repair for inconsistent PGs, which > failed because the recorded file size (within xattr) mismatched with > the actual file size. > > The mismatch was caused by the fact that the content of the data file > are OSD logs, following is from osd.354 on c003: > > -rw-r--r-- 1 yahoo root 75168 Jan 3 07:30 > default.12061.9\u8396947527\u52ac8b3ec6\uo.jpg__head_A2478171__3__7 > -bash-4.1$ head > "default.12061.9\u8396947527\u52ac8b3ec6\uo.jpg__head_A2478171__3__7" > 2016-01-03 07:30:01.600119 7f7fe2096700 15 > filestore(/home/y/var/lib/ceph/osd/ceph-354) getattrs > 3.171s7_head/a2478171/default.12061.9_8396947527_52ac8b3ec6_o.jpg/head//3/18446744073709551615/7 > 2016-01-03 07:30:01.604967 7f7fe2096700 10 > filestore(/home/y/var/lib/ceph/osd/ceph-354) -ERANGE, len is 494 > 2016-01-03 07:30:01.604984 7f7fe2096700 10 > filestore(/home/y/var/lib/ceph/osd/ceph-354) -ERANGE, got 247 > 2016-01-03 07:30:01.604986 7f7fe2096700 20 > filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting > '_user.rgw.idtag' > 2016-01-03 07:30:01.604996 7f7fe2096700 20 > filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting '_' > 2016-01-03 07:30:01.605007 7f7fe2096700 20 > filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting > 'snapset' > 2016-01-03 07:30:01.605013 7f7fe2096700 20 > filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting > '_user.rgw.manifest' > 2016-01-03 07:30:01.605026 7f7fe2096700 20 > filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting > 'hinfo_key' > 2016-01-03 07:30:01.605042 7f7fe2096700 20 > filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting > '_user.rgw.x-amz-meta-origin' > 2016-01-03 07:30:01.605049 7f7fe2096700 20 > filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting > '_user.rgw.acl' > > > This only happens on the clusters we turned on the verbose log > (debug_osd/filestore=20). And we are running ceph v0.87. > > Thanks, > Guang > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
OSD data file are OSD logs
Hi Cephers, Before I open a tracker, I would like check if it is a known issue or not.. One one of our clusters, there was OSD crash during repairing, the crash happened after we issued a PG repair for inconsistent PGs, which failed because the recorded file size (within xattr) mismatched with the actual file size. The mismatch was caused by the fact that the content of the data file are OSD logs, following is from osd.354 on c003: -rw-r--r-- 1 yahoo root 75168 Jan 3 07:30 default.12061.9\u8396947527\u52ac8b3ec6\uo.jpg__head_A2478171__3__7 -bash-4.1$ head "default.12061.9\u8396947527\u52ac8b3ec6\uo.jpg__head_A2478171__3__7" 2016-01-03 07:30:01.600119 7f7fe2096700 15 filestore(/home/y/var/lib/ceph/osd/ceph-354) getattrs 3.171s7_head/a2478171/default.12061.9_8396947527_52ac8b3ec6_o.jpg/head//3/18446744073709551615/7 2016-01-03 07:30:01.604967 7f7fe2096700 10 filestore(/home/y/var/lib/ceph/osd/ceph-354) -ERANGE, len is 494 2016-01-03 07:30:01.604984 7f7fe2096700 10 filestore(/home/y/var/lib/ceph/osd/ceph-354) -ERANGE, got 247 2016-01-03 07:30:01.604986 7f7fe2096700 20 filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting '_user.rgw.idtag' 2016-01-03 07:30:01.604996 7f7fe2096700 20 filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting '_' 2016-01-03 07:30:01.605007 7f7fe2096700 20 filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting 'snapset' 2016-01-03 07:30:01.605013 7f7fe2096700 20 filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting '_user.rgw.manifest' 2016-01-03 07:30:01.605026 7f7fe2096700 20 filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting 'hinfo_key' 2016-01-03 07:30:01.605042 7f7fe2096700 20 filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting '_user.rgw.x-amz-meta-origin' 2016-01-03 07:30:01.605049 7f7fe2096700 20 filestore(/home/y/var/lib/ceph/osd/ceph-354) fgetattrs 61 getting '_user.rgw.acl' This only happens on the clusters we turned on the verbose log (debug_osd/filestore=20). And we are running ceph v0.87. Thanks, Guang -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Long peering - throttle at FileStore::queue_transactions
Hi Cephers, Happy New Year! I got question regards to the long PG peering.. Over the last several days I have been looking into the *long peering* problem when we start a OSD / OSD host, what I observed was that the two peering working threads were throttled (stuck) when trying to queue new transactions (writing pg log), thus the peering process are dramatically slow down. The first question came to me was, what were the transactions in the queue? The major ones, as I saw, included: - The osd_map and incremental osd_map, this happens if the OSD had been down for a while (in a large cluster), or when the cluster got upgrade, which made the osd_map epoch the down OSD had, was far behind the latest osd_map epoch. During the OSD booting, it would need to persist all those osd_maps and generate lots of filestore transactions (linear with the epoch gap). > As the PG was not involved in most of those epochs, could we only take and > persist those osd_maps which matter to the PGs on the OSD? - There are lots of deletion transactions, and as the PG booting, it needs to merge the PG log from its peers, and for the deletion PG entry, it would need to queue the deletion transaction immediately. > Could we delay the queue of the transactions until all PGs on the host are > peered? Thanks, Guang -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 答复: Reboot blocked when undoing unmap op.
On Mon, Jan 4, 2016 at 10:51 AM, Wukongming wrote: > Hi, Ilya, > > It is an old problem. > When you say "when you issue a reboot, daemons get killed and the kernel > client ends up waiting for the them to come back, because of outstanding > writes issued by umount called by systemd (or whatever)." > > Do you mean if umount rbd successfully, the process of kernel client will > stop waiting? What kind of Communication mechanism between libceph and > daemons(or ceph userspace)? If you umount the filesystem on top of rbd and unmap rbd image, there won't be anything to wait for. In fact, if there aren't any other rbd images mapped, libceph will clean up after itself and exit. If you umount the filesystem on top of rbd but don't unmap the image, libceph will remain there, along with some amount of communication (keepalive messages, watch requests, etc). However, all of that is internal and is unlikely to block reboot. If you don't umount the filesystem, your init system will try to umount it, issuing FS requests to the rbd device. We don't want to drop those requests, so, if daemons are gone by then, libceph ends up blocking. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph branch status
-- All Branches -- Abhishek Varshney 2015-11-23 11:45:29 +0530 infernalis-backports Adam C. Emerson 2015-12-21 16:51:39 -0500 wip-cxx11concurrency Adam Crume 2014-12-01 20:45:58 -0800 wip-doc-rbd-replay Alfredo Deza 2015-03-23 16:39:48 -0400 wip-11212 2015-12-23 11:25:13 -0500 wip-doc-style Alfredo Deza 2014-07-08 13:58:35 -0400 wip-8679 2014-09-04 13:58:14 -0400 wip-8366 2014-10-13 11:10:10 -0400 wip-9730 Ali Maredia 2015-11-25 13:45:29 -0500 wip-10587-split-servers 2015-12-23 12:01:46 -0500 wip-cmake 2015-12-30 19:35:39 -0500 wip-cmake-rocksdb 2016-01-01 00:13:31 -0500 wip-cmake-reorg Barbora AnÄincová 2015-11-04 16:43:45 +0100 wip-doc-RGW Boris Ranto 2015-09-04 15:19:11 +0200 wip-bash-completion Daniel Gryniewicz 2015-11-11 09:06:00 -0500 wip-rgw-storage-class 2015-12-09 12:56:37 -0500 cmake-dang Danny Al-Gaaf 2015-04-23 16:32:00 +0200 wip-da-SCA-20150421 2015-04-23 17:18:57 +0200 wip-nosetests 2015-04-23 18:20:16 +0200 wip-unify-num_objects_degraded 2015-11-03 14:10:47 +0100 wip-da-SCA-20151029 2015-11-03 14:40:44 +0100 wip-da-SCA-20150910 David Zafman 2014-08-29 10:41:23 -0700 wip-libcommon-rebase 2015-04-24 13:14:23 -0700 wip-cot-giant 2015-09-28 11:33:11 -0700 wip-12983 2015-12-22 16:19:25 -0800 wip-zafman-testing Dongmao Zhang 2014-11-14 19:14:34 +0800 thesues-master Greg Farnum 2015-04-29 21:44:11 -0700 wip-init-names 2015-07-16 09:28:24 -0700 hammer-12297 2015-10-02 13:00:59 -0700 greg-infernalis-lock-testing 2015-10-02 13:09:05 -0700 greg-infernalis-lock-testing-cacher 2015-10-07 00:45:24 -0700 greg-infernalis-fs 2015-10-21 17:43:07 -0700 client-pagecache-norevoke 2015-10-27 11:32:46 -0700 hammer-pg-replay 2015-11-24 07:17:33 -0800 greg-fs-verify 2015-12-11 00:24:40 -0800 greg-fs-testing Greg Farnum 2014-10-23 13:33:44 -0700 wip-forward-scrub Guang G Yang 2015-06-26 20:31:44 + wip-ec-readall 2015-07-23 16:13:19 + wip-12316 Guang Yang 2014-09-25 00:47:46 + wip-9008 2015-10-20 15:30:41 + wip-13441 Haomai Wang 2015-10-26 00:02:04 +0800 wip-13521 Haomai Wang 2014-07-27 13:37:49 +0800 wip-flush-set 2015-04-20 00:47:59 +0800 update-organization 2015-07-21 19:33:56 +0800 fio-objectstore 2015-08-26 09:57:27 +0800 wip-recovery-attr 2015-10-24 23:39:07 +0800 fix-compile-warning Hector Martin 2015-12-03 03:07:02 +0900 wip-cython-rbd Ilya Dryomov 2014-09-05 16:15:10 +0400 wip-rbd-notify-errors Ivo Jimenez 2015-08-24 23:12:45 -0700 hammer-with-new-workunit-for-wip-12551 James Page 2015-11-04 11:08:42 + javacruft-wip-ec-modules Jason Dillaman 2015-08-31 23:17:53 -0400 wip-12698 2015-11-13 02:00:21 -0500 wip-11287-rebased Jenkins 2015-11-04 14:31:13 -0800 rhcs-v0.94.3-ubuntu Jenkins 2014-07-29 05:24:39 -0700 wip-nhm-hang 2014-10-14 12:10:38 -0700 wip-2 2015-02-02 10:35:28 -0800 wip-sam-v0.92 2015-08-21 12:46:32 -0700 last 2015-08-21 12:46:32 -0700 loic-v9.0.3 2015-09-15 10:23:18 -0700 rhcs-v0.80.8 2015-09-21 16:48:32 -0700 rhcs-v0.94.1-ubuntu Joao Eduardo Luis 2014-09-10 09:39:23 +0100 wip-leveldb-get.dumpling Joao Eduardo Luis 2014-07-22 15:41:42 +0100 wip-leveldb-misc Joao Eduardo Luis 2014-09-02 17:19:52 +0100 wip-leveldb-get 2014-10-17 16:20:11 +0100 wip-paxos-fix 2014-10-21 21:32:46 +0100 wip-9675.dumpling 2015-07-27 21:56:42 +0100 wip-11470.hammer 2015-09-09 15:45:45 +0100 wip-11786.hammer Joao Eduardo Luis 2014-11-17 16:43:53 + wip-mon-osdmap-cleanup 2014-12-15 16:18:56 + wip-giant-mon-backports 2014-12-17 17:13:57 + wip-mon-backports.firefly 2014-12-17 23:15:10 + wip-mon-sync-fix.dumpling 2015-01-07 23:01:00 + wip-mon-blackhole-mlog-0.87.7 2015-01-10 02:40:42 + wip-dho-joao 2015-01-10 02:46:31 + wip-mon-paxos-fix 2015-01-26 13:00:09 + wip-mon-datahealth-fix 2015-02-04 22:36:14 + wip-10643 2015-09-09 15:43:51 +0100 wip-11786.firefly Joao Eduardo Luis 2015-05-27 23:48:45 +0100 wip-mon-scrub 2015-05-29 12:21:43 +0100 wip-11545 2015-06-05 16:12:57 +0100 wip-10507 2015-06-16 14:34:11 +0100 wip-11470 2015-06-25 00:16:41 +0100 wip-10507-2 2015-07-14 16:52:35 +0100 wip-joao-testing 2015-09-08 09:48:41 +0100 wip-leveldb-hang Joe Julian 2015-10-13 14:50:22 +
Re: Speeding up rbd_stat() in libvirt
On 04-01-16 16:38, Jason Dillaman wrote: > Short term, assuming there wouldn't be an objection from the libvirt > community, I think spawning a thread pool and concurrently executing several > rbd_stat calls concurrently would be the easiest and cleanest solution. I > wouldn't suggest trying to roll your own solution for retrieving image sizes > for format 1 and 2 RBD images directly within libvirt. > I'll ask in the libvirt community if they allow such a thing. > Longer term, given this use case, perhaps it would make sense to add an async > version of rbd_open. The rbd_stat call itself just reads the data from > memory initialized by rbd_open. On the Jewel branch, librbd has had some > major rework and image loading is asynchronous under the hood already. > Hmm, that would be nice. In the callback I could call rbd_stat() and populate the volume list within libvirt. I would very much like to go that route since it saves me a lot of code inside libvirt ;) Wido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Speeding up rbd_stat() in libvirt
Short term, assuming there wouldn't be an objection from the libvirt community, I think spawning a thread pool and concurrently executing several rbd_stat calls concurrently would be the easiest and cleanest solution. I wouldn't suggest trying to roll your own solution for retrieving image sizes for format 1 and 2 RBD images directly within libvirt. Longer term, given this use case, perhaps it would make sense to add an async version of rbd_open. The rbd_stat call itself just reads the data from memory initialized by rbd_open. On the Jewel branch, librbd has had some major rework and image loading is asynchronous under the hood already. -- Jason Dillaman - Original Message - > From: "Wido den Hollander" > To: ceph-devel@vger.kernel.org > Sent: Monday, December 28, 2015 8:48:40 AM > Subject: Speeding up rbd_stat() in libvirt > > Hi, > > The storage pools of libvirt know a mechanism called 'refresh' which > will scan a storage pool to refresh the contents. > > The current implementation does: > * List all images via rbd_list() > * Call rbd_stat() on each image > > Source: > http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/storage/storage_backend_rbd.c;h=cdbfdee98505492407669130712046783223c3cf;hb=master#l329 > > This works, but a RBD pool with 10k images takes a couple of minutes to > scan. > > Now, Ceph is distributed, so this could be done in parallel, but before > I start on this I was wondering if somebody had a good idea to fix this? > > I don't know if it is allowed in libvirt to spawn multiple threads and > have workers do this, but it was something which came to mind. > > libvirt only wants to know the size of a image and this is now stored in > the rbd_directory object, so the rbd_stat() is required. > > Suggestions or ideas? I would like to have this process to be as fast as > possible. > > Wido > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Create one millon empty files with cephfs
On Tue, Dec 29, 2015 at 4:55 AM, Fengguang Gong wrote: > hi, > We create one million empty files through filebench, here is the test env: > MDS: one MDS > MON: one MON > OSD: two OSD, each with one Inter P3700; data on OSD with 2x replica > Network: all nodes are connected through 10 gigabit network > > We use more than one client to create files, to test the scalability of > MDS. Here are the results: > IOPS under one client: 850 > IOPS under two client: 1150 > IOPS under four client: 1180 > > As we can see, the IOPS almost maintains unchanged when the number of > client increase from 2 to 4. > > Cephfs may have a low scalability under one MDS, and we think its the big > lock in > MDSDamon::ms_dispatch()::Mutex::locker(every request acquires this lock), > who limits the > scalability of MDS. > > We think this big lock could be removed through the following steps: > 1. separate the process of ClientRequest with other requests, so we can > parallel the process > of ClientRequest > 2. use some small granularity locks instead of big lock to ensure > consistency > > Wondering this idea is reasonable? Parallelizing the MDS is probably a very big job; it's on our radar but not for a while yet. If one were to do it, yes, breaking down the big MDS lock would be the way forward. I'm not sure entirely what that involves — you'd need to significantly chunk up the locking on our more critical data structures, most especially the MDCache. Luckily there is *some* help there in terms of the file cap locking structures we already have in place, but it's a *huge* project and not one to be undertaken lightly. A special processing mechanism for ClientRequests versus other requests is not an assumption I'd start with. I think you'll find that file creates are just about the least scalable thing you can do on CephFS right now, though, so there is some easier ground. One obvious approach is to extend the current inode preallocation — it already allocates inodes per-client and has a fast path inside of the MDS for handing them back. It'd be great if clients were aware of that preallocation and could create files without waiting for the MDS to talk back to them! The issue with this is two-fold: 1) need to update the cap flushing protocol to deal with files newly created by the client 2) need to handle all the backtrace stuff normally performed by the MDS on file create (which still needs to happen, on either the client or the server) There's also clean up in case of a client failure, but we've already got a model for that in how we figure out real file sizes and things based on max size. I think there's a ticket about this somewhere, but I can't find it off-hand... -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is rbd map/unmap op. configured like an event?
Hi All, Is rbd map/unmap op. configured like an event in the directory of /etc/init, so we can use system/upstart to automanage it? - wukongming ID: 12019 Tel:0571-86760239 Dept:2014 UIS2 ONEStor - 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 邮件! This e-mail and its attachments contain confidential information from H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!