Large time shift causes OSD to hit suicide timeout and ABRT
Hello, Not sure if this matches any real-world problem: step time server 192.168.10.125 offset 30763065.968946 sec #0 0x7f2d0294d405 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7f2d02950b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x7f2d0324b875 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x7f2d03249996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x7f2d032499c3 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x7f2d03249bee in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x0090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1 0 == \hit suicide timeout\, file=optimized out, line=79, func=0xa38c60 bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)) at common/assert.cc:77 #7 0x0087914b in ceph::HeartbeatMap::_check (this=this@entry=0x35b40e0, h=h@entry=0x36d1050, who=who@entry=0xa38aef reset_timeout, now=now@entry=1380797379) at common/HeartbeatMap.cc:79 #8 0x0087940e in ceph::HeartbeatMap::reset_timeout (this=0x35b40e0, h=0x36d1050, grace=15, suicide_grace=150) at common/HeartbeatMap.cc:89 #9 0x0070ada7 in OSD::process_peering_events (this=0x375, pgs=..., handle=...) at osd/OSD.cc:6808 #10 0x0074c2e4 in OSD::PeeringWQ::_process (this=optimized out, pgs=..., handle=...) at osd/OSD.h:869 #11 0x00903dca in ThreadPool::worker (this=0x3750478, wt=0x4ef6fa80) at common/WorkQueue.cc:119 #12 0x00905070 in ThreadPool::WorkThread::entry (this=optimized out) at common/WorkQueue.h:316 #13 0x7f2d046c2e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #14 0x7f2d02a093dd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #15 0x in ?? () -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Build failure after merge of ceph tree
After merging the ceph tree into -next an x86 allmodconfig build fails with: fs/ceph/file.c: In function ‘ceph_sync_read’: fs/ceph/file.c:437:25: error: ‘struct iov_iter’ has no member named ‘iov’ void __user *data = i-iov[0].iov_base + i-iov_offset; ^ fs/ceph/file.c:438:18: error: ‘struct iov_iter’ has no member named ‘iov’ size_t len = i-iov[0].iov_len - i-iov_offset; ^ fs/ceph/file.c:470:26: error: ‘struct iov_iter’ has no member named ‘iov’ void __user *data = i-iov[0].iov_base ^ In file included from include/linux/cache.h:4:0, from include/linux/time.h:4, from include/linux/stat.h:18, from include/linux/module.h:10, from fs/ceph/file.c:3: fs/ceph/file.c:472:14: error: ‘struct iov_iter’ has no member named ‘iov’ l = min(i-iov[0].iov_len - i-iov_offset, ^ include/linux/kernel.h:670:9: note: in definition of macro ‘min’ typeof(x) _min1 = (x); \ ^ fs/ceph/file.c:472:14: error: ‘struct iov_iter’ has no member named ‘iov’ l = min(i-iov[0].iov_len - i-iov_offset, ^ include/linux/kernel.h:670:21: note: in definition of macro ‘min’ typeof(x) _min1 = (x); \ ^ include/linux/kernel.h:672:17: warning: comparison of distinct pointer types lacks a cast [enabled by default] (void) (_min1 == _min2); \ ^ fs/ceph/file.c:472:9: note: in expansion of macro ‘min’ l = min(i-iov[0].iov_len - i-iov_offset, ^ fs/ceph/file.c: In function ‘ceph_sync_direct_write’: fs/ceph/file.c:586:24: error: ‘struct iov_iter’ has no member named ‘iov’ void __user *data = i.iov-iov_base + i.iov_offset; ^ fs/ceph/file.c:587:14: error: ‘struct iov_iter’ has no member named ‘iov’ u64 len = i.iov-iov_len - i.iov_offset; ^ make[2]: *** [fs/ceph/file.o] Error 1 make[2]: *** Waiting for unfinished jobs make[1]: *** [fs/ceph] Error 2 caused by commits 53d028160 (ceph: implement readv/preadv for sync operation) and 2f0a7a180 (ceph: Implement writev/pwritev for sync operation) interacting with commit f6794d33a5ec (iov_iter: hide iovec details behind ops function pointers) from the aio-direct tree. I extended Stephen's previous fix to this: From 577435f0a97e67b735f355aef0ef55732814818c Mon Sep 17 00:00:00 2001 From: Mark Brown broo...@linaro.org Date: Thu, 3 Oct 2013 13:05:20 +0100 Subject: [PATCH] ceph: Fix up for iov_iter changes Extend an earlier fixup by Stephen Rothwell. Signed-off-by: Mark Brown broo...@linaro.org --- fs/ceph/file.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index c4419e8..37b5b5c 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -434,8 +434,8 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *i, if (file-f_flags O_DIRECT) { while (iov_iter_count(i)) { - void __user *data = i-iov[0].iov_base + i-iov_offset; - size_t len = i-iov[0].iov_len - i-iov_offset; + void __user *data = iov_iter_iovec(i)-iov_base + i-iov_offset; + size_t len = iov_iter_iovec(i)-iov_len - i-iov_offset; num_pages = calc_pages_for((unsigned long)data, len); pages = ceph_get_direct_page_vector(data, @@ -467,9 +467,9 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *i, size_t left = len = ret; while (left) { - void __user *data = i-iov[0].iov_base + void __user *data = iov_iter_iovec(i)-iov_base + i-iov_offset; - l = min(i-iov[0].iov_len - i-iov_offset, + l = min(iov_iter_iovec(i)-iov_len - i-iov_offset, left); ret = ceph_copy_page_vector_to_user(pages[k], @@ -583,8 +583,8 @@ ceph_sync_direct_write(struct kiocb *iocb, const struct iovec *iov, iov_iter_init(i, iov, nr_segs, count, 0); while (iov_iter_count(i) 0) { - void __user *data = i.iov-iov_base + i.iov_offset; - u64 len = i.iov-iov_len - i.iov_offset; + void __user *data = iov_iter_iovec(i)-iov_base + i.iov_offset; + u64 len = iov_iter_iovec(i)-iov_len - i.iov_offset; page_align = (unsigned long)data ~PAGE_MASK; -- 1.8.4.rc3 signature.asc Description: Digital signature
Re: [ceph-users] ceph-create-keys hung
On 10/03/2013 02:44 PM, Abhay Sachan wrote: Hi All, I have tried setting up a ceph cluster with 3 nodes (3 monitors). I am using RHEL 6.4 as OS with dumpling(0.67.3) release. Ceph cluster creation (using ceph-deploy as well as mkcephfs), ceph-creates-keys doesn't return on any of the servers. Whereas, if I create a cluster with only 1 node (1 monitor), key creation goes through. Has anybody seen this problem or any ideas what I might be missing?? Regards, Abhay Those symptoms tell me that your monitors are not forming quorum. 'ceph-create-keys' needs the monitors to first establish a quorum, otherwise it will hang waiting for that to happen. Please make sure all your monitors are running. If so, try running 'ceph -s' on your cluster. If that hangs as well, try accessing each monitor's admin socket to check what's happening [1]. If that too fails, try looking into the logs for something obviously wrong. If you are not able to discern anything useful at that point, upload the logs to some place and point us to them -- we'll then be happy to take a look. Hope this helps. -Joao -- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Large time shift causes OSD to hit suicide timeout and ABRT
On Thu, 3 Oct 2013, Andrey Korolyov wrote: Hello, Not sure if this matches any real-world problem: step time server 192.168.10.125 offset 30763065.968946 sec Heh.. yeah, we use timestamps in lots o fplaces for things like timeouts. Small time steps are fine but big ones can easily cause problems. sage #0 0x7f2d0294d405 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7f2d02950b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x7f2d0324b875 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x7f2d03249996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x7f2d032499c3 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x7f2d03249bee in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x0090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1 0 == \hit suicide timeout\, file=optimized out, line=79, func=0xa38c60 bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)) at common/assert.cc:77 #7 0x0087914b in ceph::HeartbeatMap::_check (this=this@entry=0x35b40e0, h=h@entry=0x36d1050, who=who@entry=0xa38aef reset_timeout, now=now@entry=1380797379) at common/HeartbeatMap.cc:79 #8 0x0087940e in ceph::HeartbeatMap::reset_timeout (this=0x35b40e0, h=0x36d1050, grace=15, suicide_grace=150) at common/HeartbeatMap.cc:89 #9 0x0070ada7 in OSD::process_peering_events (this=0x375, pgs=..., handle=...) at osd/OSD.cc:6808 #10 0x0074c2e4 in OSD::PeeringWQ::_process (this=optimized out, pgs=..., handle=...) at osd/OSD.h:869 #11 0x00903dca in ThreadPool::worker (this=0x3750478, wt=0x4ef6fa80) at common/WorkQueue.cc:119 #12 0x00905070 in ThreadPool::WorkThread::entry (this=optimized out) at common/WorkQueue.h:316 #13 0x7f2d046c2e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #14 0x7f2d02a093dd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #15 0x in ?? () -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hung task on page invalidate
Milosz Tanski mil...@adfin.com wrote: Stores : ops=10257 run=67477 pgs=57220 rxd=62216 olm=14 I think this line probably shows the problem. The olm=14 indicates that 14 pages were found over the store limit set on the object. Look in fscache_write_op() for: if (page-index op-store_limit) { fscache_stat(fscache_n_store_pages_over_limit); goto superseded; } If we find a page that's over the store limit, we immediately abandon the storage attempt - which is wrong. We need to do something similar to fscache_end_page_write() but clearing COOKIE_PENDING_TAG - and then we need to continue and clear all pages over the limit. David -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
A call for teuthology users
Hi Ross Patrick, During tonight talk about teuthology I suggested that, if you think it's appropriate, you could say a word during your talk in London next week. Although teuthology is still rough around the edges, it's definitely useable and useful, even outside inktank. If you could invite people to join our weekly meeting and / or try to install and run teuthology, that may help grow the user base. What do you think ? -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing. signature.asc Description: OpenPGP digital signature
Weekly teuthology meeting #3
Hi, Today 8pm paris time / CEST was the thrid teuthology meeting. The general idea of these meetings is to get together developers using teuthology ( or willing to use teuthology ) outside of Inktank and people using it inside inktank. The goal is to help transition from an internal tool to something that's installable and upgradeable by any Ceph developer. The next meeting will be held 6pm paris time / CEST wednesday october 9th, 2013. The IRC channel is irc.oftc.net#ceph-devel and the conference room is mumble.upstream-university.org ( http://www.mumble.com/mumble-download.php ). It is timeboxed to one hour. This edition will be immediately after the Ceph day http://cephdaylondon-eorg.eventbrite.com/ and I'll do my best to recruit participants :-) Cheers -- Loïc Dachary, Artisan Logiciel Libre All that is necessary for the triumph of evil is that good people do nothing. signature.asc Description: OpenPGP digital signature
looking for loan
Do you have a firm or company that need loan to start up a business or need,personal loan, Debt consolidation? For more information,Contact us now for a guarantee loan with low interest rate. We will provide you with loan to meet your needs. For more information contact us with the following information's. Full name: country: Address: Phone Number: Amount needed: Duration of loan: sg.loan...@outlook.com Kind regards -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
a couple hot/cold classification storage papers
http://131.107.65.14/pubs/176690/ColdDataClassification-icde2013-cr.pdf - identify hot/cold records for an in-memory database - in-memory lru is discarded ot of hand due to overhead - they do a simple log (or log a sample of say 10% of accesses) an present various algorithms for estimating K hottests items from that. - their 'backward' algorithm scans the log in reverse chronological order. once it figures out no further items can be found that compete with what is hottest so far it can terminate early. - they seem to assume that every record is in the log, or that anything not in the log is already known cold and not of interest. so, not quite the same problem as us unless we log for all time. Thought: We could only trim a hitset/bloom filter/whatever once every hash key that appears in that set but not later sets has been demoted/purged. In our case, that could mean: - initial pass that enumerates all object and pushes untouched stuff (as we've previosly discussed) - thereafter, the agent scans from 0..2^32 and enumerates any hash values appearing in the oldest sets but not newer ones and only pushes those down. Not sure how tractable that might be. If we explicitly listed object names in each hitset it would certainly work. --- http://dmclab.hanyang.ac.kr/wikidata/ssd/2012_ssd_seminar/MSST_2011/HotDataIdentification_DongchulPark_MSST_2011.pdf - identify hot data in an SSD - bloom filters because DRAM is precious (and mostly needed for FTL) - round-robin set of bloom filters - estimate both frequency (how many bf's does it appear in) and recency (oldest/newest access) Thoughts: - Any DRAM not spent on hot/cold tracking is spent on caching, which improves performance. - We could use counting bloom filters. Although that may not be that useful if we have multiple bins and can count how many bins accesses appear in. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
xattr limits
I want to record with the ceph-devel archive results from testing limits of xattrs for Linux filesystems used with Ceph. Script that creates xattrs with name user.test1, user.test2, …. on a single file 3.10 linux kernel ext4 value bytes number of entries 1 148 16 103 256 14 5127 1024 3 4036 1 Beyond this immediately get ENOSPC btrfs value bytes number of entries 8 10k 16 10k 32 10k 64 10k 128 10k 256 10k 512 10k slow but worked 1,000,000 got completely hung for minutes at a time during removal strace showed no forward progress 1024 10k 2048 10k 309610k Beyond this you start getting ENOSPC after fewer entries xfs (limit entries due to xfs crash with 10k entries) value bytes number of entries 1 1k 81k 16 1k 32 1k 64 1k 128 1k 256 1k 512 1k 10241k 2048 1k 4096 1k 8192 1k 163841k 327681k 655361k -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html