date:20131003

Large time shift causes OSD to hit suicide timeout and ABRT

2013-10-03 Thread Andrey Korolyov

Hello,

Not sure if this matches any real-world problem:

step time server 192.168.10.125 offset 30763065.968946 sec

#0  0x7f2d0294d405 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7f2d02950b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x7f2d0324b875 in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x7f2d03249996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x7f2d032499c3 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x7f2d03249bee in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x0090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1
0 == \hit suicide timeout\, file=optimized out, line=79,
func=0xa38c60 bool
ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
time_t)) at common/assert.cc:77
#7  0x0087914b in ceph::HeartbeatMap::_check
(this=this@entry=0x35b40e0, h=h@entry=0x36d1050,
who=who@entry=0xa38aef reset_timeout, now=now@entry=1380797379)
at common/HeartbeatMap.cc:79
#8  0x0087940e in ceph::HeartbeatMap::reset_timeout
(this=0x35b40e0, h=0x36d1050, grace=15, suicide_grace=150) at
common/HeartbeatMap.cc:89
#9  0x0070ada7 in OSD::process_peering_events (this=0x375,
pgs=..., handle=...) at osd/OSD.cc:6808
#10 0x0074c2e4 in OSD::PeeringWQ::_process (this=optimized
out, pgs=..., handle=...) at osd/OSD.h:869
#11 0x00903dca in ThreadPool::worker (this=0x3750478,
wt=0x4ef6fa80) at common/WorkQueue.cc:119
#12 0x00905070 in ThreadPool::WorkThread::entry
(this=optimized out) at common/WorkQueue.h:316
#13 0x7f2d046c2e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#14 0x7f2d02a093dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x in ?? ()
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Build failure after merge of ceph tree

2013-10-03 Thread Mark Brown

After merging the ceph tree into -next an x86 allmodconfig build fails
with:

fs/ceph/file.c: In function ‘ceph_sync_read’:
fs/ceph/file.c:437:25: error: ‘struct iov_iter’ has no member named ‘iov’
void __user *data = i-iov[0].iov_base + i-iov_offset;
 ^
fs/ceph/file.c:438:18: error: ‘struct iov_iter’ has no member named ‘iov’
size_t len = i-iov[0].iov_len - i-iov_offset;
  ^
fs/ceph/file.c:470:26: error: ‘struct iov_iter’ has no member named ‘iov’
 void __user *data = i-iov[0].iov_base
  ^
In file included from include/linux/cache.h:4:0,
 from include/linux/time.h:4,
 from include/linux/stat.h:18,
 from include/linux/module.h:10,
 from fs/ceph/file.c:3:
fs/ceph/file.c:472:14: error: ‘struct iov_iter’ has no member named ‘iov’
 l = min(i-iov[0].iov_len - i-iov_offset,
  ^
include/linux/kernel.h:670:9: note: in definition of macro ‘min’
  typeof(x) _min1 = (x);   \
 ^
fs/ceph/file.c:472:14: error: ‘struct iov_iter’ has no member named ‘iov’
 l = min(i-iov[0].iov_len - i-iov_offset,
  ^
include/linux/kernel.h:670:21: note: in definition of macro ‘min’
  typeof(x) _min1 = (x);   \
 ^
include/linux/kernel.h:672:17: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
  (void) (_min1 == _min2);  \
 ^
fs/ceph/file.c:472:9: note: in expansion of macro ‘min’
 l = min(i-iov[0].iov_len - i-iov_offset,
 ^
fs/ceph/file.c: In function ‘ceph_sync_direct_write’:
fs/ceph/file.c:586:24: error: ‘struct iov_iter’ has no member named ‘iov’
   void __user *data = i.iov-iov_base + i.iov_offset;
^
fs/ceph/file.c:587:14: error: ‘struct iov_iter’ has no member named ‘iov’
   u64 len = i.iov-iov_len - i.iov_offset;
  ^
make[2]: *** [fs/ceph/file.o] Error 1
make[2]: *** Waiting for unfinished jobs
make[1]: *** [fs/ceph] Error 2

caused by commits 53d028160 (ceph: implement readv/preadv for sync
operation) and 2f0a7a180 (ceph: Implement writev/pwritev for sync
operation) interacting with commit f6794d33a5ec (iov_iter: hide iovec
details behind ops function pointers) from the aio-direct tree.

I extended Stephen's previous fix to this:

From 577435f0a97e67b735f355aef0ef55732814818c Mon Sep 17 00:00:00 2001
From: Mark Brown broo...@linaro.org
Date: Thu, 3 Oct 2013 13:05:20 +0100
Subject: [PATCH] ceph: Fix up for iov_iter changes

Extend an earlier fixup by Stephen Rothwell.

Signed-off-by: Mark Brown broo...@linaro.org
---
 fs/ceph/file.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c4419e8..37b5b5c 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -434,8 +434,8 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct 
iov_iter *i,
 
if (file-f_flags  O_DIRECT) {
while (iov_iter_count(i)) {
-   void __user *data = i-iov[0].iov_base + i-iov_offset;
-   size_t len = i-iov[0].iov_len - i-iov_offset;
+   void __user *data = iov_iter_iovec(i)-iov_base + 
i-iov_offset;
+   size_t len = iov_iter_iovec(i)-iov_len - i-iov_offset;
 
num_pages = calc_pages_for((unsigned long)data, len);
pages = ceph_get_direct_page_vector(data,
@@ -467,9 +467,9 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct 
iov_iter *i,
size_t left = len = ret;
 
while (left) {
-   void __user *data = i-iov[0].iov_base
+   void __user *data = iov_iter_iovec(i)-iov_base
+ i-iov_offset;
-   l = min(i-iov[0].iov_len - i-iov_offset,
+   l = min(iov_iter_iovec(i)-iov_len - 
i-iov_offset,
left);
 
ret = ceph_copy_page_vector_to_user(pages[k],
@@ -583,8 +583,8 @@ ceph_sync_direct_write(struct kiocb *iocb, const struct 
iovec *iov,
iov_iter_init(i, iov, nr_segs, count, 0);
 
while (iov_iter_count(i)  0) {
-   void __user *data = i.iov-iov_base + i.iov_offset;
-   u64 len = i.iov-iov_len - i.iov_offset;
+   void __user *data = iov_iter_iovec(i)-iov_base + i.iov_offset;
+   u64 len = iov_iter_iovec(i)-iov_len - i.iov_offset;
 
page_align = (unsigned long)data  ~PAGE_MASK;
 
-- 
1.8.4.rc3



signature.asc
Description: Digital signature

Re: [ceph-users] ceph-create-keys hung

2013-10-03 Thread Joao Eduardo Luis


On 10/03/2013 02:44 PM, Abhay Sachan wrote:

Hi All,
I have tried setting up a ceph cluster with 3 nodes (3 monitors). I am
using RHEL 6.4 as OS with dumpling(0.67.3) release. Ceph cluster
creation (using ceph-deploy as well as mkcephfs), ceph-creates-keys
doesn't return on any of the servers. Whereas, if I create a cluster
with only 1 node (1 monitor), key creation goes through. Has anybody
seen this problem or any ideas what I might be missing??

Regards,
Abhay


Those symptoms tell me that your monitors are not forming quorum. 
'ceph-create-keys' needs the monitors to first establish a quorum, 
otherwise it will hang waiting for that to happen.


Please make sure all your monitors are running.  If so, try running 
'ceph -s' on your cluster.  If that hangs as well, try accessing each 
monitor's admin socket to check what's happening [1].  If that too 
fails, try looking into the logs for something obviously wrong.  If you 
are not able to discern anything useful at that point, upload the logs 
to some place and point us to them -- we'll then be happy to take a look.


Hope this helps.

  -Joao

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Large time shift causes OSD to hit suicide timeout and ABRT

2013-10-03 Thread Sage Weil

On Thu, 3 Oct 2013, Andrey Korolyov wrote:
 Hello,
 
 Not sure if this matches any real-world problem:
 
 step time server 192.168.10.125 offset 30763065.968946 sec

Heh.. yeah, we use timestamps in lots o fplaces for things like timeouts.  
Small time steps are fine but big ones can easily cause problems.

sage


 
 #0  0x7f2d0294d405 in raise () from /lib/x86_64-linux-gnu/libc.so.6
 #1  0x7f2d02950b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6
 #2  0x7f2d0324b875 in __gnu_cxx::__verbose_terminate_handler() ()
 from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 #3  0x7f2d03249996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 #4  0x7f2d032499c3 in std::terminate() () from
 /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 #5  0x7f2d03249bee in __cxa_throw () from
 /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 #6  0x0090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1
 0 == \hit suicide timeout\, file=optimized out, line=79,
 func=0xa38c60 bool
 ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
 time_t)) at common/assert.cc:77
 #7  0x0087914b in ceph::HeartbeatMap::_check
 (this=this@entry=0x35b40e0, h=h@entry=0x36d1050,
 who=who@entry=0xa38aef reset_timeout, now=now@entry=1380797379)
 at common/HeartbeatMap.cc:79
 #8  0x0087940e in ceph::HeartbeatMap::reset_timeout
 (this=0x35b40e0, h=0x36d1050, grace=15, suicide_grace=150) at
 common/HeartbeatMap.cc:89
 #9  0x0070ada7 in OSD::process_peering_events (this=0x375,
 pgs=..., handle=...) at osd/OSD.cc:6808
 #10 0x0074c2e4 in OSD::PeeringWQ::_process (this=optimized
 out, pgs=..., handle=...) at osd/OSD.h:869
 #11 0x00903dca in ThreadPool::worker (this=0x3750478,
 wt=0x4ef6fa80) at common/WorkQueue.cc:119
 #12 0x00905070 in ThreadPool::WorkThread::entry
 (this=optimized out) at common/WorkQueue.h:316
 #13 0x7f2d046c2e9a in start_thread () from
 /lib/x86_64-linux-gnu/libpthread.so.0
 #14 0x7f2d02a093dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
 #15 0x in ?? ()
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hung task on page invalidate

2013-10-03 Thread David Howells

Milosz Tanski mil...@adfin.com wrote:

 Stores : ops=10257 run=67477 pgs=57220 rxd=62216 olm=14

I think this line probably shows the problem.  The olm=14 indicates that 14
pages were found over the store limit set on the object.  Look in
fscache_write_op() for:

if (page-index  op-store_limit) {
fscache_stat(fscache_n_store_pages_over_limit);
goto superseded;
}

If we find a page that's over the store limit, we immediately abandon the
storage attempt - which is wrong.  We need to do something similar to
fscache_end_page_write() but clearing COOKIE_PENDING_TAG - and then we need to
continue and clear all pages over the limit.

David
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

A call for teuthology users

2013-10-03 Thread Loic Dachary

Hi Ross  Patrick,

During tonight talk about teuthology I suggested that, if you think it's 
appropriate, you could say a word during your talk in London next week. 
Although teuthology is still rough around the edges, it's definitely useable 
and useful, even outside inktank. If you could invite people to join our weekly 
meeting and / or try to install and run teuthology, that may help grow the user 
base.

What do you think ?

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.




signature.asc
Description: OpenPGP digital signature

Weekly teuthology meeting #3

2013-10-03 Thread Loic Dachary

Hi,

Today 8pm paris time / CEST was the thrid teuthology meeting. The general idea 
of these meetings is to get together developers using teuthology ( or willing 
to use teuthology ) outside of Inktank and people using it inside inktank. The 
goal is to help transition from an internal tool to something that's 
installable and upgradeable by any Ceph developer. 

The next meeting will be held 6pm paris time / CEST wednesday october 9th, 
2013. The IRC channel is irc.oftc.net#ceph-devel and the conference room is 
mumble.upstream-university.org ( http://www.mumble.com/mumble-download.php ). 
It is timeboxed to one hour.

This edition will be immediately after the Ceph day 
http://cephdaylondon-eorg.eventbrite.com/ and I'll do my best to recruit 
participants :-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.








signature.asc
Description: OpenPGP digital signature

looking for loan

2013-10-03 Thread Aijaz Lending

Do you have a firm or company that need loan to start up a business or 
need,personal loan, Debt consolidation? For more information,Contact us now for 
a guarantee loan with low interest rate. We will provide you with loan to meet 
your needs. For more information contact us with the following information's.
Full name:
country:
Address:
Phone Number:
Amount needed:
Duration of loan:

sg.loan...@outlook.com
Kind regards
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

a couple hot/cold classification storage papers

2013-10-03 Thread Sage Weil

http://131.107.65.14/pubs/176690/ColdDataClassification-icde2013-cr.pdf

- identify hot/cold records for an in-memory database
- in-memory lru is discarded ot of hand due to overhead
- they do a simple log (or log a sample of say 10% of accesses) an
present various algorithms for estimating K hottests items from that.
- their 'backward' algorithm scans the log in reverse chronological
order. once it figures out no further items can be found that compete
with what is hottest so far it can terminate early.
- they seem to assume that every record is in the log, or that anything
not in the log is already known cold and not of interest. so, not quite
the same problem as us unless we log for all time.

Thought:
We could only trim a hitset/bloom filter/whatever once every hash key
that appears in that set but not later sets has been demoted/purged. In
our case, that could mean:

- initial pass that enumerates all object and pushes untouched stuff (as
we've previosly discussed)
- thereafter, the agent scans from 0..2^32 and enumerates any hash
values appearing in the oldest sets but not newer ones and only pushes
those
down.

Not sure how tractable that might be. If we explicitly listed object
names in each hitset it would certainly work.

---

http://dmclab.hanyang.ac.kr/wikidata/ssd/2012_ssd_seminar/MSST_2011/HotDataIdentification_DongchulPark_MSST_2011.pdf

- identify hot data in an SSD
- bloom filters because DRAM is precious (and mostly needed for FTL)
- round-robin set of bloom filters
- estimate both frequency (how many bf's does it appear in) and recency
(oldest/newest access)

Thoughts:
- Any DRAM not spent on hot/cold tracking is spent on caching, which
improves performance.
- We could use counting bloom filters. Although that may not be that
useful if we have multiple bins and can count how many bins accesses
appear in.

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

xattr limits

2013-10-03 Thread David Zafman


I want to record with the ceph-devel archive results from testing limits of 
xattrs for Linux filesystems used with Ceph.

Script that creates xattrs with name user.test1, user.test2, …. on a single file
3.10 linux kernel

ext4  
value bytes   number of entries
1   148
   16 103
  256  14
  5127
  1024 3
4036  1 
Beyond this immediately get ENOSPC

btrfs
value bytes   number of entries
 8   10k
16  10k
32   10k
64   10k
128  10k
256  10k
512 10k  slow but worked 1,000,000 got completely hung for 
minutes at a time during removal strace showed no forward progress
1024 10k
2048 10k
 309610k
Beyond this you start getting ENOSPC after fewer entries

xfs (limit entries due to xfs crash with 10k entries)
value bytes   number of entries
1   1k
81k
16   1k
32   1k
64  1k
128  1k
256 1k
512  1k
10241k
2048 1k
4096 1k
8192 1k
163841k
327681k
655361k

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html