librbd bug?

2013-03-07 Thread Wolfgang Hennerbichler
Hi,

I've a libvirt-VM that gets format 2 rbd-childs 'fed' by the superhost.
It crashed recently with this in the logs:

osdc/ObjectCacher.cc: In function 'void
ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t,
tid_t, int)' thread 7f0cab5fd700 time 2013-03-01 22:02:37.374410
osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)
 ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
 1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long,
unsigned long, int)+0xd68) [0x7f0d087cda28]
 2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x7f0d087d460b]
 3: (Context::complete(int)+0xa) [0x7f0d0878c9fa]
 4: (librbd::C_Request::finish(int)+0x85) [0x7f0d087bc325]
 5: (Context::complete(int)+0xa) [0x7f0d0878c9fa]
 6: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f0d087a1387]
 7: (librados::C_AioSafe::finish(int)+0x1d) [0x7f0d07b5834d]
 8: (Finisher::finisher_thread_entry()+0x1c0) [0x7f0d07bc20d0]
 9: (()+0x7e9a) [0x7f0d0546be9a]
 10: (clone()+0x6d) [0x7f0d05198cbd]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'

Any clue why that happened?

-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] ceph: increase i_release_count when clear I_COMPLETE flag

2013-03-07 Thread Yan, Zheng
From: "Yan, Zheng" 

If some dentries were pruned or FILE_SHARED cap was revoked while
readdir is in progress. make sure ceph_readdir() does not mark the
directory as complete.

Signed-off-by: Yan, Zheng 
---
 fs/ceph/caps.c |  1 +
 fs/ceph/dir.c  | 13 +++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 76634f4..35cebf3 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -500,6 +500,7 @@ static void __check_cap_issue(struct ceph_inode_info *ci, 
struct ceph_cap *cap,
if (S_ISDIR(ci->vfs_inode.i_mode)) {
dout(" marking %p NOT complete\n", &ci->vfs_inode);
ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
+   ci->i_release_count++;
}
}
 }
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 76821be..068304c 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -909,7 +909,11 @@ static int ceph_rename(struct inode *old_dir, struct 
dentry *old_dentry,
 */
 
/* d_move screws up d_subdirs order */
-   ceph_i_clear(new_dir, CEPH_I_COMPLETE);
+   struct ceph_inode_info *ci = ceph_inode(new_dir);
+   spin_lock(&ci->i_ceph_lock);
+   ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
+   ci->i_release_count++;
+   spin_unlock(&ci->i_ceph_lock);
 
d_move(old_dentry, new_dentry);
 
@@ -1073,6 +1077,7 @@ static int ceph_snapdir_d_revalidate(struct dentry 
*dentry,
  */
 static void ceph_d_prune(struct dentry *dentry)
 {
+   struct ceph_inode_info *ci;
dout("ceph_d_prune %p\n", dentry);
 
/* do we have a valid parent? */
@@ -1087,7 +1092,11 @@ static void ceph_d_prune(struct dentry *dentry)
 * we hold d_lock, so d_parent is stable, and d_fsdata is never
 * cleared until d_release
 */
-   ceph_i_clear(dentry->d_parent->d_inode, CEPH_I_COMPLETE);
+   ci = ceph_inode(dentry->d_parent->d_inode);
+   spin_lock(&ci->i_ceph_lock);
+   ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
+   ci->i_release_count++;
+   spin_unlock(&ci->i_ceph_lock);
 }
 
 /*
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] fs: fix dentry_lru_prune()

2013-03-07 Thread Yan, Zheng
From: "Yan, Zheng" 

dentry_lru_prune() should always call file system's d_prune callback.

Signed-off-by: Yan, Zheng 
---
 fs/dcache.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 19153a0..f0060aa 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -344,14 +344,9 @@ static void dentry_lru_del(struct dentry *dentry)
  */
 static void dentry_lru_prune(struct dentry *dentry)
 {
-   if (!list_empty(&dentry->d_lru)) {
-   if (dentry->d_flags & DCACHE_OP_PRUNE)
-   dentry->d_op->d_prune(dentry);
-
-   spin_lock(&dcache_lru_lock);
-   __dentry_lru_del(dentry);
-   spin_unlock(&dcache_lru_lock);
-   }
+   if (dentry->d_flags & DCACHE_OP_PRUNE)
+   dentry->d_op->d_prune(dentry);
+   dentry_lru_del(dentry);
 }
 
 static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CephFS First product release discussion

2013-03-07 Thread Jimmy Tang

On 5 Mar 2013, at 17:03, Greg Farnum wrote:

> This is a companion discussion to the blog post at 
> http://ceph.com/dev-notes/cephfs-mds-status-discussion/ — go read that!
> 
> The short and slightly alternate version: I spent most of about two weeks 
> working on bugs related to snapshots in the MDS, and we started realizing 
> that we could probably do our first supported release of CephFS and the 
> related infrastructure much sooner if we didn't need to support all of the 
> whizbang features. (This isn't to say that the base feature set is stable 
> now, but it's much closer than when you turn on some of the other things.) 
> I'd like to get feedback from you in the community on what minimum supported 
> feature set would prompt or allow you to start using CephFS in real 
> environments — not what you'd *like* to see, but what you *need* to see. This 
> will allow us at Inktank to prioritize more effectively and hopefully get out 
> a supported release much more quickly! :)
> 
> The current proposed feature set is basically what's left over after we've 
> trimmed off everything we can think to split off, but if any of the proposed 
> included features are also particularly important or don't matter, be sure to 
> mention them (NFS export in particular — it works right now but isn't in 
> great shape due to NFS filehandle caching).
> 

fsck would be desirable, even if its just something that tells me that 
something is 'corrupted' or 'dangling' would be useful. quotas on sub-tree's 
like how the du feature is currently implemented would be nice. 

some sort of a smarter exporting of sub-tree's that would nice too e.g. if i 
mounted /ceph/fileset_1 as /myfs1 on a client, I'd like the /myfs1 to report 
100gb when i run df instead of 100tb which is the entire system that /ceph/ 
has, we're currently using rbd's here to limit what the users should have so we 
can present a subset of the storage managed by ceph to end users so they don't 
get excited with seeing 100tb available in cephfs (the numbers here are 
fictional). managing one cephfs is probably easier than managing lots of rbd's 
in certain cases.

Regards,
Jimmy Tang

--
Senior Software Engineer, Digital Repository of Ireland (DRI)
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jt...@tchpc.tcd.ie
Tel: +353-1-896-3847

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stuff for v0.56.4

2013-03-07 Thread Travis Rhoden
As long as the fix for...

osdc/ObjectCacher.cc: In function 'void
ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t,
tid_t, int)' thread 7fd316a50700 time 2013-03-07 15:03:21.641190
osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)

...s in there (which you already put on the bobtail branch, I believe)
I will be happy.  This particular bug crashes several VMs a day for
me.

 - Travis

On Wed, Mar 6, 2013 at 3:37 AM, Wido den Hollander  wrote:
> On 03/06/2013 12:10 AM, Sage Weil wrote:
>>
>> There have been a few important bug fixes that people are hitting or
>> want:
>>
>> - the journal replay bug (5d54ab154ca790688a6a1a2ad5f869c17a23980a)
>> - the - _ pool name vs cap parsing thing that is biting openstack users
>> - ceph-disk-* changes to support latest ceph-deploy
>>
>> If there are other things that we want to include in 0.56.4, lets get them
>> into the bobtial branch sooner rather than later.
>>
>> Possible items:
>>
>> - pg log trimming (probably a conservative subset) to avoid memory bloat
>> - omap scrub?
>> - pg temp collection removal?
>> - buffer::cmp fix from loic?
>>
>> Are there other items that we are missing?
>>
>
> I'm still seeing #3816 on my systems. The fix in wip-3816 did not resolve it
> for me.
>
> Wido
>
>
>> sage
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CephFS Space Accounting and Quotas

2013-03-07 Thread Jim Schutt
On 03/06/2013 05:18 PM, Greg Farnum wrote:
> On Wednesday, March 6, 2013 at 3:14 PM, Jim Schutt wrote:
>> When I'm doing these stat operations the file system is otherwise
>> idle.
> 
> What's the cluster look like? This is just one active MDS and a couple 
> hundred clients?

1 mds, 1 mon, 576 osds, 198 cephfs clients.

> 
>> What is happening is that once one of these slow stat operations
>> on a file completes, it never happens again for that file, from
>> any client. At least, that's the case if I'm not writing to
>> the file any more. I haven't checked if appending to the files
>> restarts the behavior.
> 
> I assume it'll come back, but if you could verify that'd be good.

OK, I'll check it out.

> 
>  
>> On the client side I'm running with 3.8.2 + the ceph patch queue
>> that was merged into 3.9-rc1.
>>
>> On the server side I'm running recent next branch (commit 0f42eddef5),
>> with the tcp receive socket buffer option patches cherry-picked.
>> I've also got a patch that allows mkcephfs to use osd_pool_default_pg_num
>> rather than pg_bits to set initial number of PGs (same for pgp_num),
>> and a patch that lets me run with just one pool that contains both
>> data and metadata. I'm testing data distribution uniformity with 512K PGs.
>>
>> My MDS tunables are all at default settings.
>>
>>>
>>> We'll probably want to get a high-debug log of the MDS during these slow 
>>> stats as well.
>>
>> OK.
>>
>> Do you want me to try to reproduce with a more standard setup?
> No, this is fine. 
>  
>> Also, I see Sage just pushed a patch to pgid decoding - I expect
>> I need that as well, if I'm running the latest client code.
> 
> Yeah, if you've got the commit it references you'll want it.
> 
>> Do you want the MDS log at 10 or 20?
> More is better. ;)

OK, thanks.

-- Jim

> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: librbd bug?

2013-03-07 Thread Sage Weil
On Thu, 7 Mar 2013, Wolfgang Hennerbichler wrote:
> Hi,
> 
> I've a libvirt-VM that gets format 2 rbd-childs 'fed' by the superhost.
> It crashed recently with this in the logs:
> 
> osdc/ObjectCacher.cc: In function 'void
> ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t,
> tid_t, int)' thread 7f0cab5fd700 time 2013-03-01 22:02:37.374410
> osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)
>  ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
>  1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long,
> unsigned long, int)+0xd68) [0x7f0d087cda28]
>  2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x7f0d087d460b]
>  3: (Context::complete(int)+0xa) [0x7f0d0878c9fa]
>  4: (librbd::C_Request::finish(int)+0x85) [0x7f0d087bc325]
>  5: (Context::complete(int)+0xa) [0x7f0d0878c9fa]
>  6: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f0d087a1387]
>  7: (librados::C_AioSafe::finish(int)+0x1d) [0x7f0d07b5834d]
>  8: (Finisher::finisher_thread_entry()+0x1c0) [0x7f0d07bc20d0]
>  9: (()+0x7e9a) [0x7f0d0546be9a]
>  10: (clone()+0x6d) [0x7f0d05198cbd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> 
> Any clue why that happened?

We fixed one bug that triggered this behavior, but I just saw another 
occurence yesterday.  I'm working on reproducing it now.  Once I have 
some confidence it is fully resolved I will backport the fix(es) to the 
bobtail branch.

Thanks!
sage

> 
> -- 
> DI (FH) Wolfgang Hennerbichler
> Software Development
> Unit Advanced Computing Technologies
> RISC Software GmbH
> A company of the Johannes Kepler University Linz
> 
> IT-Center
> Softwarepark 35
> 4232 Hagenberg
> Austria
> 
> Phone: +43 7236 3343 245
> Fax: +43 7236 3343 250
> wolfgang.hennerbich...@risc-software.at
> http://www.risc-software.at
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MDS running at 100% CPU, no clients

2013-03-07 Thread Greg Farnum
This isn't bringing up anything in my brain, but I don't know what that 
_sample() function is actually doing — did you get any farther into it?
-Greg

On Wednesday, March 6, 2013 at 6:23 PM, Noah Watkins wrote:

> Which, looks to be in a tight loop in the memory model _sample…
>  
> (gdb) bt
> #0 0x7f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x7f027046dd88 in std::__basic_file::xsgetn(char*, long) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #2 0x7f027046f4c5 in std::basic_filebuf 
> >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3 0x7f0270467ceb in std::basic_istream >& 
> std::getline, std::allocator 
> >(std::basic_istream >&, std::basic_string std::char_traits, std::allocator >&, char) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4 0x0072bdd4 in MemoryModel::_sample(MemoryModel::snap*) ()
> #5 0x005658db in MDCache::check_memory_usage() ()
> #6 0x004ba929 in MDS::tick() ()
> #7 0x00794c65 in SafeTimer::timer_thread() ()
> #8 0x007958ad in SafeTimerThread::entry() ()
> #9 0x7f0270d7de9a in start_thread () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
>  
> On Mar 6, 2013, at 6:18 PM, Noah Watkins  (mailto:jayh...@cs.ucsc.edu)> wrote:
>  
> >  
> > On Mar 6, 2013, at 5:57 PM, Noah Watkins  > (mailto:jayh...@cs.ucsc.edu)> wrote:
> >  
> > > The MDS process in my cluster is running at 100% CPU. In fact I thought 
> > > the cluster came down, but rather an ls was taking a minute. There aren't 
> > > any clients active. I've left the process running in case there is any 
> > > probing you'd like to do on it:
> > >  
> > > virt res cpu
> > > 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds
> > >  
> > > Thanks,
> > > Noah
> >  
> >  
> >  
> >  
> > This is a ceph-mds child thread under strace. The only thread
> > that appears to be doing anything.
> >  
> > root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372
> > Process 3372 attached - interrupt to quit
> > read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050
> > read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050
> > read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050
> > read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020
> > read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020
> > read(1649, "7f0217d8-7f0217e8 rw-p 0"..., 8191) = 4020
> > read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020
> > ...
> >  
> > That file looks to be:
> >  
> > ceph-mds 3337 root 1649r REG 0,3 0 266903 /proc/3337/maps
> >  
> > (3337 is the parent process).
>  
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org 
> (mailto:majord...@vger.kernel.org)
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MDS running at 100% CPU, no clients

2013-03-07 Thread Noah Watkins

On Mar 7, 2013, at 9:24 AM, Greg Farnum  wrote:

> This isn't bringing up anything in my brain, but I don't know what that 
> _sample() function is actually doing — did you get any farther into it?

_sample reads /proc/self/maps in a loop until eof or some other conditions. i 
couldn't figure out if the thread was stuck in _sample or a level up. Anyhow, 
my gdb-foo isn't stellar and I managed to crash the mds. I'm gonna stick some 
log points in and try to reproduce it.


> -Greg
> 
> On Wednesday, March 6, 2013 at 6:23 PM, Noah Watkins wrote:
> 
>> Which, looks to be in a tight loop in the memory model _sample…
>> 
>> (gdb) bt
>> #0 0x7f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1 0x7f027046dd88 in std::__basic_file::xsgetn(char*, long) () 
>> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #2 0x7f027046f4c5 in std::basic_filebuf 
>> >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #3 0x7f0270467ceb in std::basic_istream >& 
>> std::getline, std::allocator 
>> >(std::basic_istream >&, 
>> std::basic_string, std::allocator >&, 
>> char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> #4 0x0072bdd4 in MemoryModel::_sample(MemoryModel::snap*) ()
>> #5 0x005658db in MDCache::check_memory_usage() ()
>> #6 0x004ba929 in MDS::tick() ()
>> #7 0x00794c65 in SafeTimer::timer_thread() ()
>> #8 0x007958ad in SafeTimerThread::entry() ()
>> #9 0x7f0270d7de9a in start_thread () from 
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> 
>> On Mar 6, 2013, at 6:18 PM, Noah Watkins > (mailto:jayh...@cs.ucsc.edu)> wrote:
>> 
>>> 
>>> On Mar 6, 2013, at 5:57 PM, Noah Watkins >> (mailto:jayh...@cs.ucsc.edu)> wrote:
>>> 
 The MDS process in my cluster is running at 100% CPU. In fact I thought 
 the cluster came down, but rather an ls was taking a minute. There aren't 
 any clients active. I've left the process running in case there is any 
 probing you'd like to do on it:
 
 virt res cpu
 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds
 
 Thanks,
 Noah
>>> 
>>> 
>>> 
>>> 
>>> This is a ceph-mds child thread under strace. The only thread
>>> that appears to be doing anything.
>>> 
>>> root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372
>>> Process 3372 attached - interrupt to quit
>>> read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050
>>> read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050
>>> read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050
>>> read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020
>>> read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020
>>> read(1649, "7f0217d8-7f0217e8 rw-p 0"..., 8191) = 4020
>>> read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020
>>> ...
>>> 
>>> That file looks to be:
>>> 
>>> ceph-mds 3337 root 1649r REG 0,3 0 266903 /proc/3337/maps
>>> 
>>> (3337 is the parent process).
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org 
>> (mailto:majord...@vger.kernel.org)
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stuff for v0.56.4

2013-03-07 Thread Yehuda Sadeh
On Tue, Mar 5, 2013 at 3:10 PM, Sage Weil  wrote:
> There have been a few important bug fixes that people are hitting or
> want:
>
> - the journal replay bug (5d54ab154ca790688a6a1a2ad5f869c17a23980a)
> - the - _ pool name vs cap parsing thing that is biting openstack users
> - ceph-disk-* changes to support latest ceph-deploy
>
> If there are other things that we want to include in 0.56.4, lets get them
> into the bobtial branch sooner rather than later.
>
> Possible items:
>
> - pg log trimming (probably a conservative subset) to avoid memory bloat
> - omap scrub?
> - pg temp collection removal?
> - buffer::cmp fix from loic?
>
> Are there other items that we are missing?
>

wip-4247-bobtail (pending review)

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is linux 3.8.2 up to date with all ceph patches?

2013-03-07 Thread Nick Bartos
I'm looking at upgrading to 3.8.2 from 3.5.7 with patches, and I just
wanted to make sure that there weren't any additional ceph fixes that
should be applied to 3.8.2.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


changes to rados command

2013-03-07 Thread Andrew Hume

in order to make the rados command more useful in scripts,
i'd like to make a change, specifically change to

rados -p pool getomapval obj key [fmt]

where fmt is an optional formatting parameter.
i've implemented 'str' which will print the value as an unadorned string.

what is the process for doing this?

andrew hume
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Using different storage types on same osd hosts?

2013-03-07 Thread Stefan Priebe

Am 06.03.2013 09:58, schrieb Martin B Nielsen:

Hi,

We did the opposite here; adding some SSD in free slots after having a
normal cluster running with SATA.

Thanks for your answer. Why did you do this? Was it to slow with SATA?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: changes to rados command

2013-03-07 Thread Greg Farnum
On Thursday, March 7, 2013 at 11:25 AM, Andrew Hume wrote:
> 
> in order to make the rados command more useful in scripts,
> i'd like to make a change, specifically change to
> 
> rados -p pool getomapval obj key [fmt]
> 
> where fmt is an optional formatting parameter.
> i've implemented 'str' which will print the value as an unadorned string.
> 
> what is the process for doing this?
> 
Patch submission, you mean? Github pull requests, sending a pull request with 
git URL to the list, or sending straight patches the list are all good. I'll 
like you more if you give me a URL of some form instead of making me get the 
patches out of email and into my git repo correctly, though. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: changes to rados command

2013-03-07 Thread Greg Farnum
(Re-added the list for future reference)

Well, you'll need to learn how to use git at a basic level in order to be able 
to work effectively on Ceph (or most other open-source projects).
Some links that might be helpful:
http://www.joelonsoftware.com/items/2010/03/17.html
http://www.ibm.com/developerworks/library/l-git-subversion-1/
http://try.github.com/

I haven't been through these all thoroughly, but the first one should describe 
the mental model changes that motivate git, the second looks to be a deep 
tutorial, and the third will teach you the mechanics. :)


Github pull requests are a Github nicety; their website will teach you how to 
use them. A simple git URL just requires that your git repository be accessible 
over the internet, and then you tell us what the URL is and what branch to pull 
from, and we can do so. (This of course requires that your changes actually be 
in a branch, so you'll need to have the commits arranged nicely and such.)
-Greg


On Thursday, March 7, 2013 at 12:42 PM, Andrew Hume wrote:

> i don't know how to do teh first two (but i am a quickish learner).
> i know how to type git diff | mail already.
> if you can guide me a little on how to do teh git things, i'll do those.
> 
> 
> On Mar 7, 2013, at 1:37 PM, Greg Farnum wrote:
> > On Thursday, March 7, 2013 at 11:25 AM, Andrew Hume wrote:
> > > 
> > > in order to make the rados command more useful in scripts,
> > > i'd like to make a change, specifically change to
> > > 
> > > rados -p pool getomapval obj key [fmt]
> > > 
> > > where fmt is an optional formatting parameter.
> > > i've implemented 'str' which will print the value as an unadorned string.
> > > 
> > > what is the process for doing this?
> > Patch submission, you mean? Github pull requests, sending a pull request 
> > with git URL to the list, or sending straight patches the list are all 
> > good. I'll like you more if you give me a URL of some form instead of 
> > making me get the patches out of email and into my git repo correctly, 
> > though. :)
> > -Greg
> > Software Engineer #42 @ http://inktank.com | http://ceph.com
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org 
> > (mailto:majord...@vger.kernel.org)
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> ---
> Andrew Hume
> 623-551-2845 (VO and best)
> 973-236-2014 (NJ)
> and...@research.att.com (mailto:and...@research.att.com)



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] ceph: increase i_release_count when clear I_COMPLETE flag

2013-03-07 Thread Greg Farnum
I'm pulling this in for now to make sure this clears out that ENOENT bug we hit 
— but shouldn't we be fixing ceph_i_clear() to always bump the i_release_count? 
It doesn't seem like it would ever be correct without it, and these are the 
only two callers.  

The second one looks good to us and we'll test it but of course that can't go 
upstream through our tree.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Thursday, March 7, 2013 at 3:36 AM, Yan, Zheng wrote:

> From: "Yan, Zheng" mailto:zheng.z@intel.com)>
>  
> If some dentries were pruned or FILE_SHARED cap was revoked while
> readdir is in progress. make sure ceph_readdir() does not mark the
> directory as complete.
>  
> Signed-off-by: Yan, Zheng  (mailto:zheng.z@intel.com)>
> ---
> fs/ceph/caps.c | 1 +
> fs/ceph/dir.c | 13 +++--
> 2 files changed, 12 insertions(+), 2 deletions(-)
>  
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 76634f4..35cebf3 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -500,6 +500,7 @@ static void __check_cap_issue(struct ceph_inode_info *ci, 
> struct ceph_cap *cap,
> if (S_ISDIR(ci->vfs_inode.i_mode)) {
> dout(" marking %p NOT complete\n", &ci->vfs_inode);
> ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
> + ci->i_release_count++;
> }
> }
> }
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 76821be..068304c 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -909,7 +909,11 @@ static int ceph_rename(struct inode *old_dir, struct 
> dentry *old_dentry,
> */
>  
> /* d_move screws up d_subdirs order */
> - ceph_i_clear(new_dir, CEPH_I_COMPLETE);
> + struct ceph_inode_info *ci = ceph_inode(new_dir);
> + spin_lock(&ci->i_ceph_lock);
> + ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
> + ci->i_release_count++;
> + spin_unlock(&ci->i_ceph_lock);
>  
> d_move(old_dentry, new_dentry);
>  
> @@ -1073,6 +1077,7 @@ static int ceph_snapdir_d_revalidate(struct dentry 
> *dentry,
> */
> static void ceph_d_prune(struct dentry *dentry)
> {
> + struct ceph_inode_info *ci;
> dout("ceph_d_prune %p\n", dentry);
>  
> /* do we have a valid parent? */
> @@ -1087,7 +1092,11 @@ static void ceph_d_prune(struct dentry *dentry)
> * we hold d_lock, so d_parent is stable, and d_fsdata is never
> * cleared until d_release
> */
> - ceph_i_clear(dentry->d_parent->d_inode, CEPH_I_COMPLETE);
> + ci = ceph_inode(dentry->d_parent->d_inode);
> + spin_lock(&ci->i_ceph_lock);
> + ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
> + ci->i_release_count++;
> + spin_unlock(&ci->i_ceph_lock);
> }
>  
> /*
> --  
> 1.7.11.7



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stuff for v0.56.4

2013-03-07 Thread Bryan K. Wright

s...@inktank.com said:
> - pg log trimming (probably a conservative subset) to avoid memory bloat 

Anything that reduces the size of OSD processes would be appreciated.

Bryan
-- 

Bryan Wright  |"If you take cranberries and stew them like 
Physics Department| applesauce, they taste much more like prunes
University of Virginia| than rhubarb does."  --  Groucho 
Charlottesville, VA  22901| 
(434) 924-7218| br...@virginia.edu



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is linux 3.8.2 up to date with all ceph patches?

2013-03-07 Thread Sage Weil
On Thu, 7 Mar 2013, Nick Bartos wrote:
> I'm looking at upgrading to 3.8.2 from 3.5.7 with patches, and I just
> wanted to make sure that there weren't any additional ceph fixes that
> should be applied to 3.8.2.

Nothing that I'm aware of.  Alex?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stuff for v0.56.4

2013-03-07 Thread Sage Weil
On Thu, 7 Mar 2013, Bryan K. Wright wrote:
> 
> s...@inktank.com said:
> > - pg log trimming (probably a conservative subset) to avoid memory bloat 
> 
> Anything that reduces the size of OSD processes would be appreciated.

You can probably do this with just

 log max recent = 1000

By default it's keeping 100k lines of logs in memory, which can eat a lot 
of ram (but is great when debugging issues).

s


> 
>   Bryan
> -- 
> 
> Bryan Wright  |"If you take cranberries and stew them like 
> Physics Department| applesauce, they taste much more like prunes
> University of Virginia| than rhubarb does."  --  Groucho 
> Charlottesville, VA  22901|   
> (434) 924-7218| br...@virginia.edu
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OpenStack summit : Ceph design session

2013-03-07 Thread Loic Dachary
Hi Yehuda,

I'm not sure if one keystone for all zones would be better than one keystone 
per zone. If you think it's worth discussing during the OpenStack summit and 
you create a session http://summit.openstack.org/cfp/create in the keystone 
track, I will definitely attend :-). Or I can do it if you like. But there is 
no way to share the edit permissions.

Cheers

On 03/07/2013 12:12 AM, Yehuda Sadeh wrote:
> On Wed, Mar 6, 2013 at 3:05 PM, Neil Levine  wrote:
>> On Wed, Mar 6, 2013 at 2:45 PM, Loic Dachary  wrote:
>>> Hi Neil,
>>>
>>> On 03/06/2013 08:27 PM, Neil Levine wrote:
 I think the multi-site RGW stuff is somewhat orthogonal to OpenStack
>>>
>>> Even when keystone is involved ?
>>
>> Good question.
>>
>> Yehuda: how would the asynchronously replicated user metadata interact
>> with Keystone?
> 
> Depends. Are there any special requirements? From what I understand,
> you'd define the keystone backend every zone interact with per-zone,
> and the fact that it's a replicated region/zone doesn't change much.
> 
>>
 Who approves the session at ODS and when is this decision made?
>>>
>>> I suspect Josh knows more than I do about this. During the cinder meeting 
>>> earlier today J. Griffith said that if the nova track is too busy to host 
>>> the "Roadmap for Ceph integration with OpenStack" session he was in favor 
>>> of having it in the cinder track. Following his advice I suggested to 
>>> Thierry Carrez to open a "Cross project" track ( 
>>> http://lists.openstack.org/pipermail/openstack-dev/2013-March/006365.html )
>>
>> I don't think Cinder is such a bad place for it to be as presumably
>> the interaction to copy the block device to a secondary location would
>> be triggered through a Cinder API call no?
>>
>> Neil
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Is linux 3.8.2 up to date with all ceph patches?

2013-03-07 Thread Alex Elder
On 03/07/2013 03:23 PM, Sage Weil wrote:
> On Thu, 7 Mar 2013, Nick Bartos wrote:
>> I'm looking at upgrading to 3.8.2 from 3.5.7 with patches, and I just
>> wanted to make sure that there weren't any additional ceph fixes that
>> should be applied to 3.8.2.
> 
> Nothing that I'm aware of.  Alex?

I will look, but unfortunately I'm headed out for
a concert right now so you'll have to wait until
tomorrow for an answer.

I am not aware of anything major, but there might
be one or two things we should back port.

-Alex

> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] ceph: increase i_release_count when clear I_COMPLETE flag

2013-03-07 Thread Yan, Zheng
On Fri, Mar 8, 2013 at 5:03 AM, Greg Farnum  wrote:
> I'm pulling this in for now to make sure this clears out that ENOENT bug we 
> hit — but shouldn't we be fixing ceph_i_clear() to always bump the 
> i_release_count? It doesn't seem like it would ever be correct without it, 
> and these are the only two callers.

yes, it's better to put it in ceph_i_clear(). will update patches once
they pass the test.

Regards
Yan, Zheng

>
> The second one looks good to us and we'll test it but of course that can't go 
> upstream through our tree.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Thursday, March 7, 2013 at 3:36 AM, Yan, Zheng wrote:
>
>> From: "Yan, Zheng" mailto:zheng.z@intel.com)>
>>
>> If some dentries were pruned or FILE_SHARED cap was revoked while
>> readdir is in progress. make sure ceph_readdir() does not mark the
>> directory as complete.
>>
>> Signed-off-by: Yan, Zheng > (mailto:zheng.z@intel.com)>
>> ---
>> fs/ceph/caps.c | 1 +
>> fs/ceph/dir.c | 13 +++--
>> 2 files changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> index 76634f4..35cebf3 100644
>> --- a/fs/ceph/caps.c
>> +++ b/fs/ceph/caps.c
>> @@ -500,6 +500,7 @@ static void __check_cap_issue(struct ceph_inode_info 
>> *ci, struct ceph_cap *cap,
>> if (S_ISDIR(ci->vfs_inode.i_mode)) {
>> dout(" marking %p NOT complete\n", &ci->vfs_inode);
>> ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
>> + ci->i_release_count++;
>> }
>> }
>> }
>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
>> index 76821be..068304c 100644
>> --- a/fs/ceph/dir.c
>> +++ b/fs/ceph/dir.c
>> @@ -909,7 +909,11 @@ static int ceph_rename(struct inode *old_dir, struct 
>> dentry *old_dentry,
>> */
>>
>> /* d_move screws up d_subdirs order */
>> - ceph_i_clear(new_dir, CEPH_I_COMPLETE);
>> + struct ceph_inode_info *ci = ceph_inode(new_dir);
>> + spin_lock(&ci->i_ceph_lock);
>> + ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
>> + ci->i_release_count++;
>> + spin_unlock(&ci->i_ceph_lock);
>>
>> d_move(old_dentry, new_dentry);
>>
>> @@ -1073,6 +1077,7 @@ static int ceph_snapdir_d_revalidate(struct dentry 
>> *dentry,
>> */
>> static void ceph_d_prune(struct dentry *dentry)
>> {
>> + struct ceph_inode_info *ci;
>> dout("ceph_d_prune %p\n", dentry);
>>
>> /* do we have a valid parent? */
>> @@ -1087,7 +1092,11 @@ static void ceph_d_prune(struct dentry *dentry)
>> * we hold d_lock, so d_parent is stable, and d_fsdata is never
>> * cleared until d_release
>> */
>> - ceph_i_clear(dentry->d_parent->d_inode, CEPH_I_COMPLETE);
>> + ci = ceph_inode(dentry->d_parent->d_inode);
>> + spin_lock(&ci->i_ceph_lock);
>> + ci->i_ceph_flags &= ~CEPH_I_COMPLETE;
>> + ci->i_release_count++;
>> + spin_unlock(&ci->i_ceph_lock);
>> }
>>
>> /*
>> --
>> 1.7.11.7
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] fs: fix dentry_lru_prune()

2013-03-07 Thread Dave Chinner
On Thu, Mar 07, 2013 at 07:37:36PM +0800, Yan, Zheng wrote:
> From: "Yan, Zheng" 
> 
> dentry_lru_prune() should always call file system's d_prune callback.

Why? What bug does this fix?

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] fs: fix dentry_lru_prune()

2013-03-07 Thread Yan, Zheng
On 03/08/2013 10:04 AM, Dave Chinner wrote:
> On Thu, Mar 07, 2013 at 07:37:36PM +0800, Yan, Zheng wrote:
>> From: "Yan, Zheng" 
>>
>> dentry_lru_prune() should always call file system's d_prune callback.
> 
> Why? What bug does this fix?
> 

Ceph uses a flag to track if the dcache contents for a directory are complete,
and it relies on d_prune() to clear the flag when some dentries are trimmed.
We noticed that dentry_lru_prune() sometimes does not call ceph_d_prune().
It seems the dentry in question is ancestor trimmed by try_prune_one_dentry().

Regards
Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mon losing touch with OSDs

2013-03-07 Thread Chris Dunlop
On Thu, Feb 28, 2013 at 09:00:24PM -0800, Sage Weil wrote:
> On Fri, 1 Mar 2013, Chris Dunlop wrote:
>> On Sat, Feb 23, 2013 at 01:02:53PM +1100, Chris Dunlop wrote:
>>> On Fri, Feb 22, 2013 at 05:52:11PM -0800, Sage Weil wrote:
 On Sat, 23 Feb 2013, Chris Dunlop wrote:
> On Fri, Feb 22, 2013 at 05:30:04PM -0800, Sage Weil wrote:
>> On Sat, 23 Feb 2013, Chris Dunlop wrote:
>>> On Fri, Feb 22, 2013 at 04:13:21PM -0800, Sage Weil wrote:
 On Sat, 23 Feb 2013, Chris Dunlop wrote:
> On Fri, Feb 22, 2013 at 03:43:22PM -0800, Sage Weil wrote:
>> On Sat, 23 Feb 2013, Chris Dunlop wrote:
>>> On Fri, Feb 22, 2013 at 01:57:32PM -0800, Sage Weil wrote:
 I just looked at the logs.  I can't tell what happend to cause 
 that 10 
 second delay.. strangely, messages were passing from 0 -> 1, but 
 nothing 
 came back from 1 -> 0 (although 1 was queuing, if not sending, 
 them).
> 
> Is there any way of telling where they were delayed, i.e. in the 1's 
> output
> queue or 0's input queue?
 
 Yeah, if you bump it up to 'debug ms = 20'.  Be aware that that will 
 generate a lot of logging, though.
>>> 
>>> I really don't want to load the system with too much logging, but I'm 
>>> happy
>>> modifying code...  Are there specific interesting debug outputs which I 
>>> can
>>> modify so they're output under "ms = 1"?
>> 
>> I'm basically interested in everything in writer() and write_message(), 
>> and reader() and read_message()...
> 
> Like this?
 
 Yeah.  You could do 2 instead of 1 so you can turn it down.  I suspect 
 that this is the lions share of what debug 20 will spam to the log, but 
 hopefully the load is manageable!
>>> 
>>> Good idea on the '2'. I'll get that installed and wait for it to happen 
>>> again.
>> 
>> FYI...
>> 
>> To avoid running out of disk space for the massive logs, I
>> started using logrotate on the ceph logs every two hours, which
>> does a 'service ceph reload' to re-open the log files.
>> 
>> In the week since doing that I haven't seen any 'slow requests'
>> at all (the load has stayed the same as before the change),
>> which means the issue with the osds dropping out, then the
>> system not recovering properly, also hasn't happened.
>> 
>> That's a bit suspicious, no?
> 
> I suspect the logging itself is changing the timing.  Let's wait and see 
> if we get lucky... 

We got "lucky"...

ceph-mon.0.log:
2013-03-08 03:46:44.786682 7fcc62172700  1 -- 192.168.254.132:0/20298 --> 
192.168.254.133:6801/23939 -- osd_ping(ping e815 stamp 2013-03-08 
03:46:44.786679) v2 -- ?+0 0x765b180 con 0x6ab6160
  [no ping_reply logged, then later...]
2013-03-08 03:46:56.211993 7fcc71190700 -1 osd.0 815 heartbeat_check: no reply 
from osd.1 since 2013-03-08 03:46:35.986327 (cutoff 2013-03-08 03:46:36.211992)

ceph-mon.1.log:
2013-03-08 03:46:44.786848 7fe6f47a4700  1 -- 192.168.254.133:6801/23939 <== 
osd.0 192.168.254.132:0/20298 178549  osd_ping(ping e815 stamp 2013-03-08 
03:46:44.786679) v2  47+0+0 (1298645350 0 0) 0x98256c0 con 0x7bd2160
2013-03-08 03:46:44.786880 7fe6f47a4700  1 -- 192.168.254.133:6801/23939 --> 
192.168.254.132:0/20298 -- osd_ping(ping_reply e815 stamp 2013-03-08 
03:46:44.786679) v2 -- ?+0 0x29876c0 con 0x7bd2160

Interestingly, the matching ping_reply from osd.1 never appears in the
osd.0 log, in contrast to the previous incident upthread where the
"missing" ping replies were all seen in a rush (but after osd.1 had been
marked down).

The missing ping_reply caused osd.1 to get marked down, then it marked
itself up again a bit later ("map e818 wrongly marked me down"). However
the system still hadn't recovered before 07:46:29 when a 'service ceph
restart' was done on the machine holding mon.b5 and osd.1, bringing things
back to life.

Before the restart:

# ceph -s
   health HEALTH_WARN 273 pgs peering; 2 pgs recovery_wait; 273 pgs stuck 
inactive; 576 pgs stuck unclean; recovery 43/293224 degraded (0.015%)
   monmap e9: 3 mons at 
{b2=10.200.63.130:6789/0,b4=10.200.63.132:6789/0,b5=10.200.63.133:6789/0}, 
election epoch 898, quorum 0,1,2 b2,b4,b5
   osdmap e825: 2 osds: 2 up, 2 in
pgmap v3545580: 576 pgs: 301 active, 2 active+recovery_wait, 273 peering; 
560 GB data, 1348 GB used, 2375 GB / 3724 GB avail; 43/293224 degraded (0.015%)
   mdsmap e1: 0/0/1 up

After the restart:

# ceph -s
   health HEALTH_WARN 19 pgs recovering; 24 pgs recovery_wait; 43 pgs stuck 
unclean; recovery 66/293226 degraded (0.023%)
   monmap e9: 3 mons at 
{b2=10.200.63.130:6789/0,b4=10.200.63.132:6789/0,b5=10.200.63.133:6789/0}, 
election epoch 902, quorum 0,1,2 b2,b4,b5
   osdmap e828: 2 osds: 2 up, 2 in
pgmap v3545603: 576 pgs: 533 active+clean, 24 active+recovery_wait, 19 
active+recovering; 560 GB data, 1348 GB used, 2375 GB / 3724 GB avail; 0B/s rd, 
8135KB/s w

Re: librbd bug?

2013-03-07 Thread Dan Mick



On 03/07/2013 02:16 AM, Wolfgang Hennerbichler wrote:

Hi,

I've a libvirt-VM that gets format 2 rbd-childs 'fed' by the superhost.
It crashed recently with this in the logs:

osdc/ObjectCacher.cc: In function 'void
ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t,
tid_t, int)' thread 7f0cab5fd700 time 2013-03-01 22:02:37.374410
osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)
  ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
  1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long,
unsigned long, int)+0xd68) [0x7f0d087cda28]
  2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x7f0d087d460b]
  3: (Context::complete(int)+0xa) [0x7f0d0878c9fa]
  4: (librbd::C_Request::finish(int)+0x85) [0x7f0d087bc325]
  5: (Context::complete(int)+0xa) [0x7f0d0878c9fa]
  6: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f0d087a1387]
  7: (librados::C_AioSafe::finish(int)+0x1d) [0x7f0d07b5834d]
  8: (Finisher::finisher_thread_entry()+0x1c0) [0x7f0d07bc20d0]
  9: (()+0x7e9a) [0x7f0d0546be9a]
  10: (clone()+0x6d) [0x7f0d05198cbd]
  NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'

Any clue why that happened?



This looks like

http://tracker.ceph.com/issues/4271

--
Dan Mick, Filesystem Engineering
Inktank Storage, Inc.   http://inktank.com
Ceph docs: http://ceph.com/docs
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: librbd bug?

2013-03-07 Thread Sage Weil
On Thu, 7 Mar 2013, Dan Mick wrote:
> On 03/07/2013 02:16 AM, Wolfgang Hennerbichler wrote:
> > Hi,
> > 
> > I've a libvirt-VM that gets format 2 rbd-childs 'fed' by the superhost.
> > It crashed recently with this in the logs:
> > 
> > osdc/ObjectCacher.cc: In function 'void
> > ObjectCacher::bh_write_commit(int64_t, sobject_t, loff_t, uint64_t,
> > tid_t, int)' thread 7f0cab5fd700 time 2013-03-01 22:02:37.374410
> > osdc/ObjectCacher.cc: 834: FAILED assert(ob->last_commit_tid < tid)
> >   ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
> >   1: (ObjectCacher::bh_write_commit(long, sobject_t, long, unsigned long,
> > unsigned long, int)+0xd68) [0x7f0d087cda28]
> >   2: (ObjectCacher::C_WriteCommit::finish(int)+0x6b) [0x7f0d087d460b]
> >   3: (Context::complete(int)+0xa) [0x7f0d0878c9fa]
> >   4: (librbd::C_Request::finish(int)+0x85) [0x7f0d087bc325]
> >   5: (Context::complete(int)+0xa) [0x7f0d0878c9fa]
> >   6: (librbd::rados_req_cb(void*, void*)+0x47) [0x7f0d087a1387]
> >   7: (librados::C_AioSafe::finish(int)+0x1d) [0x7f0d07b5834d]
> >   8: (Finisher::finisher_thread_entry()+0x1c0) [0x7f0d07bc20d0]
> >   9: (()+0x7e9a) [0x7f0d0546be9a]
> >   10: (clone()+0x6d) [0x7f0d05198cbd]
> >   NOTE: a copy of the executable, or `objdump -rdS ` is
> > needed to interpret this.
> > terminate called after throwing an instance of 'ceph::FailedAssertion'
> > 
> > Any clue why that happened?
> > 
> 
> This looks like
> 
> http://tracker.ceph.com/issues/4271

I am chasing http://tracker.ceph.com/issues/4369, which may indicate a 
problem with the fix for #4271.  Once this is sorted out, I'll cherry-pick 
the fix to bobtail.

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] fs: fix dentry_lru_prune()

2013-03-07 Thread Dave Chinner
On Fri, Mar 08, 2013 at 10:43:00AM +0800, Yan, Zheng wrote:
> On 03/08/2013 10:04 AM, Dave Chinner wrote:
> > On Thu, Mar 07, 2013 at 07:37:36PM +0800, Yan, Zheng wrote:
> >> From: "Yan, Zheng" 
> >>
> >> dentry_lru_prune() should always call file system's d_prune callback.
> > 
> > Why? What bug does this fix?
> > 
> 
> Ceph uses a flag to track if the dcache contents for a directory are complete,
> and it relies on d_prune() to clear the flag when some dentries are trimmed.
> We noticed that dentry_lru_prune() sometimes does not call ceph_d_prune().
> It seems the dentry in question is ancestor trimmed by try_prune_one_dentry().

That doesn't sound right to me. Any dentry that goes through
try_prune_one_dentry() is on a LRU list, and will end up in
dentry_kill() if the reference count drops to zero and hence calls
dentry_lru_prune() with a non-emtpy LRU pointer.

If it has a non-zero reference count, it gets removed from the LRU,
and the next call to dput() that drops the reference count to zero
will add it back to the LRU and it will go around again. So it
sounds to me like there is something else going on here.

FWIW, if the dentry is not on the LRU, why would it need pruning?
If it needs pruning regardless of it's status on the LRU, then
dentry_lru_prune() should go away entirely and pruning be done
explicity where it is needed rather than wrapped up in an unrelated
LRU operation

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] fs: fix dentry_lru_prune()

2013-03-07 Thread Yan, Zheng
On 03/08/2013 02:27 PM, Dave Chinner wrote:
> On Fri, Mar 08, 2013 at 10:43:00AM +0800, Yan, Zheng wrote:
>> On 03/08/2013 10:04 AM, Dave Chinner wrote:
>>> On Thu, Mar 07, 2013 at 07:37:36PM +0800, Yan, Zheng wrote:
 From: "Yan, Zheng" 

 dentry_lru_prune() should always call file system's d_prune callback.
>>>
>>> Why? What bug does this fix?
>>>
>>
>> Ceph uses a flag to track if the dcache contents for a directory are 
>> complete,
>> and it relies on d_prune() to clear the flag when some dentries are trimmed.
>> We noticed that dentry_lru_prune() sometimes does not call ceph_d_prune().
>> It seems the dentry in question is ancestor trimmed by 
>> try_prune_one_dentry().
> 
> That doesn't sound right to me. Any dentry that goes through
> try_prune_one_dentry() is on a LRU list, and will end up in
> dentry_kill() if the reference count drops to zero and hence calls
> dentry_lru_prune() with a non-emtpy LRU pointer.
> 
> If it has a non-zero reference count, it gets removed from the LRU,
> and the next call to dput() that drops the reference count to zero
> will add it back to the LRU and it will go around again. So it
> sounds to me like there is something else going on here.
> 
> FWIW, if the dentry is not on the LRU, why would it need pruning?
> If it needs pruning regardless of it's status on the LRU, then
> dentry_lru_prune() should go away entirely and pruning be done
> explicity where it is needed rather than wrapped up in an unrelated
> LRU operation
> 

I didn't described it clearly

static void try_prune_one_dentry(struct dentry *dentry)
__releases(dentry->d_lock)
{
.
/* Prune ancestors. */
dentry = parent;
while (dentry) {
spin_lock(&dentry->d_lock);
if (dentry->d_count > 1) {
dentry->d_count--;
spin_unlock(&dentry->d_lock);
return;
}
dentry = dentry_kill(dentry, 1);
  I mean dentries that are pruned here
}
}

Regards
Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html