Re: Race between __sync_single_inode() and LogFS garbage collector

2007-02-20 Thread Jörn Engel
On Mon, 19 February 2007 21:06:33 -0600, Dave Kleikamp wrote:
> 
> You've obviously given this a lot more thought that I have, but this
> sounds like something that has possibilities.  You couldn't implement
> your own drop_inode method that does better locking against the garbage
> collector?

Bloody hell!  What a great question for an unexpected reason.  GC also
races against iput().

Basically whenever anyone locks any inode, I may have a problem.  If
that inode remains locked until some writeout happens, I will have a
problem.  Afaics, locking happens in three places:

__sync_single_inode()
Causes writeout of inode pages and inode itself

iput()
Causes writeout of inode itself

get_new_inode() or get_new_inode_fast()
No writes from here

Both __sync_single_inode() and iput() would need to prevent GC before
locking the inode.  GC can be caused by any writes to LogFS, so they
need to lock against all write paths in LogFS.  Once they start writeout
themselves, they also need to pass information that they have taken the
lock onward, otherwise taking the lock itself would deadlock.  Nasty.

Alternatively there could be a non-blocking version of iget() that
simply returns when an inode is locked.  Now GC would have to spin in a
loop, waiting for one of two possible events.  Either the inode is
getting unlocked or someone calls into LogFS with the inode in question
and GC needs to take that by some means.  Again, nasty.

The more I think about it, the more I like the idea of double-caching
inodes.  As ugly as it sounds, it causes the least amount of problems.

Thanks, Shaggy!

Jörn

-- 
Linux [...] existed just for discussion between people who wanted
to show off how geeky they were.
-- Rob Enderle
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Race between __sync_single_inode() and LogFS garbage collector

2007-02-19 Thread Dave Kleikamp
On Mon, 2007-02-19 at 23:23 +, Jörn Engel wrote:
> On Mon, 19 February 2007 17:05:55 -0600, Dave Kleikamp wrote:
> > 
> > It'd be nice if you could drop s_w_mutex when the garbage collector
> > calls i_get().
> 
> Not possible.  Garbage collection only happens when space is getting
> tight.  At that moment all writes must be serialized or this race will
> be the least of my problems. :(
> 
> > Otherwise, you may be able to call ilookup5_nowait() in the garbage
> > collector, and skip that inode if I_LOCK is set.
> 
> Also not possible.  I cannot skip that inode, or again this race will be
> a minor problem.  The inode exists on the medium and I must get it by
> some means.  Re-reading it from the medium is fine, writing is not and
> waiting for the write to happen brings me back to square one.

Okay, I get it now.  You've got more constraints than I initially
realized.

> It is a nasty problem that has been haunting me for about a year now.
> For a while I tried ilookup5_nowait() and just used the inode in spite
> of the lock.  But that will explode spectacularly when racing against
> generic_drop_inode().

You've obviously given this a lot more thought that I have, but this
sounds like something that has possibilities.  You couldn't implement
your own drop_inode method that does better locking against the garbage
collector?

> Double-caching or a common lock seem to be the only solutions.
> 
> Jörn
> 
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Race between __sync_single_inode() and LogFS garbage collector

2007-02-19 Thread Jörn Engel
On Mon, 19 February 2007 17:05:55 -0600, Dave Kleikamp wrote:
> 
> It'd be nice if you could drop s_w_mutex when the garbage collector
> calls i_get().

Not possible.  Garbage collection only happens when space is getting
tight.  At that moment all writes must be serialized or this race will
be the least of my problems. :(

> Otherwise, you may be able to call ilookup5_nowait() in the garbage
> collector, and skip that inode if I_LOCK is set.

Also not possible.  I cannot skip that inode, or again this race will be
a minor problem.  The inode exists on the medium and I must get it by
some means.  Re-reading it from the medium is fine, writing is not and
waiting for the write to happen brings me back to square one.

It is a nasty problem that has been haunting me for about a year now.
For a while I tried ilookup5_nowait() and just used the inode in spite
of the lock.  But that will explode spectacularly when racing against
generic_drop_inode().

Double-caching or a common lock seem to be the only solutions.

Jörn

-- 
My second remark is that our intellectual powers are rather geared to
master static relations and that our powers to visualize processes
evolving in time are relatively poorly developed.
-- Edsger W. Dijkstra
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Race between __sync_single_inode() and LogFS garbage collector

2007-02-19 Thread Dave Kleikamp
On Mon, 2007-02-19 at 21:31 +, Jörn Engel wrote:
> Looks like I really write the first log-structured filesystem for Linux.
> At least I can into a fairly arcane race that seems to be generic to all
> of them.
> 
> Writing when space is tight may involve calling the garbage collector.
> The garbage collector will iget() random inodes, either to verify if a
> block is valid or to copy the block around.  At this point, all writes
> to LogFS are serialized.
> 
> __sync_single_inode() will first lock a random inode, then call
> write_inode(), then unlock the inode.  So we can get this:
> 
> 
> __sync_single_inode() garbage collector
> -
> inode->i_state |= I_LOCK; ...
> ...   mutex_lock(&super->s_w_mutex);
> write_inode(inode, wait); ...
>   ... iget(sb, ino);
>   mutex_lock(&super->s_w_mutex);  ...
>   ...   wait_on_inode(inode);
>   mutex_unlock(&super->s_w_mutex);
>   ... 
> ...
> inode->i_state &= ~I_LOCK;
> 
> 
> And once in a blue moon, those two will race for the same inode.  As far
> as I can see, the race can only get fixed in two ways:
> 1. Never iget() inside the garbage collector.  That would require having
>a private inode cache for LogFS.
> 2. Synchonize __sync_single_inode() and the garbage collector somehow.
> 
> Variant 1 would result in double caching for the same object, something
> I would like to avoid.  So does anyone have suggestions how variant 2
> could be achieved?  Essentially what I need is a way to say "don't sync
> any inodes right now, I'll be back in 5 milliseconds or so".

It'd be nice if you could drop s_w_mutex when the garbage collector
calls i_get().

Otherwise, you may be able to call ilookup5_nowait() in the garbage
collector, and skip that inode if I_LOCK is set.

> 
> Jörn
> 
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Race between __sync_single_inode() and LogFS garbage collector

2007-02-19 Thread Jörn Engel
Looks like I really write the first log-structured filesystem for Linux.
At least I can into a fairly arcane race that seems to be generic to all
of them.

Writing when space is tight may involve calling the garbage collector.
The garbage collector will iget() random inodes, either to verify if a
block is valid or to copy the block around.  At this point, all writes
to LogFS are serialized.

__sync_single_inode() will first lock a random inode, then call
write_inode(), then unlock the inode.  So we can get this:


__sync_single_inode()   garbage collector
-
inode->i_state |= I_LOCK;   ...
... mutex_lock(&super->s_w_mutex);
write_inode(inode, wait);   ...
  ...   iget(sb, ino);
  mutex_lock(&super->s_w_mutex);...
  ... wait_on_inode(inode);
  mutex_unlock(&super->s_w_mutex);  
  ...   
...
inode->i_state &= ~I_LOCK;


And once in a blue moon, those two will race for the same inode.  As far
as I can see, the race can only get fixed in two ways:
1. Never iget() inside the garbage collector.  That would require having
   a private inode cache for LogFS.
2. Synchonize __sync_single_inode() and the garbage collector somehow.

Variant 1 would result in double caching for the same object, something
I would like to avoid.  So does anyone have suggestions how variant 2
could be achieved?  Essentially what I need is a way to say "don't sync
any inodes right now, I'll be back in 5 milliseconds or so".

Jörn

-- 
Courage is not the absence of fear, but rather the judgement that
something else is more important than fear.
-- Ambrose Redmoon
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html