-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jan Kara wrote:
>   Hello,
> 
>> Following this message is a series of 11 patches that rework the reiserfs
>> xattr code. The current implementation open codes a number of functions that
>> are well tested and stable elsewhere, but will slight modifications. It also
>> does a number of things suboptimally, such as operations that affect all of
>> the xattrs associated with an inode, as well as not handling journalling as
>> well as can be.
>>
>> I've run them through a weekend of stress testing successfully, but I'd like
>> some additional testing before considering them safe.
>>
>> Here's the run down:
>>
>> * 01 - Make internal xattr operations use vfs ops
>>   This eliminates the open coding of the read/write loops in favor of using
>>   vfs_read and vfs_write. Performance-wise, it's very little difference
>>   from the open coding, and implementation-wise, it's more tested. Yes, this
>>   violates the no-file-io-in-the-kernel rules, but this rule has been 
>> violated
>>   since the day the original patches were accepted.
>   I'm not completely sure but from briefly looking at the patches I
> think there might be the following problem: you first start a transaction
> and then call a VFS write operation. That will lock a page it wants to
> write. But if bdflush works on the xattr file and sends some dirty data
> to disk, it will first take page lock and then start the transaction.
> This introduces essentially a lock inversion (as starting a transaction
> behaves like taking a lock) and hence deadlocks... I've been solving
> these for the quota code some time ago (quota also has similar needs as
> your xattr code) - I really had some deadlock reports for heavily loaded
> machines. There the only reasonable solution was an extra write
> functions bypassing the page cache. Maybe in your case you can solve the
> problems differently as you don't need the working solution for all the
> filesystems but just for Reiserfs. 

Sigh. Ok. The way you describe it definitely makes it a lock inversion
issue. I haven't run into it yet, but as you say, it occurs on heavily
loaded machines. I've done some load testing, but apparently not enough,
since your analysis is sound.

But, I think there is a silver lining after all. It sounds like you've
already worked around these issues for the journaled quota code. What do
you think about turning the quota read/write functions into something
more generic and using that for xattrs as well as quotas?

Ultimately, I think that quota files and xattrs are the same things -
"normal" files read from and written to during the I/O path. The changes
to journaled quotas would be minimal - just turn
reiserfs_quota_{read,write} into small wrappers that make the
type->inode mapping and then call the remainder of the existing code as
reiserfs_internal_{read,write} with the appropriate inode. The xattr
code could juse the reiserfs_internal_{read,write} similarly and get all
the deadlock avoidance work you've already done for free.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEB2l0LPWxlyuTD7IRAsuaAKCHnEKlcgxcIGEV5n5f3m4CY5c/pACeK4CH
h1cj7wLkY+irYbaagmUHtm4=
=o4nE
-----END PGP SIGNATURE-----

Reply via email to