CFP: Storage Security and Survivability Workshop

2007-03-28 Thread Valerie Henson
Hi folks,

I'm chairing a workshop on storage security and survivability this
fall.  I'd really like to have participation from the Linux file
systems and storage community.  It's not much work; a good 2-6 page
paper or a really killer abstract is enough to get you into the
workshop.  This is also your chance to mingle with security experts.
The full CFP isn't included because it keeps tripping spam filters,
but here's the URL:

http://www.storagess.org/2007/

Thanks for your consideration,

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATCH 2/18] exportfs: add fid type

2007-03-28 Thread J. Bruce Fields
On Sat, Mar 17, 2007 at 01:10:09AM +, Christoph Hellwig wrote:
> Add a structured fid type so that we don't have to pass an array
> of u32 values around everywhere.  It's a union of possible layouts.
> As a start there's only the u32 array and the traditional 32bit
> inode format, but there will be more in one of my next patchset
> when I start to document the various filehandle formats we have
> in lowlevel filesystems better.
> 
> Also add an enum that gives the various filehandle types humand
> readable names.
> 
> Note:  Some people might think the struct containing an anonymous
> union is ugly, but I didn't want to pass around a raw union type.

A basic question: what do these apply against?  For example, I don't
have this file:

> Index: linux-2.6/include/linux/exportfs.h
> ===
> --- linux-2.6.orig/include/linux/exportfs.h   2007-03-16 15:44:02.0 
> +0100
> +++ linux-2.6/include/linux/exportfs.h2007-03-16 15:44:09.0 
> +0100

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATCH 0/18] export operations rewrite

2007-03-28 Thread Christoph Hellwig
On Sat, Mar 17, 2007 at 01:09:46AM +, Christoph Hellwig wrote:
> This patchset is a medium scale rewrite of the export operations
> interface.  The goal is to make the interface less complex, and
> easier to understand from the filesystem side, aswell as preparing
> generic support for exporting of 64bit inode numbers.

Any comments on this?  So far I got two positive responses in private
but no comment at all in public..
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux page cache issue?

2007-03-28 Thread Xin Zhao

You are right. If the device is very big, the radix tree could be huge
as well. Maybe the lookup it not that cheap. But the per-device tree
can be optimized too. A simple way I can immediately image is: evenly
split a device into N parts by the sector numbers. For each part, we
maintain a radix tree. Let's do a math. Suppose I have a 32G partition
(2^35 bytes) and each data block is 4K bytes (2^12).  So the partition
has 2^23 blocks. I divide the blocks into 1024 (2^12) groups. Each
group will only have 2^11 blocks. With radix tree, the average lookup
overhead for each tree would be log(2^11) steps. That is 11 in-memory
tree traverse to locate a page. This cost seems to be acceptable. I
don't really measure it though. As to the memory used for maintain the
radix trees, I believe it is trivial considering the memory size of
modern computers.

Xin

On 3/28/07, John Anthony Kazos Jr. <[EMAIL PROTECTED]> wrote:

> The lookup of the per-device radix tree may incur some overhead. But
> compared to the slow disk access, looking up an in-memory radix tree is
> much cheaper and should be trivial, I guess.

I would consider whether or not it really is trivial. You'd have to think
hard about just how much of your filesystem is going to be sharing data
blocks. If you fail to find in the per-file tree, then fail to find in the
per-device tree, then still have to read the block from the device, and
this is happening too often, then the additional overhead of the
per-device tree check for non-cached items may end up cancelling the
savings for cached items.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux page cache issue?

2007-03-28 Thread Xin Zhao

Thanks a lot! Folks!

Your reply addressed my concern.

Now I want to explain the problem that leads me to explore the Linux
disk cache management.  This is actually from my project. In a file
system I am working on, two files may have different inodes, but share
the same data blocks. Of course additional block-level reference
counting and copy-on-write mechanisms are needed to prevent operations
on one file from disrupting the other file. But the point is, the two
files share the same data blocks.

I hope that consequential reads to the two files can benefit from disk
cache, since they have the same data blocks. But I noticed that Linux
splits disk buffer cache into many small parts and associate a file's
data with its mapping object. Linux determines whether a data page is
cached or not by lookup the file's mapping radix tree. So this is a
per-file radix tree. This design obviously makes each tree smaller and
faster to look up. But this design eliminates the possibility of
sharing disk cache across two files. For example, if a process reads
file 2 right after file 1 (both file 1 and 2 share the same data block
set). Even if the data blocks are already loaded in memory, but they
can only be located via file 1's mapping object. When Linux reads file
2, it still think the data is not present in memory.  So the process
still needs to load the data from disk again.

Would it make sense to build a per-device radix tree indexed by (dev,
sect_no)?  The loaded data pages can still be associated with a
per-file radix tree in the file's mapping object, but it is also
associated with the per-device radix tree. When looking up cached
pages, Linux can first check the per-file radix tree. The per-device
radix tree is checked only if Linux fails to find a cached page in the
per-file radix tree. The lookup of the per-device radix tree may incur
some overhead. But compared to the slow disk access, looking up an
in-memory radix tree is much cheaper and should be trivial, I guess.

Any thought about this?

Thanks,
-x

On 3/28/07, Dave Kleikamp <[EMAIL PROTECTED]> wrote:

On Wed, 2007-03-28 at 02:45 -0400, Xin Zhao wrote:
> Hi,
>
> If a Linux process opens and reads a file A, then it closes the file.
> Will Linux keep the file A's data in cache for a while in case another
> process opens and reads the same in a short time? I think that is what
> I heard before.

Yes.

> But after I digged into the kernel code, I am confused.
>
> When a process closes the file A, iput() will be called, which in turn
> calls the follows two functions:
> iput_final()->generic_drop_inode()

A comment from the top of fs/dcache.c:

/*
 * Notes on the allocation strategy:
 *
 * The dcache is a master of the icache - whenever a dcache entry
 * exists, the inode will always exist. "iput()" is done either when
 * the dcache entry is deleted or garbage collected.
 */

Basically, as long a a dentry is present, iput_final won't be called on
the inode.

> But from the following calling chain, we can see that file close will
> eventually lead to evict and free all cached pages. Actually in
> truncate_complete_page(), the pages will be freed.  This seems to
> imply that Linux has to re-read the same data from disk even if
> another process B read the same file right after process A closes the
> file. That does not make sense to me.
>
> /***calling chain ***/
> generic_delete_inode/generic_forget_inode()->
> truncate_inode_pages()->truncate_inode_pages_range()->
> truncate_complete_page()->remove_from_page_cache()->
> __remove_from_page_cache()->radix_tree_delete()
>
> Am I missing something? Can someone please provide some advise?
>
> Thanks a lot
> -x

Shaggy
--
David Kleikamp
IBM Linux Technology Center



-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: forced umount?

2007-03-28 Thread Phillip Susi

Pekka J Enberg wrote:
We never want to _abort_ pending updates only pending reads. So, even with 
revoke(), we need to be careful which is why we do do_fsync() in 
generic_revoke_file() to make sure pending updates are flushed before we 
declare the inode revoked.


But, I haven't looked at forced unmount that much so there may be other 
issues I am not aware of.


For the purposes of this thread we _do_ want to abort pending updates to
force the system to give up on a broken block device rather than block a 
bunch of tasks in the D state forever.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux page cache issue?

2007-03-28 Thread Dave Kleikamp
On Wed, 2007-03-28 at 02:45 -0400, Xin Zhao wrote:
> Hi,
> 
> If a Linux process opens and reads a file A, then it closes the file.
> Will Linux keep the file A's data in cache for a while in case another
> process opens and reads the same in a short time? I think that is what
> I heard before.

Yes.

> But after I digged into the kernel code, I am confused.
> 
> When a process closes the file A, iput() will be called, which in turn
> calls the follows two functions:
> iput_final()->generic_drop_inode()

A comment from the top of fs/dcache.c:

/*
 * Notes on the allocation strategy:
 *
 * The dcache is a master of the icache - whenever a dcache entry
 * exists, the inode will always exist. "iput()" is done either when
 * the dcache entry is deleted or garbage collected.
 */

Basically, as long a a dentry is present, iput_final won't be called on
the inode.

> But from the following calling chain, we can see that file close will
> eventually lead to evict and free all cached pages. Actually in
> truncate_complete_page(), the pages will be freed.  This seems to
> imply that Linux has to re-read the same data from disk even if
> another process B read the same file right after process A closes the
> file. That does not make sense to me.
> 
> /***calling chain ***/
> generic_delete_inode/generic_forget_inode()->
> truncate_inode_pages()->truncate_inode_pages_range()->
> truncate_complete_page()->remove_from_page_cache()->
> __remove_from_page_cache()->radix_tree_delete()
> 
> Am I missing something? Can someone please provide some advise?
> 
> Thanks a lot
> -x

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux page cache issue?

2007-03-28 Thread Matthias Kaehlcke
according to the chapter "Linux Kernel Overview" of the
kernelhacking-HOWTO the page cache holds pages associated with *open*
files:

The Page Cache

The page cache is made up of pages, each of which refers to a 4kB
portion of data associated with an open file. The data contained in a
page may come from several disk blocks, which may or may not be
physically neighbours on the disk. The page cache is largely used to
interface the requirements of the memory management subsystem (which
uses fixed, 4kB pages) to the VFS subsystem (which uses different size
blocks for different devices).

The page cache has two important data structures, a page hash table
and an inode queue. The page hash table is used to quickly find the
page descriptor of the page holding data associated with an inode and
offset within a file. The inode queue contains lists of page
descriptors relating to open files.

http://www.kernelhacking.org/docs/kernelhacking-HOWTO/indexs03.html

m.

El Wed, Mar 28, 2007 at 02:45:23AM -0400 Xin Zhao ha dit:

> Hi,
> 
> If a Linux process opens and reads a file A, then it closes the file.
> Will Linux keep the file A's data in cache for a while in case another
> process opens and reads the same in a short time? I think that is what
> I heard before.
> 
> But after I digged into the kernel code, I am confused.
> 
> When a process closes the file A, iput() will be called, which in turn
> calls the follows two functions:
> iput_final()->generic_drop_inode()
> 
> But from the following calling chain, we can see that file close will
> eventually lead to evict and free all cached pages. Actually in
> truncate_complete_page(), the pages will be freed.  This seems to
> imply that Linux has to re-read the same data from disk even if
> another process B read the same file right after process A closes the
> file. That does not make sense to me.
> 
> /***calling chain ***/
> generic_delete_inode/generic_forget_inode()->
> truncate_inode_pages()->truncate_inode_pages_range()->
> truncate_complete_page()->remove_from_page_cache()->
> __remove_from_page_cache()->radix_tree_delete()
> 
> Am I missing something? Can someone please provide some advise?
> 
> Thanks a lot
> -x
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
   For to be free is not merely to cast off
   one's chains, but to live in a way that
  respects and enhances the freedom of others
   (Nelson Mandela)
 .''`.
using free software / Debian GNU/Linux | http://debian.org  : :'  :
`. `'`
gpg --keyserver pgp.mit.edu --recv-keys 47D8E5D4  `-
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html