Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 11:49, David Chinner wrote:
 On Wed, Sep 12, 2007 at 01:27:33AM +1000, Nick Piggin wrote:

  I just gave 4 things which combined might easily reduce xfs vmap overhead
  by several orders of magnitude, all without changing much code at all.

 Patches would be greatly appreciately. You obviously understand this
 vm code much better than I do, so if it's easy to fix by adding some
 generic vmap cache thingy, please do.

Well, it may not be easy to _fix_, but it's easy to try a few improvements ;)

How do I make an image and run a workload that will coerce XFS into
doing a significant number of vmaps?
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Mel Gorman
On (12/09/07 16:17), Christoph Lameter didst pronounce:
 On Wed, 12 Sep 2007, Nick Piggin wrote:
 
  I will still argue that my approach is the better technical solution for 
  large
  block support than yours, I don't think we made progress on that. And I'm
  quite sure we agreed at the VM summit not to rely on your patches for
  VM or IO scalability.
 
 The approach has already been tried (see the XFS layer) and found lacking. 
 
 Having a fake linear block through vmalloc means that a special software 
 layer must be introduced and we may face special casing in the block / fs 
 layer to check if we have one of these strange vmalloc blocks.
 

One of Nick's points is that to have a 100% reliable solution, that is
what is required. We already have a layering between the VM and the FS
but my understanding is that fsblock replaces rather than adds to it.

Surely, we'll be able to detect the situation where the memory is really
contiguous as a fast path and have a slower path where fragmentation was
a problem.

  But you just showed in two emails that you don't understand what the
  problem is. To reiterate: lumpy reclaim does *not* invalidate my formulae;
  and antifrag does *not* isolate the issue.
 
 I do understand what the problem is. I just do not get what your problem 
 with this is and why you have this drive to demand perfection. We are 
 working a variety of approaches on the (potential) issue but you 
 categorically state that it cannot be solved.
 

This is going in circles.

His point is that we also cannot prove it is 100% correct in all
situations. Without taking additional (expensive) steps, there will be a
workload that fragments physical memory. He doesn't know what it is and neither
do we, but that does not mean that someone else will find it. He also has a
point about the slow degredation of fragmentation that is woefully difficult
to reproduce. We've had this provability of correctness problem before.

His initial problem was not with the patches as such but the fact that they
seemed to be presented as a 1st class feature that we fully support and
is a potential solution for some VM and IO Scalability problems.  This is
not the case, we have to treat it as a 2nd class feature until we *know* no
situation exists where it breaks down. These patches on their own would have
to run for months if not a year or so before we could be really sure about it.

The only implementation question about these patches that hasn't been addressed
is the mmap() support. What's wrong with it in it's current form. Can it be
fixed or if it's fundamentally screwed etc. That has fallen by the
wayside.

  But what do you say about viable alternatives that do not have to
  worry about these unlikely scenarios, full stop? So, why should we
  not use fs block for higher order page support?
 
 Because it has already been rejected in another form and adds more 
 layering to the filesystem and more checking for special cases in which 
 we only have virtual linearity? It does not reduce the number of page 
 structs that have to be handled by the lower layers etc.
 

Unless callers always use an iterator for blocks that is optimised in the
physically linear case to be a simple array offset and when not physically
linear it either walks chains (complex) or uses vmap (must deal with TLB
flushes amoung other things). If it optimistically uses physically contiguous
memory, we may find a way to use only one page struct as well.

 Maybe we coud get to something like a hybrid that avoids some of these 
 issues?

Or gee whiz, I don't know. Start with your patches as a strictly 2nd class
citizen and build fsblock in while trying to keep use of physically contiguous
memory where possible and it makes sense.

 Add support so something like a virtual compound page can be 
 handled transparently in the filesystem layer with special casing if 
 such a beast reaches the block layer?
 
  I didn't skip that. We have large page pools today. How does that give
  first class of support to those allocations if you have to have memory
  reserves?
 
 See my other mail. That portion is not complete yet. Sorry.
 

I am *very* wary of using reserve pools for anything other than
emergency situations. If nothing else pools == wasted memory + a sizing
problem. But hey, it is one option.

Are we going to agree on some sort of plan or are we just going to
handwave ourselves to death?

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Protect update of b_transaction by j_list_lock

2007-09-13 Thread Amit K. Arora
Hello,

At all other places, j_list_lock is being held when b_transaction is
being modified. But at one place in transaction.c, this lock is not
held. Also, the journal-head.h file which defines journal_head
structure says that b_transaction should be protected by two locks -
j_list_lock and jbd_lock_bh_state()  :

/*
 * Pointer to the compound transaction which owns this buffer's
 * metadata: either the running transaction or the committing
 * transaction (if there is one).  Only applies to buffers on a
 * transaction's data or metadata journaling list.
 * [j_list_lock] [jbd_lock_bh_state()]
 */
transaction_t *b_transaction;


This was observed while debugging a problem of 'b_transaction' getting
corrupted. Here is the trace :

-
[c267bc50] d00e0654 .__journal_refile_buffer+0x100/0x16c
[jbd]
[c267bcf0] d00e48bc
.journal_commit_transaction+0x136c/0x16e0 
[jbd]
[c267be00] d00e9968 .kjournald+0xf0/0x2e8 [jbd]
[c267bee0] c0080fb8 .kthread+0x128/0x178
[c267bf90] c00264bc .kernel_thread+0x4c/0x68
3:mon e
cpu 0x3: Vector: 700 (Program Check) at [c267b930]
pc: d00e02b0: .__journal_file_buffer+0x74/0x2e0 [jbd]
lr: d00e0654: .__journal_refile_buffer+0x100/0x16c [jbd]
sp: c267bbb0
   msr: 80029032
  current = 0xc241d3d0
  paca= 0xc0464900
pid   = 16224, comm = kjournald
kernel BUG in __journal_file_buffer at fs/jbd/transaction.c:1951!
3:mon r
R00 = 0001   R16 = 41c0
R01 = c267bbb0   R17 = c0371060
R02 = d0102d20   R18 = 
R03 = c00038b25208   R19 = 002bf000
R04 = c001e6dfee80   R20 = c27e4680
R05 = 0002   R21 = 
R06 = 0283   R22 = 0fdc
R07 =    R23 = 
R08 = c0464900   R24 = c000e8ad5024
R09 = c001e6dfee80   R25 = c001b6b726e8
R10 = c00038b25208   R26 = c1f5c780
R11 = 0008   R27 = 0002
R12 = 2448   R28 = c001e6dfee80
R13 = c0464900   R29 = c00038b25208
R14 =    R30 = d0101f80
R15 = c0372620   R31 = c00163ce7908
pc  = d00e02b0 .__journal_file_buffer+0x74/0x2e0 [jbd]
lr  = d00e0654 .__journal_refile_buffer+0x100/0x16c [jbd]
msr = 80029032   cr  = 4442
ctr = d00e0170   xer = 2000   trap =  700
-

The assertion is coming from following code (fs/jbd/transaction.c):

/*
 * File a buffer on the given transaction list.
 */
void __journal_file_buffer(struct journal_head *jh,
transaction_t *transaction, int jlist)
{
struct journal_head **list = NULL;
int was_dirty = 0;
struct buffer_head *bh = jh2bh(jh);

J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));

assert_spin_locked(transaction-t_journal-j_list_lock);==HERE

J_ASSERT_JH(jh, jh-b_jlist  BJ_Types);
J_ASSERT_JH(jh, jh-b_transaction == transaction ||
jh-b_transaction == 0);
:
:


It looks like the transaction pointer got corrupted. On code inspection,
I could find a place where b_transaction is being updated without
holding the j_list_lock. Tried to fix this in the following patch and
the bug is no longer being discovered. Please see if this change is
okay.


Following patch has been tested successfully for more than two weeks:

Signed-off-by: Amit Arora [EMAIL PROTECTED]

diff -Nuarp a/fs/jbd/transaction.c b/fs/jbd/transaction.c
--- a/fs/jbd/transaction.c  2007-08-10 15:34:08.0 +0530
+++ b/fs/jbd/transaction.c  2007-08-10 15:56:02.0 +0530
@@ -693,15 +693,15 @@ repeat:
 * sure it doesn't get written to disk before the caller actually
 * commits the new data
 */
+   spin_lock(journal-j_list_lock);
if (!jh-b_transaction) {
JBUFFER_TRACE(jh, no transaction);
J_ASSERT_JH(jh, !jh-b_next_transaction);
jh-b_transaction = transaction;
JBUFFER_TRACE(jh, file as BJ_Reserved);
-   spin_lock(journal-j_list_lock);
__journal_file_buffer(jh, transaction, BJ_Reserved);
-   spin_unlock(journal-j_list_lock);
}
+   spin_unlock(journal-j_list_lock);
 
 done:
if (need_copy) {
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread David Chinner
On Thu, Sep 13, 2007 at 03:23:21AM +1000, Nick Piggin wrote:
 On Thursday 13 September 2007 11:49, David Chinner wrote:
  On Wed, Sep 12, 2007 at 01:27:33AM +1000, Nick Piggin wrote:
 
   I just gave 4 things which combined might easily reduce xfs vmap overhead
   by several orders of magnitude, all without changing much code at all.
 
  Patches would be greatly appreciately. You obviously understand this
  vm code much better than I do, so if it's easy to fix by adding some
  generic vmap cache thingy, please do.
 
 Well, it may not be easy to _fix_, but it's easy to try a few improvements ;)
 
 How do I make an image and run a workload that will coerce XFS into
 doing a significant number of vmaps?

# mkfs.xfs -n size=16384 dev

to create a filesystem with a 16k directory block size on a 4k page
machine.

Then just do operations on directories with lots of files in them
(tens of thousands). Every directory operation will require at
least one vmap in this situation - e.g. a traversal will result in
lots and lots of blocks being read that will require vmap() for every
directory block read from disk and an unmap almost immediately
afterwards when the reference is dropped

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 9p: rename uid and gid parameters

2007-09-13 Thread Eric Van Hensbergen
On 9/12/07, Latchesar Ionkov [EMAIL PROTECTED] wrote:
 Change the names of 'uid' and 'gid' parameters to the more appropriate
 'dfltuid' and 'dfltgid'.


...

 strcpy(v9ses-name, V9FS_DEFUSER);
 strcpy(v9ses-remotename, V9FS_DEFANAME);
 +   v9ses-dfltuid = V9FS_DEFUID;
 +   v9ses-dfltgid = V9FS_DEFGID;

...
 +#define V9FS_DEFUID(0)
 +#define V9FS_DEFGID(0)

I'm not sure if there is a good solution here, but I'm uncomfortable
with using uid=0 as the default.  I'm not sure if there is a default
uid for nobody, but anything is probably better than 0.  Looks like
nfsnobody is 65534, we could use that - even if only as a marker for
the server to map it to nobody on the target system?  What do you
think?

Particularly with attach-per-user, we probably need to look at
interacting with idmapd or create our own variant real soon.

  -eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 9p: attach-per-user

2007-09-13 Thread Eric Van Hensbergen
On 9/12/07, Latchesar Ionkov [EMAIL PROTECTED] wrote:

 - allow only one user to access the tree (access=uid)
  Only the user with uid can access the v9fs tree. Other users that attempt
  to access it will get EPERM error.


While access=uid and dfltuid=some-other-uid creates an interesting
flexibility in the way things can be used, I'm wondering if
access=uid dfltuid=DEFAULT_UID is intuitive, it might be nice for
the default behavior to be setting defltuid to the uid specified in
access when that access option is used.  This can be overridden with
the dfltuid option, but I think it makes more sense to attach as the
uid you are restricting access to.

If that's the way we want to go, I think that can be handled in a
separate patch.

I've merged this stuff into my test tree, as soon as regressions pass
and I confirm they compile clean on another architecture I'll push
them into my devel to be picked up by -mm.

 -eric
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage. Security attributes and ducumentation update.

2007-09-13 Thread Paul E. McKenney
On Thu, Sep 13, 2007 at 04:22:59PM +0400, Evgeniy Polyakov wrote:
 Hi Paul.
 
 On Mon, Sep 10, 2007 at 03:14:45PM -0700, Paul E. McKenney ([EMAIL 
 PROTECTED]) wrote:
   Further TODO list includes:
   * implement optional saving of mirroring/linear information on the remote
 nodes (simple)
   * implement netlink based setup (simple)
   * new redundancy algorithm (complex)
   
   Homepage:
   http://tservice.net.ru/~s0mbre/old/?section=projectsitem=dst
  
  A couple questions below, but otherwise looks good from an RCU viewpoint.
  
  Thanx, Paul
 
 Thanks for your comments, and sorry for late reply I was at KS/London
 trip.
   + if (--num) {
   + list_for_each_entry_rcu(n, node-shared, shared) {
  
  This function is called under rcu_read_lock() or similar, right?
  (Can't tell from this patch.)  It is also OK to call it from under the
  update-side mutex, of course.
 
 Actually not, but it does not require it, since entry can not be removed
 during this operations since appropriate reference counter for given node is
 being held. It should not be RCU at all.

Ah!  Yes, it is OK to use _rcu in this case, but should be avoided
unless doing so eliminates duplicate code or some such.  So, agree
with dropping _rcu in this case.

   +static int dst_mirror_read(struct dst_request *req)
   +{
   + struct dst_node *node = req-node, *n, *min_dist_node;
   + struct dst_mirror_priv *priv = node-priv;
   + u64 dist, d;
   + int err;
   +
   + req-bio_endio = dst_mirror_read_endio;
   +
   + do {
   + err = -ENODEV;
   + min_dist_node = NULL;
   + dist = -1ULL;
   + 
   + /*
   +  * Reading is never performed from the node under resync.
   +  * If this will cause any troubles (like all nodes must be
   +  * resynced between each other), this check can be removed
   +  * and per-chunk dirty bit can be tested instead.
   +  */
   +
   + if (!test_bit(DST_NODE_NOTSYNC, node-flags)) {
   + priv = node-priv;
   + if (req-start  priv-last_start)
   + dist = req-start - priv-last_start;
   + else
   + dist = priv-last_start - req-start;
   + min_dist_node = req-node;
   + }
   +
   + list_for_each_entry_rcu(n, node-shared, shared) {
  
  I see one call to this function that appears to be under the update-side
  mutex, but I cannot tell if the other calls are safe.  (Safe as in either
  under the update-side mutex or under rcu_read_lock() and friends.)
 
 The same here - those processing function are called from
 generic_make_request() from any lock on top of them. Each node is linked
 into the list of the first added node, which reference counter is
 increased in higher layer. Right now there is no way to add or remove
 nodes after array was started, such functionality requires storage tree
 lock to be taken and RCU can not be used (since it requires sleeping and
 I did not investigate sleepable RCU for this purpose).
 
 So, essentially RCU is not used in DST :)

Works for me!  Use the right tool for the job!

 Thanks for review, Paul.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 9p: rename uid and gid parameters

2007-09-13 Thread Bernd Petrovitsch
On Thu, 2007-09-13 at 08:51 -0600, Latchesar Ionkov wrote:
 Zero was the value that was used before, even though it wasn't defined
 explicitly. I just defined a macro so we can see and eventually change
 it to something better. I don't know if there is a good default value.
 Is nfsnobody the same on all Linux distributions?

Not necessarily.

[]
 On 9/13/07, Eric Van Hensbergen [EMAIL PROTECTED] wrote:
  On 9/12/07, Latchesar Ionkov [EMAIL PROTECTED] wrote:
   Change the names of 'uid' and 'gid' parameters to the more appropriate
   'dfltuid' and 'dfltgid'.
  
 
  ...
 
   strcpy(v9ses-name, V9FS_DEFUSER);
   strcpy(v9ses-remotename, V9FS_DEFANAME);
   +   v9ses-dfltuid = V9FS_DEFUID;
   +   v9ses-dfltgid = V9FS_DEFGID;
  
  ...
   +#define V9FS_DEFUID(0)
   +#define V9FS_DEFGID(0)
 
  I'm not sure if there is a good solution here, but I'm uncomfortable
  with using uid=0 as the default.  I'm not sure if there is a default
  uid for nobody, but anything is probably better than 0.  Looks like
  nfsnobody is 65534, we could use that - even if only as a marker for

On 32bit hardware. On 64bit it is (similar to 32bit): (unsigned int)-2.

  the server to map it to nobody on the target system?  What do you
  think?
 
  Particularly with attach-per-user, we probably need to look at
  interacting with idmapd or create our own variant real soon.

Bernd
-- 
Firmix Software GmbH   http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
  Embedded Linux Development and Services


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22.6: kernel BUG at fs/locks.c:171

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
 Dear all,

 I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
 (config attached).

 Any ideas / which further information needed ?

Thanks for the report. Is it reproduceable? It seems like the
locks_free_lock call that's oopsing is coming from __posix_lock_file.
The actual function looks fine, but the lock being freed could have
been corrupted if there was slab corruption, or a hardware corruption.

You could: try running memtest86+ overnight. And try the following
patch and turn on slab debugging then try to reproduce the problem.



 Soeren

 [ cut here ]
 kernel BUG at fs/locks.c:171!
 invalid opcode:  [#1]
 Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
 ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
 iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
 x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
 nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
 ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
 dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
 ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:0
 EIP:0060:[c0158f59]Not tainted VLI
 EFLAGS: 00010206   (2.6.22.6 #1)
 EIP is at locks_free_lock+0xb/0x3b
 eax: e1d07f9c   ebx: e1d07f80   ecx: f5f5e2f0   edx: 
 esi:    edi:    ebp:    esp: da3d7f04
 ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
 Process mrtg-load (pid: 19688, ti=da3d6000 task=f5e3a030 task.ti=da3d6000)
 Stack:  c015972b 0002 c04889c8 c012b920 f5f5e290 c048541c
 f0ed3ca0 01485414  e1d07f80  f0f39f58 44ef35f1 f62fc2ac
   f5f5e290  d23106c0 c015a891  0007
 0004 Call Trace:
  [c015972b] __posix_lock_file+0x44e/0x47f
  [c012b920] getnstimeofday+0x2b/0xaf
  [c015a891] fcntl_setlk+0xff/0x1f6
  [c011d836] do_setitimer+0xfa/0x226
  [c0156b87] sys_fcntl64+0x74/0x85
  [c0103ade] syscall_call+0x7/0xb
  ===
 Code: 74 1b 8b 15 30 93 48 c0 8d 43 04 89 53 04 89 42 04 a3 30 93 48 c0 c7
 40 04 30 93 48 c0 5b 5e c3 53 89 c3 8d 40 1c 39 43 1c 74 04 0f 0b eb fe
 8d 43 0c 39 43 0c 74 04 0f 0b eb fe 8d 43 04 39 43 EIP: [c0158f59]
 locks_free_lock+0xb/0x3b SS:ESP 0068:da3d7f04
 BUG: unable to handle kernel paging request at virtual address 9ee420b0
  printing eip:
 c014ab7d
 *pde = 
 Oops: 0002 [#2]
 Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
 ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
 iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
 x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
 nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
 ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
 dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
 ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:0
 EIP:0060:[c014ab7d]Not tainted VLI
 EFLAGS: 00010082   (2.6.22.6 #1)
 EIP is at free_block+0x61/0xfb
 eax: a75b2c19   ebx: c1cf6c10   ecx: e1d070c4   edx: 9ee420ac
 esi: e1d07000   edi: dfde6960   ebp: dfde7620   esp: dfd87f44
 ds: 007b   es: 007b   fs:   gs:   ss: 0068
 Process events/0 (pid: 4, ti=dfd86000 task=dfdc4a50 task.ti=dfd86000)
 Stack: 0012  0018  c1cf6c10 c1cf6c10 0018
 c1cf6c00 dfde7620 c014ac86  dfde6960 dfde7620 c0521d20 
 c014b869   dfde69e0 c0521d20 c014b827 c0125955 dfdc4b5c
 8f0c99c0 Call Trace:
  [c014ac86] drain_array+0x6f/0x89
  [c014b869] cache_reap+0x42/0xde
  [c014b827] cache_reap+0x0/0xde
  [c0125955] run_workqueue+0x6b/0xdf
  [c0125ec7] worker_thread+0x0/0xbd
  [c0125f79] worker_thread+0xb2/0xbd
  [c0128221] autoremove_wake_function+0x0/0x35
  [c01280cc] kthread+0x36/0x5a
  [c0128096] kthread+0x0/0x5a
  [c0104607] kernel_thread_helper+0x7/0x10
  ===
 Code: 8b 02 25 00 40 02 00 3d 00 40 02 00 75 03 8b 52 0c 8b 02 84 c0 78 04
 0f 0b eb fe 8b 72 1c 8b 54 24 28 8b 46 04 8b 7c 95 4c 8b 16 89 42 04 89
 10 2b 4e 0c c7 06 00 01 10 00 c7 46 04 00 02 20 00 EIP: [c014ab7d]
 free_block+0x61/0xfb SS:ESP 0068:dfd87f44
 [ cut here ]
 kernel BUG at fs/locks.c:171!
 invalid opcode:  [#3]
 Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
 ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
 iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
 x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
 nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
 ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
 dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
 ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 23:03, David Chinner wrote:
 On Thu, Sep 13, 2007 at 03:23:21AM +1000, Nick Piggin wrote:

  Well, it may not be easy to _fix_, but it's easy to try a few
  improvements ;)
 
  How do I make an image and run a workload that will coerce XFS into
  doing a significant number of vmaps?

 # mkfs.xfs -n size=16384 dev

 to create a filesystem with a 16k directory block size on a 4k page
 machine.

 Then just do operations on directories with lots of files in them
 (tens of thousands). Every directory operation will require at
 least one vmap in this situation - e.g. a traversal will result in
 lots and lots of blocks being read that will require vmap() for every
 directory block read from disk and an unmap almost immediately
 afterwards when the reference is dropped

Ah, wow, thanks: I can reproduce it.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATH 08/19] jfs: new export ops

2007-09-13 Thread J. Bruce Fields
On Thu, Sep 13, 2007 at 03:59:34PM +0200, Christoph Hellwig wrote:
 Yes, please add them to the queue.  I've put a quilt series of patches
 with the whitespace fixe on http://verein.lst.de/~hch/patches.nfsd.tgz.
 Do you want to take it that way or should I resend everything to the
 list?

Yeah, I hate to be so picky, but would it be possible to get them as
mail?  It just saves me some scripting.  (I tried git-quiltimport and it
doesn't get the silly subject lines right.)  They don't even have to be
mailed out, an mbox I can wget would be fine.

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [PATH 08/19] jfs: new export ops

2007-09-13 Thread Christoph Hellwig
On Thu, Sep 13, 2007 at 04:57:01PM -0400, J. Bruce Fields wrote:
 On Thu, Sep 13, 2007 at 03:59:34PM +0200, Christoph Hellwig wrote:
  Yes, please add them to the queue.  I've put a quilt series of patches
  with the whitespace fixe on http://verein.lst.de/~hch/patches.nfsd.tgz.
  Do you want to take it that way or should I resend everything to the
  list?
 
 Yeah, I hate to be so picky, but would it be possible to get them as
 mail?  It just saves me some scripting.  (I tried git-quiltimport and it
 doesn't get the silly subject lines right.)  They don't even have to be
 mailed out, an mbox I can wget would be fine.

No problem, I'll do that tomorrow.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [-mm] FS: file name must be unique in the same dir in procfs

2007-09-13 Thread Andrew Morton
 
 From: Zhang Rui [EMAIL PROTECTED]
 File name should be unique in the same directory.
 
 In order to keep the back-compatibility, only a warning is given
 currently, but actions must be taken to fix it when such duplicates
 are detected.
 
 Bug report and a simple fix can be found here: 
 http://bugzilla.kernel.org/show_bug.cgi?id=8798
 
 Signed-off-by: Zhang Rui [EMAIL PROTECTED]
 ---
  fs/proc/generic.c |   12 
  1 file changed, 12 insertions(+)
 
 Index: linux-2.6/fs/proc/generic.c
 ===
 --- linux-2.6.orig/fs/proc/generic.c
 +++ linux-2.6/fs/proc/generic.c
 @@ -523,6 +523,7 @@ static const struct inode_operations pro
  
  static int proc_register(struct proc_dir_entry * dir, struct
 proc_dir_entry * dp)

Your email client is wordwrapping the text.

  {
 + struct proc_dir_entry *tmp = NULL;

That initialisation is unneeded - I removed it.

`tmp' is always a crappy name for anything.  I renamed it to `de'.


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Christoph Lameter
On Thu, 13 Sep 2007, Mel Gorman wrote:

 Surely, we'll be able to detect the situation where the memory is really
 contiguous as a fast path and have a slower path where fragmentation was
 a problem.

Yes I have a draft here now of a virtual compound page solution that I am 
testing with SLUB. Page allocator first tries to allocate a contiguous 
page. If that is not possible then we string together a contiguous page 
through order 0 allocs and vmalloc.
 
 The only implementation question about these patches that hasn't been 
 addressed
 is the mmap() support. What's wrong with it in it's current form. Can it be
 fixed or if it's fundamentally screwed etc. That has fallen by the
 wayside.

Yes I have not heard about those. Got some more ideas how to clean it up 
even more in the meantime. No feedback usually means no objections. 

 I am *very* wary of using reserve pools for anything other than
 emergency situations. If nothing else pools == wasted memory + a sizing
 problem. But hey, it is one option.

Well we need the page pools anyways for large page sizes  MAX_ORDER. But 

 Are we going to agree on some sort of plan or are we just going to
 handwave ourselves to death?

Ill try to do the virtual mapped compound pages. Hopefully that will 
allow fallback for compound pages in a generic way.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html