RE: Unable to mount the SD card formatted using the DIGITAL CAMREA on Linux box

2005-08-08 Thread Mukund JB.

Dear Erik  all,

Sorry for the delay in my latest updates. 
I was required to fix the partition problem first.

I have an update in gathering some partition info about in SD cards
formatted in windows and Camera.

I tried printing data from the ox1BE offset of 0th sector on windows
formatted SD card and Camera formatted SD card.

Please see the details below.

Windows SD_0_Sector details at 0x1BE offset
--

bootable= 111 ( 0x01BE)
beg-chs.heads   = 116   .
beg-chs.sect= 104   .
beg-chs.cylin   = 101   .
sys-type= 114   .
end-chs.heads   = 32.
end-chs.sect= 109   .
end-chs.cylin   = 101   .
start sect  = 778135908 .
n/o sec in part = 1141509631.


Camera SD_0_Sector details at 0x1BE offset
--

bootable= 128
beg-chs.heads   = 1
beg-chs.sect= 26
beg-chs.cylin   = 0
sys-type= 1
end-chs.heads   = 1
end-chs.sect= 96
end-chs.cylin   = 193
start sect  = 57
n/o sec in part = 28743

To the surprise of me, the windows partition table data looks corrupted
and the camera created partition table looks tome meaningful. But,
windows SD card mounts and camera SD card fails to mount.

I am planning to get the FAT12 details from the first partition of
camera starting at sector 57 as you can see in the 'start sect'
variable.

Please help me finding a way to fix this.

Thanks  Regards,
Mukund Jampala


-Original Message-
From: Erik Mouw [mailto:[EMAIL PROTECTED]
Sent: Friday, July 29, 2005 5:56 PM
To: Srinivas G.
Cc: [EMAIL PROTECTED]
Subject: Re: Unable to mount the SD card formatted using the DIGITAL 
CAMREA on Linux box

(don't write to me personally, I do read the list)

On Fri, Jul 29, 2005 at 04:56:43PM +0530, Srinivas G. wrote:
 We have developed a Block Device Driver to handle the flash media 
 devices in Linux 2.6.x kernel. It is working fine. We are able to 
 mount the SD cards that are formatted on Windows systems, but we 
 unable mount the cards that are formatted using the DIGITAL CAMERA.

 We have found one thing that the Windows and Digital Camera both are 
 formatting the SD cards in FAT12 only. But why we are not able to 
 mount the SD cards on Linux Box that are formatted using the Digital 
 Camera.

Probably because the camera and linux disagree about the geometry in 
CHS (cylinder, head, sector) of the flash device.

Each partition table entry contains the start and end CHS of that 
partition. However, since a flash device (and also modern hard drives) 
doesn't have a meaningful geometry value, the same information is also 
encoded in logical sectors (start and size of the partition).

If the logical information is zero, the kernel falls back onto the CHS 
information in the partition table and has to assume a certain 
geometry. If that assumption differs from the assumption of the camera,

the partition boundaries will be wrong and you will not be able to 
mount the partition directly. However, you can figure out the start of 
the partition by hand, and use a loop device to get at the correct 
offset.


Erik

--
Erik Mouw
[EMAIL PROTECTED]  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] New system call, unshare

2005-08-08 Thread Janak Desai


[PATCH 2/2] unshare system call: System Call setup for i386 arch

Signed-off-by: Janak Desai


 arch/i386/kernel/syscall_table.S |1 +
 include/asm-i386/unistd.h|3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)



diff -Naurp 2.6.13-rc5-mm1/arch/i386/kernel/syscall_table.S 
2.6.13-rc5-mm1+unshare/arch/i386/kernel/syscall_table.S
--- 2.6.13-rc5-mm1/arch/i386/kernel/syscall_table.S 2005-08-07 
15:33:07.0 +
+++ 2.6.13-rc5-mm1+unshare/arch/i386/kernel/syscall_table.S 2005-08-07 
18:35:57.0 +
@@ -300,3 +300,4 @@ ENTRY(sys_call_table)
.long sys_vperfctr_control
.long sys_vperfctr_write
.long sys_vperfctr_read
+   .long sys_unshare   /* 300 */
diff -Naurp 2.6.13-rc5-mm1/include/asm-i386/unistd.h 
2.6.13-rc5-mm1+unshare/include/asm-i386/unistd.h
--- 2.6.13-rc5-mm1/include/asm-i386/unistd.h2005-08-07 15:33:40.0 
+
+++ 2.6.13-rc5-mm1+unshare/include/asm-i386/unistd.h2005-08-07 
18:36:37.0 +
@@ -305,8 +305,9 @@
 #define __NR_vperfctr_control  (__NR_vperfctr_open+1)
 #define __NR_vperfctr_write(__NR_vperfctr_open+2)
 #define __NR_vperfctr_read (__NR_vperfctr_open+3)
+#define __NR_unshare   300
 
-#define NR_syscalls 300
+#define NR_syscalls 301
 
 /*
  * user-visible error numbers are in the range -1 - -128: see

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread Alan Cox
On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
 
 [PATCH 1/2] unshare system call: System Call handler function sys_unshare


Given the complexity of the kernel code involved and the obscurity of
the functionality why not just do another clone() in userspace to
unshare the things you want to unshare and then _exit the parent ?

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Mount the SD card formatted using the DIGITAL CAMREA on Linux box but HOW?

2005-08-08 Thread Mukund JB.

Dear all,

I have an SD card problem that mounts when formatted on windows but
fails when formatted on camera as you all know.

Now, I an able mount the SD card formatted using the DIGITAL CAMREA on
Linux box using the windows formatted SD cards first 512 bytes.

This will NOT serve as a permanent solution BUT I am able to reduce the
scope of the issue. I used a raw method for this. Here it is.

I formatted the card in windows; copied it to disk-image.

(Dump the first 512 bytes to the file)
dd if=/dev/tfa0 of=disk-image count=1

Then I copied the disk-image to the camera formatted device which is not
mounting.

dd if= disk-image of=/dev/tfa0 count=1

Then, I tried to mount the device. It mounts with some error messages
that are caused to not able to find the 57 sector that contains the FAT
details.

Attempt to access beyond the end of device
FAT: Directory bread(block 24) failed
FAT: Directory bread(block ..) failed
..
..
FAT: Directory bread(block 55) failed

It works fine when I increase the count to 60. I able to copy files,
remount and see the files.
Some one please help me in coming to a conclusion where is the problem.

Regards,
Mukund Jampala

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread Avi Kivity

Alan Cox wrote:


On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
 


[PATCH 1/2] unshare system call: System Call handler function sys_unshare
   




Given the complexity of the kernel code involved and the obscurity of
the functionality why not just do another clone() in userspace to
unshare the things you want to unshare and then _exit the parent ?

 

suppose somebody wait()s for the parent? you've turned a synchronous 
operation into an asynchronous one.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Srivastava, Rahul
Hi All,

I was just wondering if any of you guys had a chance to validate the
hypothesis and the proposed fix. 

Thanks,
Rahul


-Original Message-
From: Srivastava, Rahul 
Sent: Tuesday, August 02, 2005 8:32 AM
To: 'Marcelo Tosatti'; 'Ernie Petrides'; 'Larry Woodman'
Subject: RE: oops in 2.4.25 prune_icache() called from kswapd


Hi,

Thanks for reviewing the mail. I was thinking whether below changes in
clear_inode() will close the race window:

in clear_inode(), change line:

inode-i_state = I_CLEAR;

with below piece of code:

*
spin_lock(inode_lock);
while (inode-i_state  I_LOCK) {
 spin_unlock(inode_lock);
 __wait_on_inode(inode);
 spin_lock(inode_lock);
}
inode-i_state = I_CLEAR;
spin_unlock(inode_lock);
*

I feel the race is between __sync_one() and iput()/clear_inode()
(also suggested by Albert) which is as follows:

 race
condition***


engine 0:
| 
calls iput() and lock inode_lock. iput removes the inode from the i_list
and unlocks  |
inode_lock
|
 
|   
 
|  engine 1:

 
|  grab inode_lock and calls __sync_one()
 
|
engine 0:
|
calls clear_inode(), get past the call to wait_on_inode() which looks
if I_LOCK is set. |
/* From this point onwards clear_inode() and the remainder of iput()
does not care about   |
I_LOCK or inode_lock. */
|
 
|
 
| engine 1:

 
| Sets I_LOCK.
 
|
engine 0:
|
sets i_state = I_CLEAR
| 
iput() calls destroy_inode()
|
kmem_cache_free() returns the inode to free list of inode cache.
|
 
|  
 
| engine 1: 
 
| Goes ahead and inserts the freed inode into one of the three possible
lists.


And we endup in having a corrupted inode on the inode list.

Your thoughts please.

Thanks,
Rahul




-Original Message-
From: Marcelo Tosatti [mailto:[EMAIL PROTECTED]
Sent: Monday, August 01, 2005 6:08 AM
To: Srivastava, Rahul; Ernie Petrides; Larry Woodman
Subject: Re: oops in 2.4.25 prune_icache() called from kswapd


On Thu, Jul 28, 2005 at 05:57:43PM -0500, Srivastava, Rahul wrote:
 Hi Marcelo,

 Thanks a lot for your detailed response.

 However, I still don't see how adding I_CLEAR in __refile_inode is
 avoiding the race mentioned below.

 I understand that the I_CLEAR flag addition is done to ensure that
 __refile_inode() doesn't insert the inode in one of the four lists and
 should just return (in this particular scenario). Please correct me if
 I am wrong in my assumption.

 Now if we see the code, without I_CLEAR flag (unpatched code), we are
 checking for I_FREEING flag in __refile_inode(). And if a inode is
 marked for freeing (i.e., I_FREEING is set) current code also returns
 without adding it into one of the four lists.

 And there is no scenario, in code, wherein we can have a inode with
 I_CLEAR flag set but I_FREEING unset.

I fail to disagree: the I_FREEING flag will always be set when an
attempt
is made to set I_CLEAR (there are asserts to guarantee that).

So, I also can't understand the addition of I_CLEAR check on
__refile_inode() and its purpose.

Larry, can you clarify for us?

 In nutshell: If I_CLEAR is set that means I_FREEING will also be set.
 And since we are already checking for I_FREEING in __refile_inode,
 checking for I_CLEAR will be a kind of duplication?

 Hope I have not misinterpreted the complete story.

 Thanks,
 Rahul


 -Original Message-
 From: Marcelo Tosatti [mailto:[EMAIL PROTECTED]
 Sent: Thursday, July 28, 2005 5:38 AM
 To: Srivastava, Rahul
 Subject: Re: oops in 2.4.25 prune_icache() called from kswapd


 On Thu, Jul 28, 2005 at 10:39:12AM -0500, Srivastava, Rahul wrote:
  Hi Marcelo,
 
  I was seeing your fix in inode.c and need a clarification.
 
  In the patch you have added I_CLEAR flag. What is confusing me is
  that

  I_CLEAR flag is set only in clear_inode().  And in this same
  function we have:
 
  **
  if (!(inode-i_state  I_FREEING))
  BUG();
  **
 
  So we are setting I_CLEAR only is I_FREEING is set. If that is the
  case, shouldn't just a check for I_FREEING is enough in
refile_inode.
  ?

 The problem is that it is a race between two processors.

  Basically, I am not able to make out the significance of adding
  I_CLEAR in _refile_inode().
 
  +   if (inode-i_state  (I_FREEING|I_CLEAR))

 That will avoid __refile_inode() from putting a freed inode into
 the lists...

  Thanks for your time and sorry to bother you,
  Rahul

 No problem.

 The description goes like:

 The first scenerio is:

 1.) cpu0 is in __sync_one() just about to call __refile_inode() after
 taking the inode_lock and clearing I_LOCK.

 spin_lock(inode_lock);
 inode-i_state = ~I_LOCK;
 if (!(inode-i_state  I_FREEING))
 __refile_inode(inode);
 wake_up(inode-i_wait);

 2.) cpu1 is in [dispose_list()] where it has dropped the inode_lock
 and calls clear_inode(). It doesnt block because I_LOCK is clear so it
 sets the inode state.

 void clear_inode(struct inode *inode)
 {
 ...
 

Re: FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Ernie Petrides
On Monday, 8-Aug-2005 at 11:43 CDT, Rahul Srivastava wrote:

 I was just wondering if any of you guys had a chance to validate the
 hypothesis and the proposed fix.

Larry Woodman is the one who worked on this problem, and he agreed (last
week) to follow up on this discussion.  Unfortunately, he's away this
week.

I've forwarded his mail (posted to an internal Red Hat patch review
mailing list 3 months ago) in the interim, and this contains the patch
that has been committed to Red Hat Enterprise Linux version 3 (of which
Update 6 is currently in beta).

Hopefully, Larry will follow up when he gets back.

Cheers.  -ernie



--- Forwarded Message


 From: Larry Woodman [EMAIL PROTECTED]
 Date: Mon, 09 May 2005 11:39:11 -0400
 Subject: [Taroon Patch] fix inode cache deadlock/race...




Over the past couple of weeks we have seen two races in the inode cache
code. The first is between [dispose_list()] and __refile_inode() and the
second is between prune_icache() and truncate_inodes(). I posted both of
these patches but wanted to make sure they got properly reviewed and
included in RHEL3-U6.

Fixes [RHEL3 bugzillas 149636 and] 155289.


The first scenerio is:

1.) cpu0 is in __sync_one() just about to call __refile_inode() after
taking the inode_lock and clearing I_LOCK.
-
spin_lock(inode_lock);
inode-i_state = ~I_LOCK;
if (!(inode-i_state  I_FREEING))
__refile_inode(inode);
wake_up(inode-i_wait);
-

2.) cpu1 is in [dispose_list()] where it has dropped the inode_lock and calls
clear_inode(). It doesnt block because
I_LOCK is clear so it sets the inode state.
-
void clear_inode(struct inode *inode)
{
...
wait_on_inode(inode);
...
inode-i_state = I_CLEAR;
...
}
-

3.) cpu0 calls __refile_inode which places is on one of the four
possible inode lists
-
static inline void __refile_inode(struct inode *inode)
{
if (inode-i_state  I_DIRTY)
to = inode-i_sb-s_dirty;
else if (atomic_read(inode-i_count))
to = inode_in_use;
else if (inode-i_data.nrpages)
to = inode_unused_pagecache;
else
to = inode_unused;
list_del(inode-i_list);
list_add(inode-i_list, to);
}
-

4.) cpu1 returns from clear_inode() then calls destroy_inode() which
kmem_cache_free()s it.
-
static void destroy_inode(struct inode *inode)
{ if (inode-i_sb-s_op-destroy_inode)
inode-i_sb-s_op-destroy_inode(inode);
else
kmem_cache_free(inode_cachep, inode);
}
-

5.) at this point we have an inode that has been kmem_cache_free()'d
that is also sitting one
of the lists determined by __refile_inode(), that cant be good!!! Also,
the code looks the
same in RHEL4.



The second scenerio is:

CPU0 is in prune_icache() called by kswapd and CPU1 is in
invalidate_inodes() called by
the auto-mount daemon.

1.) CPU0: prune_icache() sets the I_LOCK bit in an inode on the
inode_unused_pagecache
list, releases the inode_lock and calls invalidate_inode_pages.

2.) CPU1: invalidate_inodes() calls invalidate_list() for the
inode_unused_pagecache list
with the node_lock held and sets the I_FREEING bit in the inode-i_state.

3.) CPU0: prune_icache() acquires the inode_lock and clears the I_LOCK
bit in the inode-i_state.

4.) CPU1: dispose_list() calls clear_inode() without the inode_lock
held. Since the I_LOCK bit
is clear, clear_inode() sets inode-i_state = I_CLEAR, clearing the
I_FREEING bit.

5.) CPU0: prune_icache() calls __refile_inode() because clear_inode()
cleared I_FREEING without
holding the inode_lock. This inode that is no longer on the
inode_unused_pagecache
list which results in that inode being placed on the inode_unused list.

6.) CPU1: dispose_list() calls destroy_inode() which kmem_cache_free()s
an inode that is also on the
inode_unused list.


At this point there is an inode that has been kmem_cache_free()'d and is
also on the inode_unused list.

This patch to clear_inode() acquires the inode_lock before manipulating
the inode-i_state field. This
is the only place in the kernel that manipulates the inode without
holding the inode_lock.



--- linux-2.4.21/fs/inode.c.orig
+++ linux-2.4.21/fs/inode.c
@@ -296,7 +296,7 @@ static inline void __refile_inode(struct
 {
struct list_head *to;

-   if (inode-i_state  I_FREEING)
+   if (inode-i_state  (I_FREEING|I_CLEAR))
return;
if (list_empty(inode-i_hash))
return;
@@ -636,7 +636,9 @@ void clear_inode(struct inode *inode)
cdput(inode-i_cdev);
inode-i_cdev = NULL;
}
+   spin_lock(inode_lock);

[RFC] atomic open(..., O_CREAT | ...)

2005-08-08 Thread Miklos Szeredi
I'd like to make my filesystem be able to do file creation and opening
atomically.  This is needed for filesystems which cannot separate
checking open permission from the actual open operation.

Usually any filesystem served from userspace by an unprivileged (no
CAP_DAC_OVERRIDE) process will be such (ftp, sftp, etc.).

With nameidata-intent.open.* it is possible to do the actual open
from -lookup() or -create().  However there's no easy way to
associate the 'struct file *' returned by dentry_open() with the
filesystem's private file object.  Also if there's some error after
the file has been opened but before a successful return of the file
pointer, the filesystem has no way to know that it should destroy the
private file object.

The following patch makes this possible through a new file pointer
field in the open intent data, through which the filesystem can pass
an opened file to be returned by filp_open().

The filesystem can call dentry_open() from -lookup() or -create(),
and it in with it's private file data.  If there's an error the file
can be properly destroyed through f_op-release().

There's one question on which I'm not sure what is the best solution:

The filesystem needs to know whether it's f_op-open() method was
called from lookup/create, or from the filp_open(), because in the
first case it need not do anything (the private file object will be
created outside dentry_open()), but in the second case it must
actually prepare the private file object.

Two solutions come to mind:

  1) pass a special open flag to dentry_open() which will be passed on
 to f_op-open() in filp-f_flags

  2) create a new 'dentry_open_noopen()' variant, which doesn't call
 f_op-open()

Does one sound better?  Or something else?

Comments are welcome.

Thanks,
Miklos

Index: linux/include/linux/namei.h
===
--- linux.orig/include/linux/namei.h2005-06-17 21:48:29.0 +0200
+++ linux/include/linux/namei.h 2005-08-06 17:12:55.0 +0200
@@ -8,6 +8,10 @@ struct vfsmount;
 struct open_intent {
int flags;
int create_mode;
+   int orig_flags;
+
+   /* Fs may want to do dentry_open() in -lookup(), or in -create() */
+   struct file *file;
 };
 
 enum { MAX_NESTED_LINKS = 5 };
Index: linux/fs/open.c
===
--- linux.orig/fs/open.c2005-08-06 12:34:14.0 +0200
+++ linux/fs/open.c 2005-08-08 13:03:08.0 +0200
@@ -762,9 +762,22 @@ struct file *filp_open(const char * file
if (namei_flags  O_TRUNC)
namei_flags |= 2;
 
+   /* Fill in the open() intent data */
+   nd.intent.open.flags = namei_flags;
+   nd.intent.open.create_mode = mode;
+   nd.intent.open.orig_flags = flags;
+   nd.intent.open.file = NULL;
+
error = open_namei(filename, namei_flags, mode, nd);
-   if (!error)
-   return dentry_open(nd.dentry, nd.mnt, flags);
+   if (!error) {
+   if (nd.intent.open.file)
+   return nd.intent.open.file;
+   else 
+   return dentry_open(nd.dentry, nd.mnt, flags);
+   }
+
+   if (nd.intent.open.file  !IS_ERR(nd.intent.open.file))
+   fput(nd.intent.open.file);
 
return ERR_PTR(error);
 }
Index: linux/fs/namei.c
===
--- linux.orig/fs/namei.c   2005-08-06 12:35:59.0 +0200
+++ linux/fs/namei.c2005-08-06 17:12:55.0 +0200
@@ -1423,10 +1423,6 @@ int open_namei(const char * pathname, in
if (flag  O_APPEND)
acc_mode |= MAY_APPEND;
 
-   /* Fill in the open() intent data */
-   nd-intent.open.flags = flag;
-   nd-intent.open.create_mode = mode;
-
/*
 * The simplest case - just a plain lookup.
 */
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] atomic open(..., O_CREAT | ...)

2005-08-08 Thread Trond Myklebust
ty den 09.08.2005 Klokka 00:27 (+0200) skreiv Miklos Szeredi:
 I'd like to make my filesystem be able to do file creation and opening
 atomically.  This is needed for filesystems which cannot separate
 checking open permission from the actual open operation.
 
 Usually any filesystem served from userspace by an unprivileged (no
 CAP_DAC_OVERRIDE) process will be such (ftp, sftp, etc.).
 
 With nameidata-intent.open.* it is possible to do the actual open
 from -lookup() or -create().  However there's no easy way to
 associate the 'struct file *' returned by dentry_open() with the
 filesystem's private file object.  Also if there's some error after
 the file has been opened but before a successful return of the file
 pointer, the filesystem has no way to know that it should destroy the
 private file object.

We've already got a patch that does this, and that I'm queueing up for
inclusion. See

http://client.linux-nfs.org/Linux-2.6.x/2.6.12/linux-2.6.12-63-open_file_intents.dif

As for the orig flags thing. What is the point of that?

Cheers,
  Trond

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Marcelo Tosatti
On Mon, Aug 08, 2005 at 04:03:38PM -0500, Srivastava, Rahul wrote:
 Hi,
 
 Thanks a lot. The second race mentioned below explains it all.
 
 Now I understood the significance of adding I_CLEAR. I actually never
 noticed that I_CLEAR flag is directly assigned to i_state. Since this
 will clear up the I_FREEING flag, the addition of I_CLEAR in
 __refile_inodes() does make sense. 

Right - I also missed the clearing of the I_FREEING flag.

Thanks! 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Marcelo Tosatti
On Mon, Aug 08, 2005 at 11:45:28AM -0500, Srivastava, Rahul wrote:
 Hi All,
 
 I was just wondering if any of you guys had a chance to validate the
 hypothesis and the proposed fix. 
 
 Thanks,
 Rahul
 
 
 -Original Message-
 From: Srivastava, Rahul 
 Sent: Tuesday, August 02, 2005 8:32 AM
 To: 'Marcelo Tosatti'; 'Ernie Petrides'; 'Larry Woodman'
 Subject: RE: oops in 2.4.25 prune_icache() called from kswapd
 
 
 Hi,
 
 Thanks for reviewing the mail. I was thinking whether below changes in
 clear_inode() will close the race window:
 
 in clear_inode(), change line:
 
 inode-i_state = I_CLEAR;
 
 with below piece of code:
 
 *
 spin_lock(inode_lock);
 while (inode-i_state  I_LOCK) {
  spin_unlock(inode_lock);
  __wait_on_inode(inode);
  spin_lock(inode_lock);
 }
 inode-i_state = I_CLEAR;
 spin_unlock(inode_lock);
 *
 
 I feel the race is between __sync_one() and iput()/clear_inode()
 (also suggested by Albert) which is as follows:
 
  race
 condition***
 
 
 engine 0:
 | 
 calls iput() and lock inode_lock. iput removes the inode from the i_list
 and unlocks  |
 inode_lock
 |
  
 |   
  
 |  engine 1:
 
  
 |  grab inode_lock and calls __sync_one()
  
 |
 engine 0:
 |
 calls clear_inode(), get past the call to wait_on_inode() which looks
 if I_LOCK is set. |
 /* From this point onwards clear_inode() and the remainder of iput()
 does not care about   |
 I_LOCK or inode_lock. */
 |
  
 |
  
 | engine 1:
 
  
 | Sets I_LOCK.
  
 |
 engine 0:
 |
 sets i_state = I_CLEAR
 | 
 iput() calls destroy_inode()
 |
 kmem_cache_free() returns the inode to free list of inode cache.
 |
  
 |  
  
 | engine 1: 
  
 | Goes ahead and inserts the freed inode into one of the three possible
 lists.

As stated in private, Larry's fix should catch that in __refile_inode() and
ignore the I_CLEAR inode.

 And we endup in having a corrupted inode on the inode list.
 
 Your thoughts please.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html