Re: FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Marcelo Tosatti
On Mon, Aug 08, 2005 at 11:45:28AM -0500, Srivastava, Rahul wrote:
> Hi All,
> 
> I was just wondering if any of you guys had a chance to validate the
> hypothesis and the proposed fix. 
> 
> Thanks,
> Rahul
> 
> 
> -Original Message-
> From: Srivastava, Rahul 
> Sent: Tuesday, August 02, 2005 8:32 AM
> To: 'Marcelo Tosatti'; 'Ernie Petrides'; 'Larry Woodman'
> Subject: RE: oops in 2.4.25 prune_icache() called from kswapd
> 
> 
> Hi,
> 
> Thanks for reviewing the mail. I was thinking whether below changes in
> clear_inode() will close the race window:
> 
> in clear_inode(), change line:
> 
> inode->i_state = I_CLEAR;
> 
> with below piece of code:
> 
> *
> spin_lock(&inode_lock);
> while (inode->i_state & I_LOCK) {
>  spin_unlock(&inode_lock);
>  __wait_on_inode(inode);
>  spin_lock(&inode_lock);
> }
> inode->i_state = I_CLEAR;
> spin_unlock(&inode_lock);
> *
> 
> I feel the race is between "__sync_one()" and "iput()/clear_inode()"
> (also suggested by Albert) which is as follows:
> 
>  race
> condition***
> 
> 
> engine 0:
> | 
> calls iput() and lock inode_lock. iput removes the inode from the i_list
> and unlocks  |
> inode_lock
> |
>  
> |   
>  
> |  engine 1:
> 
>  
> |  grab inode_lock and calls __sync_one()
>  
> |
> engine 0:
> |
> calls clear_inode(), get past the call to "wait_on_inode()" which looks
> if I_LOCK is set. |
> /* From this point onwards clear_inode() and the remainder of iput()
> does not care about   |
> I_LOCK or inode_lock. */
> |
>  
> |
>  
> | engine 1:
> 
>  
> | Sets I_LOCK.
>  
> |
> engine 0:
> |
> sets i_state = I_CLEAR
> | 
> iput() calls destroy_inode()
> |
> kmem_cache_free() returns the inode to free list of inode cache.
> |
>  
> |  
>  
> | engine 1: 
>  
> | Goes ahead and inserts the freed inode into one of the three possible
> lists.

As stated in private, Larry's fix should catch that in __refile_inode() and
ignore the I_CLEAR inode.

> And we endup in having a corrupted inode on the inode list.
> 
> Your thoughts please.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Marcelo Tosatti
On Mon, Aug 08, 2005 at 04:03:38PM -0500, Srivastava, Rahul wrote:
> Hi,
> 
> Thanks a lot. The second race mentioned below explains it all.
> 
> Now I understood the significance of adding I_CLEAR. I actually never
> noticed that I_CLEAR flag is directly assigned to i_state. Since this
> will clear up the I_FREEING flag, the addition of I_CLEAR in
> __refile_inodes() does make sense. 

Right - I also missed the clearing of the I_FREEING flag.

Thanks! 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] atomic open(..., O_CREAT | ...)

2005-08-08 Thread Trond Myklebust
ty den 09.08.2005 Klokka 00:27 (+0200) skreiv Miklos Szeredi:
> I'd like to make my filesystem be able to do file creation and opening
> atomically.  This is needed for filesystems which cannot separate
> checking open permission from the actual open operation.
> 
> Usually any filesystem served from userspace by an unprivileged (no
> CAP_DAC_OVERRIDE) process will be such (ftp, sftp, etc.).
> 
> With nameidata->intent.open.* it is possible to do the actual open
> from ->lookup() or ->create().  However there's no easy way to
> associate the 'struct file *' returned by dentry_open() with the
> filesystem's private file object.  Also if there's some error after
> the file has been opened but before a successful return of the file
> pointer, the filesystem has no way to know that it should destroy the
> private file object.

We've already got a patch that does this, and that I'm queueing up for
inclusion. See

http://client.linux-nfs.org/Linux-2.6.x/2.6.12/linux-2.6.12-63-open_file_intents.dif

As for the "orig flags" thing. What is the point of that?

Cheers,
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] atomic open(..., O_CREAT | ...)

2005-08-08 Thread Miklos Szeredi
I'd like to make my filesystem be able to do file creation and opening
atomically.  This is needed for filesystems which cannot separate
checking open permission from the actual open operation.

Usually any filesystem served from userspace by an unprivileged (no
CAP_DAC_OVERRIDE) process will be such (ftp, sftp, etc.).

With nameidata->intent.open.* it is possible to do the actual open
from ->lookup() or ->create().  However there's no easy way to
associate the 'struct file *' returned by dentry_open() with the
filesystem's private file object.  Also if there's some error after
the file has been opened but before a successful return of the file
pointer, the filesystem has no way to know that it should destroy the
private file object.

The following patch makes this possible through a new file pointer
field in the open intent data, through which the filesystem can pass
an opened file to be returned by filp_open().

The filesystem can call dentry_open() from ->lookup() or ->create(),
and it in with it's private file data.  If there's an error the file
can be properly destroyed through f_op->release().

There's one question on which I'm not sure what is the best solution:

The filesystem needs to know whether it's f_op->open() method was
called from lookup/create, or from the filp_open(), because in the
first case it need not do anything (the private file object will be
created outside dentry_open()), but in the second case it must
actually prepare the private file object.

Two solutions come to mind:

  1) pass a special open flag to dentry_open() which will be passed on
 to f_op->open() in filp->f_flags

  2) create a new 'dentry_open_noopen()' variant, which doesn't call
 f_op->open()

Does one sound better?  Or something else?

Comments are welcome.

Thanks,
Miklos

Index: linux/include/linux/namei.h
===
--- linux.orig/include/linux/namei.h2005-06-17 21:48:29.0 +0200
+++ linux/include/linux/namei.h 2005-08-06 17:12:55.0 +0200
@@ -8,6 +8,10 @@ struct vfsmount;
 struct open_intent {
int flags;
int create_mode;
+   int orig_flags;
+
+   /* Fs may want to do dentry_open() in ->lookup(), or in ->create() */
+   struct file *file;
 };
 
 enum { MAX_NESTED_LINKS = 5 };
Index: linux/fs/open.c
===
--- linux.orig/fs/open.c2005-08-06 12:34:14.0 +0200
+++ linux/fs/open.c 2005-08-08 13:03:08.0 +0200
@@ -762,9 +762,22 @@ struct file *filp_open(const char * file
if (namei_flags & O_TRUNC)
namei_flags |= 2;
 
+   /* Fill in the open() intent data */
+   nd.intent.open.flags = namei_flags;
+   nd.intent.open.create_mode = mode;
+   nd.intent.open.orig_flags = flags;
+   nd.intent.open.file = NULL;
+
error = open_namei(filename, namei_flags, mode, &nd);
-   if (!error)
-   return dentry_open(nd.dentry, nd.mnt, flags);
+   if (!error) {
+   if (nd.intent.open.file)
+   return nd.intent.open.file;
+   else 
+   return dentry_open(nd.dentry, nd.mnt, flags);
+   }
+
+   if (nd.intent.open.file && !IS_ERR(nd.intent.open.file))
+   fput(nd.intent.open.file);
 
return ERR_PTR(error);
 }
Index: linux/fs/namei.c
===
--- linux.orig/fs/namei.c   2005-08-06 12:35:59.0 +0200
+++ linux/fs/namei.c2005-08-06 17:12:55.0 +0200
@@ -1423,10 +1423,6 @@ int open_namei(const char * pathname, in
if (flag & O_APPEND)
acc_mode |= MAY_APPEND;
 
-   /* Fill in the open() intent data */
-   nd->intent.open.flags = flag;
-   nd->intent.open.create_mode = mode;
-
/*
 * The simplest case - just a plain lookup.
 */
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Srivastava, Rahul
Hi,

Thanks a lot. The second race mentioned below explains it all.

Now I understood the significance of adding I_CLEAR. I actually never
noticed that I_CLEAR flag is directly assigned to i_state. Since this
will clear up the I_FREEING flag, the addition of I_CLEAR in
__refile_inodes() does make sense. 

Thanks,
Rahul

-Original Message-
From: Ernie Petrides [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 08, 2005 3:20 PM
To: Srivastava, Rahul
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
linux-fsdevel@vger.kernel.org
Subject: Re: FW: oops in 2.4.25 prune_icache() called from kswapd


On Monday, 8-Aug-2005 at 11:43 CDT, "Rahul Srivastava" wrote:

> I was just wondering if any of you guys had a chance to validate the 
> hypothesis and the proposed fix.

Larry Woodman is the one who worked on this problem, and he agreed (last
week) to follow up on this discussion.  Unfortunately, he's away this
week.

I've forwarded his mail (posted to an internal Red Hat patch review
mailing list 3 months ago) in the interim, and this contains the patch
that has been committed to Red Hat Enterprise Linux version 3 (of which
Update 6 is currently in beta).

Hopefully, Larry will follow up when he gets back.

Cheers.  -ernie



--- Forwarded Message


 From: Larry Woodman <[EMAIL PROTECTED]>
 Date: Mon, 09 May 2005 11:39:11 -0400
 Subject: [Taroon Patch] fix inode cache deadlock/race...




Over the past couple of weeks we have seen two races in the inode cache
code. The first is between [dispose_list()] and __refile_inode() and the
second is between prune_icache() and truncate_inodes(). I posted both of
these patches but wanted to make sure they got properly reviewed and
included in RHEL3-U6.

Fixes [RHEL3 bugzillas 149636 and] 155289.


The first scenerio is:

1.) cpu0 is in __sync_one() just about to call __refile_inode() after
taking the inode_lock and clearing I_LOCK.
-
spin_lock(&inode_lock);
inode->i_state &= ~I_LOCK;
if (!(inode->i_state & I_FREEING))
__refile_inode(inode);
wake_up(&inode->i_wait);
-

2.) cpu1 is in [dispose_list()] where it has dropped the inode_lock and
calls clear_inode(). It doesnt block because I_LOCK is clear so it sets
the inode state.
-
void clear_inode(struct inode *inode)
{
...
wait_on_inode(inode);
...
inode->i_state = I_CLEAR;
...
}
-

3.) cpu0 calls __refile_inode which places is on one of the four
possible inode lists
-
static inline void __refile_inode(struct inode *inode)
{
if (inode->i_state & I_DIRTY)
to = &inode->i_sb->s_dirty;
else if (atomic_read(&inode->i_count))
to = &inode_in_use;
else if (inode->i_data.nrpages)
to = &inode_unused_pagecache;
else
to = &inode_unused;
list_del(&inode->i_list);
list_add(&inode->i_list, to);
}
-

4.) cpu1 returns from clear_inode() then calls destroy_inode() which
kmem_cache_free()s it.
-
static void destroy_inode(struct inode *inode)
{ if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
else
kmem_cache_free(inode_cachep, inode);
}
-

5.) at this point we have an inode that has been kmem_cache_free()'d
that is also sitting one of the lists determined by __refile_inode(),
that cant be good!!! Also, the code looks the same in RHEL4.



The second scenerio is:

CPU0 is in prune_icache() called by kswapd and CPU1 is in
invalidate_inodes() called by
the auto-mount daemon.

1.) CPU0: prune_icache() sets the I_LOCK bit in an inode on the
inode_unused_pagecache list, releases the inode_lock and calls
invalidate_inode_pages.

2.) CPU1: invalidate_inodes() calls invalidate_list() for the
inode_unused_pagecache list with the node_lock held and sets the
I_FREEING bit in the inode->i_state.

3.) CPU0: prune_icache() acquires the inode_lock and clears the I_LOCK
bit in the inode->i_state.

4.) CPU1: dispose_list() calls clear_inode() without the inode_lock
held. Since the I_LOCK bit is clear, clear_inode() sets inode->i_state =
I_CLEAR, clearing the I_FREEING bit.

5.) CPU0: prune_icache() calls __refile_inode() because clear_inode()
cleared I_FREEING without holding the inode_lock. This inode that is no
longer on the inode_unused_pagecache list which results in that inode
being placed on the inode_unused list.

6.) CPU1: dispose_list() calls destroy_inode() which kmem_cache_free()s
an inode that is also on the inode_unused list.


At this point there is an inode that has been kmem_cache_free()'d and is
also on the inode_unused list.

This patch to clear_inode() acquires the inode_lock before manipulating
the inode->

Re: FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Ernie Petrides
On Monday, 8-Aug-2005 at 11:43 CDT, "Rahul Srivastava" wrote:

> I was just wondering if any of you guys had a chance to validate the
> hypothesis and the proposed fix.

Larry Woodman is the one who worked on this problem, and he agreed (last
week) to follow up on this discussion.  Unfortunately, he's away this
week.

I've forwarded his mail (posted to an internal Red Hat patch review
mailing list 3 months ago) in the interim, and this contains the patch
that has been committed to Red Hat Enterprise Linux version 3 (of which
Update 6 is currently in beta).

Hopefully, Larry will follow up when he gets back.

Cheers.  -ernie



--- Forwarded Message


 From: Larry Woodman <[EMAIL PROTECTED]>
 Date: Mon, 09 May 2005 11:39:11 -0400
 Subject: [Taroon Patch] fix inode cache deadlock/race...




Over the past couple of weeks we have seen two races in the inode cache
code. The first is between [dispose_list()] and __refile_inode() and the
second is between prune_icache() and truncate_inodes(). I posted both of
these patches but wanted to make sure they got properly reviewed and
included in RHEL3-U6.

Fixes [RHEL3 bugzillas 149636 and] 155289.


The first scenerio is:

1.) cpu0 is in __sync_one() just about to call __refile_inode() after
taking the inode_lock and clearing I_LOCK.
-
spin_lock(&inode_lock);
inode->i_state &= ~I_LOCK;
if (!(inode->i_state & I_FREEING))
__refile_inode(inode);
wake_up(&inode->i_wait);
-

2.) cpu1 is in [dispose_list()] where it has dropped the inode_lock and calls
clear_inode(). It doesnt block because
I_LOCK is clear so it sets the inode state.
-
void clear_inode(struct inode *inode)
{
...
wait_on_inode(inode);
...
inode->i_state = I_CLEAR;
...
}
-

3.) cpu0 calls __refile_inode which places is on one of the four
possible inode lists
-
static inline void __refile_inode(struct inode *inode)
{
if (inode->i_state & I_DIRTY)
to = &inode->i_sb->s_dirty;
else if (atomic_read(&inode->i_count))
to = &inode_in_use;
else if (inode->i_data.nrpages)
to = &inode_unused_pagecache;
else
to = &inode_unused;
list_del(&inode->i_list);
list_add(&inode->i_list, to);
}
-

4.) cpu1 returns from clear_inode() then calls destroy_inode() which
kmem_cache_free()s it.
-
static void destroy_inode(struct inode *inode)
{ if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
else
kmem_cache_free(inode_cachep, inode);
}
-

5.) at this point we have an inode that has been kmem_cache_free()'d
that is also sitting one
of the lists determined by __refile_inode(), that cant be good!!! Also,
the code looks the
same in RHEL4.



The second scenerio is:

CPU0 is in prune_icache() called by kswapd and CPU1 is in
invalidate_inodes() called by
the auto-mount daemon.

1.) CPU0: prune_icache() sets the I_LOCK bit in an inode on the
inode_unused_pagecache
list, releases the inode_lock and calls invalidate_inode_pages.

2.) CPU1: invalidate_inodes() calls invalidate_list() for the
inode_unused_pagecache list
with the node_lock held and sets the I_FREEING bit in the inode->i_state.

3.) CPU0: prune_icache() acquires the inode_lock and clears the I_LOCK
bit in the inode->i_state.

4.) CPU1: dispose_list() calls clear_inode() without the inode_lock
held. Since the I_LOCK bit
is clear, clear_inode() sets inode->i_state = I_CLEAR, clearing the
I_FREEING bit.

5.) CPU0: prune_icache() calls __refile_inode() because clear_inode()
cleared I_FREEING without
holding the inode_lock. This inode that is no longer on the
inode_unused_pagecache
list which results in that inode being placed on the inode_unused list.

6.) CPU1: dispose_list() calls destroy_inode() which kmem_cache_free()s
an inode that is also on the
inode_unused list.


At this point there is an inode that has been kmem_cache_free()'d and is
also on the inode_unused list.

This patch to clear_inode() acquires the inode_lock before manipulating
the inode->i_state field. This
is the only place in the kernel that manipulates the inode without
holding the inode_lock.



--- linux-2.4.21/fs/inode.c.orig
+++ linux-2.4.21/fs/inode.c
@@ -296,7 +296,7 @@ static inline void __refile_inode(struct
 {
struct list_head *to;

-   if (inode->i_state & I_FREEING)
+   if (inode->i_state & (I_FREEING|I_CLEAR))
return;
if (list_empty(&inode->i_hash))
return;
@@ -636,7 +636,9 @@ void clear_inode(struct inode *inode)
cdput(inode->i_cdev);
inode->i_cdev = NULL;

FW: oops in 2.4.25 prune_icache() called from kswapd

2005-08-08 Thread Srivastava, Rahul
Hi All,

I was just wondering if any of you guys had a chance to validate the
hypothesis and the proposed fix. 

Thanks,
Rahul


-Original Message-
From: Srivastava, Rahul 
Sent: Tuesday, August 02, 2005 8:32 AM
To: 'Marcelo Tosatti'; 'Ernie Petrides'; 'Larry Woodman'
Subject: RE: oops in 2.4.25 prune_icache() called from kswapd


Hi,

Thanks for reviewing the mail. I was thinking whether below changes in
clear_inode() will close the race window:

in clear_inode(), change line:

inode->i_state = I_CLEAR;

with below piece of code:

*
spin_lock(&inode_lock);
while (inode->i_state & I_LOCK) {
 spin_unlock(&inode_lock);
 __wait_on_inode(inode);
 spin_lock(&inode_lock);
}
inode->i_state = I_CLEAR;
spin_unlock(&inode_lock);
*

I feel the race is between "__sync_one()" and "iput()/clear_inode()"
(also suggested by Albert) which is as follows:

 race
condition***


engine 0:
| 
calls iput() and lock inode_lock. iput removes the inode from the i_list
and unlocks  |
inode_lock
|
 
|   
 
|  engine 1:

 
|  grab inode_lock and calls __sync_one()
 
|
engine 0:
|
calls clear_inode(), get past the call to "wait_on_inode()" which looks
if I_LOCK is set. |
/* From this point onwards clear_inode() and the remainder of iput()
does not care about   |
I_LOCK or inode_lock. */
|
 
|
 
| engine 1:

 
| Sets I_LOCK.
 
|
engine 0:
|
sets i_state = I_CLEAR
| 
iput() calls destroy_inode()
|
kmem_cache_free() returns the inode to free list of inode cache.
|
 
|  
 
| engine 1: 
 
| Goes ahead and inserts the freed inode into one of the three possible
lists.


And we endup in having a corrupted inode on the inode list.

Your thoughts please.

Thanks,
Rahul




-Original Message-
From: Marcelo Tosatti [mailto:[EMAIL PROTECTED]
Sent: Monday, August 01, 2005 6:08 AM
To: Srivastava, Rahul; Ernie Petrides; Larry Woodman
Subject: Re: oops in 2.4.25 prune_icache() called from kswapd


On Thu, Jul 28, 2005 at 05:57:43PM -0500, Srivastava, Rahul wrote:
> Hi Marcelo,
>
> Thanks a lot for your detailed response.
>
> However, I still don't see how adding I_CLEAR in __refile_inode is
> avoiding the race mentioned below.
>
> I understand that the I_CLEAR flag addition is done to ensure that
> __refile_inode() doesn't insert the inode in one of the four lists and
> should just return (in this particular scenario). Please correct me if
> I am wrong in my assumption.
>
> Now if we see the code, without I_CLEAR flag (unpatched code), we are
> checking for I_FREEING flag in __refile_inode(). And if a inode is
> marked for freeing (i.e., I_FREEING is set) current code also returns
> without adding it into one of the four lists.
>
> And there is no scenario, in code, wherein we can have a inode with
> I_CLEAR flag set but I_FREEING unset.

I fail to disagree: the I_FREEING flag will always be set when an
attempt
is made to set I_CLEAR (there are asserts to guarantee that).

So, I also can't understand the addition of I_CLEAR check on
__refile_inode() and its purpose.

Larry, can you clarify for us?

> In nutshell: If I_CLEAR is set that means I_FREEING will also be set.
> And since we are already checking for I_FREEING in __refile_inode,
> checking for I_CLEAR will be a kind of duplication?
>
> Hope I have not misinterpreted the complete story.
>
> Thanks,
> Rahul
>
>
> -Original Message-
> From: Marcelo Tosatti [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 28, 2005 5:38 AM
> To: Srivastava, Rahul
> Subject: Re: oops in 2.4.25 prune_icache() called from kswapd
>
>
> On Thu, Jul 28, 2005 at 10:39:12AM -0500, Srivastava, Rahul wrote:
> > Hi Marcelo,
> >
> > I was seeing your fix in inode.c and need a clarification.
> >
> > In the patch you have added I_CLEAR flag. What is confusing me is
> > that
>
> > "I_CLEAR" flag is set only in "clear_inode()".  And in this same
> > function we have:
> >
> > **
> > if (!(inode->i_state & I_FREEING))
> > BUG();
> > **
> >
> > So we are setting I_CLEAR only is I_FREEING is set. If that is the
> > case, shouldn't just a check for I_FREEING is enough in
refile_inode.
> > ?
>
> The problem is that it is a race between two processors.
>
> > Basically, I am not able to make out the significance of adding
> > I_CLEAR in _refile_inode().
> >
> > +   if (inode->i_state & (I_FREEING|I_CLEAR))
>
> That will avoid __refile_inode() from putting a freed inode into
> the lists...
>
> > Thanks for your time and sorry to bother you,
> > Rahul
>
> No problem.
>
> The description goes like:
>
> The first scenerio is:
>
> 1.) cpu0 is in __sync_one() just about to call __refile_inode() after
> taking the inode_lock and clearing I_LOCK.
>
> spin_lock(&inode_lock);
> inode->i_state &= ~I_LOCK;
> if (!(inode->i_state & I_FREEING))
> __refile_inode(inode);
> wake_up(&inode->i_wait);
>
> 2.) cpu1 is in [dispose_list()] where it has dropped the inode_lock
> and calls clear_inod

Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread Avi Kivity

Alan Cox wrote:


On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
 


[PATCH 1/2] unshare system call: System Call handler function sys_unshare
   




Given the complexity of the kernel code involved and the obscurity of
the functionality why not just do another clone() in userspace to
unshare the things you want to unshare and then _exit the parent ?

 

suppose somebody wait()s for the parent? you've turned a synchronous 
operation into an asynchronous one.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Mount the SD card formatted using the DIGITAL CAMREA on Linux box but HOW?

2005-08-08 Thread Mukund JB.

Dear all,

I have an SD card problem that mounts when formatted on windows but
fails when formatted on camera as you all know.

Now, I an able mount the SD card formatted using the DIGITAL CAMREA on
Linux box using the windows formatted SD cards first 512 bytes.

This will NOT serve as a permanent solution BUT I am able to reduce the
scope of the issue. I used a raw method for this. Here it is.

I formatted the card in windows; copied it to disk-image.

(Dump the first 512 bytes to the file)
dd if=/dev/tfa0 of=disk-image count=1

Then I copied the disk-image to the camera formatted device which is not
mounting.

dd if= disk-image of=/dev/tfa0 count=1

Then, I tried to mount the device. It mounts with some error messages
that are caused to not able to find the 57 sector that contains the FAT
details.

Attempt to access beyond the end of device
FAT: Directory bread(block 24) failed
FAT: Directory bread(block ..) failed
..
..
FAT: Directory bread(block 55) failed

It works fine when I increase the count to 60. I able to copy files,
remount and see the files.
Some one please help me in coming to a conclusion where is the problem.

Regards,
Mukund Jampala

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread serge
Quoting Alan Cox ([EMAIL PROTECTED]):
> On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
> > 
> > [PATCH 1/2] unshare system call: System Call handler function sys_unshare
> 
> 
> Given the complexity of the kernel code involved and the obscurity of
> the functionality why not just do another clone() in userspace to
> unshare the things you want to unshare and then _exit the parent ?

The problem I had when I tried using just clone() was that it's
not possible to have a pam library clone() and have the process
being authenticated end up with the new namespace.  At least not
that I could figure out.  Seemed possible that cloning, exiting the
original thread, and returning from the new thread could work, but
it didn't seem to work when I tried it.

thanks,
-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] New system call, unshare

2005-08-08 Thread Alan Cox
On Llu, 2005-08-08 at 09:33 -0400, Janak Desai wrote:
> 
> [PATCH 1/2] unshare system call: System Call handler function sys_unshare


Given the complexity of the kernel code involved and the obscurity of
the functionality why not just do another clone() in userspace to
unshare the things you want to unshare and then _exit the parent ?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] New system call, unshare

2005-08-08 Thread Janak Desai


[PATCH 2/2] unshare system call: System Call setup for i386 arch

Signed-off-by: Janak Desai


 arch/i386/kernel/syscall_table.S |1 +
 include/asm-i386/unistd.h|3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)



diff -Naurp 2.6.13-rc5-mm1/arch/i386/kernel/syscall_table.S 
2.6.13-rc5-mm1+unshare/arch/i386/kernel/syscall_table.S
--- 2.6.13-rc5-mm1/arch/i386/kernel/syscall_table.S 2005-08-07 
15:33:07.0 +
+++ 2.6.13-rc5-mm1+unshare/arch/i386/kernel/syscall_table.S 2005-08-07 
18:35:57.0 +
@@ -300,3 +300,4 @@ ENTRY(sys_call_table)
.long sys_vperfctr_control
.long sys_vperfctr_write
.long sys_vperfctr_read
+   .long sys_unshare   /* 300 */
diff -Naurp 2.6.13-rc5-mm1/include/asm-i386/unistd.h 
2.6.13-rc5-mm1+unshare/include/asm-i386/unistd.h
--- 2.6.13-rc5-mm1/include/asm-i386/unistd.h2005-08-07 15:33:40.0 
+
+++ 2.6.13-rc5-mm1+unshare/include/asm-i386/unistd.h2005-08-07 
18:36:37.0 +
@@ -305,8 +305,9 @@
 #define __NR_vperfctr_control  (__NR_vperfctr_open+1)
 #define __NR_vperfctr_write(__NR_vperfctr_open+2)
 #define __NR_vperfctr_read (__NR_vperfctr_open+3)
+#define __NR_unshare   300
 
-#define NR_syscalls 300
+#define NR_syscalls 301
 
 /*
  * user-visible error numbers are in the range -1 - -128: see

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] New system call, unshare

2005-08-08 Thread Janak Desai


[PATCH 1/2] unshare system call: System Call handler function sys_unshare

Signed-off-by: Janak Desai



 fork.c |  202 +++--
 1 files changed, 196 insertions(+), 6 deletions(-)



diff -Naurp 2.6.13-rc5-mm1/kernel/fork.c 2.6.13-rc5-mm1+unshare/kernel/fork.c
--- 2.6.13-rc5-mm1/kernel/fork.c2005-08-07 15:33:45.0 +
+++ 2.6.13-rc5-mm1+unshare/kernel/fork.c2005-08-07 19:03:49.0 
+
@@ -57,6 +57,17 @@ int nr_threads;  /* The idle threads do
 
 int max_threads;   /* tunable limit on nr_threads */
 
+/*
+ * mm_copy gets called from clone or unshare system calls. When called
+ * from clone, mm_struct may be shared depending on the clone flags
+ * argument, however, when called from the unshare system call, a private
+ * copy of mm_struct is made.
+ */
+enum mm_copy_share {
+   MAY_SHARE,
+   UNSHARE,
+};
+
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
  __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
@@ -447,16 +458,26 @@ void mm_release(struct task_struct *tsk,
}
 }
 
-static int copy_mm(unsigned long clone_flags, struct task_struct * tsk)
+static int copy_mm(unsigned long clone_flags, struct task_struct * tsk,
+   enum mm_copy_share copy_share_action)
 {
struct mm_struct * mm, *oldmm;
int retval;
 
-   tsk->min_flt = tsk->maj_flt = 0;
-   tsk->nvcsw = tsk->nivcsw = 0;
+   /*
+* If the process memory is being duplicated as part of the
+* unshare system call, we are working with the current process
+* and not a newly allocated task strucutre, and should not
+* zero out fault info, context switch counts, mm and active_mm
+* fields.
+*/
+   if (copy_share_action == MAY_SHARE) {
+   tsk->min_flt = tsk->maj_flt = 0;
+   tsk->nvcsw = tsk->nivcsw = 0;
 
-   tsk->mm = NULL;
-   tsk->active_mm = NULL;
+   tsk->mm = NULL;
+   tsk->active_mm = NULL;
+   }
 
/*
 * Are we cloning a kernel thread?
@@ -973,7 +994,7 @@ static task_t *copy_process(unsigned lon
goto bad_fork_cleanup_fs;
if ((retval = copy_signal(clone_flags, p)))
goto bad_fork_cleanup_sighand;
-   if ((retval = copy_mm(clone_flags, p)))
+   if ((retval = copy_mm(clone_flags, p, MAY_SHARE)))
goto bad_fork_cleanup_signal;
if ((retval = copy_keys(clone_flags, p)))
goto bad_fork_cleanup_mm;
@@ -1288,3 +1309,172 @@ void __init proc_caches_init(void)
sizeof(struct mm_struct), 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
 }
+
+/*
+ * unshare_mm is called from the unshare system call handler function to
+ * make a private copy of the mm_struct structure. It calls copy_mm with
+ * CLONE_VM flag cleard, to ensure that a private copy of mm_struct is made,
+ * and with mm_copy_share enum set to UNSHARE, to ensure that copy_mm
+ * does not clear fault info, context switch counts, mm and active_mm
+ * fields of the mm_struct.
+ */
+static int unshare_mm(unsigned long unshare_flags, struct task_struct *tsk)
+{
+   int retval = 0;
+   struct mm_struct *mm = tsk->mm;
+
+   /*
+* If the virtual memory is being shared, make a private
+* copy and disassociate the process from the shared virtual
+* memory.
+*/
+   if (atomic_read(&mm->mm_users) > 1) {
+   retval = copy_mm((unshare_flags & ~CLONE_VM), tsk, UNSHARE);
+
+   /*
+* If copy_mm was successful, decrement the number of users
+* on the original, shared, mm_struct.
+*/
+   if (!retval)
+   atomic_dec(&mm->mm_users);
+   }
+   return retval;
+}
+
+/*
+ * unshare_sighand is called from the unshare system call handler function to
+ * make a private copy of the sighand_struct structure. It calls copy_sighand
+ * with CLONE_SIGHAND cleared to ensure that a new signal handler structure
+ * is cloned from the current shared one.
+ */
+static int unshare_sighand(unsigned long unshare_flags, struct task_struct 
*tsk)
+{
+   int retval = 0;
+   struct sighand_struct *sighand = tsk->sighand;
+
+   /*
+* If the signal handlers are being shared, make a private
+* copy and disassociate the process from the shared signal
+* handlers.
+*/
+   if (atomic_read(&sighand->count) > 1) {
+   retval = copy_sighand((unshare_flags & ~CLONE_SIGHAND), tsk);
+
+   /*
+* If copy_sighand was successful, decrement the use count
+* on the original, shared, sighand_struct.
+*/
+   if (!retval)
+   atomic_dec(&sighand->count);
+   }
+   return retval;
+}
+
+/*
+ * unshare_namespa

[PATCH 0/3] New system call, unshare

2005-08-08 Thread Janak Desai

Patch Summary:
This patch implements a new system call, unshare.  unshare allows
a process to disassociate parts of the process context that were 
initially being shared using the clone() system call.

The patch consists of two parts:
[1/2] Implements the system call handler function sys_unshare.
[2/2] Implements system call setup for x86 architecture.

Patch Justification:
Inspiration for this patch came from the 4/20/05 post by Al Viro
on linux-fsdevel mailing list and the needs of per-process namespace 
based polyinstantiated directories. In his post Mr. Viro saw 
usefulness of the ability to create a private namespace without
forking. He also mentioned that "There used to be a kinda-sorta 
agreement on a new syscall: unshare(bitmap) with arguments like 
those of clone(2)".

Polyinstantiated directories provide an instance of a directory
based on the process security context (user id and/or extended
selinux attributes). Polyinstantiation of public directories such 
as /tmp provide better separation of processes and prevent 
illegal information flow through file name. Polyinstantiated
directories are needed for common criteria certification using 
Mandatory Access Control based Protection Profiles.

Legacy Mandatory Access Control based UNIX operating systems
often modified kernel's pathname translation routines to
implement polyinstantiated directories. We are currently working
on a userspace polyinstantiation mechanism that was proposed by 
Stephen Smalley on the selinux mailing list and that uses the
per-process namespace.  Without the unshare system call, namespace
separation can only be achieved by clone(2), which would require 
porting and maintaining all commands such as login, su, gdm, ssh,
cron, newrole, etc, that establish a user session.  With unshare,
namespace setup can be done using PAM session management functions
without patching individual commands. 

This patch was first submitted on linux-fsdevel in mid-may and 
suggestions for improvement have been incorporated. It is now
ported to the latest rc5-mm tree and is being submitted for
consideration for inclusion in the mm tree for 2.6.14.

Overall Approach:
The overall approach followed clone system call and its permission
enforcement. However, instead of clone's "what do we leave shared?" 
logic, here the logic was based on "what do we unshare, that was 
previously being shared?". Unlike clone, which operated on a newly 
allocated and not-yet schedulable task structure, additional
task_lock()s were taken to avoid race conditions from unshare 
having to work on the current process. Before unsharing any part 
of the context, a check is made to ensure that that part of the
context is being shared in the first place. If the context is not
being shared to begin with, the system call returns success. If 
the context is being shared, the system call makes a private copy
of that context and updates the appropriate pointers of the 
current task structure to point to this new private copy. If 
allocation and setup of the private copy fails, the system call 
appropriately restores the current task structures to continue 
using the shared context.

Currently, the system call only allows "unsharing" of namespace, 
signal handlers and virtual memory, because those three were deemed 
useful on the linux-fsdevel mailing list.

Testing:
The patch has been tested on uni-processor i386 architecture
based Fedora Core 3 system.

Signed off by: Janak Desai

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Unable to mount the SD card formatted using the DIGITAL CAMREA on Linux box

2005-08-08 Thread Mukund JB.

Dear Erik & all,

Sorry for the delay in my latest updates. 
I was required to fix the partition problem first.

I have an update in gathering some partition info about in SD cards
formatted in windows and Camera.

I tried printing data from the ox1BE offset of 0th sector on windows
formatted SD card and Camera formatted SD card.

Please see the details below.

Windows SD_0_Sector details at 0x1BE offset
--

bootable= 111 ( 0x01BE)
beg-chs.heads   = 116   .
beg-chs.sect= 104   .
beg-chs.cylin   = 101   .
sys-type= 114   .
end-chs.heads   = 32.
end-chs.sect= 109   .
end-chs.cylin   = 101   .
start sect  = 778135908 .
n/o sec in part = 1141509631.


Camera SD_0_Sector details at 0x1BE offset
--

bootable= 128
beg-chs.heads   = 1
beg-chs.sect= 26
beg-chs.cylin   = 0
sys-type= 1
end-chs.heads   = 1
end-chs.sect= 96
end-chs.cylin   = 193
start sect  = 57
n/o sec in part = 28743

To the surprise of me, the windows partition table data looks corrupted
and the camera created partition table looks tome meaningful. But,
windows SD card mounts and camera SD card fails to mount.

I am planning to get the FAT12 details from the first partition of
camera starting at sector 57 as you can see in the 'start sect'
variable.

Please help me finding a way to fix this.

Thanks & Regards,
Mukund Jampala


>-Original Message-
>From: Erik Mouw [mailto:[EMAIL PROTECTED]
>Sent: Friday, July 29, 2005 5:56 PM
>To: Srinivas G.
>Cc: [EMAIL PROTECTED]
>Subject: Re: Unable to mount the SD card formatted using the DIGITAL 
>CAMREA on Linux box
>
>(don't write to me personally, I do read the list)
>
>On Fri, Jul 29, 2005 at 04:56:43PM +0530, Srinivas G. wrote:
>> We have developed a Block Device Driver to handle the flash media 
>> devices in Linux 2.6.x kernel. It is working fine. We are able to 
>> mount the SD cards that are formatted on Windows systems, but we 
>> unable mount the cards that are formatted using the DIGITAL CAMERA.
>>
>> We have found one thing that the Windows and Digital Camera both are 
>> formatting the SD cards in FAT12 only. But why we are not able to 
>> mount the SD cards on Linux Box that are formatted using the Digital 
>> Camera.
>
>Probably because the camera and linux disagree about the geometry in 
>CHS (cylinder, head, sector) of the flash device.
>
>Each partition table entry contains the start and end CHS of that 
>partition. However, since a flash device (and also modern hard drives) 
>doesn't have a meaningful geometry value, the same information is also 
>encoded in logical sectors (start and size of the partition).
>
>If the logical information is zero, the kernel falls back onto the CHS 
>information in the partition table and has to assume a certain 
>geometry. If that assumption differs from the assumption of the camera,

>the partition boundaries will be wrong and you will not be able to 
>mount the partition directly. However, you can figure out the start of 
>the partition by hand, and use a loop device to get at the correct 
>offset.
>
>
>Erik
>
>--
>Erik Mouw
>[EMAIL PROTECTED]  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html