from:"Badari Pulavarty"

[no subject]

2018-05-12 Thread Badari Pulavarty

hi Lkml  https://bit.ly/2KhYIvQ

Badari Pulavarty

Hi lkml

2016-02-15 Thread Badari Pulavarty

Good morning lkml


http://www.drewlin.me/pitch.php?cause=1va2u3y80rqgv


Badari Pulavarty

Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-14 Thread Badari Pulavarty

On Thu, 2008-02-14 at 09:12 -0800, Dave Hansen wrote:
..
> > > > - Use currently other not exported functions in kernel/resource.c, like
> > > >   walk_memory_resource (where we would still need the maximum
> > > possible number
> > > >   of pages NR_MEM_SECTIONS)
> > >
> > > It isn't the act of exporting that's the problem.  It's making sure that
> > > the exports won't be prone to abuse and that people are using them
> > > properly.  You should assume that you can export and use
> > > walk_memory_resource().
> > 
> > So this seems to come down to a basic question:
> > New hardware seems to have a tendency to get "private MMUs",
> > which need private mappings from the kernel address space into a
> > "HW defined address space with potentially unique characteristics"
> > RDMA in Openfabrics with global MR is the most prominent example heading
> > there
> 
> That's not a question. ;)
> 
> Please explain to me why walk_memory_resource() is insufficient for your
> needs.  I've now pointed it out to you at least 3 times.  

I am not sure what you are trying to do with walk_memory_resource(). The
behavior is different on ppc64. Hotplug memory usage assumes that all
the memory resources (all system memory, not just IOMEM) are represented
in /proc/iomem. Its the case with i386 and ia64. But on ppc64 is
contains ONLY iomem related. Paulus didn't want to export all the system
memory into /proc/iomem on ppc64. So I had to workaround by providing
arch-specific walk_memory_resource() function for ppc64.

Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH] register_memory/unregister_memory clean ups

2008-02-13 Thread Badari Pulavarty

On Wed, 2008-02-13 at 14:09 +0900, Yasunori Goto wrote:
> Thanks Badari-san.
> 
> I understand what was occured. :-)
> 
> > On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > > > > +   /*
> > > > > +* Its ugly, but this is the best I can do - HELP !!
> > > > > +* We don't know where the allocations for section memmap and 
> > > > > usemap
> > > > > +* came from. If they are allocated at the boot time, they would 
> > > > > come
> > > > > +* from bootmem. If they are added through hot-memory-add they 
> > > > > could be
> > > > > +* from sla or vmalloc. If they are allocated as part of 
> > > > > hot-mem-add
> > > > > +* free them up properly. If they are allocated at boot, no easy 
> > > > > way
> > > > > +* to correctly free them :(
> > > > > +*/
> > > > > +   if (usemap) {
> > > > > +   if (PageSlab(virt_to_page(usemap))) {
> > > > > +   kfree(usemap);
> > > > > +   if (memmap)
> > > > > +   __kfree_section_memmap(memmap, nr_pages);
> > > > > +   }
> > > > > +   }
> > > > > +}
> > > > 
> > > > Do what we did with the memmap and store some of its origination
> > > > information in the low bits.
> > > 
> > > Hmm. my understand of memmap is limited. Can you help me out here ?
> > 
> > Never mind.  That was a bad suggestion.  I do think it would be a good
> > idea to mark the 'struct page' of ever page we use as bootmem in some
> > way.  Perhaps page->private? 
> 
> I agree. page->private is not used by bootmem allocator.
> 
> I would like to mark not only memmap but also pgdat (and so on)
> for next step. It will be necessary for removing whole node. :-)
> 
> 
> >  Otherwise, you can simply try all of the
> > possibilities and consider the remainder bootmem.  Did you ever find out
> > if we properly initialize the bootmem 'struct page's?
> > 
> > Please have mercy and put this in a helper, first of all.
> > 
> > static void free_usemap(unsigned long *usemap)
> > {
> > if (!usemap_
> > return;
> > 
> > if (PageSlab(virt_to_page(usemap))) {
> > kfree(usemap)
> > } else if (is_vmalloc_addr(usemap)) {
> > vfree(usemap);
> > } else {
> > int nid = page_to_nid(virt_to_page(usemap));
> > bootmem_fun_here(NODE_DATA(nid), usemap);
> > }
> > }
> > 
> > right?
> 
> It may work. But, to be honest, I feel there are TOO MANY allocation/free
> way for memmap (usemap and so on). If possible, I would like to
> unify some of them. I would like to try it.

Thank you for the offer. Here is the latest patch, feel free to
rip it out.

Thanks,
Badari

Generic helper function to remove section mappings and sysfs entries
for the section of the memory we are removing.  offline_pages() correctly 
adjusted zone and marked the pages reserved.

Issue: Need help on freeing up allocation made from bootmem. 

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>

---
 include/linux/memory_hotplug.h |4 +++
 mm/memory_hotplug.c|   34 +++
 mm/sparse.c|   44 ++---
 3 files changed, 79 insertions(+), 3 deletions(-)

Index: linux-2.6.24/mm/memory_hotplug.c
===
--- linux-2.6.24.orig/mm/memory_hotplug.c   2008-02-12 15:07:09.0 
-0800
+++ linux-2.6.24/mm/memory_hotplug.c2008-02-12 15:08:50.0 -0800
@@ -102,6 +102,15 @@ static int __add_section(struct zone *zo
return register_new_memory(__pfn_to_section(phys_start_pfn));
 }
 
+static void __remove_section(struct zone *zone, struct mem_section *ms)
+{
+   if (!valid_section(ms))
+   return;
+
+   unregister_memory_section(ms);
+   sparse_remove_one_section(zone, ms);
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -135,6 +144,31 @@ int __add_pages(struct zone *zone, unsig
 }
 EXPORT_SYMBOL_GPL(__add_pages);
 
+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+unsigned long nr_pages)
+{
+   unsigned long i;
+   int sections_to_remove;
+   unsigned long flags;
+   struct pglist_data *pgdat = zone->zone_pgdat;
+
+   /*
+* We can only rem

Re: [-mm PATCH] register_memory/unregister_memory clean ups

2008-02-12 Thread Badari Pulavarty

On Tue, 2008-02-12 at 14:15 -0800, Dave Hansen wrote:
> On Tue, 2008-02-12 at 14:07 -0800, Badari Pulavarty wrote:
> > On Tue, 2008-02-12 at 13:57 -0800, Dave Hansen wrote:
> > > On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > > > 
> > > > +static void __remove_section(struct zone *zone, unsigned long
> > > > section_nr)
> > > > +{
> > > > +   if (!valid_section_nr(section_nr))
> > > > +   return;
> > > > +
> > > > +   unregister_memory_section(__nr_to_section(section_nr));
> > > > +   sparse_remove_one_section(zone, section_nr);
> > > > +}
> > > 
> > > I do think passing in a mem_section* here is highly superior.  It makes
> > > it impossible to pass a pfn in and not get a warning.
> > > 
> > 
> > Only problem is, I need to hold pgdat_resize_lock() if pass *ms. 
> > If I don't hold the resize_lock, I have to re-evaluate.
> 
> What's wrong with holding the resize lock?  What races, precisely, are
> you trying to avoid?
> 
> > And also,
> > I need to pass section_nr for decoding the mem_map anyway :(
> 
> See sparse.c::__section_nr().  It takes a mem_section* and returns a
> section_nr.

Here is the version with your suggestion. Do you like this better ?

Thanks,
Badari

Generic helper function to remove section mappings and sysfs entries
for the section of the memory we are removing.  offline_pages() correctly 
adjusted zone and marked the pages reserved.

Issue: If mem_map, usemap allocation could come from different places -
kmalloc, vmalloc, alloc_pages or bootmem. There is no easy way
to find and free up properly. Especially for bootmem, we need to
know which node the allocation came from.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>

---
 include/linux/memory_hotplug.h |4 
 mm/memory_hotplug.c|   34 ++
 mm/sparse.c|   39 ---
 3 files changed, 74 insertions(+), 3 deletions(-)

Index: linux-2.6.24/mm/memory_hotplug.c
===
--- linux-2.6.24.orig/mm/memory_hotplug.c   2008-02-07 17:16:52.0 
-0800
+++ linux-2.6.24/mm/memory_hotplug.c2008-02-12 14:49:07.0 -0800
@@ -102,6 +102,15 @@ static int __add_section(struct zone *zo
return register_new_memory(__pfn_to_section(phys_start_pfn));
 }
 
+static void __remove_section(struct zone *zone, struct mem_section *ms)
+{
+   if (!valid_section(ms))
+   return;
+
+   unregister_memory_section(ms);
+   sparse_remove_one_section(zone, ms);
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -135,6 +144,31 @@ int __add_pages(struct zone *zone, unsig
 }
 EXPORT_SYMBOL_GPL(__add_pages);
 
+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+unsigned long nr_pages)
+{
+   unsigned long i;
+   int sections_to_remove;
+   unsigned long flags;
+   struct pglist_data *pgdat = zone->zone_pgdat;
+
+   /*
+* We can only remove entire sections
+*/
+   BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
+   BUG_ON(nr_pages % PAGES_PER_SECTION);
+
+   sections_to_remove = nr_pages / PAGES_PER_SECTION;
+
+   for (i = 0; i < sections_to_remove; i++) {
+   unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+   pgdat_resize_lock(pgdat, &flags);
+   __remove_section(zone, pfn_to_section(pfn));
+   pgdat_resize_unlock(pgdat, &flags);
+   }
+}
+EXPORT_SYMBOL_GPL(__remove_pages);
+
 static void grow_zone_span(struct zone *zone,
unsigned long start_pfn, unsigned long end_pfn)
 {
Index: linux-2.6.24/mm/sparse.c
===
--- linux-2.6.24.orig/mm/sparse.c   2008-02-07 17:16:52.0 -0800
+++ linux-2.6.24/mm/sparse.c2008-02-12 14:51:07.0 -0800
@@ -198,12 +198,13 @@ static unsigned long sparse_encode_mem_m
 }
 
 /*
- * We need this if we ever free the mem_maps.  While not implemented yet,
- * this function is included for parity with its sibling.
+ * Decode mem_map from the coded memmap
  */
-static __attribute((unused))
+static
 struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long 
pnum)
 {
+   /* mask off the extra low bits of information */
+   coded_mem_map &= SECTION_MAP_MASK;
return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
 }
 
@@ -415,4 +416,36 @@ out:
}
return ret;
 }
+
+void sparse_remove_one_section(struct zone *zone, struct mem_secti

Re: [-mm PATCH] register_memory/unregister_memory clean ups

2008-02-12 Thread Badari Pulavarty

On Tue, 2008-02-12 at 14:15 -0800, Dave Hansen wrote:
> On Tue, 2008-02-12 at 14:07 -0800, Badari Pulavarty wrote:
> > On Tue, 2008-02-12 at 13:57 -0800, Dave Hansen wrote:
> > > On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > > > 
> > > > +static void __remove_section(struct zone *zone, unsigned long
> > > > section_nr)
> > > > +{
> > > > +   if (!valid_section_nr(section_nr))
> > > > +   return;
> > > > +
> > > > +   unregister_memory_section(__nr_to_section(section_nr));
> > > > +   sparse_remove_one_section(zone, section_nr);
> > > > +}
> > > 
> > > I do think passing in a mem_section* here is highly superior.  It makes
> > > it impossible to pass a pfn in and not get a warning.
> > > 
> > 
> > Only problem is, I need to hold pgdat_resize_lock() if pass *ms. 
> > If I don't hold the resize_lock, I have to re-evaluate.
> 
> What's wrong with holding the resize lock?  What races, precisely, are
> you trying to avoid?

I was trying to avoid holding resize lock for entire duration of
remove_section(), which includes removing sysfs entries etc. Its
needed only to decode and clear out sectionmap. (I am no longer 
passing pfns).

Whats wrong with passing section_nr ? It simply checks if that
section exists and if so removes sysfs entries and corresponding
sectionmap.  What wrong thing can happen ?

> 
> > And also,
> > I need to pass section_nr for decoding the mem_map anyway :(
> 
> See sparse.c::__section_nr().  It takes a mem_section* and returns a
> section_nr.

I know. It looked like a round about of getting section_nr while
we have that information easily available.

If you are really passionate about passing mem_section*, sure I
can do that :)

Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH] register_memory/unregister_memory clean ups

2008-02-12 Thread Badari Pulavarty

On Tue, 2008-02-12 at 13:57 -0800, Dave Hansen wrote:
> On Tue, 2008-02-12 at 13:56 -0800, Badari Pulavarty wrote:
> > 
> > +static void __remove_section(struct zone *zone, unsigned long
> > section_nr)
> > +{
> > +   if (!valid_section_nr(section_nr))
> > +   return;
> > +
> > +   unregister_memory_section(__nr_to_section(section_nr));
> > +   sparse_remove_one_section(zone, section_nr);
> > +}
> 
> I do think passing in a mem_section* here is highly superior.  It makes
> it impossible to pass a pfn in and not get a warning.
> 

Only problem is, I need to hold pgdat_resize_lock() if pass *ms. 
If I don't hold the resize_lock, I have to re-evaluate. And also,
I need to pass section_nr for decoding the mem_map anyway :(

Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH] register_memory/unregister_memory clean ups

2008-02-12 Thread Badari Pulavarty

On Tue, 2008-02-12 at 12:59 -0800, Dave Hansen wrote:
> On Tue, 2008-02-12 at 09:22 -0800, Badari Pulavarty wrote:
> > +static void __remove_section(struct zone *zone, unsigned long 
> > phys_start_pfn)
> > +{
> > +   if (!pfn_valid(phys_start_pfn))
> > +   return;
> 
> I think you need at least a WARN_ON() there.  
> 
> I'd probably also not use pfn_valid(), personally.  
> 
> > +   unregister_memory_section(__pfn_to_section(phys_start_pfn));
> > +   __remove_zone(zone, phys_start_pfn);
> > +   sparse_remove_one_section(zone, phys_start_pfn, PAGES_PER_SECTION);
> > +}
> 
> Can none of this ever fail?
> 
> I also think having a function called __remove_section() that takes a
> pfn is a bad idea.  How about passing an actual 'struct mem_section *'
> into it?  One of the reasons I even made that structure was so that you
> could hand it around to things and never be confused about pfn vs. paddr
> vs. vaddr vs. section_nr.  Please use it.

Yes. I got similar feedback from Andy. I was closely trying to mimic
__add_pages() for easy review/understanding.

I have an updated version (not fully tested) which takes section_nr as
argument instead of playing with pfns. Please review this one and see if
it matches your taste :)

> 
> >  /*
> >   * Reasonably generic function for adding memory.  It is
> >   * expected that archs that support memory hotplug will
> > @@ -135,6 +153,21 @@ int __add_pages(struct zone *zone, unsig
> >  }
> >  EXPORT_SYMBOL_GPL(__add_pages);
> > 
> > +void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
> > +unsigned long nr_pages)
> > +{
> > +   unsigned long i;
> > +   int start_sec, end_sec;
> > +
> > +   start_sec = pfn_to_section_nr(phys_start_pfn);
> > +   end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> > +
> > +   for (i = start_sec; i <= end_sec; i++)
> > +   __remove_section(zone, i << PFN_SECTION_SHIFT);
> > +
> > +}
> > +EXPORT_SYMBOL_GPL(__remove_pages);
> 
> I'd like to see some warnings in there if nr_pages or phys_start_pfn are
> not section-aligned and some other sanity checks.  If someone is trying
> to remove non-section-aligned areas, we either have something wrong, or
> some other work to do, first keeping track of what section portions are
> "removed".
> 

Yes. I did most of this already (thanks for pointing out again).


> > +void sparse_remove_one_section(struct zone *zone, unsigned long start_pfn,
> > +  int nr_pages)
..
> 
> > +   usemap = ms->pageblock_flags;
> > +   memmap = sparse_decode_mem_map((unsigned long)memmap,
> > +   section_nr);
> > +   ms->section_mem_map = 0;
> > +   ms->pageblock_flags = NULL;
> > +   }
> > +   pgdat_resize_unlock(pgdat, &flags);
> 
> Ugh.  Please put this in its own helper.  Also, sparse_decode_mem_map()
> has absolutely no other users.  Please modify it so that you don't have
> to do this gunk, like put the '& SECTION_MAP_MASK' in there.  You
> probably just need:
> 
> struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long 
> pnum)
> {
>   /*
>* mask off the extra low bits of information
>*/
>   coded_mem_map &= SECTION_MAP_MASK;
> return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
> }
> 
> Then, you can just do this:
> 
>   memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> 
> No casting, no temp variables.  *PLEASE* look around at things and feel
> free to modify to modify them.  Otherwise, it'll just become a mess.
> (oh, and get rid of the unused attribute on it).

Good suggestion. 

> 
> > +
> > +   /*
> > +* Its ugly, but this is the best I can do - HELP !!
> > +* We don't know where the allocations for section memmap and usemap
> > +* came from. If they are allocated at the boot time, they would come
> > +* from bootmem. If they are added through hot-memory-add they could be
> > +* from sla or vmalloc. If they are allocated as part of hot-mem-add
> > +* free them up properly. If they are allocated at boot, no easy way
> > +* to correctly free them :(
> > +*/
> > +   if (usemap) {
> > +   if (PageSlab(virt_to_page(usemap))) {
> > +   kfree(usemap);
> > +   if (memmap)
> > +   __kfree_section_memmap(memmap, nr_pages);
> > +   }
> > +   }
> > +}

Re: [-mm PATCH] register_memory/unregister_memory clean ups

2008-02-12 Thread Badari Pulavarty

On Tue, 2008-02-12 at 17:06 +0900, Yasunori Goto wrote:
> > On Mon, 2008-02-11 at 11:48 -0800, Andrew Morton wrote:
> > > On Mon, 11 Feb 2008 09:23:18 -0800
> > > Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Hi Andrew,
> > > > 
> > > > While testing hotplug memory remove against -mm, I noticed
> > > > that unregister_memory() is not cleaning up /sysfs entries
> > > > correctly. It also de-references structures after destroying
> > > > them (luckily in the code which never gets used). So, I cleaned
> > > > up the code and fixed the extra reference issue.
> > > > 
> > > > Could you please include it in -mm ?
> > > > 
> > > > Thanks,
> > > > Badari
> > > > 
> > > > register_memory()/unregister_memory() never gets called with
> > > > "root". unregister_memory() is accessing kobject_name of
> > > > the object just freed up. Since no one uses the code,
> > > > lets take the code out. And also, make register_memory() static.  
> > > > 
> > > > Another bug fix - before calling unregister_memory()
> > > > remove_memory_block() gets a ref on kobject. unregister_memory()
> > > > need to drop that ref before calling sysdev_unregister().
> > > > 
> > > 
> > > I'd say this:
> > > 
> > > > Subject: [-mm PATCH] register_memory/unregister_memory clean ups
> > > 
> > > is rather tame.  These are more than cleanups!  These sound like
> > > machine-crashing bugs.  Do they crash machines?  How come nobody noticed
> > > it?
> > > 
> > 
> > No they don't crash machine - mainly because, they never get called
> > with "root" argument (where we have the bug). They were never tested
> > before, since we don't have memory remove work yet. All it does
> > is, it leave /sysfs directory laying around and causing next
> > memory add failure. 
> 
> Badari-san.
> 
> Which function does call unregister_memory() or unregister_memory_section()?
> I can't find its caller in current 2.6.24-mm1.
> 
> 
> ???()
>   |
>   |nothing calls?
>   |
>   +-->unregister_memory_section()
>|
>|call
>|
>+---> remove_memory_block()
>   |
>   |call
>   |
>   +> unregister_memory()
> 
> unregister_memory_section() is only externed in linux/memory.h.
> 
> Do you have any another patch to call it?
> I think it is necessary for physical memory removing.
> 
> If you have not posted it or it is not merged to -mm,
> I can understand why this bug remains.
> If you posted it, could you point it to me?

Yes. I am trying to complete the hotplug memory remove
support, so that I can use it for supporting it on ppc64
DLPAR environment.

Here is the patch to finish up some of the generic work
left. As you can see, I still need to finish up some work :(
Any help is appreciated :)

Thanks,
Badari

---
 include/linux/memory_hotplug.h |4 
 mm/memory_hotplug.c|   33 +
 mm/sparse.c|   40 
 3 files changed, 77 insertions(+)

Index: linux-2.6.24/mm/memory_hotplug.c
===
--- linux-2.6.24.orig/mm/memory_hotplug.c   2008-02-07 17:16:52.0 
-0800
+++ linux-2.6.24/mm/memory_hotplug.c2008-02-07 17:17:57.0 -0800
@@ -81,6 +81,14 @@ static int __add_zone(struct zone *zone,
return 0;
 }
 
+static void __remove_zone(struct zone *zone, unsigned long phys_start_pfn)
+{
+   /*
+* TODO - Check to see if the zone is correctly adjusted
+*Need to mark pages reserved ?
+*/
+}
+
 static int __add_section(struct zone *zone, unsigned long phys_start_pfn)
 {
int nr_pages = PAGES_PER_SECTION;
@@ -102,6 +110,16 @@ static int __add_section(struct zone *zo
return register_new_memory(__pfn_to_section(phys_start_pfn));
 }
 
+static void __remove_section(struct zone *zone, unsigned long phys_start_pfn)
+{
+   if (!pfn_valid(phys_start_pfn))
+   return;
+
+   unregister_memory_section(__pfn_to_section(phys_start_pfn));
+   __remove_zone(zone, phys_start_pfn);
+   sparse_remove_one_section(zone, phys_start_pfn, PAGES_PER_SECTION);
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -135,6 +153,21 @@ int __add_pages(struct zone *zone, unsig
 }
 EXPORT_SYMBOL_GPL(__add_pages);
 
+void __remove_pages(struct z

Re: [-mm PATCH] register_memory/unregister_memory clean ups

2008-02-11 Thread Badari Pulavarty

On Mon, 2008-02-11 at 11:48 -0800, Andrew Morton wrote:
> On Mon, 11 Feb 2008 09:23:18 -0800
> Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > Hi Andrew,
> > 
> > While testing hotplug memory remove against -mm, I noticed
> > that unregister_memory() is not cleaning up /sysfs entries
> > correctly. It also de-references structures after destroying
> > them (luckily in the code which never gets used). So, I cleaned
> > up the code and fixed the extra reference issue.
> > 
> > Could you please include it in -mm ?
> > 
> > Thanks,
> > Badari
> > 
> > register_memory()/unregister_memory() never gets called with
> > "root". unregister_memory() is accessing kobject_name of
> > the object just freed up. Since no one uses the code,
> > lets take the code out. And also, make register_memory() static.  
> > 
> > Another bug fix - before calling unregister_memory()
> > remove_memory_block() gets a ref on kobject. unregister_memory()
> > need to drop that ref before calling sysdev_unregister().
> > 
> 
> I'd say this:
> 
> > Subject: [-mm PATCH] register_memory/unregister_memory clean ups
> 
> is rather tame.  These are more than cleanups!  These sound like
> machine-crashing bugs.  Do they crash machines?  How come nobody noticed
> it?
> 

No they don't crash machine - mainly because, they never get called
with "root" argument (where we have the bug). They were never tested
before, since we don't have memory remove work yet. All it does
is, it leave /sysfs directory laying around and causing next
memory add failure. 

Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH] register_memory/unregister_memory clean ups

2008-02-11 Thread Badari Pulavarty

On Mon, 2008-02-11 at 09:54 -0800, Greg KH wrote:
> On Mon, Feb 11, 2008 at 09:23:18AM -0800, Badari Pulavarty wrote:
> > Hi Andrew,
> > 
> > While testing hotplug memory remove against -mm, I noticed
> > that unregister_memory() is not cleaning up /sysfs entries
> > correctly. It also de-references structures after destroying
> > them (luckily in the code which never gets used). So, I cleaned
> > up the code and fixed the extra reference issue.
> > 
> > Could you please include it in -mm ?
> 
> Want me to add this to my tree and send it in my next update for the
> driver core to Linus?
> 
> I'll be glad to do that.
> 
> thanks,
> 
> greg k-h

Please do. Only reason I wanted to push through -mm is, I didn't
test this with mainline (since I have patches in -mm for hotplug 
memory remove).

Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm PATCH] register_memory/unregister_memory clean ups

2008-02-11 Thread Badari Pulavarty

Hi Andrew,

While testing hotplug memory remove against -mm, I noticed
that unregister_memory() is not cleaning up /sysfs entries
correctly. It also de-references structures after destroying
them (luckily in the code which never gets used). So, I cleaned
up the code and fixed the extra reference issue.

Could you please include it in -mm ?

Thanks,
Badari

register_memory()/unregister_memory() never gets called with
"root". unregister_memory() is accessing kobject_name of
the object just freed up. Since no one uses the code,
lets take the code out. And also, make register_memory() static.  

Another bug fix - before calling unregister_memory()
remove_memory_block() gets a ref on kobject. unregister_memory()
need to drop that ref before calling sysdev_unregister().

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 drivers/base/memory.c |   22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

Index: linux-2.6.24/drivers/base/memory.c
===
--- linux-2.6.24.orig/drivers/base/memory.c 2008-02-07 16:59:52.0 
-0800
+++ linux-2.6.24/drivers/base/memory.c  2008-02-08 15:54:45.0 -0800
@@ -62,8 +62,8 @@ void unregister_memory_notifier(struct n
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
-int register_memory(struct memory_block *memory, struct mem_section *section,
-   struct node *root)
+static
+int register_memory(struct memory_block *memory, struct mem_section *section)
 {
int error;
 
@@ -71,26 +71,18 @@ int register_memory(struct memory_block 
memory->sysdev.id = __section_nr(section);
 
error = sysdev_register(&memory->sysdev);
-
-   if (root && !error)
-   error = sysfs_create_link(&root->sysdev.kobj,
- &memory->sysdev.kobj,
- kobject_name(&memory->sysdev.kobj));
-
return error;
 }
 
 static void
-unregister_memory(struct memory_block *memory, struct mem_section *section,
-   struct node *root)
+unregister_memory(struct memory_block *memory, struct mem_section *section)
 {
BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
BUG_ON(memory->sysdev.id != __section_nr(section));
 
+   /* drop the ref. we got in remove_memory_block() */
+   kobject_put(&memory->sysdev.kobj);
sysdev_unregister(&memory->sysdev);
-   if (root)
-   sysfs_remove_link(&root->sysdev.kobj,
- kobject_name(&memory->sysdev.kobj));
 }
 
 /*
@@ -361,7 +353,7 @@ static int add_memory_block(unsigned lon
mutex_init(&mem->state_mutex);
mem->phys_device = phys_device;
 
-   ret = register_memory(mem, section, NULL);
+   ret = register_memory(mem, section);
if (!ret)
ret = mem_create_simple_file(mem, phys_index);
if (!ret)
@@ -415,7 +407,7 @@ int remove_memory_block(unsigned long no
mem_remove_simple_file(mem, state);
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
-   unregister_memory(mem, section, NULL);
+   unregister_memory(mem, section);
 
return 0;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH] sysdev_unregister() should call kobject_del()

2008-02-08 Thread Badari Pulavarty


On Thu, 2008-02-07 at 20:55 -0800, Greg KH wrote:
> On Thu, Feb 07, 2008 at 05:25:46PM -0800, Badari Pulavarty wrote:
> > On Thu, 2008-02-07 at 16:38 -0800, Greg KH wrote:
> > > On Thu, Feb 07, 2008 at 03:56:58PM -0800, Badari Pulavarty wrote:
> > > > Hi Greg,
> > > > 
> > > > While playing with hotplug memory remove on 2.6.24-mm1, I 
> > > > noticed that /sysfs directory entries are not getting removed.
> > > > 
> > > > sysdev_unregister() used to call kobject_unregister().
> > > > But in 2.6.24-mm1, its only dropping the ref. It should
> > > > call kobject_del() to remove the object. Correct ?
> > > > 
> > > > With this change, the directories are getting removed
> > > > correctly. Comments ?
> > > 
> > > Ick, no, this shouldn't be needed, someone else must be holding a
> > > reference to the kobject device somewhere.  See the kobject documenation
> > > for more info.
> > > 
> > > I'll try to see where we grab 2 references...
> > 
> > I will take a closer look then. I was taking easy way out :(
> 
> Hm, I don't see anything obvious in the sys.c core.  What code is
> controlling these objects that you are creating and removing from the
> system?
> 

I found the culprit. Its not in sys.c core :)

remove_memory_block() takes a reference on kobject by calling
find_memory_block(). It then calls  unregister_memory() to free up sysfs
entry.
It drops only one ref (which it got at register time). So it never gets
freed up. I am going to send out a separate patch for it.

Thanks for your help.

Thanks,
Badari



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH] sysdev_unregister() should call kobject_del()

2008-02-08 Thread Badari Pulavarty

On Thu, 2008-02-07 at 21:41 -0800, Greg KH wrote:
> On Thu, Feb 07, 2008 at 09:08:42PM -0800, Badari Pulavarty wrote:
> > 
> > On Thu, 2008-02-07 at 20:55 -0800, Greg KH wrote:
> > > On Thu, Feb 07, 2008 at 05:25:46PM -0800, Badari Pulavarty wrote:
> > > > On Thu, 2008-02-07 at 16:38 -0800, Greg KH wrote:
> > > > > On Thu, Feb 07, 2008 at 03:56:58PM -0800, Badari Pulavarty wrote:
> > > > > > Hi Greg,
> > > > > > 
> > > > > > While playing with hotplug memory remove on 2.6.24-mm1, I 
> > > > > > noticed that /sysfs directory entries are not getting removed.
> > > > > > 
> > > > > > sysdev_unregister() used to call kobject_unregister().
> > > > > > But in 2.6.24-mm1, its only dropping the ref. It should
> > > > > > call kobject_del() to remove the object. Correct ?
> > > > > > 
> > > > > > With this change, the directories are getting removed
> > > > > > correctly. Comments ?
> > > > > 
> > > > > Ick, no, this shouldn't be needed, someone else must be holding a
> > > > > reference to the kobject device somewhere.  See the kobject 
> > > > > documenation
> > > > > for more info.
> > > > > 
> > > > > I'll try to see where we grab 2 references...
> > > > 
> > > > I will take a closer look then. I was taking easy way out :(
> > > 
> > > Hm, I don't see anything obvious in the sys.c core.  What code is
> > > controlling these objects that you are creating and removing from the
> > > system?
> > 
> > add_memory_block()/register_memory() is creating sysfs entries
> > for the memory blocks.
> > 
> > I am trying to make use of remove_memory_block() to clean up
> > sysfs entries. BTW, remove_memory_block() is never tested
> > before, since we don't support memory remove yet.
> 
> So how are you testing the sysdev removal logic then if you can't remove
> memory?

I am now adding code to remove memory, I am making use of the
existing unregister_memory() code and ran into this.
> 
> Oh, one thing, remove the link in unregister_memory, before you call
> sysdev_unregister().  You are trying to get the name of a kobject, and
> the whole object, that has just been blown away, not nice...

I can add remove link before unregister() as you suggested.

Regarding accessing kobject after freeing up, no one calls with "root".
Why not clean it up first, like this ? 

Dave, are you okay with this ?

Thanks,
Badari

register_memory()/unregister_memory() never gets called with
"root". unregister_memory() is accessing kobject_name of
the object just freed up. Since no one uses the code,
lets take the code out. And also, make register_memory()
static (since unregister_memory() is already static).  

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 drivers/base/memory.c |   20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

Index: linux-2.6.24/drivers/base/memory.c
===
--- linux-2.6.24.orig/drivers/base/memory.c 2008-02-07 16:59:52.0 
-0800
+++ linux-2.6.24/drivers/base/memory.c  2008-02-08 08:26:33.0 -0800
@@ -62,8 +62,8 @@ void unregister_memory_notifier(struct n
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
-int register_memory(struct memory_block *memory, struct mem_section *section,
-   struct node *root)
+static
+int register_memory(struct memory_block *memory, struct mem_section *section)
 {
int error;
 
@@ -71,26 +71,16 @@ int register_memory(struct memory_block 
memory->sysdev.id = __section_nr(section);
 
error = sysdev_register(&memory->sysdev);
-
-   if (root && !error)
-   error = sysfs_create_link(&root->sysdev.kobj,
- &memory->sysdev.kobj,
- kobject_name(&memory->sysdev.kobj));
-
return error;
 }
 
 static void
-unregister_memory(struct memory_block *memory, struct mem_section *section,
-   struct node *root)
+unregister_memory(struct memory_block *memory, struct mem_section *section)
 {
BUG_ON(memory->sysdev.cls != &memory_sysdev_class);
BUG_ON(memory->sysdev.id != __section_nr(section));
 
sysdev_unregister(&memory->sysdev);
-   if (root)
-   sysfs_remove_link(&root->sysdev.kobj,
- kobject_name(&memory->sysdev.kobj));
 }
 
 /*
@@ -361,7 +351,7 @@ static int add_memory_block(

Re: [-mm PATCH] sysdev_unregister() should call kobject_del()

2008-02-07 Thread Badari Pulavarty


On Thu, 2008-02-07 at 20:55 -0800, Greg KH wrote:
> On Thu, Feb 07, 2008 at 05:25:46PM -0800, Badari Pulavarty wrote:
> > On Thu, 2008-02-07 at 16:38 -0800, Greg KH wrote:
> > > On Thu, Feb 07, 2008 at 03:56:58PM -0800, Badari Pulavarty wrote:
> > > > Hi Greg,
> > > > 
> > > > While playing with hotplug memory remove on 2.6.24-mm1, I 
> > > > noticed that /sysfs directory entries are not getting removed.
> > > > 
> > > > sysdev_unregister() used to call kobject_unregister().
> > > > But in 2.6.24-mm1, its only dropping the ref. It should
> > > > call kobject_del() to remove the object. Correct ?
> > > > 
> > > > With this change, the directories are getting removed
> > > > correctly. Comments ?
> > > 
> > > Ick, no, this shouldn't be needed, someone else must be holding a
> > > reference to the kobject device somewhere.  See the kobject documenation
> > > for more info.
> > > 
> > > I'll try to see where we grab 2 references...
> > 
> > I will take a closer look then. I was taking easy way out :(
> 
> Hm, I don't see anything obvious in the sys.c core.  What code is
> controlling these objects that you are creating and removing from the
> system?

add_memory_block()/register_memory() is creating sysfs entries
for the memory blocks.

I am trying to make use of remove_memory_block() to clean up
sysfs entries. BTW, remove_memory_block() is never tested
before, since we don't support memory remove yet.

Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm PATCH] sysdev_unregister() should call kobject_del()

2008-02-07 Thread Badari Pulavarty

On Thu, 2008-02-07 at 16:38 -0800, Greg KH wrote:
> On Thu, Feb 07, 2008 at 03:56:58PM -0800, Badari Pulavarty wrote:
> > Hi Greg,
> > 
> > While playing with hotplug memory remove on 2.6.24-mm1, I 
> > noticed that /sysfs directory entries are not getting removed.
> > 
> > sysdev_unregister() used to call kobject_unregister().
> > But in 2.6.24-mm1, its only dropping the ref. It should
> > call kobject_del() to remove the object. Correct ?
> > 
> > With this change, the directories are getting removed
> > correctly. Comments ?
> 
> Ick, no, this shouldn't be needed, someone else must be holding a
> reference to the kobject device somewhere.  See the kobject documenation
> for more info.
> 
> I'll try to see where we grab 2 references...

I will take a closer look then. I was taking easy way out :(

Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm PATCH] sysdev_unregister() should call kobject_del()

2008-02-07 Thread Badari Pulavarty

Hi Greg,

While playing with hotplug memory remove on 2.6.24-mm1, I 
noticed that /sysfs directory entries are not getting removed.

sysdev_unregister() used to call kobject_unregister().
But in 2.6.24-mm1, its only dropping the ref. It should
call kobject_del() to remove the object. Correct ?

With this change, the directories are getting removed
correctly. Comments ?

Thanks,
Badari

sysdev_unregister() should call kobject_del() to remove
the object.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 drivers/base/sys.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.24/drivers/base/sys.c
===
--- linux-2.6.24.orig/drivers/base/sys.c2008-02-05 09:56:56.0 
-0800
+++ linux-2.6.24/drivers/base/sys.c 2008-02-07 15:38:17.0 -0800
@@ -265,6 +265,7 @@ void sysdev_unregister(struct sys_device
}
mutex_unlock(&sysdev_drivers_lock);
 
+   kobject_del(&sysdev->kobj);
kobject_put(&sysdev->kobj);
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drivers/base: export (un)register_memory_notifier

2008-02-01 Thread Badari Pulavarty

On Fri, 2008-02-01 at 17:16 +0100, Jan-Bernd Themann wrote:
> Drivers like eHEA need memory notifiers in order to 
> update their internal DMA memory map when memory is added
> to or removed from the system.
> 
> Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
> 
> ---
>  Comment: eHEA patches that exploit these functions will follow
> 
> 
>  drivers/base/memory.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 7ae413f..1e1bd4c 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -52,11 +52,13 @@ int register_memory_notifier(struct notifier_block *nb)
>  {
>  return blocking_notifier_chain_register(&memory_chain, nb);
>  }
> +EXPORT_SYMBOL(register_memory_notifier);
> 
>  void unregister_memory_notifier(struct notifier_block *nb)
>  {
>  blocking_notifier_chain_unregister(&memory_chain, nb);
>  }
> +EXPORT_SYMBOL(unregister_memory_notifier);
> 
>  /*
>   * register_memory - Setup a sysfs device for a memory block

Is there a reason for not making them EXPORT_SYMBOL_GPL() ?
Otherwise, looks good to me.

I have been planning to send this as part of my next update
with ppc64 arch-specific remove support and generic __remove_pages()
support. If this is blocking your work, lets get this in.

Acked-by: Badari Pulavarty <[EMAIL PROTECTED]>

Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dio: falling through to buffered I/O when invalidation of a page fails

2007-12-14 Thread Badari Pulavarty

On Tue, 2007-12-11 at 17:00 -0800, Zach Brown wrote:
> Hisashi Hifumi wrote:
> > Hi.
> > 
> > Current dio has some problems:
> > 1, In ext3 ordered, dio write can return with EIO because of the race
> > between invalidation of
> > a page and jbd. jbd pins the bhs while committing journal so
> > try_to_release_page fails when jbd
> > is committing the transaction.
> 
> Yeah.  It sure would be fantastic if some ext3 expert could stop this
> from happening somehow.  But that hasn't happened in.. uh.. Badari, for
> how many years has this been on the radar? :)

I used to have a test case that would reproduce the problem some what
consistently. But with invalidate_range() introduction and Jan Kara's
re-write of journal commit handling, I can't reproduce the problem
anymore. So I gave up fixing the problem which I can't reproduce.

If anyone has a testcase - I can take a look at the problem again.

Thanks,
Badari



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 kobject changes broken with hvcs driver on powerpc

2007-12-06 Thread Badari Pulavarty

On Thu, 2007-12-06 at 12:31 -0800, Greg KH wrote:
> On Fri, Dec 07, 2007 at 12:28:58AM +0530, Balbir Singh wrote:
> > Greg KH wrote:
> > 
> > >> Why release the spinlock here? It's done after the count is incremented.
> > >> This patch does not seem correct.
> > > 
> > > Doh, you are correct, I'll make sure that I fix this up before applying
> > > it.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > Hi, Greg,
> > 
> > I ran some tests with the fixed up version of this patch and the system
> > fails to come up.
> > 
> > I see the WARN_ON in lib/kref.c:33 and the system fails to boot beyond
> > that point. I have not yet found time to debug it though.
> 
> That's not good, that warning means that someone has tried to use this
> kref _before_ it was initialized, so there is a logic error in the code
> that was previously being papered over with the lack of this message in
> the kobject code.
> 
> I do have this same message availble as a patch for the kobject core, it
> would be interesting if you could just run 2.6.24-rc4 with just this
> patch:
>   
> http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver/kobject-warn.patch
> 
> it might take some fuzz to fit properly, but all you really want to do
> is add:
>   WARN_ON(atomic_read(&kobj->kref.refcount));
> before the kref_init() call in kobject_init().
> 
> thanks,
> 
> greg k-h

2.6.24-rc4 with above patch booted fine without any warnings. 
But 2.6.24-rc4-mm1 doesn't boot, it hangs after following messages.


e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
ipr :d0:01.0: Found IOA with IRQ: 119
ipr :d0:01.0: Starting IOA initialization sequence.
ipr :d0:01.0: Adapter firmware version: 020A005E
ipr :d0:01.0: IOA initialized.
scsi0 : IBM 570B Storage Adapter
scsi 0:0:3:0: Direct-Access IBM   H0 HUS103014FL3800  RPQF PQ: 0 ANSI: 4
scsi 0:0:5:0: Direct-Access IBM   H0 HUS103014FL3800  RPQF PQ: 0 ANSI: 4
scsi 0:0:8:0: Direct-Access IBM   H0 HUS103014FL3800  RPQF PQ: 0 ANSI: 4
scsi 0:0:15:0: Enclosure IBM  VSBPD4E2  U4SCSI 7134 PQ: 0 ANSI: 2
[ cut here ]
Badness at lib/kref.c:33
NIP: c02e1254 LR: c02dfbd8 CTR: c02e60f0
REGS: c0003f0db050 TRAP: 0700   Not tainted  (2.6.24-rc4-mm1)
MSR: 80029032   CR: 28002042  XER: 000f
TASK = c0003f0d78d0[1] 'swapper' THREAD: c0003f0d8000 CPU: 0
GPR00:  c0003f0db2d0 c0724098 c0003f131620
GPR04: fff1 fffe 000a 
GPR08: c0003d4d9000 c0003f0cbfe0 c0556591 0073
GPR12: 24002084 c0651980  
GPR16:  d8008008 c064d6f0 c0003d4d9570
GPR20: c0003d4d94b8 0002 c0003d4d9170 c0003d4d9170
GPR24: c0003d4d9000 0001 c0003d570d58 c0003d570d18
GPR28:  c0003d4d9260 c06b5400 c0003f131618
NIP [c02e1254] .kref_get+0x10/0x2c
LR [c02dfbd8] .kobject_get+0x24/0x40
Call Trace:
[c0003f0db2d0] [c0003f0db360] 0xc0003f0db360 (unreliable)
[c0003f0db350] [c02e00e8] .kobject_add+0x8c/0x21c
[c0003f0db3e0] [c0344b00] .device_add+0xd4/0x680
[c0003f0db4a0] [c03a1c4c] .scsi_alloc_target+0x218/0x404
[c0003f0db570] [c03a1fb4] .__scsi_scan_target+0xa8/0x640
[c0003f0db6b0] [c03a25c4] .scsi_scan_channel+0x78/0xdc
[c0003f0db750] [c03a26f8] .scsi_scan_host_selected+0xd0/0x140
[c0003f0db7f0] [c03c3ff4] .ipr_probe+0x1270/0x1348
[c0003f0db960] [c02f4808] .pci_device_probe+0x124/0x194
[c0003f0dba10] [c0347e8c] .driver_probe_device+0x110/0x1f0
[c0003f0dbaa0] [c0348014] .__driver_attach+0xa8/0x134
[c0003f0dbb30] [c03472ac] .bus_for_each_dev+0x80/0xd0
[c0003f0dbbe0] [c0347c14] .driver_attach+0x28/0x40
[c0003f0dbc60] [c0346788] .bus_add_driver+0xfc/0x2d0
[c0003f0dbd10] [c03482cc] .driver_register+0x80/0x9c
[c0003f0dbd90] [c02f4bb0] .__pci_register_driver+0x5c/0xcc
[c0003f0dbe20] [c0604b38] .ipr_init+0x38/0x50
[c0003f0dbea0] [c05d6428] .kernel_init+0x214/0x3ec
[c0003f0dbf90] [c0026734] .kernel_thread+0x4c/0x68
Instruction dump:
e8410028 3921 38210080 7d234b78 e8010010 ebc1fff0 7c0803a6 4e800020
8003 7c0007b4 2f80 409e0008 <0fe0> 7c001828 3001


Thanks,
Badari

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rc4-mm1 kobject changes broken with hvcs driver on powerpc

2007-12-06 Thread Badari Pulavarty

On Fri, 2007-12-07 at 00:28 +0530, Balbir Singh wrote:
> Greg KH wrote:
> 
> >> Why release the spinlock here? It's done after the count is incremented.
> >> This patch does not seem correct.
> > 
> > Doh, you are correct, I'll make sure that I fix this up before applying
> > it.
> > 
> > thanks,
> > 
> > greg k-h
> 
> Hi, Greg,
> 
> I ran some tests with the fixed up version of this patch and the system
> fails to come up.
> 
> I see the WARN_ON in lib/kref.c:33 and the system fails to boot beyond
> that point. I have not yet found time to debug it though.


Are you running into same issue, I am getting on my machine ? Are you
using IPR driver ?

Thanks,
Badari


e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
ipr: IBM Power RAID SCSI Device Driver version: 2.4.1 (April 24, 2007)
ipr :d0:01.0: Found IOA with IRQ: 119
ipr :d0:01.0: Starting IOA initialization sequence.
ipr :d0:01.0: Adapter firmware version: 020A005E
ipr :d0:01.0: IOA initialized.
scsi0 : IBM 570B Storage Adapter
scsi 0:0:3:0: Direct-Access IBM   H0 HUS103014FL3800  RPQF PQ: 0 ANSI: 4
scsi 0:0:5:0: Direct-Access IBM   H0 HUS103014FL3800  RPQF PQ: 0 ANSI: 4
scsi 0:0:8:0: Direct-Access IBM   H0 HUS103014FL3800  RPQF PQ: 0 ANSI: 4
scsi 0:0:15:0: Enclosure IBM  VSBPD4E2  U4SCSI 7134 PQ: 0 ANSI: 2
[ cut here ]
Badness at lib/kref.c:33
NIP: c02e1254 LR: c02dfbd8 CTR: c02e60f0
REGS: c0003f0db050 TRAP: 0700   Not tainted  (2.6.24-rc4-mm1)
MSR: 80029032   CR: 28002042  XER: 000f
TASK = c0003f0d78d0[1] 'swapper' THREAD: c0003f0d8000 CPU: 0
GPR00:  c0003f0db2d0 c0724098 c0003f131620
GPR04: fff1 fffe 000a 
GPR08: c0003d4d9000 c0003f0cbfe0 c0556591 0073
GPR12: 24002084 c0651980  
GPR16:  d8008008 c064d6f0 c0003d4d9570
GPR20: c0003d4d94b8 0002 c0003d4d9170 c0003d4d9170
GPR24: c0003d4d9000 0001 c0003d570d58 c0003d570d18
GPR28:  c0003d4d9260 c06b5400 c0003f131618
NIP [c02e1254] .kref_get+0x10/0x2c
LR [c02dfbd8] .kobject_get+0x24/0x40
Call Trace:
[c0003f0db2d0] [c0003f0db360] 0xc0003f0db360 (unreliable)
[c0003f0db350] [c02e00e8] .kobject_add+0x8c/0x21c
[c0003f0db3e0] [c0344b00] .device_add+0xd4/0x680
[c0003f0db4a0] [c03a1c4c] .scsi_alloc_target+0x218/0x404
[c0003f0db570] [c03a1fb4] .__scsi_scan_target+0xa8/0x640
[c0003f0db6b0] [c03a25c4] .scsi_scan_channel+0x78/0xdc
[c0003f0db750] [c03a26f8] .scsi_scan_host_selected+0xd0/0x140
[c0003f0db7f0] [c03c3ff4] .ipr_probe+0x1270/0x1348
[c0003f0db960] [c02f4808] .pci_device_probe+0x124/0x194
[c0003f0dba10] [c0347e8c] .driver_probe_device+0x110/0x1f0
[c0003f0dbaa0] [c0348014] .__driver_attach+0xa8/0x134
[c0003f0dbb30] [c03472ac] .bus_for_each_dev+0x80/0xd0
[c0003f0dbbe0] [c0347c14] .driver_attach+0x28/0x40
[c0003f0dbc60] [c0346788] .bus_add_driver+0xfc/0x2d0
[c0003f0dbd10] [c03482cc] .driver_register+0x80/0x9c
[c0003f0dbd90] [c02f4bb0] .__pci_register_driver+0x5c/0xcc
[c0003f0dbe20] [c0604b38] .ipr_init+0x38/0x50
[c0003f0dbea0] [c05d6428] .kernel_init+0x214/0x3ec
[c0003f0dbf90] [c0026734] .kernel_thread+0x4c/0x68
Instruction dump:
e8410028 3921 38210080 7d234b78 e8010010 ebc1fff0 7c0803a6 4e800020
8003 7c0007b4 2f80 409e0008 <0fe0> 7c001828 3001



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm snapshot broken-out-2007-11-06-02-32.tar.gz uploaded

2007-11-06 Thread Badari Pulavarty

On Tue, 2007-11-06 at 02:33 -0800, [EMAIL PROTECTED] wrote:
> The mm snapshot broken-out-2007-11-06-02-32.tar.gz has been uploaded to
> 
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-06-02-32.tar.gz
> 
> It contains the following patches against 2.6.24-rc1:


Getting OOPS on shutdown.

Unable to handle kernel paging request for data at address 0x60040
Faulting instruction address: 0xc0341a68
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c0341a68 LR: c0341ab4 CTR: c02e8904
REGS: c8403910 TRAP: 0300   Not tainted  (2.6.24-rc1-mm1)
MSR: 80009032   CR: 24002444  XER: 2001
DAR: 00060040, DSISR: 4000
TASK = cdff12a0[12966] 'reboot' THREAD: c840 CPU: 3
GPR00: c3131480 c8403b90 c071cb38 c331c000
GPR04: 0001  c07dac20 c000f528
GPR08: c31310a8 0006 c31311a8 c077c760
GPR12: 44002428 c0648c80 0001 
GPR16: 1009ba50  1007 
GPR20:  0001  
GPR24: 0001   4000
GPR28: fee1dead 28121969 c06b2368 c0621bd0
NIP [c0341a68] .device_shutdown+0x4c/0xd4
LR [c0341ab4] .device_shutdown+0x98/0xd4
Call Trace:
[c8403b90] [c0341ab4] .device_shutdown+0x98/0xd4 (unreliable)
[c8403c10] [c0067780] .kernel_restart+0x40/0x98
[c8403c90] [c00679f4] .sys_reboot+0x214/0x25c
[c8403e30] [c000852c] syscall_exit+0x0/0x40
Instruction dump:
6000 6000 e969 e92b0008 3909ff00 7fa95800 e9280108 3be9ff00
419e0080 e9280178 2fa9 419e0014  7d034378 2fa9 409e0020


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm snapshot broken-out-2007-11-06-02-32 - powerpc link failure

2007-11-06 Thread Badari Pulavarty

On Tue, 2007-11-06 at 23:47 +0530, Kamalesh Babulal wrote:
> Hi Andrew,
> 
> The kernel linking fails on the powerpc, with following error message
> 
>   CC  init/version.o
>   LD  init/built-in.o
>   LD  .tmp_vmlinux1
> arch/powerpc/kernel/built-in.o(.toc+0x1550): undefined reference to 
> `devices_subsys'
> make: *** [.tmp_vmlinux1] Error 1

I sent a fix for this 30 minutes ago ..

Here it is anyway.

Thanks,
Badari

---
 arch/powerpc/kernel/vio.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.24-rc1/arch/powerpc/kernel/vio.c
===
--- linux-2.6.24-rc1.orig/arch/powerpc/kernel/vio.c 2007-10-23 
20:50:57.0 -0700
+++ linux-2.6.24-rc1/arch/powerpc/kernel/vio.c  2007-11-06 10:31:56.0 
-0800
@@ -37,7 +37,7 @@
 #include 
 #include 
 
-extern struct kset devices_subsys; /* needed for vio_find_name() */
+extern struct kset *devices_kset;  /* needed for vio_find_name() */
 
 static struct bus_type vio_bus_type;
 
@@ -369,7 +369,7 @@ static struct vio_dev *vio_find_name(con
 {
struct kobject *found;
 
-   found = kset_find_obj(&devices_subsys, kobj_name);
+   found = kset_find_obj(devices_kset, kobj_name);
if (!found)
return NULL;
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm snapshot broken-out-2007-11-06-02-32.tar.gz uploaded

2007-11-06 Thread Badari Pulavarty

On Tue, 2007-11-06 at 02:33 -0800, [EMAIL PROTECTED] wrote:
> The mm snapshot broken-out-2007-11-06-02-32.tar.gz has been uploaded to
> 
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-06-02-32.tar.gz
> 

> gregkh-driver-kset-convert-sys-devices-to-use-kset_create.patch

Above patch renamed devices_subsys to devices_kset to catch all users of
the variable. Need fixes to vio.


# make -j8 zImage
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
:1389:2: warning: #warning syscall revokeat not implemented
:1393:2: warning: #warning syscall frevoke not implemented
  CHK include/linux/compile.h
  GEN .version
  CHK include/linux/compile.h
  UPD include/linux/compile.h
  CC  init/version.o
  LD  init/built-in.o
  LD  .tmp_vmlinux1
arch/powerpc/kernel/built-in.o(.toc+0x1548): undefined reference to
`devices_subsys'
make: *** [.tmp_vmlinux1] Error 1


Here is the patch, Is this correct usage ?

Thanks,
Badari

---
 arch/powerpc/kernel/vio.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.24-rc1/arch/powerpc/kernel/vio.c
===
--- linux-2.6.24-rc1.orig/arch/powerpc/kernel/vio.c 2007-10-23 
20:50:57.0 -0700
+++ linux-2.6.24-rc1/arch/powerpc/kernel/vio.c  2007-11-06 10:31:56.0 
-0800
@@ -37,7 +37,7 @@
 #include 
 #include 
 
-extern struct kset devices_subsys; /* needed for vio_find_name() */
+extern struct kset *devices_kset;  /* needed for vio_find_name() */
 
 static struct bus_type vio_bus_type;
 
@@ -369,7 +369,7 @@ static struct vio_dev *vio_find_name(con
 {
struct kobject *found;
 
-   found = kset_find_obj(&devices_subsys, kobj_name);
+   found = kset_find_obj(devices_kset, kobj_name);
if (!found)
return NULL;
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm snapshot broken-out-2007-11-06-02-32.tar.gz uploaded - build failure - rpadlpar_sysfs

2007-11-06 Thread Badari Pulavarty

On Tue, 2007-11-06 at 21:04 +0530, Kamalesh Babulal wrote:
> Hi Andrew,
> 
> The build fails with following error
>  
>   CC  drivers/pci/hotplug/rpadlpar_sysfs.o
> drivers/pci/hotplug/rpadlpar_sysfs.c:133: error: initializer element is not 
> constant
> drivers/pci/hotplug/rpadlpar_sysfs.c:133: error: (near initialization for 
> `dlpar_io_kset.kobj.parent')
> drivers/pci/hotplug/rpadlpar_sysfs.c:133: error: initializer element is not 
> constant
> drivers/pci/hotplug/rpadlpar_sysfs.c:133: error: (near initialization for 
> `dlpar_io_kset.kobj')
> drivers/pci/hotplug/rpadlpar_sysfs.c:134: error: unknown field `ktype' 
> specified in initializer
> drivers/pci/hotplug/rpadlpar_sysfs.c:134: warning: initialization from 
> incompatible pointer type
> make[3]: *** [drivers/pci/hotplug/rpadlpar_sysfs.o] Error 1
> make[2]: *** [drivers/pci/hotplug] Error 2
> make[1]: *** [drivers/pci] Error 2
> make: *** [drivers] Error 2
> 
> The patch, 
> gregkh-driver-kset-convert-pci-hotplug-to-use-kset_create_and_register.patch
> is causing the build failure.

Here is the fix (against 24-rc1 mm-brokenout). Can you try it ?

Thanks,
Badari

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> 
---
 drivers/pci/hotplug/rpadlpar_sysfs.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-2.6.24-rc1/drivers/pci/hotplug/rpadlpar_sysfs.c
===
--- linux-2.6.24-rc1.orig/drivers/pci/hotplug/rpadlpar_sysfs.c  2007-11-06 
10:09:10.0 -0800
+++ linux-2.6.24-rc1/drivers/pci/hotplug/rpadlpar_sysfs.c   2007-11-06 
10:11:33.0 -0800
@@ -129,14 +129,13 @@ struct kobj_type ktype_dlpar_io = {
 };
 
 struct kset dlpar_io_kset = {
-   .kobj = {.ktype = &ktype_dlpar_io,
-.parent = &pci_hotplug_slots_kset->kobj},
-   .ktype = &ktype_dlpar_io,
+   .kobj = {.ktype = &ktype_dlpar_io},
 };
 
 int dlpar_sysfs_init(void)
 {
kobject_set_name(&dlpar_io_kset.kobj, DLPAR_KOBJ_NAME);
+   dlpar_io_kset.kobj.parent = &pci_hotplug_slots_kset->kobj;
if (kset_register(&dlpar_io_kset)) {
printk(KERN_ERR "rpadlpar_io: cannot register kset for %s\n",
kobject_name(&dlpar_io_kset.kobj));


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm snapshot broken-out-2007-11-06-02-32.tar.gz uploaded - build fails on powerpc

2007-11-06 Thread Badari Pulavarty

On Tue, 2007-11-06 at 18:27 +0530, Kamalesh Babulal wrote:
> [EMAIL PROTECTED] wrote:
> > powerpc-move-_rtc_time-routines-under-config_adb_cuda.patch
> 
>   CC  net/9p/error.o
> arch/powerpc/platforms/powermac/time.c:168: error: implicit declaration of 
> function ‘from_rtc_time’
> arch/powerpc/platforms/powermac/time.c:225: error: implicit declaration of 
> function ‘to_rtc_time’
> make[2]: *** [arch/powerpc/platforms/powermac/time.o] Error 1
> make[1]: *** [arch/powerpc/platforms/powermac] Error 2
> make: *** [arch/powerpc/platforms] Error 2
> 
> The above patch causes the build failure, because the from_rtc_time() and 
> to_rtc_time() are
> moved under the ifdef CONFIG_ADB_CUDA, but they are begin called in the 
> pmu_set_rtc_time() and
> pmac_get_rtc_time() under CONFIG_ADB_PMU.


Yes. My fault. This patch needs to be dropped. Sorry.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add IORESOUCE_BUSY flag for System RAM (Re: [Question] How to represent SYSTEM_RAM in kerenel/resouce.c)

2007-11-01 Thread Badari Pulavarty

On Thu, 2007-11-01 at 18:21 +0900, Yasunori Goto wrote:
> Hello.
> 
> I was asked from Kame-san to write this patch.
> 
> Please apply.
> 
> -
> i386 and x86-64 registers System RAM as IORESOURCE_MEM | IORESOURCE_BUSY.
> 
> But ia64 registers it as IORESOURCE_MEM only.
> In addition, memory hotplug code registers new memory as IORESOURCE_MEM too.
> 
> This patch adds IORESOURCE_BUSY for them to avoid potential overlap mapping
> by PCI device.
> 
> Signed-off-by: Yasunori Goto <[EMAIL PROTECTED]>
> 
> ---
>  arch/ia64/kernel/efi.c |6 ++
>  mm/memory_hotplug.c|2 +-
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> Index: current/arch/ia64/kernel/efi.c
> ===
> --- current.orig/arch/ia64/kernel/efi.c   2007-11-01 15:24:05.0 
> +0900
> +++ current/arch/ia64/kernel/efi.c2007-11-01 15:24:18.0 +0900
> @@ -,7 +,7 @@ efi_initialize_iomem_resources(struct re
>   if (md->num_pages == 0) /* should not happen */
>   continue;
> 
> - flags = IORESOURCE_MEM;
> + flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>   switch (md->type) {
> 
>   case EFI_MEMORY_MAPPED_IO:
> @@ -1133,12 +1133,11 @@ efi_initialize_iomem_resources(struct re
> 
>   case EFI_ACPI_MEMORY_NVS:
>   name = "ACPI Non-volatile Storage";
> - flags |= IORESOURCE_BUSY;
>   break;
> 
>   case EFI_UNUSABLE_MEMORY:
>   name = "reserved";
> - flags |= IORESOURCE_BUSY | IORESOURCE_DISABLED;
> + flags |= IORESOURCE_DISABLED;
>   break;
> 
>   case EFI_RESERVED_TYPE:
> @@ -1147,7 +1146,6 @@ efi_initialize_iomem_resources(struct re
>   case EFI_ACPI_RECLAIM_MEMORY:
>   default:
>   name = "reserved";
> - flags |= IORESOURCE_BUSY;
>   break;
>   }
> 
> Index: current/mm/memory_hotplug.c
> ===
> --- current.orig/mm/memory_hotplug.c  2007-11-01 15:24:16.0 +0900
> +++ current/mm/memory_hotplug.c   2007-11-01 15:41:27.0 +0900
> @@ -39,7 +39,7 @@ static struct resource *register_memory_
>   res->name = "System RAM";
>   res->start = start;
>   res->end = start + size - 1;
> - res->flags = IORESOURCE_MEM;
> + res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>   if (request_resource(&iomem_resource, res) < 0) {
>   printk("System RAM resource %llx - %llx cannot be added\n",
>   (unsigned long long)res->start, (unsigned long long)res->end);
> 


Not quite.. You need following patch on top of this to make
hotplug memory remove work on ia64/x86-64.

Thanks,
Badari

Once you mark memory resource BUSY, walk_memory_resource() won't be able
to find it. 

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 kernel/resource.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.24-rc1/kernel/resource.c
===
--- linux-2.6.24-rc1.orig/kernel/resource.c 2007-10-23 20:50:57.0 
-0700
+++ linux-2.6.24-rc1/kernel/resource.c  2007-11-01 08:19:59.0 -0700
@@ -277,7 +277,7 @@ walk_memory_resource(unsigned long start
int ret = -1;
res.start = (u64) start_pfn << PAGE_SHIFT;
res.end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
-   res.flags = IORESOURCE_MEM;
+   res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
orig_end = res.end;
while ((res.start < res.end) && (find_next_system_ram(&res) >= 0)) {
pfn = (unsigned long)(res.start >> PAGE_SHIFT);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] vortex_up should initialize "err"

2007-10-17 Thread Badari Pulavarty

Simple compile warning fix. (against 2.6.23-git12)

Thanks,
Badari

vortex_up() should initialize 'err' for a successful return.

drivers/net/3c59x.c: In function `vortex_up':
drivers/net/3c59x.c:1494: warning: `err' might be used uninitialized in this 
function


Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 drivers/net/3c59x.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23/drivers/net/3c59x.c
===
--- linux-2.6.23.orig/drivers/net/3c59x.c   2007-10-17 15:33:07.0 
-0700
+++ linux-2.6.23/drivers/net/3c59x.c2007-10-17 16:07:10.0 -0700
@@ -1491,7 +1491,7 @@ vortex_up(struct net_device *dev)
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr = vp->ioaddr;
unsigned int config;
-   int i, mii_reg1, mii_reg5, err;
+   int i, mii_reg1, mii_reg5, err = 0;
 
if (VORTEX_PCI(vp)) {
pci_set_power_state(VORTEX_PCI(vp), PCI_D0);/* Go active */


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Move _rtc_time() routines under CONFIG_ADB_CUDA

2007-10-17 Thread Badari Pulavarty

Fix to clean up compile warnings (against 2.6.23-git12)

Thanks,
Badari

to_rtc_time() and from_rtc_time() seems to be used only if CONFIG_ADB_CUDA
defined. Moving them under that ifdef.

arch/powerpc/platforms/powermac/time.c:88: warning: `to_rtc_time' defined but 
not used
arch/powerpc/platforms/powermac/time.c:95: warning: `from_rtc_time' defined but 
not used

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 arch/powerpc/platforms/powermac/time.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23/arch/powerpc/platforms/powermac/time.c
===
--- linux-2.6.23.orig/arch/powerpc/platforms/powermac/time.c2007-10-09 
13:31:38.0 -0700
+++ linux-2.6.23/arch/powerpc/platforms/powermac/time.c 2007-10-17 
15:59:56.0 -0700
@@ -84,6 +84,7 @@ long __init pmac_time_init(void)
return delta;
 }
 
+#ifdef CONFIG_ADB_CUDA
 static void to_rtc_time(unsigned long now, struct rtc_time *tm)
 {
to_tm(now, tm);
@@ -97,7 +98,6 @@ static unsigned long from_rtc_time(struc
  tm->tm_hour, tm->tm_min, tm->tm_sec);
 }
 
-#ifdef CONFIG_ADB_CUDA
 static unsigned long cuda_get_time(void)
 {
struct adb_request req;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ip_gra_reasm() should set "err" incase of skb_clone() failure

2007-10-17 Thread Badari Pulavarty

Simple error handling fix (against 2.26.23-git12).

Thanks,
Badari

Need to initialize "err" in case of skb_clone() failure.

net/ipv4/ip_fragment.c: In function `ip_defrag':
net/ipv4/ip_fragment.c:540: warning: `err' might be used uninitialized in this 
function

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 net/ipv4/ip_fragment.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.23/net/ipv4/ip_fragment.c
===
--- linux-2.6.23.orig/net/ipv4/ip_fragment.c2007-10-17 15:33:27.0 
-0700
+++ linux-2.6.23/net/ipv4/ip_fragment.c 2007-10-17 15:50:51.0 -0700
@@ -544,6 +544,7 @@ static int ip_frag_reasm(struct ipq *qp,
/* Make the one we just received the head. */
if (prev) {
head = prev->next;
+   err = -ENOMEM;
fp = skb_clone(head, GFP_ATOMIC);
 
if (!fp)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-git11 compile issues

2007-10-17 Thread Badari Pulavarty

On Wed, 2007-10-17 at 18:38 +0300, Ismail Dönmez wrote:
> Wednesday 17 October 2007 Tarihinde 18:33:18 yazmıştı:
> > Known issue ?
> >
> >
> > Thanks,
> > Badari
> >
> >   CHK include/linux/version.h
> >   CHK include/linux/utsrelease.h
> >   CC  arch/x86/kernel/asm-offsets.s
> > In file included from arch/x86/kernel/asm-offsets_64.c:7,
> >  from arch/x86/kernel/asm-offsets.c:4:
> > include/linux/crypto.h:20:24: error: asm/atomic.h: No such file or
> > directory In file included from include/linux/types.h:14,
> 
> Please try running make mrproper.

That fixed it. Thank you.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c

2007-10-05 Thread Badari Pulavarty

On Fri, 2007-10-05 at 15:41 +0200, Valerie Clement wrote:
> Badari Pulavarty wrote:
> > On Thu, 2007-10-04 at 18:13 +0200, Valerie Clement wrote:
> >> While running ffsb tests on my ext4 filesystem, I got an Oops in 
> >> cache_alloc_refill().
> >> I turned on SLAB debugging and here is the message I got:
> >>
> >> slab: Internal list corruption detected in cache 'buffer_head'(30), 
> >> slabp 81007e100100(1515870810). Hexdump:
> > 
> > slabp->inuse = 1515870810 looks bogus. Is this easily reproducible ?
> 
> Hi Badari,
> Thanks for your answer.
> I didn't reproduce it without the latest ext4 patches. So I suspect a 
> bug in one of them.
> But how debugging this?
> Which other debug traces can I turn on?

Let me understand. You applied latest ext4 patchsets ? If so, Mingming
has some slab-cleanup changes in the patchset. You can try backing them
out and see. 

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c

2007-10-04 Thread Badari Pulavarty

On Thu, 2007-10-04 at 18:13 +0200, Valerie Clement wrote:
> While running ffsb tests on my ext4 filesystem, I got an Oops in 
> cache_alloc_refill().
> I turned on SLAB debugging and here is the message I got:
> 
> slab: Internal list corruption detected in cache 'buffer_head'(30), 
> slabp 81007e100100(1515870810). Hexdump:

slabp->inuse = 1515870810 looks bogus. Is this easily reproducible ?
What tests are you running through ffsb ?

> 000: 5a 5a 5a 5a 5a 5a 5a 5a b8 23 34 7e 00 81 ff ff
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 020: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 030: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 040: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 050: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 060: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a a5
> 070: c0 88 56 63 c5 56 41 d8 f1 37 4a 80 ff ff ff ff
> 080: c0 88 56 63 c5 56 41 d8 80 33 53 7d 00 81 ff ff
> 090: e8 25 60 7d 00 81 ff ff 68 cb 3b 01 00 81 ff ff
> 0a0: 18 68 50 7d 00 81 ff ff
> [ cut here ]
> kernel BUG at /home/clementv/src/linux-2.6.23-rc9/mm/slab.c:2923!
> invalid opcode:  [1] SMP
> CPU 2
> Modules linked in: qla2xxx
> Pid: 4041, comm: ffsb Not tainted 2.6.23-rc9 #2
> RIP: 0010:[]  [] check_slabp+0xb5/0xc1
> RSP: 0018:8100774bb958  EFLAGS: 00010096
> RAX: 0001 RBX: 81007e100100 RCX: 6d20
> RDX:  RSI: 0046 RDI: 81007e347280
> RBP: 00a8 R08: 0005 R09: 8060bb10
> R10: 000ae468 R11: 00050002 R12: 00a8
> R13: 81007e347280 R14: 81007e347280 R15: 0002
> FS:  41802950(0063) GS:81007e0c4728() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 5f83d00c CR3: 78149000 CR4: 06e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process ffsb (pid: 4041, threadinfo 8100774ba000, task 81007dbdc7a0)
> Stack:  000d 000e 81007e100100 81007e342398
>   81007e078488 80277069 8050 81007e347280
>   8050 0246 80299539 f000
> Call Trace:
>   [] cache_alloc_refill+0xc8/0x23f
>   [] alloc_buffer_head+0x14/0x45
>   [] kmem_cache_alloc+0x94/0xe9
>   [] alloc_buffer_head+0x14/0x45
>   [] alloc_page_buffers+0x38/0xd5
>   [] create_empty_buffers+0x14/0x9b
>   [] __block_prepare_write+0x7c/0x45b
>   [] ext4_get_block+0x0/0x139
>   [] block_prepare_write+0x1a/0x25
>   [] ext4_prepare_write+0xaf/0x175
>   [] generic_file_buffered_write+0x288/0x631
>   [] __generic_file_aio_write_nolock+0x33f/0x3a9
>   [] enqueue_entity+0x17c/0x1a3
>   [] generic_file_aio_write+0x61/0xc1
>   [] __check_preempt_curr_fair+0x56/0x76
>   [] ext4_file_write+0x16/0x91
>   [] do_sync_write+0xc9/0x10c
>   [] file_move+0x1d/0x4c
>   [] autoremove_wake_function+0x0/0x2e
>   [] do_filp_open+0x2a/0x38
>   [] poison_obj+0x26/0x30
>   [] vfs_write+0xad/0x136
>   [] sys_write+0x45/0x6e
>   [] system_call+0x7e/0x83
> 
> 
> Valérie


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel Oops in ext3 code

2007-09-28 Thread Badari Pulavarty

On Fri, 2007-09-28 at 06:54 +0200, Norbert Preining wrote:
> Hi Mingming,
> 
> On Do, 27 Sep 2007, Mingming Cao wrote:
> > Could you please sent the objdump of the ext4_discard_reservation
> > function? It doesn't match what I see here.
> 
> I assume you meant ext3_ I made
>   objdump -x -D -s super.o
> (the only place where I found this function in the source code). If you
> want something else, let me know, but a bit more specific. Can I do the
> objdump directly from the kernel image file?
> 

objdump -DlS balloc.o 

would give us ext3_discard_reservation()

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc8-mm1

2007-09-25 Thread Badari Pulavarty

Hi Andy,

One the patch you created in -mm is causing compile warning.
Here is the fix. Please verify.

Thanks,
Badari

arch/powerpc/mm/init_64.c: In function `vmemmap_populated':
arch/powerpc/mm/init_64.c:211: warning: passing arg 1 of 
`vmemmap_section_start' makes pointer from integer without a cast

vmemmap_section_start() gets called with an argument which is unsigned long.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>

Index: linux-2.6.23-rc8/arch/powerpc/mm/init_64.c
===
--- linux-2.6.23-rc8.orig/arch/powerpc/mm/init_64.c 2007-09-25 
09:18:13.0 -0700
+++ linux-2.6.23-rc8/arch/powerpc/mm/init_64.c  2007-09-25 14:50:44.0 
-0700
@@ -189,10 +189,9 @@ void pgtable_cache_init(void)
  * do this by hand as the proffered address may not be correctly aligned.
  * Subtraction of non-aligned pointers produces undefined results.
  */
-unsigned long __meminit vmemmap_section_start(struct page *page)
+unsigned long __meminit vmemmap_section_start(unsigned long page)
 {
-   unsigned long offset = ((unsigned long)page) -
-   ((unsigned long)(vmemmap));
+   unsigned long offset = page - ((unsigned long)(vmemmap));
 
/* Return the pfn of the start of the section. */
return (offset / sizeof(struct page)) & PAGE_SECTION_MASK;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc8-mm1 - powerpc memory hotplug link failure

2007-09-25 Thread Badari Pulavarty

On Wed, 2007-09-26 at 01:30 +0530, Kamalesh Babulal wrote:
> Hi Andrew,
> 
> The 2.6.23-rc8-mm1 kernel linking fails on the powerpc (P5+) box
> 
>   CC  init/version.o
>   LD  init/built-in.o
>   LD  .tmp_vmlinux1
> drivers/built-in.o: In function `memory_block_action':
> /root/scrap/linux-2.6.23-rc8/drivers/base/memory.c:188: undefined reference 
> to `.remove_memory'
> make: *** [.tmp_vmlinux1] Error 1
> 

I ran into the same thing earlier. Here is the fix I made.

Thanks,
Badari

Memory hotplug remove is currently supported only on IA64

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>

Index: linux-2.6.23-rc8/mm/Kconfig
===
--- linux-2.6.23-rc8.orig/mm/Kconfig2007-09-25 14:44:03.0 -0700
+++ linux-2.6.23-rc8/mm/Kconfig 2007-09-25 14:44:48.0 -0700
@@ -143,6 +143,7 @@ config MEMORY_HOTREMOVE
bool "Allow for memory hot remove"
depends on MEMORY_HOTPLUG
depends on MIGRATION
+   depends on (IA64)
 
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc8 (build failure from -rc1)

2007-09-25 Thread Badari Pulavarty

On Tue, 2007-09-25 at 10:02 -0700, Andrew Morton wrote:
> On Tue, 25 Sep 2007 09:57:01 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, 2007-09-24 at 17:46 -0700, Linus Torvalds wrote:
> > > Ok, I think I'm getting close to releasing a real 2.6.23. Things seem to 
> > > have calmed down, and I think Thomas Gleixner may have found the 
> > > suspend/resume regression that has dogged us for a while, so I'm feeling 
> > > happy about things.
> > > 
> > > Of course, me feeling happy is usually immediately followed by some nasty 
> > > person finding new problems, but I'll just ignore that and enjoy the 
> > > feeling anyway, however fleeting it may be.
> > 
> > I don't want to be the "nasty" person, but one of my machines doesn't
> > like 2.6.23-rc8. Infact, 2.6.23-rc1 was the first kernel this is broken.
> > (I didn't get hands on this machine till now).
> > 
> > Since my other x86-64 machines are doing fine, I am going to blame it
> > on my machine specific config :)
> > 
> > Thanks,
> > Badari
> > 
> > 
> > elm3a242:/usr/src/linux-2.6.23-rc8 # make -j4 bzImage
> >   CHK include/linux/version.h
> >   CHK include/linux/utsrelease.h
> >   CALLscripts/checksyscalls.sh
> >   CHK include/linux/compile.h
> >   SYSCALL arch/x86_64/vdso/vdso.so
> > /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-
> > linux/bin/ld: section .text [ff700500 -> ff700797]
> > overlaps section .dynstr [ff7004b8 -> ff700510]
> > collect2: ld returned 1 exit status
> > make[1]: *** [arch/x86_64/vdso/vdso.so] Error 1
> > make: *** [arch/x86_64/vdso] Error 2
> > make: *** Waiting for unfinished jobs
> > 
> 
> 
> 
> box:/usr/src/25> grep vdso series  
> x86_64-mm-vdso-text-offset.patch
> x86_64-mm-vdso-compat-install-unstripped-copies-on-disk.patch
> x86_64-mm-vdso-64bit-install-unstripped-copies-on-disk.patch
> fix-discrepancy-between-vdso-based-gettimeofday-and-sys_gettimeofday.patch
> 
> 
> x86_64-mm-vdso-text-offset.patch looks likely - can you test it please?
> 
> 
> 
> 
> Increase VDSO_TEXT_OFFSET for ancient binutils
> 
> For some reason old binutils genertate larger headers so
> increase the text offset of the vdso to avoid linker errors.
> 
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
> 
> ---
>  arch/x86_64/vdso/voffset.h |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux/arch/x86_64/vdso/voffset.h
> ===
> --- linux.orig/arch/x86_64/vdso/voffset.h
> +++ linux/arch/x86_64/vdso/voffset.h
> @@ -1 +1 @@
> -#define VDSO_TEXT_OFFSET 0x500
> +#define VDSO_TEXT_OFFSET 0x600
> 

Yep. Thats what I did earlier to fix my build.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc8 (build failure from -rc1)

2007-09-25 Thread Badari Pulavarty

On Mon, 2007-09-24 at 17:46 -0700, Linus Torvalds wrote:
> Ok, I think I'm getting close to releasing a real 2.6.23. Things seem to 
> have calmed down, and I think Thomas Gleixner may have found the 
> suspend/resume regression that has dogged us for a while, so I'm feeling 
> happy about things.
> 
> Of course, me feeling happy is usually immediately followed by some nasty 
> person finding new problems, but I'll just ignore that and enjoy the 
> feeling anyway, however fleeting it may be.

I don't want to be the "nasty" person, but one of my machines doesn't
like 2.6.23-rc8. Infact, 2.6.23-rc1 was the first kernel this is broken.
(I didn't get hands on this machine till now).

Since my other x86-64 machines are doing fine, I am going to blame it
on my machine specific config :)

Thanks,
Badari


elm3a242:/usr/src/linux-2.6.23-rc8 # make -j4 bzImage
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
  CHK include/linux/compile.h
  SYSCALL arch/x86_64/vdso/vdso.so
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-
linux/bin/ld: section .text [ff700500 -> ff700797]
overlaps section .dynstr [ff7004b8 -> ff700510]
collect2: ld returned 1 exit status
make[1]: *** [arch/x86_64/vdso/vdso.so] Error 1
make: *** [arch/x86_64/vdso] Error 2
make: *** Waiting for unfinished jobs


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6: hanging ext3 dbench tests

2007-09-24 Thread Badari Pulavarty

On Mon, 2007-09-24 at 13:04 -0700, Linus Torvalds wrote:
> 
> On Mon, 24 Sep 2007, Badari Pulavarty wrote:
> > 
> > Whats happening on my machine is ..
> > 
> > dbench forks of 4 children and sends them a signal to start the work.
> > 3 out of 4 children gets the signal and does the work. One of the child
> > never gets the signal so, it waits forever in pause(). So, parent waits
> > for a longtime to kill it.
> 
> Since this *seems* to have nothing to do with the filesystem, and since it 
> *seems* to have been introduced between -rc3 and -rc4, I did
> 
>   gitk v2.6.23-rc3..v2.6.23-rc4 -- kernel/
> 
> to see what has changed. One of the commits was signal-related, and that 
> one doesn't look like it could possibly matter.
> 
> The rest were scheduler-related, which doesn't surprise me. In fact, even 
> before I looked, my reaction to your bug report was "That sounds like an 
> application race condition".
> 
> Applications shouldn't use "pause()" for waiting for a signal. It's a 
> fundamentally racy interface - the signal could have happened just 
> *before* calling pause. So it's almost always a bug to use pause(), and 
> any users should be fixed to use "sigsuspend()" instead, which can 
> atomically (and correctly) pause for a signal while the process has masked 
> it outside of the system call.
> 
> Now, I took a look at the dbench sources, and I have to say that the race 
> looks *very* unlikely (there's quite a small window in which it does
> 
>   children[i].status = getpid();
>   ** race window here **
>   pause();
> 
> and it would require *just* the right timing so that the parent doesn't 
> end up doing the "sleep(1)" (which would make the window even less likely 
> to be hit), but there does seem to be a race condition there. And it 
> *could* be that you just happen to hit it on your hw setup.
> 
> So before you do anything else, does this patch (TOTALLY UNTESTED! DONE 
> ENTIRELY LOOKING AT THE SOURCE! IT MAY RAPE ALL YOUR PETS, AND CALL YOU 
> BAD NAMES!) make any difference?
> 
> (patch against unmodified dbench-2.0)
> 
>   Linus
> 
> ---
> diff --git a/dbench.c b/dbench.c
> index ccf5624..4be5712 100644
> --- a/dbench.c
> +++ b/dbench.c
> @@ -91,10 +91,15 @@ static double create_procs(int nprocs, void (*fn)(struct 
> child_struct * ))
>  
>   for (i=0;i   if (fork() == 0) {
> + sigset_t old, blocked;
> +
> + sigemptyset(&blocked);
> + sigaddset(&blocked, SIGCONT);
> + sigprocmask(SIG_BLOCK, &blocked, &old);
>   setbuffer(stdout, NULL, 0);
>   nb_setup(&children[i]);
>   children[i].status = getpid();
> - pause();
> + sigsuspend(&old);
>   fn(&children[i]);
>   _exit(0);
>   }

With the modified dbench, I couldn't reproduce the problem so far.
I will let it run through the night (just to be sure). 

For now, we can treat it as a tool/App issue :)

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6: hanging ext3 dbench tests

2007-09-24 Thread Badari Pulavarty

On Mon, 2007-09-24 at 13:04 -0700, Linus Torvalds wrote:
> 
> On Mon, 24 Sep 2007, Badari Pulavarty wrote:
> > 
> > Whats happening on my machine is ..
> > 
> > dbench forks of 4 children and sends them a signal to start the work.
> > 3 out of 4 children gets the signal and does the work. One of the child
> > never gets the signal so, it waits forever in pause(). So, parent waits
> > for a longtime to kill it.
> 
> Since this *seems* to have nothing to do with the filesystem, and since it 
> *seems* to have been introduced between -rc3 and -rc4, I did
> 
>   gitk v2.6.23-rc3..v2.6.23-rc4 -- kernel/

I was wrong. I managed to reproduce on 2.6.23-rc3, but it took a long
time. But I never reproduced it on 2.6.22. Ran test for a day.

> 
> to see what has changed. One of the commits was signal-related, and that 
> one doesn't look like it could possibly matter.
> 
> The rest were scheduler-related, which doesn't surprise me. In fact, even 
> before I looked, my reaction to your bug report was "That sounds like an 
> application race condition".
> 
> Applications shouldn't use "pause()" for waiting for a signal. It's a 
> fundamentally racy interface - the signal could have happened just 
> *before* calling pause. So it's almost always a bug to use pause(), and 
> any users should be fixed to use "sigsuspend()" instead, which can 
> atomically (and correctly) pause for a signal while the process has masked 
> it outside of the system call.
> 
> Now, I took a look at the dbench sources, and I have to say that the race 
> looks *very* unlikely (there's quite a small window in which it does
> 
>   children[i].status = getpid();
>   ** race window here **
>   pause();
> 
> and it would require *just* the right timing so that the parent doesn't 
> end up doing the "sleep(1)" (which would make the window even less likely 
> to be hit), but there does seem to be a race condition there. And it 
> *could* be that you just happen to hit it on your hw setup.
> 
> So before you do anything else, does this patch (TOTALLY UNTESTED! DONE 
> ENTIRELY LOOKING AT THE SOURCE! IT MAY RAPE ALL YOUR PETS, AND CALL YOU 
> BAD NAMES!) make any difference?
> 
> (patch against unmodified dbench-2.0)

I am testing the updated version of dbench now. Normally, it takes
30min-1hour to reproduce the problem (when I do infinite "dbench 4").
I will post the results soon.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6: hanging ext3 dbench tests

2007-09-24 Thread Badari Pulavarty

Hi Andy,

I managed to reproduce the dbench problem. (not sure if its the same
thing or not - but symptoms are same). My problem has nothing to do 
with ext3. I can produce it on ext2, jfs also.

Whats happening on my machine is ..

dbench forks of 4 children and sends them a signal to start the work.
3 out of 4 children gets the signal and does the work. One of the child
never gets the signal so, it waits forever in pause(). So, parent waits
for a longtime to kill it.

BTW, I was trying to find out when this problem started showing up.
So far, I managed to track it to 2.6.23-rc4. (2.6.23-rc3 doesn't seem
to have this problem). I am going to do bi-sect and find out which
patch caused this.

I am using dbench-2.0 which consistently reproduces the problem on
my x86-64 box. Did you find anything new with your setup ?

Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4: FLEX_BG Kernel support v2.

2007-09-21 Thread Badari Pulavarty

On Fri, 2007-09-21 at 09:06 -0500, Jose R. Santos wrote:
> From: Jose R. Santos <[EMAIL PROTECTED]>
> 
> ext4: FLEX_BG Kernel support v2.
> 

> @@ -702,13 +702,15 @@ static inline int ext4_valid_inum(struct super_block 
> *sb, unsigned long ino)
>  #define EXT4_FEATURE_INCOMPAT_META_BG0x0010
>  #define EXT4_FEATURE_INCOMPAT_EXTENTS0x0040 /* extents 
> support */
>  #define EXT4_FEATURE_INCOMPAT_64BIT  0x0080
> +#define EXT4_FEATURE_INCOMPAT_FLEX_BG0x0200

Any reason why 0x100 is skipped ?

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc6-mm1 panic (memory controller issue ?)

2007-09-18 Thread Badari Pulavarty

On Tue, 2007-09-18 at 15:21 -0700, Badari Pulavarty wrote:
> Hi Balbir,
> 
> I get following panic from SLUB, while doing simple fsx tests.
> I haven't used any container/memory controller stuff except 
> that I configured them in :(
> 
> Looks like slub doesn't like one of the flags passed in ?
> 
> Known issue ? Ideas ?
> 

I think, I found the issue. I am still running tests to
verify. Does this sound correct ?

Thanks,
Badari

Need to strip __GFP_HIGHMEM flag while passing to mem_container_cache_charge().

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
 mm/filemap.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc6/mm/filemap.c
===
--- linux-2.6.23-rc6.orig/mm/filemap.c  2007-09-18 12:43:54.0 -0700
+++ linux-2.6.23-rc6/mm/filemap.c   2007-09-18 19:14:44.0 -0700
@@ -441,7 +441,8 @@ int filemap_write_and_wait_range(struct 
 int add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t offset, gfp_t gfp_mask)
 {
-   int error = mem_container_cache_charge(page, current->mm, gfp_mask);
+   int error = mem_container_cache_charge(page, current->mm,
+   gfp_mask & ~__GFP_HIGHMEM);
if (error)
goto out;
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.23-rc6-mm1 panic (memory controller issue ?)

2007-09-18 Thread Badari Pulavarty

Hi Balbir,

I get following panic from SLUB, while doing simple fsx tests.
I haven't used any container/memory controller stuff except 
that I configured them in :(

Looks like slub doesn't like one of the flags passed in ?

Known issue ? Ideas ?

Thanks,
Badari

CONFIG_CONTAINERS=y
CONFIG_CONTAINER_DEBUG=y
CONFIG_CONTAINER_NS=y
CONFIG_CONTAINER_CPUACCT=y
CONFIG_CONTAINER_MEM_CONT=y
CONFIG_ACPI_CONTAINER=m


elm3b29 login: [ cut here ]
kernel BUG at mm/slub.c:1093!
invalid opcode:  [1] SMP
last sysfs file: /power/state
CPU 3
Modules linked in:
Pid: 3885, comm: fsx-linux Not tainted 2.6.23-rc6-mm1 #2
RIP: 0010:[]  [] new_slab
+0x238/0x260
RSP: 0018:81010140faf8  EFLAGS: 00010202
RAX: 0305 RBX:  RCX: 
RDX:  RSI: 001280d2 RDI: 806f3240
RBP: 81010140fb28 R08: 0040 R09: 
R10: 000160c9 R11: 0002 R12: 8101c00146c0
R13: 806f3240 R14: 806f3240 R15: 
FS:  7f7668f546d0() GS:8101c0729400()
knlGS:55749930
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f7668f6c000 CR3: 0001821c1000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process fsx-linux (pid: 3885, threadinfo 81010140e000, task
81010158aca0)
last branch before last exception/interrupt
 from  [] new_slab+0x1c/0x260
 to  [] new_slab+0x238/0x260
Stack:  8101c0729290  8101c00146c0
8101df8618c0
 806f3240  81010140fb78 8029a620
 001200d2 8029eb10 001280d20002 
Call Trace:
 [] __slab_alloc+0x1a0/0x450
 [] mem_container_charge+0x90/0x2a0
 [] kmem_cache_alloc+0x7b/0xa0
 [] mem_container_charge+0x90/0x2a0
 [] __alloc_pages+0x6e/0x360
 [] mem_container_cache_charge+0x2b/0x40
 [] add_to_page_cache+0x3e/0x120
 [] add_to_page_cache_lru+0x19/0x40
 [] find_or_create_page+0x5c/0xa0
 [] ext3_truncate+0x342/0x990
 [] mem_container_charge+0x42/0x2a0
 [] unlock_page+0x2d/0x40
 [] __do_fault+0x10f/0x3f0
 [] __dec_zone_page_state+0x25/0x30
 [] page_remove_rmap+0x46/0x140
 [] vmtruncate+0xb0/0x110
 [] inode_setattr+0x30/0x180
 [] ext3_setattr+0x12c/0x240
 [] notify_change+0x380/0x3e0
 [] do_truncate+0x63/0x90
 [] generic_file_llseek+0x61/0xc0
 [] sys_ftruncate+0xd6/0x120
 [] system_call+0x7e/0x83


Code: 0f 0b eb fe 66 66 66 90 41 8b 4d 14 ba 00 10 00 00 be 5a 00
RIP  [] new_slab+0x238/0x260
 RSP 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Badari Pulavarty

On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
> On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
> > jbd/jbd2: Replace slab allocations with page cache allocations
> > 
> > From: Christoph Lameter <[EMAIL PROTECTED]>
> > 
> > JBD should not pass slab pages down to the block layer.
> > Use page allocator pages instead. This will also prepare
> > JBD for the large blocksize patchset.
> > 
> 
> Currently memory allocation for committed_data(and frozen_buffer) for
> bufferhead is done through jbd slab management, as Christoph Hellwig
> pointed out that this is broken as jbd should not pass slab pages down
> to IO layer. and suggested to use get_free_pages() directly.
> 
> The problem with this patch, as Andreas Dilger pointed today in ext4
> interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
> 1/3-1/2 page space. 
> 
> What was the originally intention to set up slabs for committed_data(and
> frozen_buffer) in JBD? Why not using kmalloc?
> 
> Mingming

Looks good. Small suggestion is to get rid of all kmalloc() usages and
consistently use jbd_kmalloc() or jbd2_kmalloc().

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-31 Thread Badari Pulavarty

On Tue, 2007-07-31 at 15:34 -0700, Andrew Morton wrote:
> On Tue, 31 Jul 2007 15:25:11 -0700
> Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > +   dio->map_bh.b_state = 0;
> 
> ho hum, thanks.
> 
> We zero out so many fields in there now that a kzalloc() might yield
> a net gain.  0.01% in an unnamed benchmark!

Yep. I think its worth doing a kzalloc(). I wanted to understand the
*actual*  problem to make sure we are addressing it, rather than 
kzalloc() made it disappear.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-31 Thread Badari Pulavarty

On Tue, 2007-07-31 at 12:35 +0800, Joe Jin wrote:
> > Hmm.. in this config file, whats causing DIO to panic ? Which test actually
> > passing faulty buffer ?
> > 
> 
> By my testing, just defined job3 and job10 will also get the panic, but if
> only have one of them, panic will not appear. the faulty buffer maybe passed
> by mmap.

Okay. Here is the fix for the problem.

Here is whats happening in this case:

Faulty user-buffer caused -EFAULT to be returned from do_direct_IO().
We go into dio_zero_block() to see if we need to zero out sections of
the block (if IO size < blocksize case). It checks if the buffer is
newly allocation by doing buffer_new(bh). map_bh is NOT initialized and
never went through get_block() code. So, its possible to pass the check
and end up submitting a page wrongly and causing the oops.

Fix is to initialize the buffer state. 

Thanks,
Badari

Need to initialize map_bh.b_state to zero. Otherwise, in case of
a faulty user-buffer its possible to go into dio_zero_block()
and submit a page by mistake - since it checks for buffer_new().

http://marc.info/?l=linux-kernel&m=118551339032528&w=2

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 fs/direct-io.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.23-rc1/fs/direct-io.c
===
--- linux-2.6.23-rc1.orig/fs/direct-io.c2007-07-22 13:41:00.0 
-0700
+++ linux-2.6.23-rc1/fs/direct-io.c 2007-07-31 15:13:44.0 -0700
@@ -974,6 +974,7 @@ direct_io_worker(int rw, struct kiocb *i
dio->get_block = get_block;
dio->end_io = end_io;
dio->map_bh.b_private = NULL;
+   dio->map_bh.b_state = 0;
dio->final_block_in_bio = -1;
dio->next_block_for_io = -1;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-30 Thread Badari Pulavarty

On Mon, 2007-07-30 at 16:38 -0700, Zach Brown wrote:
> On Jul 30, 2007, at 2:58 PM, Badari Pulavarty wrote:
> 
> > On Mon, 2007-07-30 at 14:45 -0700, Zach Brown wrote:
> >>> I am also taking a look at it right now.
> >>
> >> Are we having a race to write a little test app that reproduces the
> >> problem? :)
> >
> > Nope. Feel free to write the test case.
> 
> Well, I'm having a heck of a time getting this to fail.  It looks  
> possible, though.  Joe, were you guys able to narrow it down to a  
> reproducible test case?  Do you have any oops output messages from  
> the crashes?

Here is what I got earlier..

Thanks,
Badari


Hi all,
Add some backgrounds:

When doing fio test on kernel 2.6.22,  we got oops,
--
BUG: unable to handle kernel paging request at virtual address 23c070bf
 printing eip:
c04a07fd
*pdpt = 1ff88001
*pde = 
Oops:  [#1]
SMP
Modules linked in: netconsole autofs4 hidp nfs lockd nfs_acl rfcomm l2cap
bluetooth sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
/@ iscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath dm_mod 
video /
sbs button battery ac ipv6 parport_pc lp parport i2c_piix4 i2c_core 
cfi_probe
gen_probe floppy scb2_flash sg mtdcore chipreg tg3 e1000 serio_raw ide_cd
/@ cdrom aic7xxx scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd 
ohci_hcd /
uhci_hcd
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010293   (2.6.22 #2)
EIP is at bio_get_nr_vecs+0x0/0x30
eax: 23c07063   ebx: 0003   ecx:    edx: 
esi: de5cef74   edi: f54a9600   ebp:    esp: de5ceca8
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process fio (pid: 17820, ti=de5ce000 task=de6570e0 task.ti=de5ce000)
Stack: c04a1c9d   0009 f54a9600 de5cef74 
f54a9600
   c04a1f43  c04a2b46 c0460466 c2c5baa0 c0812500 c0462c0a
0001
   0001 df4b90d4 de5ceee4 0011 0001 0009 0009

Call Trace:
 [] dio_new_bio+0x82/0xfe
 [] dio_send_cur_page+0x4a/0x92
 [] __blockdev_direct_IO+0xa09/0xc83
 [] __pagevec_free+0x14/0x1a
 [] release_pages+0x137/0x13f
 [] journal_start+0xaf/0xdd [jbd]
 [] ext3_direct_IO+0xfd/0x190 [ext3]
 [] ext3_get_block+0x0/0xd0 [ext3]
 [] generic_file_direct_IO+0xe5/0x116
 [] generic_file_direct_write+0x5c/0x137
 [] __generic_file_aio_write_nolock+0x37b/0x4df
 [] generic_file_aio_write+0x55/0xb3
 [] ext3_file_write+0x24/0x8f [ext3]
 [] do_sync_write+0xc7/0x10a
 [] check_kill_permission+0xec/0xf5
 [] autoremove_wake_function+0x0/0x35
 [] do_sync_write+0x0/0x10a
 [] vfs_write+0xa8/0x154
/@  [] sys_pwrite64+0x48/0x5f/
 [] syscall_call+0x7/0xb
 [] xfrm_replay_timer_handler+0x3e/0x44
 ===
Code: 89 c5 c7 44 24 14 f4 ff ff ff 74 d2 e9 b3 fe ff ff 83 7c 24 34 00 
0f 84
0b ff ff ff e9 51 ff ff ff 83 c4 20 89 e8 5b 5e 5f 5d c3 <8b> 40 5c 8b 
48 38
8b 81 20 01 00 00 0f b7 91 2a 01 00 00 0f b7
EIP: [] bio_get_nr_vecs+0x0/0x30 SS:ESP 0068:de5ceca8

---

jobfile is
---
/@ [global]/
/@ bs=8k/
/@ iodepth=1024/
/@ iodepth_batch=60/
/@ randrepeat=1/
/@ size=1m/
/@ directory=/home/oracle/
/@ numjobs=20/
/@ [job1]/
/@ ioengine=sync/
/@ bs=1k/
/@ direct=1/
/@ rw=randread/
/@ filename=file1:file2/
/@ [job2]/
/@ ioengine=libaio/
/@ rw=randwrite/
/@ direct=1/
/@ filename=file1:file2/
/@ [job3]/
/@ bs=1k/
/@ ioengine=posixaio/
/@ rw=randwrite/
/@ direct=1/
/@ filename=file1:file2/
/@ [job4]/
/@ ioengine=splice/
/@ direct=1/
/@ rw=randwrite/
/@ filename=file1:file2/
/@ [job5]/
/@ bs=1k/
/@ ioengine=sync/
/@ rw=randread/
/@ filename=file1:file2/
/@ [job7]/
/@ ioengine=libaio/
/@ rw=randwrite/
/@ filename=file1:file2/
/@ [job8]/
/@ ioengine=posixaio/
/@ rw=randwrite/
/@ filename=file1:file2/
/@ [job9]/
/@ ioengine=splice/
/@ rw=randwrite/
/@ filename=file1:file2/
/@ [job10]/
/@ ioengine=mmap/
/@ rw=randwrite/
/@ bs=1k/
/@ filename=file1:file2/
/@ [job11]/
/@ ioengine=mmap/
/@ rw=randwrite/
/@ direct=1/
/@ filename=file1:file2/
---
ignore the @ please.


With Joe's patch, seems the oops solved.
So, please give a review to see if there is any problem for that patch.

thanks,
wengang.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-30 Thread Badari Pulavarty

On Mon, 2007-07-30 at 16:38 -0700, Zach Brown wrote:
> On Jul 30, 2007, at 2:58 PM, Badari Pulavarty wrote:
> 
> > On Mon, 2007-07-30 at 14:45 -0700, Zach Brown wrote:
> >>> I am also taking a look at it right now.
> >>
> >> Are we having a race to write a little test app that reproduces the
> >> problem? :)
> >
> > Nope. Feel free to write the test case.
> 
> Well, I'm having a heck of a time getting this to fail.  It looks  
> possible, though.  Joe, were you guys able to narrow it down to a  
> reproducible test case?  Do you have any oops output messages from  
> the crashes?
> 
> It looks like it takes a very particular set of circumstances to  
> actually crash after relying on an uninitialized map_bh.  (see the  
> blkfactor, buffer_new(), and this_chunk_blocks tests in dio_zero_block 
> ()).


Looking at the crash

CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010293   (2.6.22 #2)
EIP is at bio_get_nr_vecs+0x0/0x30
eax: 23c07063   ebx: 0003   ecx:    edx: 
esi: de5cef74   edi: f54a9600   ebp:    esp: de5ceca8
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process fio (pid: 17820, ti=de5ce000 task=de6570e0 task.ti=de5ce000)
Stack: c04a1c9d   0009 f54a9600 de5cef74 
f54a9600
   c04a1f43  c04a2b46 c0460466 c2c5baa0 c0812500 c0462c0a
0001
   0001 df4b90d4 de5ceee4 0011 0001 0009 0009

Call Trace:
 [] dio_new_bio+0x82/0xfe
 [] dio_send_cur_page+0x4a/0x92
 [] __blockdev_direct_IO+0xa09/0xc83
 [] __pagevec_free+0x14/0x1a
 [] release_pages+0x137/0x13f
 [] journal_start+0xaf/0xdd [jbd]
 [] ext3_direct_IO+0xfd/0x190 [ext3]
..

I am not sure, I really understand whats happening here. It looks
like dio->map_bh.b_dev is junk. But looking at the code, we shouldn't
be in dio_send_cur_page() unless if we have a valid dio->cur_page.
(which means, that we added a page to submit, which also means
previous getblock() succeded, which means that we should have a
valid map_bh). What am I missing here ?


if (dio->cur_page) {
ret2 = dio_send_cur_page(dio);
if (ret == 0)
ret = ret2;
page_cache_release(dio->cur_page);
dio->cur_page = NULL;
}

> 
> > I am just looking at the code
> > to see what needs to be done.
> 
> It looks like the unconditional dio_cleanup() and dio_zero_block()  
> calls outside the nseg loop are relying on state which might not have  
> been built up.  _zero_block() tests map_bh's flags without them being  
> set.  _cleanup could, in some crazy world, get confused if we managed  
> to get here with a 0 nr_segs because dio->head and ->tail wouldn't be  
> initialized.
> 
> So we could initialize some more fields at the start of  
> direct_io_worker for the benefit of these cleanup calls.  Or we could  
> conditionally call them based on some other indicator of progress.   
> Neither really thrills me.
> 
> And I don't have a test case to verify changes with.  Meh.
> 
> How do you feel about initializing the dio with kzalloc() and only  
> initializing the fields that we rely on being non-zero, and  
> commenting the hell out of it?

Yeah. kzalloc() may be right way to go.. But I would like to understand
what exactly is happening here.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-30 Thread Badari Pulavarty

On Mon, 2007-07-30 at 14:45 -0700, Zach Brown wrote:
> > I am also taking a look at it right now.
> 
> Are we having a race to write a little test app that reproduces the  
> problem? :)

Nope. Feel free to write the test case. I am just looking at the code
to see what needs to be done.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] add check do_direct_IO() return val

2007-07-30 Thread Badari Pulavarty

On Mon, 2007-07-30 at 13:53 -0700, Andrew Morton wrote:
> On Sat, 28 Jul 2007 11:47:19 +0800
> Joe Jin <[EMAIL PROTECTED]> wrote:
> 
> > > I tested Andrew's patch and panic was gone but got few ENOTBLK.
> > > So I tried with Joe's patch , both panic and ENOTBLK are gone now.
> > > But in Joe's patch if (ret == -ENOTBLK && (rw & WRITE)), dio_cleanup(dio)
> > > was not getting called because of break. So I moved dio_cleanup just 
> > > after if (ret).
> > 
> > Guru, actually, break from the loop with ENOTBLK will call dio_cleanup
> > at leater, if call it too early, that means will put_page(), maybe cause
> > other panic.
> > 
> 
> fyi, I dropped the earlier patch and now we have nothing.  Please let's get 
> all
> this sorted out in time for 2.6.23.  Which is still many weeks away so there 
> is
> plenty of time to prepare something which was carefully reviewed and 
> well-tested,
> thanks.

I am also taking a look at it right now. Unfortunately, I don't think
fix is that simple - since we need to return success, in case of a
partial write.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Badari Pulavarty

On Fri, 2007-07-20 at 14:29 +1000, Nick Piggin wrote:
> Andrew Morton wrote:
> > On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> 
> > wrote:
> > 
> > 
> >>>>+ }
> >>>>+
> >>>>+ offset += ret;
> >>>>+ retval += ret;
> >>>>+ len -= ret;
> >>>>+ index += offset >> HPAGE_SHIFT;
> >>>>+ offset &= ~HPAGE_MASK;
> >>>>+
> >>>>+ page_cache_release(page);
> >>>>+ if (ret == nr && len)
> >>>>+ continue;
> >>>>+ goto out;
> >>>>+ }
> >>>>+out:
> >>>>+ return retval;
> >>>>+}
> >>>
> >>>This code doesn't have all the ghastly tricks which we deploy to handle
> >>>concurrent truncate.
> >>
> >>Do I need to ? Baaahh!!  I don't want to deal with them. 
> > 
> > 
> > Nick, can you think of any serious consequences of a read/truncate race in
> > there?  I can't..
> 
> As it doesn't allow writes, then I _think_ it should be OK. If you
> ever did want to add write(2) support, then you would have transient
> zeroes problems.

I have no plans to add write() support - unless there is real reason
for doing so.

> 
> But why not just hold i_mutex around the whole thing just to be safe?

Yeah. I can do that, just to be safe for future..

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] hugetlbfs read() support

2007-07-19 Thread Badari Pulavarty

On Wed, 2007-07-18 at 22:19 -0700, Andrew Morton wrote:
> On Fri, 13 Jul 2007 18:23:33 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > Hi Andrew,
> > 
> > Here is the patch to support read() for hugetlbfs, needed to get
> > oprofile working on executables backed by largepages. 
> > 
> > If you plan to consider Christoph Lameter's pagecache cleanup patches,
> > I will re-write this. Otherwise, please consider this for -mm.
> > 
> > Thanks,
> > Badari
> > 
> > Support for reading from hugetlbfs files. libhugetlbfs lets application
> > text/data to be placed in large pages. When we do that, oprofile doesn't
> > work - since libbfd tries to read from it.
> > 
> > This code is very similar to what do_generic_mapping_read() does, but
> > I can't use it since it has PAGE_CACHE_SIZE assumptions.
> > 
> > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> > Acked-by: William Irwin <[EMAIL PROTECTED]>
> > Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
> > 
> >  fs/hugetlbfs/inode.c |  113 
> > +++
> >  1 file changed, 113 insertions(+)
> > 
> > Index: linux-2.6.22/fs/hugetlbfs/inode.c
> > ===
> > --- linux-2.6.22.orig/fs/hugetlbfs/inode.c  2007-07-08 16:32:17.0 
> > -0700
> > +++ linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-13 19:24:36.0 
> > -0700
> > @@ -156,6 +156,118 @@ full_search:
> >  }
> >  #endif
> >  
> > +static int
> > +hugetlbfs_read_actor(struct page *page, unsigned long offset,
> > +   char __user *buf, unsigned long count,
> > +   unsigned long size)
> > +{
> > +   char *kaddr;
> > +   unsigned long left, copied = 0;
> > +   int i, chunksize;
> > +
> > +   if (size > count)
> > +   size = count;
> > +
> > +   /* Find which 4k chunk and offset with in that chunk */
> > +   i = offset >> PAGE_CACHE_SHIFT;
> > +   offset = offset & ~PAGE_CACHE_MASK;
> > +
> > +   while (size) {
> > +   chunksize = PAGE_CACHE_SIZE;
> > +   if (offset)
> > +   chunksize -= offset;
> > +   if (chunksize > size)
> > +   chunksize = size;
> > +   kaddr = kmap(&page[i]);
> > +   left = __copy_to_user(buf, kaddr + offset, chunksize);
> > +   kunmap(&page[i]);
> > +   if (left) {
> > +   copied += (chunksize - left);
> > +   break;
> > +   }
> > +   offset = 0;
> > +   size -= chunksize;
> > +   buf += chunksize;
> > +   copied += chunksize;
> > +   i++;
> > +   }
> > +   return copied ? copied : -EFAULT;
> > +}
> 
> This returns -EFAULT when asked to read zero bytes.  The caller prevents
> that, but it's a little bit ugly.  Livable with.

I can fix that, but I didn't want to come here if length == 0 - so
took a shortcut.

> 
> > +/*
> > + * Support for read() - Find the page attached to f_mapping and copy out 
> > the
> > + * data. Its *very* similar to do_generic_mapping_read(), we can't use that
> > + * since it has PAGE_CACHE_SIZE assumptions.
> > + */
> > +ssize_t
> > +hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t 
> > *ppos)
> > +{
> > +   struct address_space *mapping = filp->f_mapping;
> > +   struct inode *inode = mapping->host;
> > +   unsigned long index = *ppos >> HPAGE_SHIFT;
> > +   unsigned long end_index;
> > +   loff_t isize;
> > +   unsigned long offset;
> > +   ssize_t retval = 0;
> > +
> > +   /* validate length */
> > +   if (len == 0)
> > +   goto out;
> > +
> > +   isize = i_size_read(inode);
> > +   if (!isize)
> > +   goto out;
> > +
> > +   offset = *ppos & ~HPAGE_MASK;
> > +   end_index = (isize - 1) >> HPAGE_SHIFT;
> > +   for (;;) {
> > +   struct page *page;
> > +   int nr, ret;
> > +
> > +   /* nr is the maximum number of bytes to copy from this page */
> > +   nr = HPAGE_SIZE;
> > +   if (index >= end_index) {
> > +   if (index > end_index)
> > +   goto out;
> > +   nr = ((isize - 1) & ~HPAGE_MASK) + 1;
> > +

Re: [PATCH] ext2 statfs improvement for block and inode free count

2007-07-19 Thread Badari Pulavarty

On Wed, 2007-07-18 at 20:18 -0700, Andrew Morton wrote:
> On Fri, 13 Jul 2007 18:36:54 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > More statfs() improvements for ext2. ext2 already maintains
> > percpu counters for free blocks and inodes. Derive free
> > block count and inode count by summing up percpu counters,
> > instead of counting up all the groups in the filesystem
> > each time.
> > 
> 
> hm, another speedup patch with no measurements which demonstrate its
> benefit.

In my setups (4 & 8-way), I didn't measure any significant performance
improvements (in any reasonable workload). I see some decent
improvements on cooked-up (1 million stats) tests :(
> 
> > 
> > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> > Acked-by: Andreas Dilger <[EMAIL PROTECTED]>
> > 
> >  fs/ext2/super.c |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > Index: linux-2.6.22/fs/ext2/super.c
> > ===
> > --- linux-2.6.22.orig/fs/ext2/super.c   2007-07-13 20:06:38.0 
> > -0700
> > +++ linux-2.6.22/fs/ext2/super.c2007-07-13 20:06:51.0 -0700
> > @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * 
> > buf->f_type = EXT2_SUPER_MAGIC;
> > buf->f_bsize = sb->s_blocksize;
> > buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead;
> > -   buf->f_bfree = ext2_count_free_blocks(sb);
> > +   buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter);
> > buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count);
> > if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count))
> > buf->f_bavail = 0;
> > buf->f_files = le32_to_cpu(es->s_inodes_count);
> > -   buf->f_ffree = ext2_count_free_inodes(sb);
> > +   buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter);
> > buf->f_namelen = EXT2_NAME_LEN;
> > fsid = le64_to_cpup((void *)es->s_uuid) ^
> >le64_to_cpup((void *)es->s_uuid + sizeof(u64));
> > 
> 
> Well there's a tradeoff here.  At large CPU counts, percpu_counter_sum()
> becomes quite expensive - it takes a global lock and then goes off fishing
> in every CPU's percpu_alloced memory.
> 
> So there is some value of (num_online_cpus / sb->s_groups_count) at which
> this change becomes a loss.  Where does that value lie?

Yes. I debated long time whether I should submit this or not - due to
very reason. Old code wasn't holding any locks. I don't have any high
count CPU machine (>8way) with me. I will request for time on one.

> 
> Bear in mind that the global lock in percpu_counter_sum() will tilt the
> scales quite a bit.

Noticed that too. I added WARN_ON() to see if percpu sum doesn't match
computed sum. I saw few stacks in a 24 hour run of fsx runs. 

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ext2 statfs improvement for block and inode free count

2007-07-13 Thread Badari Pulavarty

Andrew,

Can you include it in -mm ? 

BTW, this patch is against mainline, won't apply cleanly to -mm, due to
other statfs() improvements.

Thanks,
Badari

More statfs() improvements for ext2. ext2 already maintains
percpu counters for free blocks and inodes. Derive free
block count and inode count by summing up percpu counters,
instead of counting up all the groups in the filesystem
each time.


Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
Acked-by: Andreas Dilger <[EMAIL PROTECTED]>

 fs/ext2/super.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.22/fs/ext2/super.c
===
--- linux-2.6.22.orig/fs/ext2/super.c   2007-07-13 20:06:38.0 -0700
+++ linux-2.6.22/fs/ext2/super.c2007-07-13 20:06:51.0 -0700
@@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * 
buf->f_type = EXT2_SUPER_MAGIC;
buf->f_bsize = sb->s_blocksize;
buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead;
-   buf->f_bfree = ext2_count_free_blocks(sb);
+   buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter);
buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count);
if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count))
buf->f_bavail = 0;
buf->f_files = le32_to_cpu(es->s_inodes_count);
-   buf->f_ffree = ext2_count_free_inodes(sb);
+   buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter);
buf->f_namelen = EXT2_NAME_LEN;
fsid = le64_to_cpup((void *)es->s_uuid) ^
   le64_to_cpup((void *)es->s_uuid + sizeof(u64));


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] hugetlbfs read() support

2007-07-13 Thread Badari Pulavarty

Hi Andrew,

Here is the patch to support read() for hugetlbfs, needed to get
oprofile working on executables backed by largepages. 

If you plan to consider Christoph Lameter's pagecache cleanup patches,
I will re-write this. Otherwise, please consider this for -mm.

Thanks,
Badari

Support for reading from hugetlbfs files. libhugetlbfs lets application
text/data to be placed in large pages. When we do that, oprofile doesn't
work - since libbfd tries to read from it.

This code is very similar to what do_generic_mapping_read() does, but
I can't use it since it has PAGE_CACHE_SIZE assumptions.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
Acked-by: William Irwin <[EMAIL PROTECTED]>
Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

 fs/hugetlbfs/inode.c |  113 +++
 1 file changed, 113 insertions(+)

Index: linux-2.6.22/fs/hugetlbfs/inode.c
===
--- linux-2.6.22.orig/fs/hugetlbfs/inode.c  2007-07-08 16:32:17.0 
-0700
+++ linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-13 19:24:36.0 -0700
@@ -156,6 +156,118 @@ full_search:
 }
 #endif
 
+static int
+hugetlbfs_read_actor(struct page *page, unsigned long offset,
+   char __user *buf, unsigned long count,
+   unsigned long size)
+{
+   char *kaddr;
+   unsigned long left, copied = 0;
+   int i, chunksize;
+
+   if (size > count)
+   size = count;
+
+   /* Find which 4k chunk and offset with in that chunk */
+   i = offset >> PAGE_CACHE_SHIFT;
+   offset = offset & ~PAGE_CACHE_MASK;
+
+   while (size) {
+   chunksize = PAGE_CACHE_SIZE;
+   if (offset)
+   chunksize -= offset;
+   if (chunksize > size)
+   chunksize = size;
+   kaddr = kmap(&page[i]);
+   left = __copy_to_user(buf, kaddr + offset, chunksize);
+   kunmap(&page[i]);
+   if (left) {
+   copied += (chunksize - left);
+   break;
+   }
+   offset = 0;
+   size -= chunksize;
+   buf += chunksize;
+   copied += chunksize;
+   i++;
+   }
+   return copied ? copied : -EFAULT;
+}
+
+/*
+ * Support for read() - Find the page attached to f_mapping and copy out the
+ * data. Its *very* similar to do_generic_mapping_read(), we can't use that
+ * since it has PAGE_CACHE_SIZE assumptions.
+ */
+ssize_t
+hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
+{
+   struct address_space *mapping = filp->f_mapping;
+   struct inode *inode = mapping->host;
+   unsigned long index = *ppos >> HPAGE_SHIFT;
+   unsigned long end_index;
+   loff_t isize;
+   unsigned long offset;
+   ssize_t retval = 0;
+
+   /* validate length */
+   if (len == 0)
+   goto out;
+
+   isize = i_size_read(inode);
+   if (!isize)
+   goto out;
+
+   offset = *ppos & ~HPAGE_MASK;
+   end_index = (isize - 1) >> HPAGE_SHIFT;
+   for (;;) {
+   struct page *page;
+   int nr, ret;
+
+   /* nr is the maximum number of bytes to copy from this page */
+   nr = HPAGE_SIZE;
+   if (index >= end_index) {
+   if (index > end_index)
+   goto out;
+   nr = ((isize - 1) & ~HPAGE_MASK) + 1;
+   if (nr <= offset) {
+   goto out;
+   }
+   }
+   nr = nr - offset;
+
+   /* Find the page */
+   page = find_get_page(mapping, index);
+   if (unlikely(page == NULL)) {
+   /*
+* We can't find the page in the cache - bail out ?
+*/
+   goto out;
+   }
+   /*
+* Ok, we have the page, copy it to user space buffer.
+*/
+   ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
+   if (ret < 0) {
+   retval = retval ? : ret;
+   goto out;
+   }
+
+   offset += ret;
+   retval += ret;
+   len -= ret;
+   index += offset >> HPAGE_SHIFT;
+   offset &= ~HPAGE_MASK;
+
+   page_cache_release(page);
+   if (ret == nr && len)
+   continue;
+   goto out;
+   }
+out:
+   return retval;
+}
+
 /*
  * Read a page. Again trivial. If it didn't already exist
  * in the page cache, it is zero-filled.
@@ -560,6 +

Re: [PATCH 0/16] Pid namespaces

2007-07-10 Thread Badari Pulavarty

On Tue, 2007-07-10 at 15:30 +0400, Pavel Emelianov wrote:
> Cedric Le Goater wrote:
> > Badari Pulavarty wrote:
> >> On Fri, 2007-07-06 at 12:01 +0400, Pavel Emelianov wrote:
> >>> This is "submition for inclusion" of hierarchical, not kconfig
> >>> configurable, zero overheaded ;) pid namespaces.
> >> Not able to boot my ppc64 machine with the patchset :(
> > 
> > I can't boot either on a x86_64 but I don't even have logs to send :(
> 
> Neither can I. And I cannot boot clean 2.6.22-rc6-mm1 either :(
> Does someone already know what the reason is?

2.6.22-rc6-mm1 boots fine on my x86-64 machine.

your patches with the fix (in fs/proc/root.c), I am able to
boot my x86-64 machine also.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/16] Pid namespaces

2007-07-10 Thread Badari Pulavarty

On Tue, 2007-07-10 at 17:06 +0400, Pavel Emelianov wrote:
> > Not able to boot my ppc64 machine with the patchset :(
> > 
> > Thanks,
> > Badari
> 
> That's the hunk lost during the split:
> 
> --- ./fs/proc/root.c.procfix  2007-07-10 13:52:08.0 +0400
> +++ ./fs/proc/root.c  2007-07-10 15:23:20.0 +0400
> @@ -111,7 +111,7 @@ void __init proc_root_init(void)
>   err = register_filesystem(&proc_fs_type);
>   if (err)
>   return;
> - proc_mnt = kern_mount(&proc_fs_type);
> + proc_mnt = kern_mount_data(&proc_fs_type, &init_pid_ns);
>   err = PTR_ERR(proc_mnt);
>   if (IS_ERR(proc_mnt)) {
>   unregister_filesystem(&proc_fs_type);
> 
> 
> With this machine should boot fine.

Yes. My ppc64 box booted fine with this patch.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/16] Pid namespaces

2007-07-09 Thread Badari Pulavarty

On Mon, 2007-07-09 at 22:06 +0200, Cedric Le Goater wrote:
> Badari Pulavarty wrote:
> > On Fri, 2007-07-06 at 12:01 +0400, Pavel Emelianov wrote:
> >> This is "submition for inclusion" of hierarchical, not kconfig
> >> configurable, zero overheaded ;) pid namespaces.
> > 
> > Not able to boot my ppc64 machine with the patchset :(
> 
> I can't boot either on a x86_64 but I don't even have logs to send :(

Yes. It blew up way early in the boot on my x86_64, so nothing came
up on the console to capture (blank screen) :(

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/16] Pid namespaces

2007-07-09 Thread Badari Pulavarty

On Fri, 2007-07-06 at 12:01 +0400, Pavel Emelianov wrote:
> This is "submition for inclusion" of hierarchical, not kconfig
> configurable, zero overheaded ;) pid namespaces.

Not able to boot my ppc64 machine with the patchset :(

Thanks,
Badari

Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xc0247ce0
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c0247ce0 LR: c0107bf4 CTR: c0107bd0
REGS: c05fb920 TRAP: 0300   Not tainted  (2.6.22-rc6-mm1)
MSR: 80009032   CR: 2448  XER: 2005
DAR: , DSISR: 4000
TASK = c0514650[0] 'swapper' THREAD: c05f8000 CPU: 0
GPR00: c00bd42c c05fbba0 c05f8f18 
GPR04:  c0645190 cd025000 cd024cb8
GPR08: cd024d10  cd024cf0 
GPR12: 247f c0514d80  c044e438
GPR16: 41c0 c044ce50  
GPR20: c04f9fd0 020f9fd0  c05bc370
GPR24: c05bc2f8 c0539738 0200 
GPR28: cd024c00  c054a358 cd024c00
NIP [c0247ce0] .kref_get+0x0/0x28
LR [c0107bf4] .proc_set_super+0x24/0x54
Call Trace:
[c05fbba0] [c05fbc30] 0xc05fbc30 (unreliable)
[c05fbc30] [c00bd42c] .sget+0x34c/0x470
[c05fbd00] [c0107da4] .proc_get_sb+0xa0/0x18c
[c05fbdb0] [c00bdc84] .vfs_kern_mount+0x80/0xe8
[c05fbe50] [c04ea1e4] .proc_root_init+0x4c/0x158
[c05fbed0] [c04cb9ec] .start_kernel+0x3c8/0x404
[c05fbf90] [c0008524] .start_here_common+0x54/0x130
Instruction dump:
6000 e8440008 7c0903a6 4e800421 e8410028 3801 38210080 7c030378
e8010010 ebc1fff0 7c0803a6 4e800020 <8003> 7c34 5400d97e 0b00
Kernel panic - not syncing: Attempted to kill the idle task!




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: CONFIG_PAGE_SHIFT (aka software PAGE_SIZE)

2007-07-06 Thread Badari Pulavarty

On Sat, 2007-07-07 at 00:26 +0200, Andrea Arcangeli wrote:
..
> The following simple bench seems to run fine on one real hardware and
> on kvm (a friend of mine failed so far to run it on his hardware
> though, so perhaps some driver triggers some remaining bugs) when
> booted as init=/tmp/bench-static after “cp -a /dev/hda /tmp/”.

Hmm.. I didn't have any luck booting my machine with the patchset 
(with 8k pagesize) :(

It fails to find the partition table on my hard drive.

Thanks,
Badari

AMD8111: IDE controller at PCI slot :00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: :00:07.1 (rev 03) UDMA133 controller
ide0: BM-DMA at 0x1020-0x1027, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0x1028-0x102f, BIOS settings: hdc:DMA, hdd:pio
hda: IC35L080AVVA07-0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: TOSHIBA DVD-ROM SD-M1612, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 160836480 sectors (82348 MB) w/1863KiB Cache, CHS=65535/16/63, UDMA
(100)
hda: cache flushes supported
 hda: unknown partition table <<<
hdc: ATAPI 48X DVD-ROM drive, 512kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard as /class/input/input0
input: PC Speaker as /class/input/input1
input: PS/2 Generic Mouse as /class/input/input2
TCP cubic registered
NET: Registered protocol family 1
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
VFS: Cannot open root device "hda2" or unknown-block(3,2)
Please append a correct "root=" boot option; here are the available
partitions:
0300   80418240 hda driver: ide-disk
16004194302 hdc driver: ide-cdrom
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-
block(3,2)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: CONFIG_PAGE_SHIFT (aka software PAGE_SIZE)

2007-07-06 Thread Badari Pulavarty

On Sat, 2007-07-07 at 00:26 +0200, Andrea Arcangeli wrote:
> Hello,
> 
..
> 
> If you want to help/look here the patch:
> 
>   
> http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.22-rc7/hard-page-size
> 

Very interesting patch set. I really would like to support for it.
I would like to play with, please keep the patchset uptodate.

Here is the small nit fix ..

Thanks,
Badari

 mm/migrate.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.22-rc7/mm/migrate.c
===
--- linux-2.6.22-rc7.orig/mm/migrate.c  2007-07-01 12:54:24.0 -0700
+++ linux-2.6.22-rc7/mm/migrate.c   2007-07-06 19:58:43.0 -0700
@@ -169,7 +169,7 @@ static void remove_migration_pte(struct 
goto out;
 
get_page(new);
-   pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
+   pte = pte_mkold(mk_pte(new, addr, vma->vm_page_prot));
if (is_write_migration_entry(entry))
pte = pte_mkwrite(pte);
set_pte_at(mm, addr, ptep, pte);



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc6-mm1

2007-07-06 Thread Badari Pulavarty

On Thu, 2007-06-28 at 03:43 -0700, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/
> 

fs/xfs/linux-2.6/xfs_ioctl32.c: In function ‘xfs_ioc_bulkstat_compat’:
fs/xfs/linux-2.6/xfs_ioctl32.c:334: error: ‘xfs_inumbers_fmt_compat’
undeclared (first use in this
function)fs/xfs/linux-2.6/xfs_ioctl32.c:334: error: (Each undeclared
identifier is reported only once
fs/xfs/linux-2.6/xfs_ioctl32.c:334: error: for each function it appears
in.)
make[2]: *** [fs/xfs/linux-2.6/xfs_ioctl32.o] Error 1




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dio: remove bogus refcounting BUG_ON

2007-07-05 Thread Badari Pulavarty

On Thu, 2007-07-05 at 10:11 -0700, Zach Brown wrote:
> > the BUG_ON(). But unfortunately, our perf. team is able reproduce the
> > problem.
> 
> What are they doing to reproduce it?  How much setup does it take?

Huge OLTP run :(

> 
> > Debug indicated that, the ret2 == 1 :(
> 
> That could be consistent with the theory that we're racing with the  
> dio struct being freed and reused before it's tested in the BUG_ON()  
> condition.  Suparna's suggestion to sample dio->is_async before  
> releasing the refcount and using that in the BUG_ON condition is a  
> good one.

I will ask them to try that.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dio: remove bogus refcounting BUG_ON

2007-07-04 Thread Badari Pulavarty

On Tue, 2007-07-03 at 15:28 -0700, Zach Brown wrote:
> Linus, Andrew, please apply the bug fix patch at the end of this reply
> for .22.
> 
> > >>One of our perf. team ran into this while doing some runs.
> > >>I didn't see anything obvious - it looks like we converted
> > >>async IO to synchronous one. I didn't spend much time digging
> > >>around.
> 
> OK, I think this BUG_ON() is just broken.  I wasn't able to find any
> obvious bugs from reading the code which would cause the BUG_ON() to
> fire.  If it's reproducible I'd love to hear what the recipe is.
> 
> I did notice that this BUG_ON() is evaluating dio after having dropped
> it's ref :/.  So it's not completely absurd to fear that it's a race
> with the dio's memory being reused, but that'd be a pretty tight race.
> 
> Let's remove this stupid BUG_ON and see if that test box still has
> trouble.  It might just hit the valid BUG_ON a few lines down, but this
> unsafe BUG_ON needs to go.

I went through the code multiple times, I can't find how we can trigger
the BUG_ON(). But unfortunately, our perf. team is able reproduce the
problem. Debug indicated that, the ret2 == 1 :(

Not sure how that can happen. Ideas ?

Thanks,
Badari

> 
> ---
> 
> dio: remove bogus refcounting BUG_ON
> 
> Badari Pulavarty reported a case of this BUG_ON is triggering during
> testing.  It's completely bogus and should be removed.
> 
> It's trying to notice if we left references to the dio hanging around in
> the sync case.  They should have been dropped as IO completed while this
> path was in dio_await_completion().  This condition will also be
> checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few
> lines lower.  So to start this BUG_ON() is redundant.
> 
> More fatally, it's dereferencing dio-> after having dropped its
> reference.  It's only safe to dereference the dio after releasing the
> lock if the final reference was just dropped.  Another CPU might free
> the dio in bio completion and reuse the memory after this path drops the
> dio lock but before the BUG_ON() is evaluated.
> 
> This patch passed aio+dio regression unit tests and aio-stress on ext3.
> 
> Signed-off-by: Zach Brown <[EMAIL PROTECTED]>
> Cc: Badari Pulavarty <[EMAIL PROTECTED]>
> 
> diff -r 509ce354ae1b fs/direct-io.c
> --- a/fs/direct-io.c  Sun Jul 01 22:00:49 2007 +
> +++ b/fs/direct-io.c  Tue Jul 03 14:56:41 2007 -0700
> @@ -1106,7 +1106,7 @@ direct_io_worker(int rw, struct kiocb *i
>   spin_lock_irqsave(&dio->bio_lock, flags);
>   ret2 = --dio->refcount;
>   spin_unlock_irqrestore(&dio->bio_lock, flags);
> - BUG_ON(!dio->is_async && ret2 != 0);
> +
>   if (ret2 == 0) {
>   ret = dio_complete(dio, offset, ret);
>   kfree(dio);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support

2007-07-02 Thread Badari Pulavarty

On Thu, 2007-06-14 at 13:06 -0700, Andrew Morton wrote:
> On Thu, 14 Jun 2007 12:38:39 -0700
> [EMAIL PROTECTED] wrote:
> 
> > This patchset cleans up the page cache handling by replacing
> > open coded shifts and adds through inline function calls.
> 

Some of us (crazy) people are trying to support read for hugetlbfs
in order to get oprofile work on large-page-backed-executables by
libhugetlbfs.

Currently, I can't use any generic support. I have this ugly patch
to get oprofile work. Christoph's clean ups would allow me to set
per-mapping pagesize and get this to work, without any hacks.

Thanks,
Badari

 fs/hugetlbfs/inode.c |  117 +++
 1 file changed, 117 insertions(+)

Index: linux/fs/hugetlbfs/inode.c
===
--- linux.orig/fs/hugetlbfs/inode.c 2007-05-18 04:16:27.0 -0700
+++ linux/fs/hugetlbfs/inode.c  2007-06-22 10:46:09.0 -0700
@@ -160,6 +160,122 @@ full_search:
 #endif
 
 /*
+ * Support for read()
+ */
+static int
+hugetlbfs_read_actor(struct page *page, unsigned long offset,
+   char __user *buf, unsigned long count,
+   unsigned long size)
+{
+   char *kaddr;
+   unsigned long to_copy;
+   int i, chunksize;
+
+   if (size > count)
+   size = count;
+
+   /* Find which 4k chunk and offset with in that chunk */
+   i = offset >> PAGE_CACHE_SHIFT;
+   offset = offset & ~PAGE_CACHE_MASK;
+   to_copy = size;
+
+   while (to_copy) {
+   chunksize = PAGE_CACHE_SIZE;
+   if (offset)
+   chunksize -= offset;
+   if (chunksize > to_copy)
+   chunksize = to_copy;
+
+#if 0
+printk("Coping i=%d page: %p offset %d chunk %d\n", i, &page[i], offset, 
chunksize);
+#endif
+   kaddr = kmap(&page[i]);
+   memcpy(buf, kaddr + offset, chunksize);
+   kunmap(&page[i]);
+   offset = 0;
+   to_copy -= chunksize;
+   buf += chunksize;
+   i++;
+   }
+   return size;
+}
+
+
+ssize_t
+hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
+{
+   struct address_space *mapping = filp->f_mapping;
+   struct inode *inode = mapping->host;
+   unsigned long index = *ppos >> HPAGE_SHIFT;
+   unsigned long end_index;
+   loff_t isize;
+   unsigned long offset;
+   ssize_t retval = 0;
+
+   /* validate user buffer and len */
+   if (len == 0)
+   goto out;
+
+   isize = i_size_read(inode);
+   if (!isize)
+   goto out;
+
+   offset = *ppos & ~HPAGE_MASK;
+   end_index = (isize - 1) >> HPAGE_SHIFT;
+   for (;;) {
+   struct page *page;
+   unsigned long nr, ret;
+
+   /* nr is the maximum number of bytes to copy from this page */
+   nr = HPAGE_SIZE;
+   if (index >= end_index) {
+   if (index > end_index)
+   goto out;
+   nr = ((isize - 1) & ~HPAGE_MASK) + 1;
+   if (nr <= offset) {
+   goto out;
+   }
+   }
+   nr = nr - offset;
+
+   /* Find the page */
+   page = find_get_page(mapping, index);
+   if (unlikely(page == NULL)) {
+   /*
+* We can't find the page in the cache - bail out
+* TODO - should we zero out the user buffer ?
+*/
+   goto out;
+   }
+#if 0
+printk("Found page %p at index %d offset %d nr %d\n", page, index, offset, nr);
+#endif
+
+   /*
+* Ok, we have the page, so now we can copy it to user space...
+*/
+   ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
+   if (ret < 0) {
+   retval = retval ? : ret;
+   goto out;
+   }
+
+   offset += ret;
+   retval += ret;
+   len -= ret;
+   index += offset >> HPAGE_SHIFT;
+   offset &= ~HPAGE_MASK;
+
+   page_cache_release(page);
+   if (ret == nr && len)
+   continue;
+   goto out;
+   }
+out:
+   return retval;
+}
+
+/*
  * Read a page. Again trivial. If it didn't already exist
  * in the page cache, it is zero-filled.
  */
@@ -565,6 +681,7 @@ static void init_once(void *foo, kmem_ca
 }
 
 struct file_operations hugetlbfs_file_operations = {
+   .read   = hugetlbfs_read,
.mmap   = hugetlbfs_file_mmap,
.fsync  = simple_sync_file,
.get_unmapped_area  = hugetlb_get_unmapped_area,


-
To unsubsc

Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-13 Thread Badari Pulavarty

On Wed, 2007-06-13 at 13:43 +0200, Nick Piggin wrote:
..
>  
> > 5) ext3_write_end:
> > Before  write_begin/write_end patch set we have folowing locking
> > order:
> > stop_journal(handle);
> > unlock_page(page);
> > But now order is oposite:
> > unlock_page(page);
> > stop_journal(handle);
> > Can we got any race condition now? I'm not sure is it actual problem,
> > may be somebody cant describe this.
> 
> Can we just change it to the original order? That would seem to be
> safest unless one of the ext3 devs explicitly acks it.

It would be nice to go back to original order, but its not that
simple with current structure of the code. With Nick's patches
unlock_page() happens in generic_write_end(). journal_stop() 
needs to happen after generic_write_end(). :(

Mingming, can you take a look at the current & proposed order ?
I ran into bunch of races when I tried to change the order for
->writepages() support earlier :(

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory

2007-06-11 Thread Badari Pulavarty

On Mon, 2007-06-11 at 11:11 -0700, Andrew Morton wrote:
> On Fri, 08 Jun 2007 17:43:34 -0600
> [EMAIL PROTECTED] (Eric W. Biederman) wrote:
> 
> > Some user space tools need to identify SYSV shared memory when
> > examining /proc//maps.  To do so they look for a block device
> > with major zero, a dentry named SYSV, and having the minor of
> > the internal sysv shared memory kernel mount.
> > 
> > To help these tools and to make it easier for people just browsing
> > /proc//maps this patch modifies hugetlb sysv shared memory to
> > use the SYSV dentry naming convention.
> > 
> > User space tools will still have to be aware that hugetlb sysv
> > shared memory lives on a different internal kernel mount and so
> > has a different block device minor number from the rest of sysv
> > shared memory.
> 
> So..  I am sitting here believing that this patch and Badari's
> restore-shmid-as-inode-to-fix-proc-pid-maps-abi-breakage.patch are both
> needed in 2.6.22 and that they will fix all these issues up.
> 
> If that is untrue, someone please let us know..

Andrew,

My restore-shmid-as-inode-to-fix-proc-pid-maps-abi-breakage.patch is
definitely needed for 2.6.22 to fix ABI issue.

Eric's patch goes beyond and provides same naming convention for
hugetlbfs backed shm segs (which we never did in the past). So,
its not absolutely need for 2.6.22. You can queue up for next 
release,  unless Albert really wants to extend proc-ps utils for
hugetlbfs segments too.

But, its very simple patch - you might as well push this too.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH -mm] fix create_new_namespaces() return value

2007-06-11 Thread Badari Pulavarty




Cedric Le Goater wrote:

The following patch modifies create_new_namespaces() to also use the 
errors returned by the copy_*_ns routines and not to systematically 
return ENOMEM.




In my initial version, I did same. It doesn't work :(

copy_*_ns() routines doesn't return any errors. All they return is NULL 
in case of a
failure + with the exception of copy_mnt_ns, there are no other failure 
cases.
So, there is no way to find out why the copy_*_ns() routines failed from 
create_new_namespaces().
If you really really want to do this, change all copy_*_ns() routines to 
returns meaningful

errors instead of NULL.




Signed-off-by: Cedric Le Goater <[EMAIL PROTECTED]>
Cc: Serge E. Hallyn <[EMAIL PROTECTED]>
Cc: Badari Pulavarty <[EMAIL PROTECTED]>
Cc: Pavel Emelianov <[EMAIL PROTECTED]>
Cc: Herbert Poetzl <[EMAIL PROTECTED]>
Cc: Eric W. Biederman <[EMAIL PROTECTED]>
---
kernel/nsproxy.c |   23 +--
1 file changed, 17 insertions(+), 6 deletions(-)

Index: 2.6.22-rc4-mm2/kernel/nsproxy.c
===
--- 2.6.22-rc4-mm2.orig/kernel/nsproxy.c
+++ 2.6.22-rc4-mm2/kernel/nsproxy.c
@@ -58,30 +58,41 @@ static struct nsproxy *create_new_namesp
struct fs_struct *new_fs)
{
struct nsproxy *new_nsp;
+   int err;

new_nsp = clone_nsproxy(tsk->nsproxy);
if (!new_nsp)
return ERR_PTR(-ENOMEM);

new_nsp->mnt_ns = copy_mnt_ns(flags, tsk->nsproxy->mnt_ns, new_fs);
-   if (IS_ERR(new_nsp->mnt_ns))
+   if (IS_ERR(new_nsp->mnt_ns)) {
+   err = PTR_ERR(new_nsp->mnt_ns);
goto out_ns;
+   }

new_nsp->uts_ns = copy_utsname(flags, tsk->nsproxy->uts_ns);
-   if (IS_ERR(new_nsp->uts_ns))
+   if (IS_ERR(new_nsp->uts_ns)) {
+   err = PTR_ERR(new_nsp->uts_ns);
goto out_uts;
+   }

new_nsp->ipc_ns = copy_ipcs(flags, tsk->nsproxy->ipc_ns);
-   if (IS_ERR(new_nsp->ipc_ns))
+   if (IS_ERR(new_nsp->ipc_ns)) {
+   err = PTR_ERR(new_nsp->ipc_ns);
goto out_ipc;
+   }

new_nsp->pid_ns = copy_pid_ns(flags, tsk->nsproxy->pid_ns);
-   if (IS_ERR(new_nsp->pid_ns))
+   if (IS_ERR(new_nsp->pid_ns)) {
+   err = PTR_ERR(new_nsp->pid_ns);
goto out_pid;
+   }

new_nsp->user_ns = copy_user_ns(flags, tsk->nsproxy->user_ns);
-   if (IS_ERR(new_nsp->user_ns))
+   if (IS_ERR(new_nsp->user_ns)) {
+   err = PTR_ERR(new_nsp->user_ns);


Hmm.. copy_user_ns() ? I don't see this in rc4-mm2.

Thanks,
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory

2007-06-08 Thread Badari Pulavarty




Andrew Morton wrote:


On Fri, 08 Jun 2007 17:43:34 -0600
[EMAIL PROTECTED] (Eric W. Biederman) wrote:


Some user space tools need to identify SYSV shared memory when
examining /proc//maps.  To do so they look for a block device
with major zero, a dentry named SYSV, and having the minor of
the internal sysv shared memory kernel mount.

To help these tools and to make it easier for people just browsing
/proc//maps this patch modifies hugetlb sysv shared memory to
use the SYSV dentry naming convention.

User space tools will still have to be aware that hugetlb sysv
shared memory lives on a different internal kernel mount and so
has a different block device minor number from the rest of sysv
shared memory.



I assume this fix is preferred over Badari's?  If so, why?


No. You still need my patch to fix the current breakage.

This patch makes hugetlbfs also use same naming convention as regular 
shmem for its
name. This is not absolutely needed, its a nice to have. Currently, user 
space tools

can't depend on the filename alone, since its not unique (based on kry).

Thanks,
Badari






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Restore shmid as inode# to fix /proc/pid/maps ABI breakage

2007-06-08 Thread Badari Pulavarty

Andrew,

Can you include this in -mm ?

Thanks,
Badari

shmid used to be stored as inode# for shared memory segments. Some of
the proc-ps tools use this from /proc/pid/maps.  Recent cleanups
to newseg() changed it.  This patch sets inode number back to shared 
memory id to fix breakage.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>

Index: linux-2.6.22-rc4/ipc/shm.c
===
--- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-08 15:17:20.0 -0700
+++ linux-2.6.22-rc4/ipc/shm.c  2007-06-08 15:19:38.0 -0700
@@ -397,6 +397,11 @@ static int newseg (struct ipc_namespace 
shp->shm_nattch = 0;
shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
shp->shm_file = file;
+   /*
+* shmid gets reported as "inode#" in /proc/pid/maps.
+* proc-ps tools use this. Changing this will break them.
+*/
+   file->f_dentry->d_inode->i_ino = shp->id;
 
ns->shm_tot += numpages;
shm_unlock(shp);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-08 Thread Badari Pulavarty




Eric W. Biederman wrote:



At this point given that we actually have a small user space dependency
and the fact that after I have reviewed the code it looks harmless to
change the inode number of those inodes, in both cases they are just
anonymous inodes generated with new_inode, and anything that we wrap
is likely to be equally so.

So it looks to me like we need to do three things:
- Fix the inode number


Okay. its already done.



- Fix the name on the hugetlbfs dentry to hold the key

I don't see need for doing this for hugetlbfs inodes. Currently, they 
don't base their
name on "key" + basing on the "key" is kind of useless anyway (its not 
unique).




- Add a big fat comment that user space programs depend on this
 behavior of both the dentry name and the inode number.

I don't think, the user-space can depend on the dentry-name. It can only 
depend

on inode# to match shmid. (since key is not unique esp. for key=0x).

BTW, I agree that shmid is not unique even without namespaces as its 
based on

seq# and we wrap seq#.

Thanks,
Badari




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-07 Thread Badari Pulavarty




Serge E. Hallyn wrote:


Quoting Serge E. Hallyn ([EMAIL PROTECTED]):


Quoting Badari Pulavarty ([EMAIL PROTECTED]):


On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:


Quoting Badari Pulavarty ([EMAIL PROTECTED]):


On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:


On Thu, 07 Jun 2007 10:06:37 -0700
Badari Pulavarty <[EMAIL PROTECTED]> wrote:


On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:


On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote:


BTW, I agree with Eric that its would be nice to use shmid as part
of name instead of forcing to be as inode number. It should be
possible for pmap to workout shmid from "key" or name. Isn't it ?


It is not at all nice.

1. it's incompatible ABI breakage
2. where will you put the key then, in the inode? :-)


Nope. Currently "key" is part of the name (but its not unique).


Changing to "SYSVID%d" is no good either. Look, people
are ***parsing*** this stuff in /proc. The /proc filesystem
is not some random sandbox to be playing in.

Before you go messing with it, note that the device number
also matters. (it's per-boot dynamic, but that's OK)
That's how one knows that /SYSV is not just
a regular file; sadly these didn't get a non-/ prefix.
(and no you can't fix that now; it's way too late)

Next time you feel like breaking an ABI, mind putting
"LET'S BREAK AN ABI!" in the subject of your email?


I am not breaking ABI. Its already broken in the current
mainline. I am trying to fix it by putting back the ino#
as shmid. Eric had a suggestion that, instead of depending
on the inode# to be shmid, we could embed shmid into name
(instead of "key" which is currently not unique).


BTW, I suspect this kind of thing also breaks:
a. fuser, lsof, and other resource usage display tools
b. various obscure emulators (similar to valgrind)

If you strongly feel that "old" behaviour needs to be retained, 


yup, we should put it back.  The change was, afaik, accidental.


here is the patch I originally suggested.


Confused.  Will this one-liner fix all the userspace breakage to which
Albert refers?


Yes. Albert, please correct me if I am wrong.


It will, but could lead to two different inodes with the same i_ino,
right?


Only if we generate same ID in two different namespaces. Is it currently
possible ? 


Should be nothing stopping it.



(just to be more certain, a quick test showed I can get id 0 for
different keys, and different ids for the same key 0xff, in different
ipc namespaces)


Funny. I played with it and decided that it can happen :)

Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-07 Thread Badari Pulavarty




Serge E. Hallyn wrote:


Quoting Badari Pulavarty ([EMAIL PROTECTED]):


On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:


Quoting Badari Pulavarty ([EMAIL PROTECTED]):


On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:


On Thu, 07 Jun 2007 10:06:37 -0700
Badari Pulavarty <[EMAIL PROTECTED]> wrote:


On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:


On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote:


BTW, I agree with Eric that its would be nice to use shmid as part
of name instead of forcing to be as inode number. It should be
possible for pmap to workout shmid from "key" or name. Isn't it ?


It is not at all nice.

1. it's incompatible ABI breakage
2. where will you put the key then, in the inode? :-)


Nope. Currently "key" is part of the name (but its not unique).


Changing to "SYSVID%d" is no good either. Look, people
are ***parsing*** this stuff in /proc. The /proc filesystem
is not some random sandbox to be playing in.

Before you go messing with it, note that the device number
also matters. (it's per-boot dynamic, but that's OK)
That's how one knows that /SYSV is not just
a regular file; sadly these didn't get a non-/ prefix.
(and no you can't fix that now; it's way too late)

Next time you feel like breaking an ABI, mind putting
"LET'S BREAK AN ABI!" in the subject of your email?


I am not breaking ABI. Its already broken in the current
mainline. I am trying to fix it by putting back the ino#
as shmid. Eric had a suggestion that, instead of depending
on the inode# to be shmid, we could embed shmid into name
(instead of "key" which is currently not unique).


BTW, I suspect this kind of thing also breaks:
a. fuser, lsof, and other resource usage display tools
b. various obscure emulators (similar to valgrind)

If you strongly feel that "old" behaviour needs to be retained, 


yup, we should put it back.  The change was, afaik, accidental.


here is the patch I originally suggested.


Confused.  Will this one-liner fix all the userspace breakage to which
Albert refers?


Yes. Albert, please correct me if I am wrong.


It will, but could lead to two different inodes with the same i_ino,
right?


Only if we generate same ID in two different namespaces. Is it currently
possible ? 



Should be nothing stopping it.

But like I say we never find the inode based on i_ino, and don't hash
the inode, so it might be ok.

Correct. We might end up with same shmid - which mean same inode# shows 
up in /proc/pid/maps.
If we don't unshare pid namespace or look from parent namespace - we 
will end up seeing same
shmid/inode# in different /proc/pid/maps, even though they are 
different. But I guess its okay..


Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-07 Thread Badari Pulavarty

On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
> Quoting Badari Pulavarty ([EMAIL PROTECTED]):
> > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> > > > > 
> > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > > 
> > > > > It is not at all nice.
> > > > > 
> > > > > 1. it's incompatible ABI breakage
> > > > > 2. where will you put the key then, in the inode? :-)
> > > > 
> > > > Nope. Currently "key" is part of the name (but its not unique).
> > > > 
> > > > > 
> > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > is not some random sandbox to be playing in.
> > > > > 
> > > > > Before you go messing with it, note that the device number
> > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > That's how one knows that /SYSV is not just
> > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > (and no you can't fix that now; it's way too late)
> > > > > 
> > > > > Next time you feel like breaking an ABI, mind putting
> > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > > 
> > > > I am not breaking ABI. Its already broken in the current
> > > > mainline. I am trying to fix it by putting back the ino#
> > > > as shmid. Eric had a suggestion that, instead of depending
> > > > on the inode# to be shmid, we could embed shmid into name
> > > > (instead of "key" which is currently not unique).
> > > > 
> > > > > BTW, I suspect this kind of thing also breaks:
> > > > > a. fuser, lsof, and other resource usage display tools
> > > > > b. various obscure emulators (similar to valgrind)
> > > > 
> > > > If you strongly feel that "old" behaviour needs to be retained, 
> > > 
> > > yup, we should put it back.  The change was, afaik, accidental.
> > > 
> > > > here is the patch I originally suggested.
> > > 
> > > Confused.  Will this one-liner fix all the userspace breakage to which
> > > Albert refers?
> > 
> > Yes. Albert, please correct me if I am wrong.
> 
> It will, but could lead to two different inodes with the same i_ino,
> right?

Only if we generate same ID in two different namespaces. Is it currently
possible ? 

Thanks,
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-07 Thread Badari Pulavarty

On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> On Thu, 07 Jun 2007 10:06:37 -0700
> Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> > > 
> > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > of name instead of forcing to be as inode number. It should be
> > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > 
> > > It is not at all nice.
> > > 
> > > 1. it's incompatible ABI breakage
> > > 2. where will you put the key then, in the inode? :-)
> > 
> > Nope. Currently "key" is part of the name (but its not unique).
> > 
> > > 
> > > Changing to "SYSVID%d" is no good either. Look, people
> > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > is not some random sandbox to be playing in.
> > > 
> > > Before you go messing with it, note that the device number
> > > also matters. (it's per-boot dynamic, but that's OK)
> > > That's how one knows that /SYSV is not just
> > > a regular file; sadly these didn't get a non-/ prefix.
> > > (and no you can't fix that now; it's way too late)
> > > 
> > > Next time you feel like breaking an ABI, mind putting
> > > "LET'S BREAK AN ABI!" in the subject of your email?
> > 
> > I am not breaking ABI. Its already broken in the current
> > mainline. I am trying to fix it by putting back the ino#
> > as shmid. Eric had a suggestion that, instead of depending
> > on the inode# to be shmid, we could embed shmid into name
> > (instead of "key" which is currently not unique).
> > 
> > > BTW, I suspect this kind of thing also breaks:
> > > a. fuser, lsof, and other resource usage display tools
> > > b. various obscure emulators (similar to valgrind)
> > 
> > If you strongly feel that "old" behaviour needs to be retained, 
> 
> yup, we should put it back.  The change was, afaik, accidental.
> 
> > here is the patch I originally suggested.
> 
> Confused.  Will this one-liner fix all the userspace breakage to which
> Albert refers?

Yes. Albert, please correct me if I am wrong.

Thanks,
Badari


> > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
> > memory (shmid). It was useful in debugging, but its changed recently. 
> > This patch sets inode number to shared memory id to match /proc/pid/maps.
> > 
> > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> > 
> > Index: linux-2.6.22-rc4/ipc/shm.c
> > ===
> > --- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700
> > +++ linux-2.6.22-rc4/ipc/shm.c  2007-06-06 08:23:57.0 -0700
> > @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace 
> > shp->shm_nattch = 0;
> > shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
> > shp->shm_file = file;
> > +   file->f_dentry->d_inode->i_ino = shp->id;
> >  
> > ns->shm_tot += numpages;
> > shm_unlock(shp);
> > 
> > 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-07 Thread Badari Pulavarty

On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> On 6/7/07, Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > BTW, I agree with Eric that its would be nice to use shmid as part
> > of name instead of forcing to be as inode number. It should be
> > possible for pmap to workout shmid from "key" or name. Isn't it ?
> 
> It is not at all nice.
> 
> 1. it's incompatible ABI breakage
> 2. where will you put the key then, in the inode? :-)

Nope. Currently "key" is part of the name (but its not unique).

> 
> Changing to "SYSVID%d" is no good either. Look, people
> are ***parsing*** this stuff in /proc. The /proc filesystem
> is not some random sandbox to be playing in.
> 
> Before you go messing with it, note that the device number
> also matters. (it's per-boot dynamic, but that's OK)
> That's how one knows that /SYSV is not just
> a regular file; sadly these didn't get a non-/ prefix.
> (and no you can't fix that now; it's way too late)
> 
> Next time you feel like breaking an ABI, mind putting
> "LET'S BREAK AN ABI!" in the subject of your email?

I am not breaking ABI. Its already broken in the current
mainline. I am trying to fix it by putting back the ino#
as shmid. Eric had a suggestion that, instead of depending
on the inode# to be shmid, we could embed shmid into name
(instead of "key" which is currently not unique).

> BTW, I suspect this kind of thing also breaks:
> a. fuser, lsof, and other resource usage display tools
> b. various obscure emulators (similar to valgrind)

If you strongly feel that "old" behaviour needs to be retained, 
here is the patch I originally suggested.

Thanks,
Badari

"ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
memory (shmid). It was useful in debugging, but its changed recently. 
This patch sets inode number to shared memory id to match /proc/pid/maps.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>

Index: linux-2.6.22-rc4/ipc/shm.c
===
--- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700
+++ linux-2.6.22-rc4/ipc/shm.c  2007-06-06 08:23:57.0 -0700
@@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace 
shp->shm_nattch = 0;
shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
shp->shm_file = file;
+   file->f_dentry->d_inode->i_ino = shp->id;
 
ns->shm_tot += numpages;
shm_unlock(shp);



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-07 Thread Badari Pulavarty

On Thu, 2007-06-07 at 00:53 -0400, Albert Cahalan wrote:
> On 6/6/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[EMAIL PROTECTED]> 
> > wrote:
> > > Eric W. Biederman writes:
> > > > Badari Pulavarty <[EMAIL PROTECTED]> writes:
> > >
> > > >> Your recent cleanup to shm code, namely
> > > >>
> > > >> [PATCH] shm: make sysv ipc shared memory use stacked files
> > > >>
> > > >> took away one of the debugging feature for shm segments.
> > > >> Originally, shmid were forced to be the inode numbers and
> > > >> they show up in /proc/pid/maps for the process which mapped
> > > >> this shared memory segments (vma listing). That way, its easy
> > > >> to find out who all mapped this shared memory segment. Your
> > > >> patchset, took away the inode# setting. So, we can't easily
> > > >> match the shmem segments to /proc/pid/maps easily. (It was
> > > >> really useful in tracking down a customer problem recently).
> > > >> Is this done deliberately ? Anything wrong in setting this back ?
> > > >
> > > > Theoretically it makes the stacked file concept more brittle,
> > > > because it means the lower layers can't care about their inode
> > > > number.
> > > >
> > > > We do need something to tie these things together.
> > > >
> > > > So I suspect what makes most sense is to simply rename the
> > > > dentry SYSVID
> > >
> > > Please stop breaking things in /proc. The pmap command relys
> > > on the old behavior.
> >
> > What effect did this change have upon the pmap command?  Details, please.
> >
> > > It's time to revert.
> >
> > Probably true, but we'd need to understand what the impact was.
> 
> Very simply, pmap reports the shmid.
> 
> albert 0 ~$ pmap `pidof X` | egrep -2 shmid
> 3005  16384K rw-s-  /dev/fb0
> 3105152K rw---[ anon ]
> 31076000384K rw-s-[ shmid=0x3f428000 ]
> 310d6000384K rw-s-[ shmid=0x3f430001 ]
> 31136000384K rw-s-[ shmid=0x3f438002 ]
> 31196000384K rw-s-[ shmid=0x3f440003 ]
> 311f6000384K rw-s-[ shmid=0x3f448004 ]
> 31256000384K rw-s-[ shmid=0x3f450005 ]
> 312b6000384K rw-s-[ shmid=0x3f460006 ]
> 31316000384K rw-s-[ shmid=0x3f870007 ]
> 31491000140K r  /usr/share/fonts/type1/gsfonts/n021003l.pfb
> 3150e000   9496K rw---[ anon ]

pmap seems to get shmid from "ino#" field of /proc/pid/map.
Its already broken in current mainline.

But, the breakage is not due to namespaces or container effort :(
Its due to noble effort from Eric to clean up the shm code,
take out the hacks to handle hugetlbfs and make the code
more streamlined and readable.

If we really really want old behaviour, we need my one line
patch to force shmid as inode# :(

BTW, I agree with Eric that its would be nice to use shmid as part
of name instead of forcing to be as inode number. It should be
possible for pmap to workout shmid from "key" or name. Isn't it ?

Andrew/Linus, its up to you to figure out if its worth breaking.
Here is the patch to base dentry-name on shmid - so we don't
need to use ino# to identify shmid.

Thanks,
Badari

Instead of basing dentry name on the shm "key", base it on
"shmid" - so it shows up clearly in /proc/pid/maps. Earlier
we were forcing ino# to match shmid.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
Index: linux-2.6.22-rc4/ipc/shm.c
===
--- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700
+++ linux-2.6.22-rc4/ipc/shm.c  2007-06-06 13:43:36.0 -0700
@@ -364,6 +364,14 @@ static int newseg (struct ipc_namespace 
return error;
}
 
+   error = -ENOSPC;
+   id = shm_addid(ns, shp);
+   if(id == -1)
+   goto no_id;
+
+   /* Build an id, so we can use it for filename */
+   shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
+
if (shmflg & SHM_HUGETLB) {
/* hugetlb_zero_setup takes care of mlock user accounting */
file = hugetlb_zero_setup(size);
@@ -377,34 +385,28 @@ static int newseg (struct ipc_namespace 
if  ((shmflg & SHM_NORESERVE) &&
sysctl_overcommit_memory != OVERCOMMIT_NEVER)
acctflag = 0;
-   sprintf (name, "SYSV%08x", key);
+   sprintf (name, "SYSVID%d", shp-&

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-06 Thread Badari Pulavarty

On Wed, 2007-06-06 at 11:02 -0600, Eric W. Biederman wrote:
> Badari Pulavarty <[EMAIL PROTECTED]> writes:
> 
> > Hi Eric,
> >
> > Your recent cleanup to shm code, namely
> >
> > [PATCH] shm: make sysv ipc shared memory use stacked files
> >
> > took away one of the debugging feature for shm segments.
> > Originally, shmid were forced to be the inode numbers and
> > they show up in /proc/pid/maps for the process which mapped
> > this shared memory segments (vma listing). That way, its easy
> > to find out who all mapped this shared memory segment. Your
> > patchset, took away the inode# setting. So, we can't easily
> > match the shmem segments to /proc/pid/maps easily. (It was
> > really useful in tracking down a customer problem recently). 
> > Is this done deliberately ? Anything wrong in setting this back ?
> >
> > Comments ?
> >
> > Thanks,
> > Badari
> >
> > Without patch:
> > --
> >
> > # ipcs -m
> >
> > -- Shared Memory Segments 
> > keyshmid  owner  perms  bytes  nattch status
> > 0x 884737 db2inst1  76733554432   13
> >
> > # grep 884737 /proc/*/maps
> > #
> >
> > With patch:
> > ---
> >
> > # ipcs -m
> >
> > -- Shared Memory Segments 
> > keyshmid  owner  perms  bytes  nattch status
> > 0x 884737 db2inst1  76733554432   13
> >
> > # grep 884737 /proc/*/maps
> > /proc/0/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/1/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/2/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/3/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/4/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/5/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/6/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/7/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/8/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/11121/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/11122/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/11124/maps:4000389c000-4000589c000 rw-s  00:08 884737
> > /SYSV (deleted)
> > /proc/11575/maps:40006724000-40008724000 rw-s  00:08 884737
> > /SYSV (deleted)
> >
> >
> >
> > Here is the patch.
> >
> > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
> > memory (shmid). It was useful in debugging, but its changed recently. 
> > This patch sets inode number to shared memory id to match /proc/pid/maps.
> 
> Theoretically it makes the stacked file concept more brittle, because
> it means the lower layers can't care about their inode number.
> 
> We do need something to tie these things together.
> 
> So I suspect what makes most sense is to simply rename the dentry
> SYSVID

Yep. Currently, we use part of "key" as the dentry name. For example,

# ipcs

-- Shared Memory Segments 
keyshmid  owner  perms  bytes  nattch status
0x083d0d74 851968 db2inst1  76733554432   13

# grep 83d0d74 /proc/*/maps
/proc/0/maps:40004724000-40006724000 rw-s  00:08 851968  
/SYSV083d0d74 (deleted)
/proc/1/maps:40004724000-40006724000 rw-s  00:08 851968  
/SYSV083d0d74 (deleted)
/proc/2/maps:40004724000-40006724000 rw-s  00:08 851968  
/SYSV083d0d74 (deleted)
/proc/3/maps:40004724000-40006724000 rw-s  00:08 851968  
/SYSV083d0d74 (deleted)
..

The issue is with the ones with key = 0x000, like following:

# ipcs

-- Shared Memory Segments 
keyshmid  owner  perms  bytes  nattch status
0x 884737 db2inst1  76733554432   13
0x 950275 db2fenc1  70123052288   13

There is no unique way to identify them easily :(

I guess, like you suggested, we can change the dentry name to use shmid
instead of the portions of the "key" to make it unique. I think, I can 
work out a patch for this.

[RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-06 Thread Badari Pulavarty

Hi Eric,

Your recent cleanup to shm code, namely

[PATCH] shm: make sysv ipc shared memory use stacked files

took away one of the debugging feature for shm segments.
Originally, shmid were forced to be the inode numbers and
they show up in /proc/pid/maps for the process which mapped
this shared memory segments (vma listing). That way, its easy
to find out who all mapped this shared memory segment. Your
patchset, took away the inode# setting. So, we can't easily
match the shmem segments to /proc/pid/maps easily. (It was
really useful in tracking down a customer problem recently). 
Is this done deliberately ? Anything wrong in setting this back ?

Comments ?

Thanks,
Badari

Without patch:
--

# ipcs -m

-- Shared Memory Segments 
keyshmid  owner  perms  bytes  nattch status
0x 884737 db2inst1  76733554432   13

# grep 884737 /proc/*/maps
#

With patch:
---

# ipcs -m

-- Shared Memory Segments 
keyshmid  owner  perms  bytes  nattch status
0x 884737 db2inst1  76733554432   13

# grep 884737 /proc/*/maps
/proc/0/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/1/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/2/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/3/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/4/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/5/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/6/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/7/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/8/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/11121/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/11122/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/11124/maps:4000389c000-4000589c000 rw-s  00:08 884737 
/SYSV (deleted)
/proc/11575/maps:40006724000-40008724000 rw-s  00:08 884737 
/SYSV (deleted)



Here is the patch.

"ino#" in /proc/pid/maps used to match "ipcs -m" output for shared 
memory (shmid). It was useful in debugging, but its changed recently. 
This patch sets inode number to shared memory id to match /proc/pid/maps.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>

Index: linux-2.6.22-rc4/ipc/shm.c
===
--- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.0 -0700
+++ linux-2.6.22-rc4/ipc/shm.c  2007-06-06 08:23:57.0 -0700
@@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace 
shp->shm_nattch = 0;
shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
shp->shm_file = file;
+   file->f_dentry->d_inode->i_ino = shp->id;
 
ns->shm_tot += numpages;
shm_unlock(shp);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.22-rc4

2007-06-05 Thread Badari Pulavarty

On Mon, 2007-06-04 at 20:50 -0700, Linus Torvalds wrote:
> So -rc4 is out there now, hopefully shrinking the regression list further. 
> 

Nothing serious, compile warnings ..

mm/sparse.c:244: warning: `__kmalloc_section_memmap' defined but not used
mm/sparse.c:274: warning: `__kfree_section_memmap' defined but not used

Here is the patch.

Thanks,
Badari

__kmalloc_section_memmap(), vaddr_in_vmalloc_area() and
 __kfree_section_memmap() are used only for MEMORY_HOTPLUG.
Moved them under CONFIG_MEMORY_HOTPLUG.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>

Index: linux-2.6.22-rc4/mm/sparse.c
===
--- linux-2.6.22-rc4.orig/mm/sparse.c   2007-06-04 17:57:25.0 -0700
+++ linux-2.6.22-rc4/mm/sparse.c2007-06-05 13:56:29.0 -0700
@@ -240,6 +240,27 @@ static struct page __init *sparse_early_
return NULL;
 }
 
+/*
+ * Allocate the accumulated non-linear sections, allocate a mem_map
+ * for each and record the physical to section mapping.
+ */
+void __init sparse_init(void)
+{
+   unsigned long pnum;
+   struct page *map;
+
+   for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+   if (!valid_section_nr(pnum))
+   continue;
+
+   map = sparse_early_mem_map_alloc(pnum);
+   if (!map)
+   continue;
+   sparse_init_one_section(__nr_to_section(pnum), pnum, map);
+   }
+}
+
+#ifdef CONFIG_MEMORY_HOTPLUG
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 {
struct page *page, *ret;
@@ -280,27 +301,6 @@ static void __kfree_section_memmap(struc
 }
 
 /*
- * Allocate the accumulated non-linear sections, allocate a mem_map
- * for each and record the physical to section mapping.
- */
-void __init sparse_init(void)
-{
-   unsigned long pnum;
-   struct page *map;
-
-   for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
-   if (!valid_section_nr(pnum))
-   continue;
-
-   map = sparse_early_mem_map_alloc(pnum);
-   if (!map)
-   continue;
-   sparse_init_one_section(__nr_to_section(pnum), pnum, map);
-   }
-}
-
-#ifdef CONFIG_MEMORY_HOTPLUG
-/*
  * returns the number of sections whose mem_maps were properly
  * set.  If this is <=0, then that means that the passed-in
  * map was not consumed and must be freed.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-24 Thread Badari Pulavarty

On Thu, 2007-05-24 at 14:05 +0200, Jens Axboe wrote:
> On Thu, May 24 2007, Jens Axboe wrote:
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > SMP NR_CPUS=32 NUMA pSeries
> > > Modules linked in: qla2xxx scsi_transport_fc
> > > NIP: c00414a0 LR: c004162c CTR: 0001
> > > REGS: c47bb130 TRAP: 0300   Not tainted  (2.6.22-rc1)
> > > MSR: 80001032   CR: 2822  XER: 0008
> > > DAR: , DSISR: 4000
> > > TASK = c47a6aa0[0] 'swapper' THREAD: c47b8000 CPU: 7
> > > GPR00: 0080 c47bb3b0 c0692358 c47a6aa0
> > > GPR04:  0070  c05ac0b8
> > > GPR08: e4b4 0001  0280
> > > GPR12: 0020 c05a3e80  07a8dd70
> > > GPR16:    c47b8000
> > > GPR20: 3b9aca00 c47a6c50 0001 
> > > GPR24: d0909048 0001dee6d30c0d30 c3b7dd80 c47a6aa0
> > > GPR28: 0001079027ca c47a6aa0 c05b7cb0 c0472c9c
> > > NIP [c00414a0] .dequeue_task+0x0/0x9c
> > > LR [c004162c] .deactivate_task+0x40/0x60
> > > Call Trace:
> > > [c47bb3b0] [c004bccc] .printk+0x38/0x48 (unreliable)
> > > [c47bb440] [c0471704] .schedule+0x1fc/0x8dc
> > > [c47bb540] [c0472c9c] .schedule_timeout+0xa8/0xe8
> > > [c47bb610] [c0057260] .msleep+0x20/0x38
> > > [c47bb690] [c003f5ec] .eeh_dn_check_failure+0x114/0x268
> > > [c47bb740] [c003fc64] .eeh_check_failure+0xec/0x114
> > > [c47bb7c0] [d086190c] .qla2300_fw_dump+0x1130/0x1c00 
> > > [qla2xxx]
> > > [c47bb8a0] [d0858d50] .qla2300_intr_handler+0x1e8/0x60c 
> > > [qla2xxx]
> > > [c47bb950] [c0078368] .handle_IRQ_event+0x70/0xe4
> > > [c47bb9f0] [c007a7e0] .handle_fasteoi_irq+0x11c/0x1d0
> > > [c47bba90] [c000c178] .do_IRQ+0x90/0xec
> > > [c47bbb10] [c0004790] hardware_interrupt_entry+0x18/0x1c
> > 
> > Not good. The qla changes are non-trivial (that hardware has a really
> > funky sg setup), so I may have botched a part of it. I'll review the
> > qla changes and get back to you.
> 
> Does this help?
> 
> diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c

Yes. It does. For now, I have no more complaints :)

Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-22 Thread Badari Pulavarty

On Mon, 2007-05-21 at 08:35 +0200, Jens Axboe wrote:
> On Mon, May 21 2007, Jens Axboe wrote:
> > On Fri, May 18 2007, Badari Pulavarty wrote:
> > > On Fri, 2007-05-18 at 09:35 +0200, Jens Axboe wrote:
> > > > On Thu, May 17 2007, Badari Pulavarty wrote:
> > > > > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > > > > > On Wed, May 16 2007, Badari Pulavarty wrote:
> > > > > > > On Tue, 2007-05-15 at 19:50 +0200, Jens Axboe wrote:
> > > > > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > > > > On Tue, 2007-05-15 at 19:20 +0200, Jens Axboe wrote:
> > > > > > > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > > > > > > On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > 
> > > > > > > > > > > > Updated version of the patch - this time I'll just 
> > > > > > > > > > > > attach the patch
> > > > > > > > > > > > file...
> > > > > > > > > > > 
> > > > > > > > > > > Missing scatterlist.h inclusions..
> > > > > > > > > > > 
> > > > > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c: In function 
> > > > > > > > > > > ???sym_scatter???:
> > > > > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: 
> > > > > > > > > > > implicit declaration
> > > > > > > > > > > of function ???for_each_sg???
> > > > > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected 
> > > > > > > > > > > ???;??? before ???{???
> > > > > > > > > > > token
> > > > > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused 
> > > > > > > > > > > variable ???tp???
> > > > > > > > > > > make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c: In function 
> > > > > > > > > > > ???qla24xx_build_scsi_iocbs???:
> > > > > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit 
> > > > > > > > > > > declaration of
> > > > > > > > > > > function ???for_each_sg???
> > > > > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected 
> > > > > > > > > > > ???;??? before ???{???
> > > > > > > > > > > token
> > > > > > > > > > 
> > > > > > > > > > Thanks, will fix those. What arch? I tested it here.
> > > > > > > > > 
> > > > > > > > > I am playing with them on ppc64.
> > > > > > > > 
> > > > > > > > Ah ok, you need the updated patch series for ppc64 support. 
> > > > > > > > Builds fine
> > > > > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > > > > 
> > > > > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > > > > 
> > > > > > > > I can mail you an updated patch, if you want.
> > > > > > > 
> > > > > > > 
> > > > > > > Here is the whole panic stack..
> > > > > > 
> > > > > > Thanks will fix that up, the IDE part is totally untested. Can you 
> > > > > > try
> > > > > > and backout this patch and see if it boots?
> > > > > 
> > > > > I increased max_segments to 1024 on my qla2200 attached disks and
> > > > > simple "dd" (direct read) resulted in following:
> > > > > 
> > > > > elm3b29:/sys/block/sdd/queue # echo 1024 > max_segments
> > > > > elm3b29:/sys/block/sdd/queue # cat max_hw_sectors_kb > max_sectors_kb
> > > > > elm3b29:/mnt # dd iflag=direct if=.

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-22 Thread Badari Pulavarty

On Mon, 2007-05-21 at 08:14 +0200, Jens Axboe wrote:
> On Fri, May 18 2007, Badari Pulavarty wrote:
> > On Fri, 2007-05-18 at 09:35 +0200, Jens Axboe wrote:
> > > On Thu, May 17 2007, Badari Pulavarty wrote:
> > > > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > > > > On Wed, May 16 2007, Badari Pulavarty wrote:
> > > > > > On Tue, 2007-05-15 at 19:50 +0200, Jens Axboe wrote:
> > > > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > > > On Tue, 2007-05-15 at 19:20 +0200, Jens Axboe wrote:
> > > > > > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > > > > > On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > Updated version of the patch - this time I'll just attach 
> > > > > > > > > > > the patch
> > > > > > > > > > > file...
> > > > > > > > > > 
> > > > > > > > > > Missing scatterlist.h inclusions..
> > > > > > > > > > 
> > > > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c: In function 
> > > > > > > > > > ???sym_scatter???:
> > > > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: implicit 
> > > > > > > > > > declaration
> > > > > > > > > > of function ???for_each_sg???
> > > > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected 
> > > > > > > > > > ???;??? before ???{???
> > > > > > > > > > token
> > > > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused 
> > > > > > > > > > variable ???tp???
> > > > > > > > > > make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c: In function 
> > > > > > > > > > ???qla24xx_build_scsi_iocbs???:
> > > > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit 
> > > > > > > > > > declaration of
> > > > > > > > > > function ???for_each_sg???
> > > > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected 
> > > > > > > > > > ???;??? before ???{???
> > > > > > > > > > token
> > > > > > > > > 
> > > > > > > > > Thanks, will fix those. What arch? I tested it here.
> > > > > > > > 
> > > > > > > > I am playing with them on ppc64.
> > > > > > > 
> > > > > > > Ah ok, you need the updated patch series for ppc64 support. 
> > > > > > > Builds fine
> > > > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > > > 
> > > > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > > > 
> > > > > > > I can mail you an updated patch, if you want.
> > > > > > 
> > > > > > 
> > > > > > Here is the whole panic stack..
> > > > > 
> > > > > Thanks will fix that up, the IDE part is totally untested. Can you try
> > > > > and backout this patch and see if it boots?
> > > > 
> > > > I increased max_segments to 1024 on my qla2200 attached disks and
> > > > simple "dd" (direct read) resulted in following:
> > > > 
> > > > elm3b29:/sys/block/sdd/queue # echo 1024 > max_segments
> > > > elm3b29:/sys/block/sdd/queue # cat max_hw_sectors_kb > max_sectors_kb
> > > > elm3b29:/mnt # dd iflag=direct if=./z of=/dev/null bs=512M
> > > > 
> > > > Unable to handle kernel paging request at 1008 RIP:
> > > >  [] __rmqueue+0x6f/0x120
> > > 
> > > Auch, that's a bug. I don't think the oom path has been tested yet,
> > > perhaps this is hitting it.
> > > 
> > > Can you try with this debug patch, plus enable the sla

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-18 Thread Badari Pulavarty

On Fri, 2007-05-18 at 09:35 +0200, Jens Axboe wrote:
> On Thu, May 17 2007, Badari Pulavarty wrote:
> > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > > On Wed, May 16 2007, Badari Pulavarty wrote:
> > > > On Tue, 2007-05-15 at 19:50 +0200, Jens Axboe wrote:
> > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > On Tue, 2007-05-15 at 19:20 +0200, Jens Axboe wrote:
> > > > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > > > On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > Updated version of the patch - this time I'll just attach the 
> > > > > > > > > patch
> > > > > > > > > file...
> > > > > > > > 
> > > > > > > > Missing scatterlist.h inclusions..
> > > > > > > > 
> > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c: In function 
> > > > > > > > ???sym_scatter???:
> > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: implicit 
> > > > > > > > declaration
> > > > > > > > of function ???for_each_sg???
> > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected 
> > > > > > > > ???;??? before ???{???
> > > > > > > > token
> > > > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused 
> > > > > > > > variable ???tp???
> > > > > > > > make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1
> > > > > > > > 
> > > > > > > > 
> > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c: In function 
> > > > > > > > ???qla24xx_build_scsi_iocbs???:
> > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit 
> > > > > > > > declaration of
> > > > > > > > function ???for_each_sg???
> > > > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected ???;??? 
> > > > > > > > before ???{???
> > > > > > > > token
> > > > > > > 
> > > > > > > Thanks, will fix those. What arch? I tested it here.
> > > > > > 
> > > > > > I am playing with them on ppc64.
> > > > > 
> > > > > Ah ok, you need the updated patch series for ppc64 support. Builds 
> > > > > fine
> > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > 
> > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > 
> > > > > I can mail you an updated patch, if you want.
> > > > 
> > > > 
> > > > Here is the whole panic stack..
> > > 
> > > Thanks will fix that up, the IDE part is totally untested. Can you try
> > > and backout this patch and see if it boots?
> > 
> > I increased max_segments to 1024 on my qla2200 attached disks and
> > simple "dd" (direct read) resulted in following:
> > 
> > elm3b29:/sys/block/sdd/queue # echo 1024 > max_segments
> > elm3b29:/sys/block/sdd/queue # cat max_hw_sectors_kb > max_sectors_kb
> > elm3b29:/mnt # dd iflag=direct if=./z of=/dev/null bs=512M
> > 
> > Unable to handle kernel paging request at 1008 RIP:
> >  [] __rmqueue+0x6f/0x120
> 
> Auch, that's a bug. I don't think the oom path has been tested yet,
> perhaps this is hitting it.
> 
> Can you try with this debug patch, plus enable the slab debugging
> helpers (like poisoning)?
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 7456992..a479d1e 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -793,6 +793,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd 
> *cmd, gfp_t gfp_mask)
>   return ret;
>  enomem:
>   if (ret) {
> + printk(KERN_ERR "scsi: failed to allocate sg table\n");
>   /*
>* Free entries chained off ret. Since we were trying to
>* allocate another sglist, we know that all entries are of
> 

Not much help. I get all kinds of weird panics.. This time I got (with
the above debug).

general protection fault:  [1] SMP
CPU 1
Mo

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-18 Thread Badari Pulavarty

On Fri, 2007-05-18 at 19:03 +0200, Jens Axboe wrote:
> On Fri, May 18 2007, Badari Pulavarty wrote:
> > On Fri, 2007-05-18 at 09:33 +0200, Jens Axboe wrote:
> > > On Thu, May 17 2007, Badari Pulavarty wrote:
> > > > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > > > .. 
> > > > > > > 
> > > > > > > Ah ok, you need the updated patch series for ppc64 support. 
> > > > > > > Builds fine
> > > > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > > > 
> > > > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > > > 
> > > > > > > I can mail you an updated patch, if you want.
> > > > > > 
> > > > > > 
> > > > > > Here is the whole panic stack..
> > > > > 
> > > > > Thanks will fix that up, the IDE part is totally untested. Can you try
> > > > > and backout this patch and see if it boots?
> > > > 
> > > > Yes. It boots fine with following backed out.
> > > > 
> > > > Looking at the code ide_probe.c: hwif_init() is doing
> > > > 
> > > > hwif->sg_table = kmalloc(sizeof(struct 
> > > > scatterlist)*hwif->sg_max_nents,
> > > >  GFP_KERNEL);
> > > > 
> > > > blk_rq_map_sg() is looking for the chaining info and going over end of 
> > > > the
> > > > allocation.
> > > 
> > > Hmm, looks ok, I'm guessing it's just missing a memset (or just turn it
> > > into a kzalloc())?
> > > 
> > 
> > Even with backing out all the ide changes, I get this on boot
> > once in a while.
> 
> Yep, I think the ide changes are fine as such, the problem is the
> missing memset/kzalloc. Can you try that?

kzalloc() made it better. I haven't seen ide panics anymore. I will try
it again after applying ide patches.

Thanks
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-18 Thread Badari Pulavarty

On Fri, 2007-05-18 at 09:33 +0200, Jens Axboe wrote:
> On Thu, May 17 2007, Badari Pulavarty wrote:
> > On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> > .. 
> > > > > 
> > > > > Ah ok, you need the updated patch series for ppc64 support. Builds 
> > > > > fine
> > > > > here on ppc64. See the #sglist branch of the block repo:
> > > > > 
> > > > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > > > 
> > > > > I can mail you an updated patch, if you want.
> > > > 
> > > > 
> > > > Here is the whole panic stack..
> > > 
> > > Thanks will fix that up, the IDE part is totally untested. Can you try
> > > and backout this patch and see if it boots?
> > 
> > Yes. It boots fine with following backed out.
> > 
> > Looking at the code ide_probe.c: hwif_init() is doing
> > 
> > hwif->sg_table = kmalloc(sizeof(struct 
> > scatterlist)*hwif->sg_max_nents,
> >  GFP_KERNEL);
> > 
> > blk_rq_map_sg() is looking for the chaining info and going over end of the
> > allocation.
> 
> Hmm, looks ok, I'm guessing it's just missing a memset (or just turn it
> into a kzalloc())?
> 

Even with backing out all the ide changes, I get this on boot
once in a while.

Thanks,
Badari

ReiserFS: hda2: checking transaction log (hda2)
Unable to handle kernel paging request at 005e5e66 RIP:
 [] blk_rq_map_sg+0x71/0x1b0
PGD 0
Oops:  [1] SMP
CPU 3
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.22-rc1-sg #7
RIP: 0010:[]  [] blk_rq_map_sg
+0x71/0x1b0
RSP: :8101a024fcc8  EFLAGS: 00010287
RAX: 0001df33e000 RBX: 8101df2b5f70 RCX: 00019f352000
RDX:  RSI: 8101df228300 RDI: 001df33e
RBP: 8101a024fd28 R08: 04e2 R09: 
R10: 007f R11: 0001 R12: 005e5e46
R13: 1000 R14:  R15: 8101df2b5f60
FS:  () GS:8101c021f300()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 005e5e66 CR3: 00201000 CR4: 06e0
Process swapper (pid: 1, threadinfo 8101a0238000, task
810180238000)
Stack:  0003 810179c58000 00019f352000
810179c562c0
 8101df228e80 00170082 01ff81010001 8101df3207a8
 8078a500 810179c56000 8078a500 8101df3207a8
Call Trace:
   [] ide_map_sg+0x42/0xd0
 [] ide_build_sglist+0x2a/0x90
 [] ide_build_dmatable+0x2f/0x180
 [] ide_dma_setup+0x44/0xe0
 [] ide_do_rw_disk+0x349/0x510
 [] ide_do_request+0x622/0xb40
 [] ide_end_request+0x9d/0x160
 [] ide_dma_intr+0x0/0xd0
 [] ide_dma_intr+0x0/0xd0
 [] ide_intr+0x23f/0x250
 [] handle_IRQ_event+0x35/0x70
 [] handle_edge_irq+0xcc/0x150
 [] do_IRQ+0x80/0x100
 [] ret_from_intr+0x0/0xa
   [] kmem_cache_alloc+0x40/0x70
 [] mempool_alloc_slab+0x11/0x20
 [] mempool_alloc+0x42/0x110
 [] generic_make_request+0x198/0x240
 [] bio_alloc_bioset+0x2e/0x120
 [] bio_alloc+0x10/0x20
 [] submit_bh+0x6b/0x140
 [] ll_rw_block+0xd0/0xe0
 [] journal_read+0xb5e/0xec0
 [] zone_statistics+0x61/0xa0
 [] get_page_from_freelist+0x3c8/0x510
 [] __alloc_pages+0x6e/0x330
 [] alloc_page_interleave+0x8d/0xa0
 [] alloc_pages_current+0x86/0x90
 [] get_zeroed_page+0x20/0x40
 [] __pte_alloc_kernel+0x64/0x80
 [] map_vm_area+0x1dc/0x2e0
 [] __vmalloc_area_node+0x157/0x1a0
 [] journal_init+0x819/0x990
 [] __vmalloc_area_node+0x157/0x1a0
 [] __vmalloc_node+0x6f/0x80
 [] __vmalloc+0xe/0x10
 [] reiserfs_fill_super+0x2ba/0xc20
 [] vsnprintf+0x2e7/0x680
 [] snprintf+0x59/0x60
 [] __down_write_nested+0x17/0xc0
 [] strlcpy+0x4f/0x70
 [] test_bdev_super+0x0/0x20
 [] get_sb_bdev+0x13c/0x170
 [] reiserfs_fill_super+0x0/0xc20
 [] get_super_block+0x13/0x20
 [] vfs_kern_mount+0xd8/0x160
 [] do_kern_mount+0x4e/0x100
 [] do_mount+0x4e2/0x790
 [] __d_lookup+0x9c/0x130
 [] do_lookup+0x84/0x200
 [] do_lookup+0x84/0x200
 [] dput+0x24/0x140
 [] __link_path_walk+0x469/0xec0
 [] zone_statistics+0x7d/0xa0
 [] __alloc_pages+0x6e/0x330
 [] alloc_page_interleave+0x8d/0xa0
 [] alloc_pages_current+0x86/0x90
 [] __get_free_pages+0x1b/0x40
 [] copy_mount_options+0x52/0x180
 [] sys_mount+0x94/0xf0
 [] do_mount_root+0x21/0xa0
 [] mount_block_root+0x90/0x220
 [] sys_rmdir+0x11/0x20
 [] mount_root+0xe6/0xf0
 [] prepare_namespace+0xad/0x160
 [] kernel_init+0x23a/0x330
 [] child_rip+0xa/0x12
 [] kernel_init+0x0/0x330
 [] child_rip+0x0/0x12


Code: 49 8b 44 24 20 49 8d 4c 24 20 48 89 c2 48 83 e2 fe a8 01 48
RIP  [] blk_rq_map_sg+0x71/0x1b0
 RSP 
CR2: 005e5e66


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-17 Thread Badari Pulavarty

On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
> On Wed, May 16 2007, Badari Pulavarty wrote:
> > On Tue, 2007-05-15 at 19:50 +0200, Jens Axboe wrote:
> > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > On Tue, 2007-05-15 at 19:20 +0200, Jens Axboe wrote:
> > > > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > > > On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Updated version of the patch - this time I'll just attach the 
> > > > > > > patch
> > > > > > > file...
> > > > > > 
> > > > > > Missing scatterlist.h inclusions..
> > > > > > 
> > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c: In function ???sym_scatter???:
> > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: implicit 
> > > > > > declaration
> > > > > > of function ???for_each_sg???
> > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected ???;??? 
> > > > > > before ???{???
> > > > > > token
> > > > > > drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused variable 
> > > > > > ???tp???
> > > > > > make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1
> > > > > > 
> > > > > > 
> > > > > > drivers/scsi/qla2xxx/qla_iocb.c: In function 
> > > > > > ???qla24xx_build_scsi_iocbs???:
> > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit declaration 
> > > > > > of
> > > > > > function ???for_each_sg???
> > > > > > drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected ???;??? before 
> > > > > > ???{???
> > > > > > token
> > > > > 
> > > > > Thanks, will fix those. What arch? I tested it here.
> > > > 
> > > > I am playing with them on ppc64.
> > > 
> > > Ah ok, you need the updated patch series for ppc64 support. Builds fine
> > > here on ppc64. See the #sglist branch of the block repo:
> > > 
> > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > 
> > > I can mail you an updated patch, if you want.
> > 
> > 
> > Here is the whole panic stack..
> 
> Thanks will fix that up, the IDE part is totally untested. Can you try
> and backout this patch and see if it boots?

I increased max_segments to 1024 on my qla2200 attached disks and
simple "dd" (direct read) resulted in following:

elm3b29:/sys/block/sdd/queue # echo 1024 > max_segments
elm3b29:/sys/block/sdd/queue # cat max_hw_sectors_kb > max_sectors_kb
elm3b29:/mnt # dd iflag=direct if=./z of=/dev/null bs=512M

Unable to handle kernel paging request at 1008 RIP:
 [] __rmqueue+0x6f/0x120
PGD 100921067 PUD 1057f9067 PMD 0
Oops: 0002 [1] SMP
CPU 0
Modules linked in: jfs hfs vfat fat sg sd_mod qla2xxx firmware_class
scsi_transport_fc scsi_mod ipv6 thermal processor fan button battery ac
dm_mod floppy parport_pc lp parport
Pid: 4329, comm: dd Tainted: G   M   2.6.22-rc1 #4
RIP: 0010:[]  [] __rmqueue
+0x6f/0x120
RSP: 0018:8101bdcab948  EFLAGS: 00010093
RAX: 81017e644148 RBX: 0002 RCX: 1000
RDX: 81011c80 RSI:  RDI: 81011a00
RBP: 8101bdcab968 R08: 81017a801142 R09: 
R10: 00078e7a R11: 0002 R12: 81011c80
R13:  R14: 81017e644120 R15: 81011a00
FS:  2b53c78cef20() GS:8063b000()
knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 1008 CR3: 000102707000 CR4: 06e0
Process dd (pid: 4329, threadinfo 8101bdcaa000, task
8101bf4ff440)
Stack:  81017a801110 0012 0012
81017a801100
 8101bdcaba08 8025f47d 0001 81012e68
 00445f7ebe38 81012e60 000280d2 81012e68
Call Trace:
 [] get_page_from_freelist+0x31d/0x510
 [] __alloc_pages+0x6e/0x330
 [] alloc_page_vma+0x4a/0xa0
 [] __handle_mm_fault+0x9b7/0xba0
 [] follow_page+0x1b6/0x250
 [] get_user_pages+0x10c/0x3d0
 [] bio_add_page+0x2e/0x30
 [] dio_get_page+0xbb/0x1b0
 [] __blockdev_direct_IO+0x478/0xc20
 [] :jfs:jfs_direct_IO+0x50/0x60
 [] :jfs:jfs_get_block+0x0/0x230
 [] generic_file_direct_IO+0x73/0x150
 [] generic_file_aio_read+0x131/0x170
 [] page_add_new_anon_rmap+0x10/0x20
 [] __handle_mm_fault+0xa67/0xba0
 [] do_sync_read+0xf1/0x130
 [] up_read+0x9/0x10
 [] autoremove_wake_function+0x0/0x40
 [] __up_wri

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-17 Thread Badari Pulavarty

On Thu, 2007-05-17 at 08:27 +0200, Jens Axboe wrote:
.. 
> > > 
> > > Ah ok, you need the updated patch series for ppc64 support. Builds fine
> > > here on ppc64. See the #sglist branch of the block repo:
> > > 
> > > git://git.kernel.dk/data/git/linux-2.6-block.git
> > > 
> > > I can mail you an updated patch, if you want.
> > 
> > 
> > Here is the whole panic stack..
> 
> Thanks will fix that up, the IDE part is totally untested. Can you try
> and backout this patch and see if it boots?

Yes. It boots fine with following backed out.

Looking at the code ide_probe.c: hwif_init() is doing

hwif->sg_table = kmalloc(sizeof(struct scatterlist)*hwif->sg_max_nents,
 GFP_KERNEL);

blk_rq_map_sg() is looking for the chaining info and going over end of the
allocation.


Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: select(0, ..) is valid ?

2007-05-16 Thread Badari Pulavarty

On Wed, 2007-05-16 at 10:37 -0500, Anton Blanchard wrote:
> Hi Hugh,
> 
> > It's interesting that compat_core_sys_select() shows this kmalloc(0)
> > failure but core_sys_select() does not.  That's because core_sys_select()
> > avoids kmalloc by using a buffer on the stack for small allocations (and
> > 0 sure is small).  Shouldn't compat_core_sys_select() do just the same?
> > Or is SLUB going to be so efficient that doing so is a waste of time?
> 
> Nice catch, the original optimisation from Andi is:
> 
> http://git.kernel.org/git-new/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=70674f95c0a2ea694d5c39f4e514f538a09be36f
> 
> And I think it makes sense for the compat code to do it too.
> 
> Anton

Here it is ..

Should I do one for poll() also ?

Thanks,
Badari

Optimize select by a using stack space for small fd sets.
core_sys_select() already has this optimization. This is
for compat version. 

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 fs/compat.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

Index: linux-2.6.22-rc1/fs/compat.c
===
--- linux-2.6.22-rc1.orig/fs/compat.c   2007-05-12 18:45:56.0 -0700
+++ linux-2.6.22-rc1/fs/compat.c2007-05-16 17:50:39.0 -0700
@@ -1544,9 +1544,10 @@ int compat_core_sys_select(int n, compat
compat_ulong_t __user *outp, compat_ulong_t __user *exp, s64 *timeout)
 {
fd_set_bits fds;
-   char *bits;
+   void *bits;
int size, max_fds, ret = -EINVAL;
struct fdtable *fdt;
+   long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
 
if (n < 0)
goto out_nofds;
@@ -1564,11 +1565,14 @@ int compat_core_sys_select(int n, compat
 * since we used fdset we need to allocate memory in units of
 * long-words.
 */
-   ret = -ENOMEM;
size = FDS_BYTES(n);
-   bits = kmalloc(6 * size, GFP_KERNEL);
-   if (!bits)
-   goto out_nofds;
+   bits = stack_fds;
+   if (size > sizeof(stack_fds) / 6) {
+   bits = kmalloc(6 * size, GFP_KERNEL);
+   ret = -ENOMEM;
+   if (!bits)
+   goto out_nofds;
+   }
fds.in  = (unsigned long *)  bits;
fds.out = (unsigned long *) (bits +   size);
fds.ex  = (unsigned long *) (bits + 2*size);
@@ -1600,7 +1604,8 @@ int compat_core_sys_select(int n, compat
compat_set_fd_set(n, exp, fds.res_ex))
ret = -EFAULT;
 out:
-   kfree(bits);
+   if (bits != stack_fds)
+   kfree(bits);
 out_nofds:
return ret;
 }



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-16 Thread Badari Pulavarty

On Tue, 2007-05-15 at 19:50 +0200, Jens Axboe wrote:
> On Tue, May 15 2007, Badari Pulavarty wrote:
> > On Tue, 2007-05-15 at 19:20 +0200, Jens Axboe wrote:
> > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> > > > > Hi,
> > > > > 
> > > > > Updated version of the patch - this time I'll just attach the patch
> > > > > file...
> > > > 
> > > > Missing scatterlist.h inclusions..
> > > > 
> > > > drivers/scsi/sym53c8xx_2/sym_glue.c: In function ???sym_scatter???:
> > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: implicit declaration
> > > > of function ???for_each_sg???
> > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected ???;??? before 
> > > > ???{???
> > > > token
> > > > drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused variable 
> > > > ???tp???
> > > > make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1
> > > > 
> > > > 
> > > > drivers/scsi/qla2xxx/qla_iocb.c: In function 
> > > > ???qla24xx_build_scsi_iocbs???:
> > > > drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit declaration of
> > > > function ???for_each_sg???
> > > > drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected ???;??? before 
> > > > ???{???
> > > > token
> > > 
> > > Thanks, will fix those. What arch? I tested it here.
> > 
> > I am playing with them on ppc64.
> 
> Ah ok, you need the updated patch series for ppc64 support. Builds fine
> here on ppc64. See the #sglist branch of the block repo:
> 
> git://git.kernel.dk/data/git/linux-2.6-block.git
> 
> I can mail you an updated patch, if you want.


Here is the whole panic stack..

VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 356k freed
Unable to handle kernel paging request at 464b7948 RIP:
 [] blk_rq_map_sg+0x71/0x1b0
PGD 1df350067 PUD 0
Oops:  [1] SMP
CPU 3
Modules linked in:
Pid: 1, comm: init Not tainted 2.6.22-rc1 #2
RIP: 0010:[]  [] blk_rq_map_sg
+0x71/0x1b0
RSP: :8101a02390e8  EFLAGS: 00010206
RAX: 0001df36a000 RBX: 8101df2efce0 RCX: 0001df446000
RDX:  RSI: 8101df2eb780 RDI: 001df36a
RBP: 8101a0239148 R08: 04e2 R09: 
R10: 8101df2eb780 R11: 0001 R12: 464b7928
R13: 1000 R14: 000e R15: 8101df2efcd0
FS:  () GS:8101c0223300()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 464b7948 CR3: 00017a397000 CR4: 06e0
Process init (pid: 1, threadinfo 8101a0238000, task
81018023a000)
Stack:  0001 810179c58000 0001df446000
810179c56060
 8101df2eb780 0004a02393b8 0101 8101df2b4000
 8078a500 810179c56000 8078a500 8101df2b4000
Call Trace:
 [] ide_map_sg+0x42/0xd0
 [] ide_build_sglist+0x2a/0x90
 [] ide_build_dmatable+0x2f/0x1a0
 [] ide_dma_setup+0x44/0xe0
 [] ide_do_rw_disk+0x349/0x510
 [] ide_do_request+0x622/0xb40
 [] lock_timer_base+0x36/0x70
 [] del_timer+0x6b/0x70
 [] do_ide_request+0x1d/0x20
 [] __generic_unplug_device+0x25/0x30
 [] blk_start_queueing+0x25/0x30
 [] cfq_insert_request+0x36b/0x380
 [] elv_insert+0x130/0x1a0
 [] __elv_add_request+0x68/0xc0
 [] __make_request+0xd3/0x590
 [] generic_make_request+0x198/0x240
 [] bio_alloc_bioset+0xa9/0x120
 [] submit_bio+0x62/0xe0
 [] mpage_bio_submit+0x22/0x30
 [] do_mpage_readpage+0x49d/0x590
 [] __inc_zone_page_state+0x2a/0x30
 [] mpage_readpages+0x88/0x160
 [] reiserfs_get_block+0x0/0x1250
 [] reiserfs_get_block+0x0/0x1250
 [] reiserfs_readpages+0x1a/0x20
 [] __do_page_cache_readahead+0x1af/0x2c0
 [] __alloc_pages+0x6e/0x330
 [] do_page_cache_readahead+0x59/0x80
 [] filemap_nopage+0x239/0x2f0
 [] __handle_mm_fault+0x1d0/0xba0
 [] do_page_fault+0x1dc/0x950
 [] __alloc_pages+0x6e/0x330
 [] vma_prio_tree_insert+0x2d/0x50
 [] vma_link+0xb2/0x140
 [] __vma_link_rb+0x2b/0x30
 [] error_exit+0x0/0x84
 [] __clear_user+0x1a/0x40
 [] clear_user+0x2b/0x40
 [] padzero+0x21/0x30
 [] load_elf_binary+0xbbf/0x1ec0
 [] __alloc_pages+0x6e/0x330
 [] __alloc_pages+0x6e/0x330
 [] alloc_pages_current+0x5a/0x90
 [] copy_strings+0x122/0x220
 [] search_binary_handler+0xaf/0x210
 [] do_execve+0x25f/0x290
 [] strncpy_from_user+0x3a/0x50
 [] sys_execve+0x46/0xb0
 [] kernel_execve+0x64/0xd0
 [] run_init_process+0x1e/0x20
 [] init_post+0x9f/0xf0
 [] kernel_init+0x23f/0x330
 [] child_rip+0xa/0x12
 [] kernel_init+0x0/0x330
 [] child_rip+0x0/0x12


Code: 49 8b 44 24 20 49 8d 4c 24 20 48 89 c2 48 83 e2 fe a8 01 48
RIP  [] blk_rq_map_sg+0x71/0x1b0
 RSP 
CR2: 464b7948
Kernel panic - not syncing: Attempted to kill init!

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-16 Thread Badari Pulavarty

On Tue, 2007-05-15 at 19:50 +0200, Jens Axboe wrote:
> On Tue, May 15 2007, Badari Pulavarty wrote:
> > On Tue, 2007-05-15 at 19:20 +0200, Jens Axboe wrote:
> > > On Tue, May 15 2007, Badari Pulavarty wrote:
> > > > On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> > > > > Hi,
> > > > > 
> > > > > Updated version of the patch - this time I'll just attach the patch
> > > > > file...
> > > > 
> > > > Missing scatterlist.h inclusions..
> > > > 
> > > > drivers/scsi/sym53c8xx_2/sym_glue.c: In function ???sym_scatter???:
> > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: implicit declaration
> > > > of function ???for_each_sg???
> > > > drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected ???;??? before 
> > > > ???{???
> > > > token
> > > > drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused variable 
> > > > ???tp???
> > > > make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1
> > > > 
> > > > 
> > > > drivers/scsi/qla2xxx/qla_iocb.c: In function 
> > > > ???qla24xx_build_scsi_iocbs???:
> > > > drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit declaration of
> > > > function ???for_each_sg???
> > > > drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected ???;??? before 
> > > > ???{???
> > > > token
> > > 
> > > Thanks, will fix those. What arch? I tested it here.
> > 
> > I am playing with them on ppc64.
> 
> Ah ok, you need the updated patch series for ppc64 support. Builds fine
> here on ppc64. See the #sglist branch of the block repo:
> 
> git://git.kernel.dk/data/git/linux-2.6-block.git
> 
> I can mail you an updated patch, if you want.
> 

paniced my amd64 box on boot :(

Unable to handle kernel NULL pointer dereference at 001e
RIP:
 [] blk_rq_map_sg+0x71/0x1b0
PGD 0
Oops:  [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.22-rc1 #2
RIP: 0010:[]  [] blk_rq_map_sg
+0x71/0x1b0
RSP: :810180239330  EFLAGS: 00010287
RAX: 000179d0 RBX: 8101bf204320 RCX: 1000
RDX: 810179c62000 RSI: 8101df507780 RDI: 00179d00


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: select(0, ..) is valid ?

2007-05-15 Thread Badari Pulavarty

On Tue, 2007-05-15 at 10:44 -0700, Andrew Morton wrote:
> On Tue, 15 May 2007 10:29:18 -0700
> Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > Is select(0, ..) is a valid operation ?
> 
> Probably - it becomes an elaborate way of doing a sleep.  Whatever - we
> used to permit it without error, so we should continue to do so.

Okay.

> 
> > I see that there is no check to prevent this or return
> > success early, without doing any work. Do we need one ?
> > 
> > slub code is complaining that we are doing kmalloc(0).
> > 
> > [ cut here ]
> > Badness at include/linux/slub_def.h:88
> > Call Trace:
> > [c001e4eb7640] [c000e650] .show_stack+0x68/0x1b0
> > (unreliable)
> > [c001e4eb76e0] [c029b854] .report_bug+0x94/0xe8
> > [c001e4eb7770] [c00219f0] .program_check_exception
> > +0x12c/0x568
> > [c001e4eb77f0] [c0004a84] program_check_common+0x104/0x180
> > --- Exception: 700 at .get_slab+0x4c/0x234
> > LR = .__kmalloc+0x24/0xc4
> > [c001e4eb7ae0] [c001e4eb7b80] 0xc001e4eb7b80 (unreliable)
> > [c001e4eb7b80] [c00a7ff0] .__kmalloc+0x24/0xc4
> > [c001e4eb7c10] [c00ea720] .compat_core_sys_select+0x90/0x240
> > [c001e4eb7d00] [c00ec3a4] .compat_sys_select+0xb0/0x190
> > [c001e4eb7dc0] [c0014944] .ppc32_select+0x14/0x28
> > [c001e4eb7e30] [c000872c] syscall_exit+0x0/0x40
> >
> 
> I _think_ we can just do
> 
> --- a/fs/compat.c~a
> +++ a/fs/compat.c
> @@ -1566,9 +1566,13 @@ int compat_core_sys_select(int n, compat
>*/
>   ret = -ENOMEM;
>   size = FDS_BYTES(n);
> - bits = kmalloc(6 * size, GFP_KERNEL);
> - if (!bits)
> - goto out_nofds;
> + if (likely(size)) {
> + bits = kmalloc(6 * size, GFP_KERNEL);
> + if (!bits)
> + goto out_nofds;
> + } else {
> + bits = NULL;
> + }
>   fds.in  = (unsigned long *)  bits;
>   fds.out = (unsigned long *) (bits +   size);
>   fds.ex  = (unsigned long *) (bits + 2*size);
> _


Yes. This is what I did earlier, but then I was wondering if I
could skip the whole operation and bail out early (if n == 0). 
I guess not.

> I mean, if that oopses then I'd be very interested in finding out why.
> 
> But I'm starting to suspect that it would be better to permit kmalloc(0) in
> slub.  It depends on how many more of these things need fixing.
> 
> otoh, a kmalloc(0) could be a sign of some buggy/inefficient/weird code, so
> there's some value in forcing us to go look at all the callsites.

So far, I haven't found any other. Lets leave the check.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-15 Thread Badari Pulavarty

On Tue, 2007-05-15 at 19:20 +0200, Jens Axboe wrote:
> On Tue, May 15 2007, Badari Pulavarty wrote:
> > On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> > > Hi,
> > > 
> > > Updated version of the patch - this time I'll just attach the patch
> > > file...
> > 
> > Missing scatterlist.h inclusions..
> > 
> > drivers/scsi/sym53c8xx_2/sym_glue.c: In function ???sym_scatter???:
> > drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: implicit declaration
> > of function ???for_each_sg???
> > drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected ???;??? before 
> > ???{???
> > token
> > drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused variable ???tp???
> > make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1
> > 
> > 
> > drivers/scsi/qla2xxx/qla_iocb.c: In function ???qla24xx_build_scsi_iocbs???:
> > drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit declaration of
> > function ???for_each_sg???
> > drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected ???;??? before ???{???
> > token
> 
> Thanks, will fix those. What arch? I tested it here.

I am playing with them on ppc64.

Thanks,
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

select(0, ..) is valid ?

2007-05-15 Thread Badari Pulavarty

Hi,

Is select(0, ..) is a valid operation ?

I see that there is no check to prevent this or return
success early, without doing any work. Do we need one ?

slub code is complaining that we are doing kmalloc(0).

Thanks,
Badari

[ cut here ]
Badness at include/linux/slub_def.h:88
Call Trace:
[c001e4eb7640] [c000e650] .show_stack+0x68/0x1b0
(unreliable)
[c001e4eb76e0] [c029b854] .report_bug+0x94/0xe8
[c001e4eb7770] [c00219f0] .program_check_exception
+0x12c/0x568
[c001e4eb77f0] [c0004a84] program_check_common+0x104/0x180
--- Exception: 700 at .get_slab+0x4c/0x234
LR = .__kmalloc+0x24/0xc4
[c001e4eb7ae0] [c001e4eb7b80] 0xc001e4eb7b80 (unreliable)
[c001e4eb7b80] [c00a7ff0] .__kmalloc+0x24/0xc4
[c001e4eb7c10] [c00ea720] .compat_core_sys_select+0x90/0x240
[c001e4eb7d00] [c00ec3a4] .compat_sys_select+0xb0/0x190
[c001e4eb7dc0] [c0014944] .ppc32_select+0x14/0x28
[c001e4eb7e30] [c000872c] syscall_exit+0x0/0x40


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Chaining sg lists for big IO commands v5

2007-05-15 Thread Badari Pulavarty

On Fri, 2007-05-11 at 15:51 +0200, Jens Axboe wrote:
> Hi,
> 
> Updated version of the patch - this time I'll just attach the patch
> file...

Missing scatterlist.h inclusions..

drivers/scsi/sym53c8xx_2/sym_glue.c: In function ‘sym_scatter’:
drivers/scsi/sym53c8xx_2/sym_glue.c:385: warning: implicit declaration
of function ‘for_each_sg’
drivers/scsi/sym53c8xx_2/sym_glue.c:385: error: expected ‘;’ before ‘{’
token
drivers/scsi/sym53c8xx_2/sym_glue.c:375: warning: unused variable ‘tp’
make[3]: *** [drivers/scsi/sym53c8xx_2/sym_glue.o] Error 1


drivers/scsi/qla2xxx/qla_iocb.c: In function ‘qla24xx_build_scsi_iocbs’:
drivers/scsi/qla2xxx/qla_iocb.c:678: warning: implicit declaration of
function ‘for_each_sg’
drivers/scsi/qla2xxx/qla_iocb.c:678: error: expected ‘;’ before ‘{’
token



Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 5/14] Introduce union stack

2007-05-14 Thread Badari Pulavarty

On Mon, 2007-05-14 at 15:10 +0530, Bharata B Rao wrote:
> From: Jan Blunck <[EMAIL PROTECTED]>
> Subject: Introduce union stack.
> 
> Adds union stack infrastructure to the dentry structure and provides
> locking routines to walk the union stack.
...

> --- /dev/null
> +++ b/include/linux/dcache_union.h
> @@ -0,0 +1,248 @@
> +/*
> + * VFS based union mount for Linux
> + *
> + * Copyright © 2004-2007 IBM Corporation
> + *   Author(s): Jan Blunck ([EMAIL PROTECTED])
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the Free
> + * Software Foundation; either version 2 of the License, or (at your option)
> + * any later version.
> + *
> + */
> +#ifndef __LINUX_DCACHE_UNION_H
> +#define __LINUX_DCACHE_UNION_H
> +#ifdef __KERNEL__
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_UNION_MOUNT
> +
> +/*
> + * This is the union info object, that describes general information about 
> this
> + * union directory
> + *
> + * u_mutex protects the union stack against modification. You can reach it
> + * through the d_union field in struct dentry. Hold it when you are walking
> + * or modifing the union stack !
> + */
> +struct union_info {
> + atomic_t u_count;
> + struct mutex u_mutex;
> +};
> +
> +/* allocate/de-allocate */
> +extern struct union_info *union_alloc(void);
> +extern struct union_info *union_get(struct union_info *);
> +extern void union_put(struct union_info *);
> +
> +/*
> + * These are the functions for locking a dentry's union. When one
> + * want to acquire a denties union lock, use:
> + *
> + * - union_lock() when you can sleep,
> + * - union_lock_spinlock() when you are holding a spinlock (that
> + *   you CAN savely give up and reacquire again)
> + * - union_lock_readlock() when you are holding a readlock (that
> + *   you CAN savely give up and reacquire again)
> + *
> + * Otherwise get the union lock early before you enter your
> + * "no sleeping here" code.
> + *
> + * NOTES: union_info structure is reference counted using u_count member.
> + * union_get() and union_put() which get and put references on union_info
> + * should be done under union_info's u_mutex. Since the last union_put() 
> frees
> + * the union_info structure itself it can't obviously be done under u_mutex.
> + * union_release() should be used in such cases (Eg. dput(), umount()) where
> + * union_info is disassociated from the dentries, and it becomes safe
> + * to free the union_info.
> + */
> +static inline void __union_lock(struct union_info *uinfo)
> +{
> + BUG_ON(!atomic_read(&uinfo->u_count));
> + mutex_lock(&uinfo->u_mutex);
> +}
> +
> +static inline void union_lock(struct dentry *dentry)
> +{
> + if (unlikely(dentry && dentry->d_union)) {
> + struct union_info *ui = dentry->d_union;
> +
> + UM_DEBUG_LOCK("\"%s\" locking %p (count=%d)\n",
> +   dentry->d_name.name, ui,
> +   atomic_read(&ui->u_count));
> + __union_lock(dentry->d_union);
> + }
> +}
> +
> +static inline void __union_unlock(struct union_info *uinfo)
> +{
> + BUG_ON(!atomic_read(&uinfo->u_count));
> + mutex_unlock(&uinfo->u_mutex);
> +}
> +
> +static inline void union_unlock(struct dentry *dentry)
> +{
> + if (unlikely(dentry && dentry->d_union)) {
> + struct union_info *ui = dentry->d_union;
> +
> + UM_DEBUG_LOCK("\"%s\" unlocking %p (count=%d)\n",
> +   dentry->d_name.name, ui,
> +   atomic_read(&ui->u_count));
> + __union_unlock(dentry->d_union);
> + }
> +}
> +
> +static inline void union_alloc_dentry(struct dentry *dentry)
> +{
> + spin_lock(&dentry->d_lock);
> + if (!dentry->d_union) {
> + dentry->d_union = union_alloc();
> + spin_unlock(&dentry->d_lock);
> + } else {
> + spin_unlock(&dentry->d_lock);
> + union_lock(dentry);
> + }
> +}
> +
> +static inline struct union_info *union_lock_and_get(struct dentry *dentry)
> +{
> + union_lock(dentry);
> + return union_get(dentry->d_union);
> +}
> +
> +/* Shouldn't be called with last reference to union_info */
> +static inline void union_put_and_unlock(struct union_info *uinfo)
> +{
> + union_put(uinfo);
> + __union_unlock(&uinfo->u_mutex);
   ^^^

It should be

__union_unlock(uinfo);

Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 5/14] Introduce union stack

2007-05-14 Thread Badari Pulavarty

On Mon, 2007-05-14 at 15:10 +0530, Bharata B Rao wrote:
> From: Jan Blunck <[EMAIL PROTECTED]>
> Subject: Introduce union stack.
> 
> Adds union stack infrastructure to the dentry structure and provides
> locking routines to walk the union stack.
> 
> Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
> Signed-off-by: Bharata B Rao <[EMAIL PROTECTED]>
...

> +/*
> + * This is a *I can't get no sleep* helper which is called when we try
> + * to access the struct fs_struct *fs field of a struct task_struct.
> + *
> + * Yes, this is possibly starving but we have to change root, altroot
> + * or pwd in the frequency of this while loop. Don't think that this
> + * happens really often ;)
> + *
> + * This is called while holding the rwlock_t fs->lock
> + *
> + * TODO: Unlocking side of union_lock_fs() needs 3 union_unlock()s.
> + * May be introduce union_unlock_fs().
> + *
> + * FIXME: This routine is used when the caller wants to dget one or
> + * more of fs->[root, altroot, pwd]. When the caller doesn't want to
> + * dget _all_ of these, it is strictly not necessary to get union_locks
> + * on all of these. Check.
> + */
> +static inline void union_lock_fs(struct fs_struct *fs)
> +{
> + int locked;
> +
> + while (fs) {
> + locked = union_trylock(fs->root);
> + if (!locked)
> + goto loop1;
> + locked = union_trylock(fs->altroot);
> + if (!locked)
> + goto loop2;
> + locked = union_trylock(fs->pwd);
> + if (!locked)
> + goto loop3;
> + break;
> + loop3:
> + union_unlock(fs->altroot);
> + loop2:
> + union_unlock(fs->root);
> + loop1:
> + read_unlock(&fs->lock);
> + UM_DEBUG_LOCK("Failed to get all semaphores in fs_struct!\n");
> + cpu_relax();
> + read_lock(&fs->lock);
> + continue;

Nit.. why "continue" ?

> + }
> + BUG_ON(!fs);

Whats the use of BUG_ON() here ? Top of the function would be more
useful.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 13/14] ext3 whiteout support

2007-05-14 Thread Badari Pulavarty

On Mon, 2007-05-14 at 15:14 +0530, Bharata B Rao wrote:
> From: Bharata B Rao <[EMAIL PROTECTED]>
> Subject: ext3 whiteout support
> 
> Introduce whiteout support for ext3.
> 
> Signed-off-by: Bharata B Rao <[EMAIL PROTECTED]>
> Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
> ---
>  fs/ext3/dir.c   |2 -
>  fs/ext3/namei.c |   62 
> 
>  fs/ext3/super.c |   11 +++-
>  include/linux/ext3_fs.h |5 +++
>  4 files changed, 72 insertions(+), 8 deletions(-)
> 
> --- a/fs/ext3/dir.c
> +++ b/fs/ext3/dir.c
> @@ -29,7 +29,7 @@
>  #include 
>  
>  static unsigned char ext3_filetype_table[] = {
> - DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
> + DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK, 
> DT_WHT
>  };
>  
>  static int ext3_readdir(struct file *, void *, filldir_t);
> --- a/fs/ext3/namei.c
> +++ b/fs/ext3/namei.c
> @@ -1071,6 +1071,7 @@ static unsigned char ext3_type_by_mode[S
>   [S_IFIFO >> S_SHIFT]= EXT3_FT_FIFO,
>   [S_IFSOCK >> S_SHIFT]   = EXT3_FT_SOCK,
>   [S_IFLNK >> S_SHIFT]= EXT3_FT_SYMLINK,
> + [S_IFWHT >> S_SHIFT]= EXT3_FT_WHT,
>  };
>  
>  static inline void ext3_set_de_type(struct super_block *sb,
> @@ -1786,7 +1787,7 @@ out_stop:
>  /*
>   * routine to check that the specified directory is empty (for rmdir)
>   */
> -static int empty_dir (struct inode * inode)
> +static int empty_dir (handle_t *handle, struct inode * inode)

Is there a reason for passing the handle ? Why couldn't you get it from
journal_current_handle() if needed to do the delete the whiteout ?

>  {
>   unsigned long offset;
>   struct buffer_head * bh;
> @@ -1848,8 +1849,28 @@ static int empty_dir (struct inode * ino
>   continue;
>   }
>   if (le32_to_cpu(de->inode)) {
> - brelse (bh);
> - return 0;
> + /* If this is a whiteout, remove it */
> + if (de->file_type == EXT3_FT_WHT) {
> + unsigned long ino = le32_to_cpu(de->inode);
> + struct inode *tmp_inode = iget(inode->i_sb, 
> ino);
> + if (!tmp_inode) {
> + brelse (bh);
> + return 0;
> + }
> +
> + if (ext3_delete_entry(handle, inode, de, bh)) {
> + iput(tmp_inode);
> + brelse (bh);
> + return 0;
> + }
> +
> + tmp_inode->i_ctime = inode->i_ctime;
> + tmp_inode->i_nlink--;
> + iput(tmp_inode);
> + } else {
> + brelse (bh);
> + return 0;
> + }
>   }
>   offset += le16_to_cpu(de->rec_len);
>   de = (struct ext3_dir_entry_2 *)
> @@ -2031,7 +2052,7 @@ static int ext3_rmdir (struct inode * di
>   goto end_rmdir;
>  
>   retval = -ENOTEMPTY;
> - if (!empty_dir (inode))
> + if (!empty_dir (handle, inode))
>   goto end_rmdir;
>  
>   retval = ext3_delete_entry(handle, dir, de, bh);
> @@ -2060,6 +2081,36 @@ end_rmdir:
>   return retval;
>  }
>  
> +static int ext3_whiteout(struct inode *dir, struct dentry *dentry)
> +{
> + struct inode * inode;
> + int err, retries = 0;
> + handle_t *handle;
> +
> +retry:
> + handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
> + EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3 +
> + 2*EXT3_QUOTA_INIT_BLOCKS(dir->i_sb));
> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> +
> + if (IS_DIRSYNC(dir))
> + handle->h_sync = 1;
> +
> + inode = ext3_new_inode (handle, dir, S_IFWHT | S_IRUGO);
> + err = PTR_ERR(inode);
> + if (IS_ERR(inode))
> + goto out_stop;

Don't you need to call init_special_inode() here ?
Or this is handled somewhere else ?

> +
> + err = ext3_add_nondir(handle, dentry, inode);
> +
> +out_stop:
> + ext3_journal_stop(handle);
> + if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
> + goto retry;
> + return err;
> +}
> +

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-25 Thread Badari Pulavarty

On Tue, 2007-04-24 at 15:21 -0700, [EMAIL PROTECTED] wrote:
> V2->V3

Hmm.. It broke ext2 :(

V2 worked fine with the small fix I sent you earlier.
But on V3, I can't run fsx. I see random data showing up.
I will debug, when I get a chance.

Thanks,
Badari

READ BAD DATA: offset = 0x5092a4, size = 0x5093a0, fname = (null)
OFFSET  GOODBAD RANGE
0x77f466d0  0x77e1f7c0  0x  0x  6ef
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  6f1
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  6f1
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  6f3
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  6f3
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  6f5
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x776f6e6b  0x6b636568  0x  0x  84f
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  851
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  851
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  853
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  853
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  855
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x776f6e6b  0x6b636568  0x  0x  857
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  859
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  859
operation# (mod 256) for the bad data may be 5284080
0x2820236e  0x77e1f7c0  0x  0x  85b
operation# (mod 256) for the bad data may be 5284080
LOG DUMP (49149 total operations):
0(2012505808 mod 256): WRITE0x77f466d0 thru 0x77e1f7c0  (0x0
bytes) HOL***
0(2012505808 mod 256): WRITE0x77f466d0 thru 0x77e1f7c0  (0x0
bytes)
0(2012505808 mod 256): READ 0x77f466d0 thru 0x77e1f7c0  (0x0
bytes)
0(2012505808 mod 256): READ 0x77f466d0 thru 0x77e1f7c0  (0x0
bytes)******


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 197 matches

Mail list logo