Re: [ewg] Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-16 Thread Alexander Schmidt
On Tue, 16 Jun 2009 09:10:39 -0700
Roland Dreier  wrote:

> 
>  > Yeah, the notifier code remains untouched as we still do not allow dynamic
>  > memory operations _while_ our module is loaded. The patch allows the 
> driver to
>  > cope with DMEM operations that happened before the module was loaded, which
>  > might result in a non-contiguous memory layout. When the driver registers
>  > its global memory region in the system, the memory layout must be 
> considered.
>  > 
>  > We chose the term "toleration" instead of "support" to illustrate this.
> 
> I see.  So things just silently broke in some cases when the driver was
> loaded after operations you didn't tolerate?
> 
> Anyway, thanks for the explanation.

Well, things did not break silently. The registration of the MR failed with
an error code which was reported to userspace.

Will you push the patch for .31 or .32?

Thanks,
Alex
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [ewg] Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-16 Thread Roland Dreier

 > Yeah, the notifier code remains untouched as we still do not allow dynamic
 > memory operations _while_ our module is loaded. The patch allows the driver 
 > to
 > cope with DMEM operations that happened before the module was loaded, which
 > might result in a non-contiguous memory layout. When the driver registers
 > its global memory region in the system, the memory layout must be considered.
 > 
 > We chose the term "toleration" instead of "support" to illustrate this.

I see.  So things just silently broke in some cases when the driver was
loaded after operations you didn't tolerate?

Anyway, thanks for the explanation.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-16 Thread Alexander Schmidt
Hi Roland,

thank you for taking a look at the code!

On Fri, 12 Jun 2009 21:50:58 -0700
Roland Dreier  wrote:

> OK, one major issue with this patch and a few minor nits.
> 
> First, the major issue is that I don't see anything in the patch that
> changes the code in ehca_mem_notifier() in ehca_main.c:
> 
>   case MEM_GOING_ONLINE:
>   case MEM_GOING_OFFLINE:
>   /* only ok if no hca is attached to the lpar */
>   spin_lock_irqsave(&shca_list_lock, flags);
>   if (list_empty(&shca_list)) {
>   spin_unlock_irqrestore(&shca_list_lock, flags);
>   return NOTIFY_OK;
>   } else {
>   spin_unlock_irqrestore(&shca_list_lock, flags);
>   if (printk_timed_ratelimit(&ehca_dmem_warn_time,
>  30 * 1000))
>   ehca_gen_err("DMEM operations are not allowed"
>"in conjunction with eHCA");
>   return NOTIFY_BAD;
>   }
> 
> But your patch description says:
> 
>  > This patch implements toleration of dynamic memory operations
> 
> But it seems you're still going to hit the same NOTIFY_BAD case above
> after your patch.  So something doesn't compute for me.  Could you
> explain more?

Yeah, the notifier code remains untouched as we still do not allow dynamic
memory operations _while_ our module is loaded. The patch allows the driver to
cope with DMEM operations that happened before the module was loaded, which
might result in a non-contiguous memory layout. When the driver registers
its global memory region in the system, the memory layout must be considered.

We chose the term "toleration" instead of "support" to illustrate this.

I'll put some more details into the changelog, incorporate the other comments
and send out a second version of the patch.

Thanks,
Alex

> 
> Second, a nit:
> 
>  > +#define EHCA_REG_MR 0
>  > +#define EHCA_REG_BUSMAP_MR (~0)
> 
> and you pass these as the reg_busmap parm in:
> 
>  >  int ehca_reg_mr(struct ehca_shca *shca,
>  >struct ehca_mr *e_mr,
>  >u64 *iova_start,
>  > @@ -991,7 +1031,8 @@
>  >struct ehca_pd *e_pd,
>  >struct ehca_mr_pginfo *pginfo,
>  >u32 *lkey, /*OUT*/
>  > -  u32 *rkey) /*OUT*/
>  > +  u32 *rkey, /*OUT*/
>  > +  int reg_busmap)
> 
> and test it as:
> 
>  > +  if (reg_busmap)
>  > +  ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
>  > +  else
>  > +  ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
> 
> So the ~0 for true looks a bit odd.  One option would be to make
> reg_busmap a bool, since that's how you're using it, but then you lose
> the nice self-documenting macro where you call things.
> 
> So I think it would be cleaner to do something like
> 
> enum ehca_reg_type {
>   EHCA_REG_MR,
>   EHCA_REG_BUSMAP_MR
> };
> 
> and make the "int reg_busmap" parameter into "enum ehca_reg_type reg_type"
> and have the code become
> 
> + if (reg_type == EHCA_REG_BUSMAP_MR)
> + ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
> + else if (reg_type == EHCA_REG_MR)
> + ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
> + else
> + ret = -EINVAL
> 
> or something like that.
> 
>  > +struct ib_dma_mapping_ops ehca_dma_mapping_ops = {
>  > +  .mapping_error = ehca_dma_mapping_error,
>  > +  .map_single = ehca_dma_map_single,
>  > +  .unmap_single = ehca_dma_unmap_single,
>  > +  .map_page = ehca_dma_map_page,
>  > +  .unmap_page = ehca_dma_unmap_page,
>  > +  .map_sg = ehca_dma_map_sg,
>  > +  .unmap_sg = ehca_dma_unmap_sg,
>  > +  .dma_address = ehca_dma_address,
>  > +  .dma_len = ehca_dma_len,
>  > +  .sync_single_for_cpu = ehca_dma_sync_single_for_cpu,
>  > +  .sync_single_for_device = ehca_dma_sync_single_for_device,
>  > +  .alloc_coherent = ehca_dma_alloc_coherent,
>  > +  .free_coherent = ehca_dma_free_coherent,
>  > +};
> 
> I always think structures like this are easier to read if you align the
> '=' signs.  But no big deal.
> 
>  > +  ret = ehca_create_busmap();
>  > +  if (ret) {
>  > +  ehca_gen_err("Cannot create busmap.");
>  > +  goto module_init2;
>  > +  }
>  > +
>  >ret = ibmebus_register_driver(&ehca_driver);
>  >if (ret) {
>  >ehca_gen_err("Cannot register eHCA device driver");
>  >ret = -EINVAL;
>  > -  goto module_init2;
>  > +  goto module_init3;
>  >}
>  >  
>  >ret = register_memory_notifier(&ehca_mem_nb);
>  >if (ret) {
>  >ehca_gen_err("Failed registering memory add/remove notifier");
>  > -  goto module_init3;
>  > +  goto module_init4;
> 
> Having to renumber unrelated things is when something changes is why I
> don't like this style of error path labels.  But I think it's well and
> truly too late to fix that in e

Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-12 Thread Roland Dreier
OK, one major issue with this patch and a few minor nits.

First, the major issue is that I don't see anything in the patch that
changes the code in ehca_mem_notifier() in ehca_main.c:

case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
/* only ok if no hca is attached to the lpar */
spin_lock_irqsave(&shca_list_lock, flags);
if (list_empty(&shca_list)) {
spin_unlock_irqrestore(&shca_list_lock, flags);
return NOTIFY_OK;
} else {
spin_unlock_irqrestore(&shca_list_lock, flags);
if (printk_timed_ratelimit(&ehca_dmem_warn_time,
   30 * 1000))
ehca_gen_err("DMEM operations are not allowed"
 "in conjunction with eHCA");
return NOTIFY_BAD;
}

But your patch description says:

 > This patch implements toleration of dynamic memory operations

But it seems you're still going to hit the same NOTIFY_BAD case above
after your patch.  So something doesn't compute for me.  Could you
explain more?

Second, a nit:

 > +#define EHCA_REG_MR 0
 > +#define EHCA_REG_BUSMAP_MR (~0)

and you pass these as the reg_busmap parm in:

 >  int ehca_reg_mr(struct ehca_shca *shca,
 >  struct ehca_mr *e_mr,
 >  u64 *iova_start,
 > @@ -991,7 +1031,8 @@
 >  struct ehca_pd *e_pd,
 >  struct ehca_mr_pginfo *pginfo,
 >  u32 *lkey, /*OUT*/
 > -u32 *rkey) /*OUT*/
 > +u32 *rkey, /*OUT*/
 > +int reg_busmap)

and test it as:

 > +if (reg_busmap)
 > +ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
 > +else
 > +ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);

So the ~0 for true looks a bit odd.  One option would be to make
reg_busmap a bool, since that's how you're using it, but then you lose
the nice self-documenting macro where you call things.

So I think it would be cleaner to do something like

enum ehca_reg_type {
EHCA_REG_MR,
EHCA_REG_BUSMAP_MR
};

and make the "int reg_busmap" parameter into "enum ehca_reg_type reg_type"
and have the code become

+   if (reg_type == EHCA_REG_BUSMAP_MR)
+   ret = ehca_reg_bmap_mr_rpages(shca, e_mr, pginfo);
+   else if (reg_type == EHCA_REG_MR)
+   ret = ehca_reg_mr_rpages(shca, e_mr, pginfo);
+   else
+   ret = -EINVAL

or something like that.

 > +struct ib_dma_mapping_ops ehca_dma_mapping_ops = {
 > +.mapping_error = ehca_dma_mapping_error,
 > +.map_single = ehca_dma_map_single,
 > +.unmap_single = ehca_dma_unmap_single,
 > +.map_page = ehca_dma_map_page,
 > +.unmap_page = ehca_dma_unmap_page,
 > +.map_sg = ehca_dma_map_sg,
 > +.unmap_sg = ehca_dma_unmap_sg,
 > +.dma_address = ehca_dma_address,
 > +.dma_len = ehca_dma_len,
 > +.sync_single_for_cpu = ehca_dma_sync_single_for_cpu,
 > +.sync_single_for_device = ehca_dma_sync_single_for_device,
 > +.alloc_coherent = ehca_dma_alloc_coherent,
 > +.free_coherent = ehca_dma_free_coherent,
 > +};

I always think structures like this are easier to read if you align the
'=' signs.  But no big deal.

 > +ret = ehca_create_busmap();
 > +if (ret) {
 > +ehca_gen_err("Cannot create busmap.");
 > +goto module_init2;
 > +}
 > +
 >  ret = ibmebus_register_driver(&ehca_driver);
 >  if (ret) {
 >  ehca_gen_err("Cannot register eHCA device driver");
 >  ret = -EINVAL;
 > -goto module_init2;
 > +goto module_init3;
 >  }
 >  
 >  ret = register_memory_notifier(&ehca_mem_nb);
 >  if (ret) {
 >  ehca_gen_err("Failed registering memory add/remove notifier");
 > -goto module_init3;
 > +goto module_init4;

Having to renumber unrelated things is when something changes is why I
don't like this style of error path labels.  But I think it's well and
truly too late to fix that in ehca.

 - R.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-10 Thread Hannes Hering
Hi Michael,

On Wednesday 10 June 2009 02:02:36 Michael Ellerman wrote:
> For those of us who haven't read the HEA spec lately, can you give us
> some more detail on that? :)
first of all, please note that this patch is actually for the ehca infiniband
driver. The ehca driver uses an internal memory region, which is supposed to
contain all physical memory. A memory region maps a virtually contiguous
adapter address space to the physical or better absolute address space. The
limitation is that the memory region cannot map non-contiguous virtual adapter
address space. However, on ppc64 machines there is a feature to dynamically add
or remove memory to logical partitions during runtime. These operations scatter
the absolute memory so that the kernel memory has a non-contiguous layout. This
layout cannot be represented in a memory region. The purpose of this code is to
detect the memory layout so that the memory region can be made up of the
existing memory chunks. It also translates the kernel addresses to the memory
region address, which is needed for interaction with the HCA.
> How does it interact with kexec/kdump?
We never tested the ehca driver with kexec/kdump. This patch also doesn't
improve anything in this context.

> phys_to_abs() ? As below, or does it come from somewhere else?
You're right, actually that isn't needed on System p. On the other hand I
needed to choose an address type, which is the base of all mapping. The
"busmap" holds all addresses as absolute addresses. The absolute addresses can
then be converted in any other type (virt, phys). 

Regards

Hannes

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-09 Thread Michael Ellerman
On Tue, 2009-06-09 at 15:59 +0200, Hannes Hering wrote:
> This patch implements toleration of dynamic memory operations and 16 GB
> gigantic pages. On module load the driver walks through available system
> memory, checks for available memory ranges and then registers the kernel
> internal memory region accordingly. The translation of address ranges is
> implemented via a 3-level busmap.

Hi Hannes,

For those of us who haven't read the HEA spec lately, can you give us
some more detail on that? :)

How does it interact with kexec/kdump?

> +static int ehca_update_busmap(unsigned long pfn, unsigned long nr_pages)
> +{
> + unsigned long i, start_section, end_section;
> + int top, dir, idx;
> +
> + if (!nr_pages)
> + return 0;
> +
> + if (!ehca_bmap) {
> + ehca_bmap = kmalloc(sizeof(struct ehca_bmap), GFP_KERNEL);
> + if (!ehca_bmap)
> + return -ENOMEM;
> + /* Set map block to 0xFF according to EHCA_INVAL_ADDR */
> + memset(ehca_bmap, 0xFF, EHCA_TOP_MAP_SIZE);
> + }
> +
> + start_section = phys_to_abs(pfn * PAGE_SIZE) / EHCA_SECTSIZE;
> + end_section = phys_to_abs((pfn + nr_pages) * PAGE_SIZE) / EHCA_SECTSIZE;


phys_to_abs() ? As below, or does it come from somewhere else?

 arch/powerpc/include/asm/abs_addr.h:
 47 static inline unsigned long phys_to_abs(unsigned long pa)   
 48 {
 49 unsigned long chunk;
 50 
 51 /* This is a no-op on non-iSeries */
 52 if (!firmware_has_feature(FW_FEATURE_ISERIES))
 53 return pa;
 54 
 55 chunk = addr_to_chunk(pa);
 56 
 57 if (chunk < mschunks_map.num_chunks)
 58 chunk = mschunks_map.mapping[chunk];
 59 
 60 return chunk_to_addr(chunk) + (pa & MSCHUNKS_OFFSET_MASK);
 61 }


cheers


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2.6.31] ehca: Tolerate dynamic memory operations and huge pages

2009-06-09 Thread Hannes Hering
This patch implements toleration of dynamic memory operations and 16 GB
gigantic pages. On module load the driver walks through available system
memory, checks for available memory ranges and then registers the kernel
internal memory region accordingly. The translation of address ranges is
implemented via a 3-level busmap.

Signed-off-by: Hannes Hering 

---
This patch is built and tested against infiniband.git. Please apply for 2.6.31.

Regards

Hannes

Index: infiniband/drivers/infiniband/hw/ehca/ehca_main.c
===
--- infiniband.orig/drivers/infiniband/hw/ehca/ehca_main.c  2009-06-09 
14:20:37.0 +0200
+++ infiniband/drivers/infiniband/hw/ehca/ehca_main.c   2009-06-09 
14:20:47.0 +0200
@@ -52,7 +52,7 @@
 #include "ehca_tools.h"
 #include "hcp_if.h"
 
-#define HCAD_VERSION "0026"
+#define HCAD_VERSION "0027"
 
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch ");
@@ -506,6 +506,7 @@
shca->ib_device.detach_mcast= ehca_detach_mcast;
shca->ib_device.process_mad = ehca_process_mad;
shca->ib_device.mmap= ehca_mmap;
+   shca->ib_device.dma_ops = &ehca_dma_mapping_ops;
 
if (EHCA_BMASK_GET(HCA_CAP_SRQ, shca->hca_cap)) {
shca->ib_device.uverbs_cmd_mask |=
@@ -1028,17 +1029,23 @@
goto module_init1;
}
 
+   ret = ehca_create_busmap();
+   if (ret) {
+   ehca_gen_err("Cannot create busmap.");
+   goto module_init2;
+   }
+
ret = ibmebus_register_driver(&ehca_driver);
if (ret) {
ehca_gen_err("Cannot register eHCA device driver");
ret = -EINVAL;
-   goto module_init2;
+   goto module_init3;
}
 
ret = register_memory_notifier(&ehca_mem_nb);
if (ret) {
ehca_gen_err("Failed registering memory add/remove notifier");
-   goto module_init3;
+   goto module_init4;
}
 
if (ehca_poll_all_eqs != 1) {
@@ -1053,9 +1060,12 @@
 
return 0;
 
-module_init3:
+module_init4:
ibmebus_unregister_driver(&ehca_driver);
 
+module_init3:
+   ehca_destroy_busmap();
+
 module_init2:
ehca_destroy_slab_caches();
 
@@ -1073,6 +1083,8 @@
 
unregister_memory_notifier(&ehca_mem_nb);
 
+   ehca_destroy_busmap();
+
ehca_destroy_slab_caches();
 
ehca_destroy_comp_pool();
Index: infiniband/drivers/infiniband/hw/ehca/ehca_mrmw.c
===
--- infiniband.orig/drivers/infiniband/hw/ehca/ehca_mrmw.c  2009-06-09 
14:20:37.0 +0200
+++ infiniband/drivers/infiniband/hw/ehca/ehca_mrmw.c   2009-06-09 
14:20:47.0 +0200
@@ -53,6 +53,39 @@
 /* max number of rpages (per hcall register_rpages) */
 #define MAX_RPAGES 512
 
+/* DMEM toleration management */
+#define EHCA_SECTSHIFTSECTION_SIZE_BITS
+#define EHCA_SECTSIZE  (1UL << EHCA_SECTSHIFT)
+#define EHCA_HUGEPAGESHIFT 34
+#define EHCA_HUGEPAGE_SIZE (1UL << EHCA_HUGEPAGESHIFT)
+#define EHCA_HUGEPAGE_PFN_MASK ((EHCA_HUGEPAGE_SIZE - 1) >> PAGE_SHIFT)
+#define EHCA_INVAL_ADDR0xULL
+#define EHCA_DIR_INDEX_SHIFT 13   /* 8k Entries in 64k block */
+#define EHCA_TOP_INDEX_SHIFT (EHCA_DIR_INDEX_SHIFT * 2)
+#define EHCA_MAP_ENTRIES (1 << EHCA_DIR_INDEX_SHIFT)
+#define EHCA_TOP_MAP_SIZE (0x1)   /* currently fixed map size 
*/
+#define EHCA_DIR_MAP_SIZE (0x1)
+#define EHCA_ENT_MAP_SIZE (0x1)
+#define EHCA_INDEX_MASK (EHCA_MAP_ENTRIES - 1)
+#define EHCA_REG_MR 0
+#define EHCA_REG_BUSMAP_MR (~0)
+
+static unsigned long ehca_mr_len;
+/*
+ * Memory map data structures
+ */
+struct ehca_dir_bmap {
+   u64 ent[EHCA_MAP_ENTRIES];
+};
+struct ehca_top_bmap {
+   struct ehca_dir_bmap *dir[EHCA_MAP_ENTRIES];
+};
+struct ehca_bmap {
+   struct ehca_top_bmap *top[EHCA_MAP_ENTRIES];
+};
+
+static struct ehca_bmap *ehca_bmap;
+
 static struct kmem_cache *mr_cache;
 static struct kmem_cache *mw_cache;
 
@@ -68,6 +101,8 @@
 #define EHCA_MR_PGSHIFT1M  20
 #define EHCA_MR_PGSHIFT16M 24
 
+static u64 ehca_map_vaddr(void *caddr);
+
 static u32 ehca_encode_hwpage_size(u32 pgsize)
 {
int log = ilog2(pgsize);
@@ -135,7 +170,8 @@
goto get_dma_mr_exit0;
}
 
-   ret = ehca_reg_maxmr(shca, e_maxmr, (u64 *)KERNELBASE,
+   ret = ehca_reg_maxmr(shca, e_maxmr,
+(void *)ehca_map_vaddr((void *)KERNELBASE),
 mr_access_flags, e_pd,
 &e_maxmr->ib.ib_mr.lkey,
 &e_maxmr->ib.ib_mr.rkey);
@@ -251,7 +287,7 @@
 
ret = ehca_reg_mr(shca, e_mr, iova_start, size, mr_access_flags,
  e_pd,