Re: [RFC PATCH 4/8 v2] IB/odp/hmm: prepare for HMM code path.

2015-08-13 Thread Jerome Glisse
On Thu, Aug 13, 2015 at 02:13:35PM -0600, Jason Gunthorpe wrote:
> On Thu, Aug 13, 2015 at 03:20:49PM -0400, Jérôme Glisse wrote:
>   
> > +#if IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING_HMM)
> > +#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING_HMM */
> 
> Yuk, what is wrong with
> 
>  #if !IS_ENABLED(...)
> 
> ?

Just that latter patches add code btw #if and #else, and that
originaly it was a bigger patch that added the #if code #else
at the same time. Hence why this patch looks like this.

> 
> > -#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> > +#if IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)
> > +#if IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING_HMM)
> > +#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING_HMM */
> 
> Double yuk
> 
>  #if !(IS_ENABLED(..) && IS_ENABLED(..))
> 
> ?

Same reason as above.


> And the #ifdefs suck, as many as possible should be normal if
> statements, and one should think carefully if we really need to remove
> fields from structures..

My patch only add #if, i am not responsible for previous code
that used #ifdef, i was told to convert to #if and that's what
i am doing.

Regarding fields, yes this is intentional, ODP is an infrastructure
that is private to infiniband and thus needs more fields inside ib
struct. While HMM is intended to be a common infrastructure not only
for ib device but for other kind of devices too.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] IB/mlx5/hmm: add mlx5 HMM device initialization and callback v3.

2015-07-21 Thread Jerome Glisse
> On 17/07/2015 22:01, Jérôme Glisse wrote:
> > @@ -151,10 +151,11 @@ int ib_umem_odp_get(struct ib_ucontext *context,
> > struct ib_umem *umem)
> > context->ib_mirror = ib_mirror_ref(ib_mirror);
> > }
> > mutex_unlock(&ib_device->hmm_mutex);
> > -   umem->odp_data.ib_mirror = ib_mirror;
> > +   umem->odp_data->ib_mirror = ib_mirror;
> >  
> > down_write(&ib_mirror->umem_rwsem);
> > -   rbt_ib_umem_insert(&umem->odp_data->interval_tree, &mirror->umem_tree);
> > +   rbt_ib_umem_insert(&umem->odp_data->interval_tree,
> > +  &ib_mirror->umem_tree);
> > up_write(&ib_mirror->umem_rwsem);
> >  
> > mmput(mm);
> > @@ -163,7 +164,7 @@ int ib_umem_odp_get(struct ib_ucontext *context, struct
> > ib_umem *umem)
> >  
> >  void ib_umem_odp_release(struct ib_umem *umem)
> >  {
> > -   struct ib_mirror *ib_mirror = umem->odp_data;
> > +   struct ib_mirror *ib_mirror = umem->odp_data->ib_mirror;
> >  
> > /*
> >  * Ensure that no more pages are mapped in the umem.
> 
> It doesn't look like this code would have compiled before this patch,
> and as far as I can see the previous patch removed the #error line.
> Could you make sure all of the patches build correctly? You could use
> tools/testing/ktest for instance.
> 

This is a rebase error. But the #error is there for a purpose the HMM
would not work mid way so if anyone if bisecting up through that i would
rather error out at compilation.

Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3.

2015-06-15 Thread Jerome Glisse
On Thu, Jun 11, 2015 at 03:26:46PM -0700, Mark Hairgrove wrote:
> On Thu, 11 Jun 2015, Jerome Glisse wrote:
> > On Wed, Jun 10, 2015 at 06:15:08PM -0700, Mark Hairgrove wrote:

[...]
> > Ok i see the race you are afraid of and really it is an unlikely one
> > __mutex_unlock_common_slowpath() take a spinlock right after allowing
> > other to take the mutex, when we are in your scenario there is no
> > contention on that spinlock so it is taken right away and as there
> > is no one in the mutex wait list then it goes directly to unlock the
> > spinlock and return. You can ignore the debug function as if debugging
> > is enabled than the mutex_lock() would need to also take the spinlock
> > and thus you would have proper synchronization btw 2 thread thanks to
> > the mutex.wait_lock.
> > 
> > So basicly while CPU1 is going :
> > spin_lock(mutex.wait_lock)
> > if (!list_empty(mutex.wait_list)) {
> >   // wait_list is empty so branch not taken
> > }
> > spin_unlock(mutex.wait_lock)
> > 
> > CPU2 would have to test the mirror list and mutex_unlock and return
> > before the spin_unlock() of CPU1. This is a tight race, i can add a
> > synchronize_rcu() to device_unregister after the mutex_unlock() so
> > that we also add a grace period before the device is potentialy freed
> > which should make that race completely unlikely.
> > 
> > Moreover for something really bad to happen it would need that the
> > freed memory to be reallocated right away by some other thread. Which
> > really sound unlikely unless CPU1 is the slowest of all :)
> > 
> > Cheers,
> > Jérôme
> > 
> 
> But CPU1 could get preempted between the atomic_set and the 
> spin_lock_mutex, and then it doesn't matter whether or not a grace period 
> has elapsed before CPU2 proceeds.
> 
> Making race conditions less likely just makes them harder to pinpoint when 
> they inevitably appear in the wild. I don't think it makes sense to spend 
> any effort in making a race condition less likely, and that thread I 
> referenced (https://lkml.org/lkml/2013/12/2/997) is fairly strong evidence 
> that fixing this race actually matters. So, I think this race condition 
> really needs to be fixed.
> 
> One fix is for hmm_mirror_unregister to wait for hmm_notifier_release 
> completion between hmm_mirror_kill and hmm_mirror_unref. It can do this by 
> calling synchronize_srcu() on the mmu_notifier's srcu. This has the 
> benefit that the driver is guaranteed not to get the "mm is dead" callback 
> after hmm_mirror_unregister returns.
> 
> In fact, are there any callbacks on the mirror that can arrive after 
> hmm_mirror_unregister? If so, how will hmm_device_unregister solve them?
> 
> From a general standpoint, hmm_device_unregister must perform some kind of 
> synchronization to be sure that all mirrors are completely released and 
> done and no new callbacks will trigger. Since that has to be true, can't 
> that synchronization be moved into hmm_mirror_unregister instead?
> 
> If that happens there's no need for a "mirror can be freed" ->release 
> callback at all because the driver is guaranteed that a mirror is done 
> after hmm_mirror_unregister.

Well there is no need or 2 callback (relase|stop , free) just one, the
release|stop that is needed. I kind of went halfway last week on this.
I will probably rework that a little to keep just one call and rely on
driver to call hmm_mirror_unregister()

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3.

2015-06-11 Thread Jerome Glisse
On Wed, Jun 10, 2015 at 06:15:08PM -0700, Mark Hairgrove wrote:

[...]
> > There is no race here, the mirror struct will only be freed once as again
> > the list is a synchronization point. Whoever remove the mirror from the
> > list is responsible to drop the list reference.
> > 
> > In the fixed code the only thing that will happen twice is the ->release()
> > callback. Even that can be work around to garanty it is call only once.
> > 
> > Anyway i do not see anyrace here.
> > 
> 
> The mirror lifetime is fine. The problem I see is with the device lifetime 
> on a multi-core system. Imagine this sequence:
> 
> - On CPU1 the mm associated with the mirror is going down
> - On CPU2 the driver unregisters the mirror then the device
> 
> When this happens, the last device mutex_unlock on CPU1 is the only thing 
> preventing the free of the device in CPU2. That doesn't work, as described 
> in this thread: https://lkml.org/lkml/2013/12/2/997
> 
> Here's the full sequence again with mutex_unlock split apart. Hopefully 
> this shows the device_unregister problem more clearly:
> 
> CPU1 (mm release)   CPU2 (driver)
> --  --
> hmm_notifier_release
>   down_write(&hmm->rwsem);
>   hlist_del_init(&mirror->mlist);
>   up_write(&hmm->rwsem);
> 
>   // CPU1 thread is preempted or 
>   // something
> hmm_mirror_unregister
>   hmm_mirror_kill
> down_write(&hmm->rwsem);
> // mirror removed by CPU1 already
> // so hlist_unhashed returns 1
> up_write(&hmm->rwsem);
> 
>   hmm_mirror_unref(&mirror);
>   // Mirror ref now 1
> 
>   // CPU2 thread is preempted or
>   // something
> // CPU1 thread is scheduled
> 
> hmm_mirror_unref(&mirror);
>   // Mirror ref now 0, cleanup
>   hmm_mirror_destroy(&mirror)
> mutex_lock(&device->mutex);
> list_del_init(&mirror->dlist);
> device->ops->release(mirror);
>   kfree(mirror);
>   // CPU2 thread is scheduled, now
>   // both CPU1 and CPU2 are running
> 
> hmm_device_unregister
>   mutex_lock(&device->mutex);
> mutex_optimistic_spin()
> mutex_unlock(&device->mutex);
>   [...]
>   __mutex_unlock_common_slowpath
> // CPU2 releases lock
> atomic_set(&lock->count, 1);
>   // Spinning CPU2 acquires now-
>   // free lock
>   // mutex_lock returns
>   // Device list empty
>   mutex_unlock(&device->mutex);
>   return 0;
> kfree(hmm_device);
> // CPU1 still accessing 
> // hmm_device->mutex in 
> //__mutex_unlock_common_slowpath

Ok i see the race you are afraid of and really it is an unlikely one
__mutex_unlock_common_slowpath() take a spinlock right after allowing
other to take the mutex, when we are in your scenario there is no
contention on that spinlock so it is taken right away and as there
is no one in the mutex wait list then it goes directly to unlock the
spinlock and return. You can ignore the debug function as if debugging
is enabled than the mutex_lock() would need to also take the spinlock
and thus you would have proper synchronization btw 2 thread thanks to
the mutex.wait_lock.

So basicly while CPU1 is going :
spin_lock(mutex.wait_lock)
if (!list_empty(mutex.wait_list)) {
  // wait_list is empty so branch not taken
}
spin_unlock(mutex.wait_lock)

CPU2 would have to test the mirror list and mutex_unlock and return
before the spin_unlock() of CPU1. This is a tight race, i can add a
synchronize_rcu() to device_unregister after the mutex_unlock() so
that we also add a grace period before the device is potentialy freed
which should make that race completely unlikely.

Moreover for something really bad to happen it would need that the
freed memory to be reallocated right away by some other thread. Which
really sound unlikely unless CPU1 is the slowest of all :)

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3.

2015-06-10 Thread Jerome Glisse
On Tue, Jun 09, 2015 at 08:33:12PM -0700, Mark Hairgrove wrote:
> 
> 
> On Tue, 9 Jun 2015, Jerome Glisse wrote:
> 
> > On Mon, Jun 08, 2015 at 06:54:29PM -0700, Mark Hairgrove wrote:
> > > Can you clarify how that's different from mmu_notifiers? Those are also
> > > embedded into a driver-owned struct.
> > 
> > For HMM you want to be able to kill a mirror from HMM, you might have kernel
> > thread call inside HMM with a mirror (outside any device file lifetime) ...
> > The mirror is not only use at register & unregister, there is a lot more 
> > thing
> > you can call using the HMM mirror struct.
> > 
> > So the HMM mirror lifetime as a result is more complex, it can not simply be
> > free from the mmu_notifier_release callback or randomly. It needs to be
> > refcounted.
> 
> Sure, there are driver -> HMM calls like hmm_mirror_fault that 
> mmu_notifiers don't have, but I don't understand why that fundamentally 
> makes HMM mirror lifetimes more complex. Decoupling hmm_mirror lifetime 
> from mm lifetime adds complexity too.

Driver->HMM calls can happen from random kernel thread thus you need to
garanty that hmm_mirror can not go away. More over you can have CPU MM
call into HMM outside of mmu_notifier. Basicly you can get to HMM code
by many different code path, unlike any of the current mmu_notifier.

So refcounting is necessary as otherwise the device driver might decide
to unregister and free the mirror while some other kernel thread is
about to dereference the exact same mirror. Synchronization with mmu
notifier srcu will not be enough in the case of page fault on remote
memory for instance. Other case too.

> 
> > The mmu_notifier code assume that the mmu_notifier struct is
> > embedded inside a struct that has a lifetime properly synchronize with the
> > mm. For HMM mirror this is not something that sounds like a good idea as 
> > there
> > is too many way to get it wrong.
> 
> What kind of synchronization with the mm are you referring to here? 
> Clients of mmu_notifiers don't have to do anything as far as I know. 
> They're guaranteed that the mm won't go away because each registered 
> notifier bumps mm_count.

So for all current user afaict (kvm, xen, intel, radeon) tie to a file,
(sgi gru) tie to vma, (iommu) tie to mm. Which means this is all properly
synchronize with lifetime of mm (ignoring the fork case).

The struct that is tie to mmu_notifier for all of them can be accessed
only through one code path (ioctl for most of them).

> 
> > So idea of HMM mirror is that it can out last the mm lifetime but the HMM
> > struct can not. So you have hmm_mirror <~> hmm <-> mm and the mirror can be
> > "unlink" and have different lifetime from the hmm that itself has same life
> > time as mm.
> 
> Per the earlier discussion hmm_mirror_destroy is missing a call to 
> hmm_unref. If that's added back I don't understand how the mirror can 
> persist past the hmm struct. The mirror can be unlinked from hmm's list, 
> yes, but that doesn't mean that hmm/mm can be torn down. The hmm/mm 
> structs will stick around until hmm_destroy since that does the 
> mmu_notifier_unregister. hmm_destroy can't be called until the last 
> hmm_mirror_destroy.
> 
> Doesn't that mean that hmm/mm are guaranteed to be allocated until the 
> last hmm_mirror_unregister? That sounds like a good guarantee to make.

Like said, just ignore current code it is utterly broken in so many way
when it comes to lifetime. I screw that part badly when reworking the
patchset, i was focusing on other part.

I fixed that in my tree, i am waiting for more review on other part as
anyway the lifetime thing is easy to rework/fix.

http://cgit.freedesktop.org/~glisse/linux/log/?h=hmm

[...]
> > > > > If so, I think there's a race here in the case of mm teardown 
> > > > > happening
> > > > > concurrently with hmm_mirror_unregister.
> > > > >
> > > > > [...]
> > > > >
> > > > > Do you agree that this sequence can happen, or am I missing something
> > > > > which prevents it?
> > > >
> > > > Can't happen because child have mm->hmm = NULL ie only one hmm per mm
> > > > and hmm is tie to only one mm. It is the responsability of the device
> > > > driver to make sure same apply to private reference to the hmm mirror
> > > > struct ie hmm_mirror should never be tie to a private file struct.
> > > 
> > > It's useful for the driver to have some association between files and
> > > mirrors. If the file is closed prior to process e

Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3.

2015-06-09 Thread Jerome Glisse
On Mon, Jun 08, 2015 at 06:54:29PM -0700, Mark Hairgrove wrote:
> On Mon, 8 Jun 2015, Jerome Glisse wrote:
> > On Mon, Jun 08, 2015 at 12:40:18PM -0700, Mark Hairgrove wrote:
> > > On Thu, 21 May 2015, j.gli...@gmail.com wrote:
> > > > From: Jérôme Glisse 
> > > >
> > > > This patch only introduce core HMM functions for registering a new
> > > > mirror and stopping a mirror as well as HMM device registering and
> > > > unregistering.
> > > >
> > > > [...]
> > > >
> > > > +/* struct hmm_device_operations - HMM device operation callback
> > > > + */
> > > > +struct hmm_device_ops {
> > > > +   /* release() - mirror must stop using the address space.
> > > > +*
> > > > +* @mirror: The mirror that link process address space with the 
> > > > device.
> > > > +*
> > > > +* When this is call, device driver must kill all device thread 
> > > > using
> > > > +* this mirror. Also, this callback is the last thing call by 
> > > > HMM and
> > > > +* HMM will not access the mirror struct after this call (ie no 
> > > > more
> > > > +* dereference of it so it is safe for the device driver to 
> > > > free it).
> > > > +* It is call either from :
> > > > +*   - mm dying (all process using this mm exiting).
> > > > +*   - hmm_mirror_unregister() (if no other thread holds a 
> > > > reference)
> > > > +*   - outcome of some device error reported by any of the 
> > > > device
> > > > +* callback against that mirror.
> > > > +*/
> > > > +   void (*release)(struct hmm_mirror *mirror);
> > > > +};
> > >
> > > The comment that ->release is called when the mm dies doesn't match the
> > > implementation. ->release is only called when the mirror is destroyed, and
> > > that can only happen after the mirror has been unregistered. This may not
> > > happen until after the mm dies.
> > >
> > > Is the intent for the driver to get the callback when the mm goes down?
> > > That seems beneficial so the driver can kill whatever's happening on the
> > > device. Otherwise the device may continue operating in a dead address
> > > space until the driver's file gets closed and it unregisters the mirror.
> >
> > This was the intent before merging free & release. I guess i need to
> > reinstate the free versus release callback. Sadly the lifetime for HMM
> > is more complex than mmu_notifier as we intend the mirror struct to
> > be embedded into a driver private struct.
> 
> Can you clarify how that's different from mmu_notifiers? Those are also
> embedded into a driver-owned struct.

For HMM you want to be able to kill a mirror from HMM, you might have kernel
thread call inside HMM with a mirror (outside any device file lifetime) ...
The mirror is not only use at register & unregister, there is a lot more thing
you can call using the HMM mirror struct.

So the HMM mirror lifetime as a result is more complex, it can not simply be
free from the mmu_notifier_release callback or randomly. It needs to be
refcounted. The mmu_notifier code assume that the mmu_notifier struct is
embedded inside a struct that has a lifetime properly synchronize with the
mm. For HMM mirror this is not something that sounds like a good idea as there
is too many way to get it wrong.

So idea of HMM mirror is that it can out last the mm lifetime but the HMM
struct can not. So you have hmm_mirror <~> hmm <-> mm and the mirror can be
"unlink" and have different lifetime from the hmm that itself has same life
time as mm.

> Is the goal to allow calling hmm_mirror_unregister from within the "mm is
> dying" HMM callback? I don't know whether that's really necessary as long
> as there's some association between the driver files and the mirrors.

No this is not a goal and i actualy forbid that.

> 
> > > If so, I think there's a race here in the case of mm teardown happening
> > > concurrently with hmm_mirror_unregister.
> > >
> > > [...]
> > >
> > > Do you agree that this sequence can happen, or am I missing something
> > > which prevents it?
> >
> > Can't happen because child have mm->hmm = NULL ie only one hmm per mm
> > and hmm is tie to only one mm. It is the responsability of the device
> > driver to make sure same apply to

Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3.

2015-06-08 Thread Jerome Glisse
On Mon, Jun 08, 2015 at 12:40:18PM -0700, Mark Hairgrove wrote:
> 
> 
> On Thu, 21 May 2015, j.gli...@gmail.com wrote:
> 
> > From: Jérôme Glisse 
> >
> > This patch only introduce core HMM functions for registering a new
> > mirror and stopping a mirror as well as HMM device registering and
> > unregistering.
> >
> > [...]
> >
> > +/* struct hmm_device_operations - HMM device operation callback
> > + */
> > +struct hmm_device_ops {
> > +   /* release() - mirror must stop using the address space.
> > +*
> > +* @mirror: The mirror that link process address space with the device.
> > +*
> > +* When this is call, device driver must kill all device thread using
> > +* this mirror. Also, this callback is the last thing call by HMM and
> > +* HMM will not access the mirror struct after this call (ie no more
> > +* dereference of it so it is safe for the device driver to free it).
> > +* It is call either from :
> > +*   - mm dying (all process using this mm exiting).
> > +*   - hmm_mirror_unregister() (if no other thread holds a reference)
> > +*   - outcome of some device error reported by any of the device
> > +* callback against that mirror.
> > +*/
> > +   void (*release)(struct hmm_mirror *mirror);
> > +};
> 
> The comment that ->release is called when the mm dies doesn't match the
> implementation. ->release is only called when the mirror is destroyed, and
> that can only happen after the mirror has been unregistered. This may not
> happen until after the mm dies.
> 
> Is the intent for the driver to get the callback when the mm goes down?
> That seems beneficial so the driver can kill whatever's happening on the
> device. Otherwise the device may continue operating in a dead address
> space until the driver's file gets closed and it unregisters the mirror.

This was the intent before merging free & release. I guess i need to
reinstate the free versus release callback. Sadly the lifetime for HMM
is more complex than mmu_notifier as we intend the mirror struct to
be embedded into a driver private struct.

> 
> > +static void hmm_mirror_destroy(struct kref *kref)
> > +{
> > +   struct hmm_device *device;
> > +   struct hmm_mirror *mirror;
> > +   struct hmm *hmm;
> > +
> > +   mirror = container_of(kref, struct hmm_mirror, kref);
> > +   device = mirror->device;
> > +   hmm = mirror->hmm;
> > +
> > +   mutex_lock(&device->mutex);
> > +   list_del_init(&mirror->dlist);
> > +   device->ops->release(mirror);
> > +   mutex_unlock(&device->mutex);
> > +}
> 
> The hmm variable is unused. It also probably isn't safe to access at this
> point.

hmm_unref(hmm); was lost somewhere probably in a rebase and it is safe to
access hmm here.

> 
> 
> > +static void hmm_mirror_kill(struct hmm_mirror *mirror)
> > +{
> > +   down_write(&mirror->hmm->rwsem);
> > +   if (!hlist_unhashed(&mirror->mlist)) {
> > +   hlist_del_init(&mirror->mlist);
> > +   up_write(&mirror->hmm->rwsem);
> > +
> > +   hmm_mirror_unref(&mirror);
> > +   } else
> > +   up_write(&mirror->hmm->rwsem);
> > +}
> 
> Shouldn't this call hmm_unref? hmm_mirror_register calls hmm_ref but
> there's no corresponding hmm_unref when the mirror goes away. As a result
> the hmm struct gets leaked and thus so does the entire mm since
> mmu_notifier_unregister is never called.
> 
> It might also be a good idea to set mirror->hmm = NULL here to prevent
> accidental use in say hmm_mirror_destroy.

No, hmm_mirror_destroy must be the one doing the hmm_unref(hmm)

> 
> 
> > +/* hmm_device_unregister() - unregister a device with HMM.
> > + *
> > + * @device: The hmm_device struct.
> > + * Returns: 0 on success or -EBUSY otherwise.
> > + *
> > + * Call when device driver want to unregister itself with HMM. This will 
> > check
> > + * that there is no any active mirror and returns -EBUSY if so.
> > + */
> > +int hmm_device_unregister(struct hmm_device *device)
> > +{
> > +   mutex_lock(&device->mutex);
> > +   if (!list_empty(&device->mirrors)) {
> > +   mutex_unlock(&device->mutex);
> > +   return -EBUSY;
> > +   }
> > +   mutex_unlock(&device->mutex);
> > +   return 0;
> > +}
> 
> I assume that the intention is for the caller to spin on
> hmm_device_unregister until -EBUSY is no longer returned?
> 
> If so, I think there's a race here in the case of mm teardown happening
> concurrently with hmm_mirror_unregister. This can happen if the parent
> process was forked and exits while the child closes the file, or if the
> file is passed to another process and closed last there while the original
> process exits.
> 
> The upshot is that the hmm_device may still be referenced by another
> thread even after hmm_device_unregister returns 0.
> 
> The below sequence shows how this might happen. Coming into this, the
> mirror's ref count is 2:
> 
> Thread A (file close)   Thread B (process exit)
> --  --
>   

Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3.

2015-05-27 Thread Jerome Glisse
On Wed, May 27, 2015 at 11:20:05AM +0530, Aneesh Kumar K.V wrote:
> j.gli...@gmail.com writes:

Noted your grammar fixes.

> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 52ffb86..189e48f 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -653,3 +653,18 @@ config DEFERRED_STRUCT_PAGE_INIT
> >   when kswapd starts. This has a potential performance impact on
> >   processes running early in the lifetime of the systemm until kswapd
> >   finishes the initialisation.
> > +
> > +if STAGING
> > +config HMM
> > +   bool "Enable heterogeneous memory management (HMM)"
> > +   depends on MMU
> > +   select MMU_NOTIFIER
> > +   select GENERIC_PAGE_TABLE
> 
> What is GENERIC_PAGE_TABLE ?

Let over of when patch 0006 what a seperate feature that was introduced
before this patch. I failed to remove that chunk. Just ignore it.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 for-next 06/16] IB/core: Implement support for MMU notifiers regarding on demand paging regions

2014-09-11 Thread Jerome Glisse
On Thu, Sep 11, 2014 at 11:32:56AM -0400, Jerome Glisse wrote:
> On Thu, Sep 11, 2014 at 12:19:01PM +, Shachar Raindel wrote:
> > 
> > 
> > > -Original Message-
> > > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > > Sent: Wednesday, September 10, 2014 11:15 PM
> > > To: Shachar Raindel
> > > Cc: Haggai Eran; linux-rdma@vger.kernel.org; Sagi Grimberg
> > > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement support for
> > > MMU notifiers regarding on demand paging regions
> > > 
> > > On Wed, Sep 10, 2014 at 09:00:36AM +0000, Shachar Raindel wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > > > > Sent: Tuesday, September 09, 2014 6:37 PM
> > > > > To: Shachar Raindel
> > > > > Cc: 1404377069-20585-1-git-send-email-hagg...@mellanox.com; Haggai
> > > Eran;
> > > > > linux-rdma@vger.kernel.org; Jerome Glisse; Sagi Grimberg
> > > > > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement support
> > > for
> > > > > MMU notifiers regarding on demand paging regions
> > > > >
> > > > > On Sun, Sep 07, 2014 at 02:35:59PM +, Shachar Raindel wrote:
> > > > > > Hi,
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > > > > > > Sent: Thursday, September 04, 2014 11:25 PM
> > > > > > > To: Haggai Eran; linux-rdma@vger.kernel.org
> > > > > > > Cc: Shachar Raindel; Sagi Grimberg
> > > > > > > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement
> > > support
> > > > > for
> > > > > > > MMU notifiers regarding on demand paging regions
> > > > > > >
> > 
> > 
> > 
> > > > > >
> > > > > > Sadly, taking mmap_sem in read-only mode does not prevent all
> > > possible
> > > > > invalidations from happening.
> > > > > > For example, a call to madvise requesting MADVISE_DONTNEED will
> > > lock
> > > > > the mmap_sem for reading only, allowing a notifier to run in
> > > parallel to
> > > > > the MR registration As a result, the following sequence of events
> > > could
> > > > > happen:
> > > > > >
> > > > > > Thread 1:   |   Thread 2
> > > > > > +-
> > > > > > madvise |
> > > > > > down_read(mmap_sem) |
> > > > > > notifier_start  |
> > > > > > |   down_read(mmap_sem)
> > > > > > |   register_mr
> > > > > > notifier_end|
> > > > > > reduce_mr_notifiers_count   |
> > > > > >
> > > > > > The end result of this sequence is an mr with running notifiers
> > > count
> > > > > of -1, which is bad.
> > > > > > The current workaround is to avoid decreasing the notifiers count
> > > if
> > > > > it is zero, which can cause other issues.
> > > > > > The proper fix would be to prevent notifiers from running in
> > > parallel
> > > > > to registration. For this, taking mmap_sem in write mode might be
> > > > > sufficient, but we are not sure about this.
> > > > > > We will be happy to hear additional input on this subject, to make
> > > > > sure we got it covered properly.
> > > > >
> > > > > So in HMM i solve this by having a struct allocated in the start
> > > range
> > > > > callback
> > > > > and the end range callback just ignore things when it can not find
> > > the
> > > > > matching
> > > > > struct.
> > > >
> > > > This kind of mechanism sounds like it has a bigger risk for
> > > deadlocking
> > > > the system, causing an OOM kill without a real need or significantly
> > > > slowing down the system.
> > > > If you are doing non-atomic memory allocations, you can deadlock the
> > > > system by requesting memory in the swapper

Re: [PATCH v1 for-next 06/16] IB/core: Implement support for MMU notifiers regarding on demand paging regions

2014-09-11 Thread Jerome Glisse
On Thu, Sep 11, 2014 at 12:19:01PM +, Shachar Raindel wrote:
> 
> 
> > -Original Message-
> > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > Sent: Wednesday, September 10, 2014 11:15 PM
> > To: Shachar Raindel
> > Cc: Haggai Eran; linux-rdma@vger.kernel.org; Sagi Grimberg
> > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement support for
> > MMU notifiers regarding on demand paging regions
> > 
> > On Wed, Sep 10, 2014 at 09:00:36AM +, Shachar Raindel wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > > > Sent: Tuesday, September 09, 2014 6:37 PM
> > > > To: Shachar Raindel
> > > > Cc: 1404377069-20585-1-git-send-email-hagg...@mellanox.com; Haggai
> > Eran;
> > > > linux-rdma@vger.kernel.org; Jerome Glisse; Sagi Grimberg
> > > > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement support
> > for
> > > > MMU notifiers regarding on demand paging regions
> > > >
> > > > On Sun, Sep 07, 2014 at 02:35:59PM +, Shachar Raindel wrote:
> > > > > Hi,
> > > > >
> > > > > > -Original Message-
> > > > > > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > > > > > Sent: Thursday, September 04, 2014 11:25 PM
> > > > > > To: Haggai Eran; linux-rdma@vger.kernel.org
> > > > > > Cc: Shachar Raindel; Sagi Grimberg
> > > > > > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement
> > support
> > > > for
> > > > > > MMU notifiers regarding on demand paging regions
> > > > > >
> 
> 
> 
> > > > >
> > > > > Sadly, taking mmap_sem in read-only mode does not prevent all
> > possible
> > > > invalidations from happening.
> > > > > For example, a call to madvise requesting MADVISE_DONTNEED will
> > lock
> > > > the mmap_sem for reading only, allowing a notifier to run in
> > parallel to
> > > > the MR registration As a result, the following sequence of events
> > could
> > > > happen:
> > > > >
> > > > > Thread 1:   |   Thread 2
> > > > > +-
> > > > > madvise |
> > > > > down_read(mmap_sem) |
> > > > > notifier_start  |
> > > > > |   down_read(mmap_sem)
> > > > > |   register_mr
> > > > > notifier_end|
> > > > > reduce_mr_notifiers_count   |
> > > > >
> > > > > The end result of this sequence is an mr with running notifiers
> > count
> > > > of -1, which is bad.
> > > > > The current workaround is to avoid decreasing the notifiers count
> > if
> > > > it is zero, which can cause other issues.
> > > > > The proper fix would be to prevent notifiers from running in
> > parallel
> > > > to registration. For this, taking mmap_sem in write mode might be
> > > > sufficient, but we are not sure about this.
> > > > > We will be happy to hear additional input on this subject, to make
> > > > sure we got it covered properly.
> > > >
> > > > So in HMM i solve this by having a struct allocated in the start
> > range
> > > > callback
> > > > and the end range callback just ignore things when it can not find
> > the
> > > > matching
> > > > struct.
> > >
> > > This kind of mechanism sounds like it has a bigger risk for
> > deadlocking
> > > the system, causing an OOM kill without a real need or significantly
> > > slowing down the system.
> > > If you are doing non-atomic memory allocations, you can deadlock the
> > > system by requesting memory in the swapper flow.
> > > Even if you are doing atomic memory allocations, you need to handle
> > the
> > > case of failing allocation, the solution to which is unclear to me.
> > > If you are using a pre-allocated pool, what are you doing when you run
> > > out of available entries in the pool? If you are blocking until some
> > > entries free up, what guarantees you that this will not cause a
> > deadlock?
> > 
> > So i am using a fixed pool and when it r

Re: [PATCH v1 for-next 06/16] IB/core: Implement support for MMU notifiers regarding on demand paging regions

2014-09-10 Thread Jerome Glisse
On Wed, Sep 10, 2014 at 09:00:36AM +, Shachar Raindel wrote:
> 
> 
> > -Original Message-
> > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > Sent: Tuesday, September 09, 2014 6:37 PM
> > To: Shachar Raindel
> > Cc: 1404377069-20585-1-git-send-email-hagg...@mellanox.com; Haggai Eran;
> > linux-rdma@vger.kernel.org; Jerome Glisse; Sagi Grimberg
> > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement support for
> > MMU notifiers regarding on demand paging regions
> > 
> > On Sun, Sep 07, 2014 at 02:35:59PM +, Shachar Raindel wrote:
> > > Hi,
> > >
> > > > -Original Message-
> > > > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > > > Sent: Thursday, September 04, 2014 11:25 PM
> > > > To: Haggai Eran; linux-rdma@vger.kernel.org
> > > > Cc: Shachar Raindel; Sagi Grimberg
> > > > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement support
> > for
> > > > MMU notifiers regarding on demand paging regions
> > > >
> > > > > * Add an interval tree implementation for ODP umems. Create an
> > > > interval tree
> > > > >   for each ucontext (including a count of the number of ODP MRs in
> > > > this
> > > > >   context, mutex, etc.), and register ODP umems in the interval
> > tree.
> > > > > * Add MMU notifiers handling functions, using the interval tree to
> > > > notify only
> > > > >   the relevant umems and underlying MRs.
> > > > > * Register to receive MMU notifier events from the MM subsystem
> > upon
> > > > ODP MR
> > > > >   registration (and unregister accordingly).
> > > > > * Add a completion object to synchronize the destruction of ODP
> > umems.
> > > > > * Add mechanism to abort page faults when there's a concurrent
> > > > invalidation.
> > > > >
> > > > > The way we synchronize between concurrent invalidations and page
> > > > faults is by
> > > > > keeping a counter of currently running invalidations, and a
> > sequence
> > > > number
> > > > > that is incremented whenever an invalidation is caught. The page
> > fault
> > > > code
> > > > > checks the counter and also verifies that the sequence number
> > hasn't
> > > > > progressed before it updates the umem's page tables. This is
> > similar
> > > > to what
> > > > > the kvm module does.
> > > > >
> > > > > There's currently a rare race in the code when registering a umem
> > in
> > > > the
> > > > > middle of an ongoing notifier. The proper fix is to either
> > serialize
> > > > the
> > > > > insertion to our umem tree with mm_lock_all or use a ucontext wide
> > > > running
> > > > > notifiers count for retries decision. Either is ugly and can lead
> > to
> > > > some sort
> > > > > of starvation. The current workaround is ugly as well - now the
> > user
> > > > can end
> > > > > up with mapped addresses that are not in the user's address space
> > > > (although it
> > > > > is highly unlikely).
> > > >
> > > > I have been trying to wrap my head around this comment. I am totaly
> > > > unfamiliar
> > > > with RDMA code, but from quick look at it when registering umem you
> > take
> > > > the
> > > > mmap_sem in read mode so any munmap from userspace would be
> > serialize.
> > > > Really
> > > > the worst that can happen is that a umem pointing to a mmaped file
> > that
> > > > is
> > > > concurently truncated but even then the address is still valid, but
> > it
> > > > should
> > > > result in a SIGBUS which here is obviously harder to report (again
> > dunno
> > > > how
> > > > RDMA works).
> > > >
> > > > So am i missing something ?
> > > >
> > >
> > > Sadly, taking mmap_sem in read-only mode does not prevent all possible
> > invalidations from happening.
> > > For example, a call to madvise requesting MADVISE_DONTNEED will lock
> > the mmap_sem for reading only, allowing a notifier to run in parallel to
> > the MR registration As a result, the following sequence of events could
> > happen:
> > >
> > &

Re: [PATCH v1 for-next 06/16] IB/core: Implement support for MMU notifiers regarding on demand paging regions

2014-09-09 Thread Jerome Glisse
On Sun, Sep 07, 2014 at 02:35:59PM +, Shachar Raindel wrote:
> Hi,
> 
> > -Original Message-
> > From: Jerome Glisse [mailto:j.gli...@gmail.com]
> > Sent: Thursday, September 04, 2014 11:25 PM
> > To: Haggai Eran; linux-rdma@vger.kernel.org
> > Cc: Shachar Raindel; Sagi Grimberg
> > Subject: Re: [PATCH v1 for-next 06/16] IB/core: Implement support for
> > MMU notifiers regarding on demand paging regions
> > 
> > > * Add an interval tree implementation for ODP umems. Create an
> > interval tree
> > >   for each ucontext (including a count of the number of ODP MRs in
> > this
> > >   context, mutex, etc.), and register ODP umems in the interval tree.
> > > * Add MMU notifiers handling functions, using the interval tree to
> > notify only
> > >   the relevant umems and underlying MRs.
> > > * Register to receive MMU notifier events from the MM subsystem upon
> > ODP MR
> > >   registration (and unregister accordingly).
> > > * Add a completion object to synchronize the destruction of ODP umems.
> > > * Add mechanism to abort page faults when there's a concurrent
> > invalidation.
> > >
> > > The way we synchronize between concurrent invalidations and page
> > faults is by
> > > keeping a counter of currently running invalidations, and a sequence
> > number
> > > that is incremented whenever an invalidation is caught. The page fault
> > code
> > > checks the counter and also verifies that the sequence number hasn't
> > > progressed before it updates the umem's page tables. This is similar
> > to what
> > > the kvm module does.
> > >
> > > There's currently a rare race in the code when registering a umem in
> > the
> > > middle of an ongoing notifier. The proper fix is to either serialize
> > the
> > > insertion to our umem tree with mm_lock_all or use a ucontext wide
> > running
> > > notifiers count for retries decision. Either is ugly and can lead to
> > some sort
> > > of starvation. The current workaround is ugly as well - now the user
> > can end
> > > up with mapped addresses that are not in the user's address space
> > (although it
> > > is highly unlikely).
> > 
> > I have been trying to wrap my head around this comment. I am totaly
> > unfamiliar
> > with RDMA code, but from quick look at it when registering umem you take
> > the
> > mmap_sem in read mode so any munmap from userspace would be serialize.
> > Really
> > the worst that can happen is that a umem pointing to a mmaped file that
> > is
> > concurently truncated but even then the address is still valid, but it
> > should
> > result in a SIGBUS which here is obviously harder to report (again dunno
> > how
> > RDMA works).
> > 
> > So am i missing something ?
> > 
> 
> Sadly, taking mmap_sem in read-only mode does not prevent all possible 
> invalidations from happening.
> For example, a call to madvise requesting MADVISE_DONTNEED will lock the 
> mmap_sem for reading only, allowing a notifier to run in parallel to the MR 
> registration As a result, the following sequence of events could happen:
> 
> Thread 1:   |   Thread 2
> +-
> madvise |
> down_read(mmap_sem) |
> notifier_start  |
> |   down_read(mmap_sem)
> |   register_mr
> notifier_end|
> reduce_mr_notifiers_count   |
> 
> The end result of this sequence is an mr with running notifiers count of -1, 
> which is bad.
> The current workaround is to avoid decreasing the notifiers count if it is 
> zero, which can cause other issues.
> The proper fix would be to prevent notifiers from running in parallel to 
> registration. For this, taking mmap_sem in write mode might be sufficient, 
> but we are not sure about this.
> We will be happy to hear additional input on this subject, to make sure we 
> got it covered properly.

So in HMM i solve this by having a struct allocated in the start range callback
and the end range callback just ignore things when it can not find the matching
struct.

That being said when registering the mmu_notifier you need 2 things, first you
need a pin on the mm (either mm is current ie current->mm or you took a 
reference
on it). Second you need to that the mmap smemaphore in write mode so that
no concurrent mmap/munmap/madvise can happen. By doing that you protect yourself
from co

Re: [PATCH v1 for-next 06/16] IB/core: Implement support for MMU notifiers regarding on demand paging regions

2014-09-04 Thread Jerome Glisse
> * Add an interval tree implementation for ODP umems. Create an interval tree
>   for each ucontext (including a count of the number of ODP MRs in this
>   context, mutex, etc.), and register ODP umems in the interval tree.
> * Add MMU notifiers handling functions, using the interval tree to notify only
>   the relevant umems and underlying MRs.
> * Register to receive MMU notifier events from the MM subsystem upon ODP MR
>   registration (and unregister accordingly).
> * Add a completion object to synchronize the destruction of ODP umems.
> * Add mechanism to abort page faults when there's a concurrent invalidation.
> 
> The way we synchronize between concurrent invalidations and page faults is by
> keeping a counter of currently running invalidations, and a sequence number
> that is incremented whenever an invalidation is caught. The page fault code
> checks the counter and also verifies that the sequence number hasn't
> progressed before it updates the umem's page tables. This is similar to what
> the kvm module does.
> 
> There's currently a rare race in the code when registering a umem in the
> middle of an ongoing notifier. The proper fix is to either serialize the
> insertion to our umem tree with mm_lock_all or use a ucontext wide running
> notifiers count for retries decision. Either is ugly and can lead to some sort
> of starvation. The current workaround is ugly as well - now the user can end
> up with mapped addresses that are not in the user's address space (although it
> is highly unlikely).

I have been trying to wrap my head around this comment. I am totaly unfamiliar
with RDMA code, but from quick look at it when registering umem you take the
mmap_sem in read mode so any munmap from userspace would be serialize. Really
the worst that can happen is that a umem pointing to a mmaped file that is
concurently truncated but even then the address is still valid, but it should
result in a SIGBUS which here is obviously harder to report (again dunno how
RDMA works).

So am i missing something ?

> 
> Signed-off-by: Sagi Grimberg 
> Signed-off-by: Shachar Raindel 
> Signed-off-by: Haggai Eran 
> Signed-off-by: Yuval Dagan 
> ---
>  drivers/infiniband/Kconfig|   1 +
>  drivers/infiniband/core/Makefile  |   2 +-
>  drivers/infiniband/core/umem.c|   2 +-
>  drivers/infiniband/core/umem_odp.c| 337 
> +-
>  drivers/infiniband/core/umem_rbtree.c |  94 ++
>  drivers/infiniband/core/uverbs_cmd.c  |  16 ++
>  include/rdma/ib_umem_odp.h|  56 ++
>  include/rdma/ib_verbs.h   |  16 ++
>  8 files changed, 512 insertions(+), 12 deletions(-)
>  create mode 100644 drivers/infiniband/core/umem_rbtree.c
> 
> diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
> index 089a2c2..b899531 100644
> --- a/drivers/infiniband/Kconfig
> +++ b/drivers/infiniband/Kconfig
> @@ -41,6 +41,7 @@ config INFINIBAND_USER_MEM
>  config INFINIBAND_ON_DEMAND_PAGING
>   bool "InfiniBand on-demand paging support"
>   depends on INFINIBAND_USER_MEM
> + select MMU_NOTIFIER
>   default y
>   ---help---
> On demand paging support for the InfiniBand subsystem.
> diff --git a/drivers/infiniband/core/Makefile 
> b/drivers/infiniband/core/Makefile
> index c58f791..acf7367 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -11,7 +11,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o 
> ib_ucm.o \
>  ib_core-y := packer.o ud_header.o verbs.o sysfs.o \
>   device.o fmr_pool.o cache.o netlink.o
>  ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
> -ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
> +ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
>  
>  ib_mad-y :=  mad.o smi.o agent.o mad_rmpp.o
>  
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index e9798e0..014977f 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -72,7 +72,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
> ib_umem *umem, int d
>   * ib_umem_get - Pin and DMA map userspace memory.
>   *
>   * If access flags indicate ODP memory, avoid pinning. Instead, stores
> - * the mm for future page fault handling.
> + * the mm for future page fault handling in conjuction with MMU notifiers.
>   *
>   * @context: userspace context to pin memory for
>   * @addr: userspace virtual address to start at
> diff --git a/drivers/infiniband/core/umem_odp.c 
> b/drivers/infiniband/core/umem_odp.c
> index 0c90ce50..c048269 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -41,26 +41,204 @@
>  #include 
>  #include 
>  
> +void ib_umem_notifier_start_account(struct ib_umem *item)
> +{
> + int notifiers_count;
> + mutex_lock(&item->odp_data->umem_mutex);
> + /*
> +  * Av

Re: [PATCH v1 for-next 00/16] On demand paging

2014-09-04 Thread Jerome Glisse
On Wed, Sep 03, 2014 at 02:15:51PM -0700, Roland Dreier wrote:
> > I would like to note that we at Los Alamos National Laboratory are very
> > interested in this functionality and it would be great if it gets accepted.
> 
> Have you done any review or testing of these changes?  If so can you
> share the results?

So jumping in here. I am working on a similar issue, ie on a subsystem that
allow a device to mirror a process address space (or part of it) using its
own mmu and page table.

While i am aiming at providing a generic API, it is not yet fully cook thus
i think having a driver implement its own code in the meantime is something
we have to live with. I am sharing the frustration of Sagi or Haggai on how
hard it is to get any kind of review.

I have not yet reviewed this code but it's nearing the top of on my todo list.
Even thought i am not an authoritative figure in either mm or rdma.

For anyone interested in taking a peak at HMM (subsystem i am working on to
provide same functionality and more) :

http://marc.info/?l=linux-mm&m=140933942705466&w=4

Cheers,
Jérôme

> 
>  - R.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html