Re: [PATCH v2 2/2] x86/mm/pat, drivers/media/ivtv: move pat warn and replace WARN() with pr_warn()

2015-07-07 Thread Ingo Molnar

* Luis R. Rodriguez mcg...@do-not-panic.com wrote:

 On Mon, Jul 6, 2015 at 5:44 PM, Luis R. Rodriguez mcg...@suse.com wrote:
  If we really wanted to we could consider arch_phys_wc_add()
 
 I mean adding a __arch_phys_wc_add() which does not check for pat_enabled().
 
  and
  deal with that this will not check for pat_enabled() and forces MTRR...
  I think Andy Luto won't like that very much though ? I at least don't
  like it since we did all this work to finally leave only 1 piece of
  code with direct MTRR access... Seems a bit sad. Since ipath will
  be removed we'd have only ivtv driver using this API, I am not sure if
  its worth it.
 
 Since ipath is going away soon we'd just have one driver with the icky
 #ifdef code. I'd rather see that and then require semantics / grammer
 rules to require ioremap_wc() when used with arch_phys_wc_add(). I don't 
 think 
 ivtv is worth to consider breaking the semantics and requirements.

Ok, let's keep your original approach with the warning then.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] x86/mm/pat, drivers/media/ivtv: move pat warn and replace WARN() with pr_warn()

2015-06-29 Thread Ingo Molnar

* Andy Walls a...@silverblocksystems.net wrote:

 On Fri, 2015-06-26 at 10:45 +0200, Ingo Molnar wrote:
  * Luis R. Rodriguez mcg...@suse.com wrote:
  
   On Thu, Jun 25, 2015 at 08:51:47AM +0200, Ingo Molnar wrote:

* Luis R. Rodriguez mcg...@do-not-panic.com wrote:

 From: Luis R. Rodriguez mcg...@suse.com
 
 On built-in kernels this warning will always splat as this is part
 of the module init. Fix that by shifting the PAT requirement check
 out under the code that does the quasi-probe for the device. This
 device driver relies on an existing driver to find its own devices,
 it looks for that device driver and its own found devices, then
 uses driver_for_each_device() to try to see if it can probe each of
 those devices as a frambuffer device with ivtvfb_init_card(). We
 tuck the PAT requiremenet check then on the ivtvfb_init_card()
 call making the check at least require an ivtv device present
 before complaining.
 
 Reported-by: Fengguang Wu fengguang...@intel.com [0-day test robot]
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  drivers/media/pci/ivtv/ivtvfb.c | 15 +--
  1 file changed, 9 insertions(+), 6 deletions(-)
 
 diff --git a/drivers/media/pci/ivtv/ivtvfb.c 
 b/drivers/media/pci/ivtv/ivtvfb.c
 index 4cb365d..8b95eef 100644
 --- a/drivers/media/pci/ivtv/ivtvfb.c
 +++ b/drivers/media/pci/ivtv/ivtvfb.c
 @@ -38,6 +38,8 @@
  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  
 02111-1307  USA
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/module.h
  #include linux/kernel.h
  #include linux/fb.h
 @@ -1171,6 +1173,13 @@ static int ivtvfb_init_card(struct ivtv *itv)
  {
   int rc;
  
 +#ifdef CONFIG_X86_64
 + if (pat_enabled()) {
 + pr_warn(ivtvfb needs PAT disabled, boot with nopat 
 kernel parameter\n);
 + return -ENODEV;
 + }
 +#endif
 +
   if (itv-osd_info) {
   IVTVFB_ERR(Card %d already initialised\n, 
 ivtvfb_card_id);
   return -EBUSY;

Same argument as for ipath: why not make arch_phys_wc_add() fail on PAT 
and 
return -1, and check it in arch_phys_wc_del()?
   
   The arch_phys_wc_add() is a no-op for PAT systems but for PAT to work we 
   need 
   not only need to add this in where we replace the MTRR call but we also 
   need to 
   convert ioremap_nocache() calls to ioremap_wc() but only if things were 
   split up 
   already.
  
 
 Hi Ingo,
 
  We don't need to do that: for such legacy drivers we can fall back to UC 
  just 
  fine, and inform the user that by booting with 'nopat' the old behavior 
  will be 
  back...
 
 This is really a user experience decision.
 
 IMO anyone who is still using ivtvfb and an old conventional PCI PVR-350 to 
 render, at SDTV resolution, an X Desktop display or video playback on a 
 television screen, isn't going to give a hoot about modern things like PAT.  
 The 
 user will simply want the framebuffer performance they are accustomed to 
 having 
 with their system.  UC will probably yield unsatisfactory performance for an 
 ivtvfb framebuffer.
 
 With that in mind, I would think it better to obviously and clearly disable 
 the 
 ivtvfb framebuffer module with PAT enabled, so the user will check the log 
 and 
 read the steps needed to obtain acceptable performance.
 
 Maybe that's me just wanting to head off the poor ivtvfb performance with 
 latest kernel bug reports.
 
 Whatever the decision, my stock response to bug reports related to this will 
 always be What do the logs say?.

So what if that frame buffer is their only (working) frame buffer? A slow 
framebuffer is still much better at giving people logs to look at than a 
non-working one.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] x86/mm/pat, drivers/media/ivtv: move pat warn and replace WARN() with pr_warn()

2015-06-26 Thread Ingo Molnar

* Luis R. Rodriguez mcg...@suse.com wrote:

 On Thu, Jun 25, 2015 at 08:51:47AM +0200, Ingo Molnar wrote:
  
  * Luis R. Rodriguez mcg...@do-not-panic.com wrote:
  
   From: Luis R. Rodriguez mcg...@suse.com
   
   On built-in kernels this warning will always splat as this is part
   of the module init. Fix that by shifting the PAT requirement check
   out under the code that does the quasi-probe for the device. This
   device driver relies on an existing driver to find its own devices,
   it looks for that device driver and its own found devices, then
   uses driver_for_each_device() to try to see if it can probe each of
   those devices as a frambuffer device with ivtvfb_init_card(). We
   tuck the PAT requiremenet check then on the ivtvfb_init_card()
   call making the check at least require an ivtv device present
   before complaining.
   
   Reported-by: Fengguang Wu fengguang...@intel.com [0-day test robot]
   Signed-off-by: Luis R. Rodriguez mcg...@suse.com
   ---
drivers/media/pci/ivtv/ivtvfb.c | 15 +--
1 file changed, 9 insertions(+), 6 deletions(-)
   
   diff --git a/drivers/media/pci/ivtv/ivtvfb.c 
   b/drivers/media/pci/ivtv/ivtvfb.c
   index 4cb365d..8b95eef 100644
   --- a/drivers/media/pci/ivtv/ivtvfb.c
   +++ b/drivers/media/pci/ivtv/ivtvfb.c
   @@ -38,6 +38,8 @@
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 
USA
 */

   +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
   +
#include linux/module.h
#include linux/kernel.h
#include linux/fb.h
   @@ -1171,6 +1173,13 @@ static int ivtvfb_init_card(struct ivtv *itv)
{
 int rc;

   +#ifdef CONFIG_X86_64
   + if (pat_enabled()) {
   + pr_warn(ivtvfb needs PAT disabled, boot with nopat kernel 
   parameter\n);
   + return -ENODEV;
   + }
   +#endif
   +
 if (itv-osd_info) {
 IVTVFB_ERR(Card %d already initialised\n, ivtvfb_card_id);
 return -EBUSY;
  
  Same argument as for ipath: why not make arch_phys_wc_add() fail on PAT and 
  return -1, and check it in arch_phys_wc_del()?
 
 The arch_phys_wc_add() is a no-op for PAT systems but for PAT to work we need 
 not only need to add this in where we replace the MTRR call but we also need 
 to 
 convert ioremap_nocache() calls to ioremap_wc() but only if things were split 
 up 
 already.

We don't need to do that: for such legacy drivers we can fall back to UC just 
fine, and inform the user that by booting with 'nopat' the old behavior will be 
back...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/mm/pat, drivers/infiniband/ipath: replace WARN() with pr_warn()

2015-06-26 Thread Ingo Molnar

* Luis R. Rodriguez mcg...@suse.com wrote:

 On Thu, Jun 25, 2015 at 08:49:22AM +0200, Ingo Molnar wrote:
  
  * Luis R. Rodriguez mcg...@do-not-panic.com wrote:
  
   From: Luis R. Rodriguez mcg...@suse.com
   
   WARN() may confuse users, fix that. ipath_init_one() is part the
   device's probe so this would only be triggered if a corresponding
   device was found.
   
   Signed-off-by: Luis R. Rodriguez mcg...@suse.com
   ---
drivers/infiniband/hw/ipath/ipath_driver.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
   
   diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
   b/drivers/infiniband/hw/ipath/ipath_driver.c
   index 2d7e503..871dbe5 100644
   --- a/drivers/infiniband/hw/ipath/ipath_driver.c
   +++ b/drivers/infiniband/hw/ipath/ipath_driver.c
   @@ -31,6 +31,8 @@
 * SOFTWARE.
 */

   +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
   +
#include linux/sched.h
#include linux/spinlock.h
#include linux/idr.h
   @@ -399,8 +401,8 @@ static int ipath_init_one(struct pci_dev *pdev, const 
   struct pci_device_id *ent)
 u32 bar0 = 0, bar1 = 0;

#ifdef CONFIG_X86_64
   - if (WARN(pat_enabled(),
   -  ipath needs PAT disabled, boot with nopat kernel 
   parameter\n)) {
   + if (pat_enabled()) {
   + pr_warn(ipath needs PAT disabled, boot with nopat kernel 
   parameter\n);
 ret = -ENODEV;
 goto bail;
 }
  
  So driver init will always fail with this on modern kernels.
 
 Nope, I double checked this, ipath_init_one() is the PCI probe routine,
 not the module init call. It should probably be renamed.
 
  Btw., on a second thought, ipath uses MTRRs to enable WC:
  
  ret = ipath_enable_wc(dd);
  if (ret)
  ret = 0;
  
  Note how it ignores any failures - the driver still works even if WC was 
  not 
  enabled.
 
 Ah, well WC strategy requires a split of the MMIO registers and the desired
 WC area, right now they are combined for some type of ipath devices. There
 are two things to consider when thinking about whether or not we want to
 do the work required to do the split:

But ... why doing the 'split'?

With my suggested approach the driver will behave in two ways:

  - if booted with 'nopat' it will behave as always and have the WC MTRR 
entries 
added

  - if booted with a modern kernel without 'nopat' then instead of getting WC 
MTRR 
entries it will not get them - we'll fall back to UC. No 'split' or any 
other 
change is needed to the driver AFAICS: it might be slower, but it will 
still 
be functional. It will _not_ get PAT WC mappings - it will fall back to UC 
- 
which is still much better for any potential user than not working at all.

Same suggestion for the other affected driver.

what am I missing?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] x86/mm/pat, drivers/media/ivtv: move pat warn and replace WARN() with pr_warn()

2015-06-25 Thread Ingo Molnar

* Luis R. Rodriguez mcg...@do-not-panic.com wrote:

 From: Luis R. Rodriguez mcg...@suse.com
 
 On built-in kernels this warning will always splat as this is part
 of the module init. Fix that by shifting the PAT requirement check
 out under the code that does the quasi-probe for the device. This
 device driver relies on an existing driver to find its own devices,
 it looks for that device driver and its own found devices, then
 uses driver_for_each_device() to try to see if it can probe each of
 those devices as a frambuffer device with ivtvfb_init_card(). We
 tuck the PAT requiremenet check then on the ivtvfb_init_card()
 call making the check at least require an ivtv device present
 before complaining.
 
 Reported-by: Fengguang Wu fengguang...@intel.com [0-day test robot]
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  drivers/media/pci/ivtv/ivtvfb.c | 15 +--
  1 file changed, 9 insertions(+), 6 deletions(-)
 
 diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
 index 4cb365d..8b95eef 100644
 --- a/drivers/media/pci/ivtv/ivtvfb.c
 +++ b/drivers/media/pci/ivtv/ivtvfb.c
 @@ -38,6 +38,8 @@
  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/module.h
  #include linux/kernel.h
  #include linux/fb.h
 @@ -1171,6 +1173,13 @@ static int ivtvfb_init_card(struct ivtv *itv)
  {
   int rc;
  
 +#ifdef CONFIG_X86_64
 + if (pat_enabled()) {
 + pr_warn(ivtvfb needs PAT disabled, boot with nopat kernel 
 parameter\n);
 + return -ENODEV;
 + }
 +#endif
 +
   if (itv-osd_info) {
   IVTVFB_ERR(Card %d already initialised\n, ivtvfb_card_id);
   return -EBUSY;

Same argument as for ipath: why not make arch_phys_wc_add() fail on PAT and 
return 
-1, and check it in arch_phys_wc_del()?

That way we don't do anything drastic, the remaining few drivers still keep 
working (albeit suboptimally - can be worked around with the 'nopat' boot 
option) 
- yet we've reduced the use of MTRRs drastically.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/mm/pat, drivers/infiniband/ipath: replace WARN() with pr_warn()

2015-06-25 Thread Ingo Molnar

* Luis R. Rodriguez mcg...@do-not-panic.com wrote:

 From: Luis R. Rodriguez mcg...@suse.com
 
 WARN() may confuse users, fix that. ipath_init_one() is part the
 device's probe so this would only be triggered if a corresponding
 device was found.
 
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  drivers/infiniband/hw/ipath/ipath_driver.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
 b/drivers/infiniband/hw/ipath/ipath_driver.c
 index 2d7e503..871dbe5 100644
 --- a/drivers/infiniband/hw/ipath/ipath_driver.c
 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c
 @@ -31,6 +31,8 @@
   * SOFTWARE.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/sched.h
  #include linux/spinlock.h
  #include linux/idr.h
 @@ -399,8 +401,8 @@ static int ipath_init_one(struct pci_dev *pdev, const 
 struct pci_device_id *ent)
   u32 bar0 = 0, bar1 = 0;
  
  #ifdef CONFIG_X86_64
 - if (WARN(pat_enabled(),
 -  ipath needs PAT disabled, boot with nopat kernel 
 parameter\n)) {
 + if (pat_enabled()) {
 + pr_warn(ipath needs PAT disabled, boot with nopat kernel 
 parameter\n);
   ret = -ENODEV;
   goto bail;
   }

So driver init will always fail with this on modern kernels.

Btw., on a second thought, ipath uses MTRRs to enable WC:

ret = ipath_enable_wc(dd);
if (ret)
ret = 0;

Note how it ignores any failures - the driver still works even if WC was not 
enabled.

So why don't you simply extend ipath_enable_wc() to generate the warning 
message 
and return -EINVAL - which still keeps the driver working on modern kernels?

Just inform the user about 'nopat' if he wants WC for this driver.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86/mm/pat, drivers/media/ivtv: replace WARN() with pr_warn()

2015-06-23 Thread Ingo Molnar

* Luis R. Rodriguez mcg...@do-not-panic.com wrote:

 From: Luis R. Rodriguez mcg...@suse.com
 
 On built-in kernels this will always splat. Fix that.
 
 Reported-by: Fengguang Wu fengguang...@intel.com [0-day test robot]
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  drivers/media/pci/ivtv/ivtvfb.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
 index 4cb365d..6f0c364 100644
 --- a/drivers/media/pci/ivtv/ivtvfb.c
 +++ b/drivers/media/pci/ivtv/ivtvfb.c
 @@ -38,6 +38,8 @@
  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/module.h
  #include linux/kernel.h
  #include linux/fb.h
 @@ -1266,8 +1268,8 @@ static int __init ivtvfb_init(void)
   int err;
  
  #ifdef CONFIG_X86_64
 - if (WARN(pat_enabled(),
 -  ivtvfb needs PAT disabled, boot with nopat kernel 
 parameter\n)) {
 + if (pat_enabled()) {
 + pr_warn(ivtvfb needs PAT disabled, boot with nopat kernel 
 parameter\n);
   return -ENODEV;
   }

So why should a built-in kernel bzImage with this driver enabled but the driver 
not present print this warning?

Why not only print in a code path where we know the hardware is present? 

allyesconfig bootups are noisy enough as-is ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86/mm/pat, drivers/infiniband/ipath: replace WARN() with pr_warn()

2015-06-23 Thread Ingo Molnar

* Luis R. Rodriguez mcg...@do-not-panic.com wrote:

 From: Luis R. Rodriguez mcg...@suse.com
 
 On built-in kernels this will always splat. Fix that.
 
 Reported-by: Fengguang Wu fengguang...@intel.com [0-day test robot]
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  drivers/infiniband/hw/ipath/ipath_driver.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
 b/drivers/infiniband/hw/ipath/ipath_driver.c
 index 2d7e503..871dbe5 100644
 --- a/drivers/infiniband/hw/ipath/ipath_driver.c
 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c
 @@ -31,6 +31,8 @@
   * SOFTWARE.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/sched.h
  #include linux/spinlock.h
  #include linux/idr.h
 @@ -399,8 +401,8 @@ static int ipath_init_one(struct pci_dev *pdev, const 
 struct pci_device_id *ent)
   u32 bar0 = 0, bar1 = 0;
  
  #ifdef CONFIG_X86_64
 - if (WARN(pat_enabled(),
 -  ipath needs PAT disabled, boot with nopat kernel 
 parameter\n)) {
 + if (pat_enabled()) {
 + pr_warn(ipath needs PAT disabled, boot with nopat kernel 
 parameter\n);
   ret = -ENODEV;
   goto bail;
   }

Same observation as for the other patch: please only warn if the hardware is 
present and the driver tries to activate. No need to annoy others.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: Revert pinned_vm braindamage

2013-06-20 Thread Ingo Molnar

* Thomas Gleixner t...@linutronix.de wrote:

 On Thu, 13 Jun 2013, Andrew Morton wrote:
  Let's try to get this wrapped up?
  
  On Thu, 6 Jun 2013 14:43:51 +0200 Peter Zijlstra pet...@infradead.org 
  wrote:
  
   
   Patch bc3e53f682 (mm: distinguish between mlocked and pinned pages)
   broke RLIMIT_MEMLOCK.
  
  I rather like what bc3e53f682 did, actually.  RLIMIT_MEMLOCK limits the
  amount of memory you can mlock().  Nice and simple.
  
  This pinning thing which infiniband/perf are doing is conceptually
  different and if we care at all, perhaps we should be looking at adding
  RLIMIT_PINNED.
 
 Actually PINNED is just a stronger version of MEMLOCK. PINNED and 
 MEMLOCK are both preventing the page from being paged out. PINNED adds 
 the constraint of preventing minor faults as well.
 
 So I think the really important tuning knob is the limitation of pages 
 which cannot be paged out. And this is what RLIMIT_MEMLOCK is about.
 
 Now if you want to add RLIMIT_PINNED as well, then it only limits the 
 number of pages which cannot create minor faults, but that does not 
 affect the limitation of total pages which cannot be paged out.

Agreed.

( Furthermore, the RLIMIT_MEMLOCK semantics change actively broke code so
  this is not academic and it would be nice to progress with it. )

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: Revert pinned_vm braindamage

2013-06-20 Thread Ingo Molnar

* Christoph Lameter c...@gentwo.org wrote:

 On Mon, 17 Jun 2013, Peter Zijlstra wrote:
 
  They did no such thing; being one of those who wrote such code. I
  expressly used RLIMIT_MEMLOCK for its the one limit userspace has to
  limit pages that are exempt from paging.
 
 Dont remember reviewing that. Assumptions were wrong in that patch then.
 
   Pinned pages are exempted by the kernel. A device driver or some other
   kernel process (reclaim, page migration, io etc) increase the page count.
   There is currently no consistent accounting for pinned pages. The
   vm_pinned counter was introduced to allow the largest pinners to track
   what they did.
 
  No, not the largest, user space controlled pinnners. The thing that
  makes all the difference is the _USER_ control.
 
 The pinning *cannot* be done from user space. Here it is the IB subsystem
 that is doing it.

Peter clearly pointed it out that in the perf case it's user-space that 
initiates the pinned memory mapping which is resource-controlled via 
RLIMIT_MEMLOCK - and this was implemented that way before your commit 
broke the code.

You seem to be hell bent on defining 'memory pinning' only as the thing 
done via the mlock*() system calls, but that is a nonsensical distinction 
that actively and incorrectly ignores other system calls that can and do 
pin memory legitimately.

If some other system call results in mapping pinned memory that is at 
least as restrictively pinned as an mlock()-ed vma (the perf syscall is 
such) then it's entirely proper design to be resource controlled under 
RLIMIT_MEMLOCK as well. In fact this worked so before your commit broke 
it.

   mlockall does not require CAP_IPC_LOCK. Never had an issue.
 
  MCL_FUTURE does absolutely require CAP_IPC_LOCK, MCL_CURRENT requires 
  a huge (as opposed to the default 64k) RLIMIT or CAP_IPC_LOCK.
 
  There's no argument there, look at the code.
 
 I am sorry but we have been mlockall() for years now without the issues 
 that you are bringing up. AFAICT mlockall does not require MCL_FUTURE.

You only have to read the mlockall() code to see that Peter's claim is 
correct:

mm/mlock.c:

SYSCALL_DEFINE1(mlockall, int, flags)
{
unsigned long lock_limit;
int ret = -EINVAL;

if (!flags || (flags  ~(MCL_CURRENT | MCL_FUTURE)))
goto out;

ret = -EPERM;
if (!can_do_mlock())
goto out;
...


int can_do_mlock(void)
{
if (capable(CAP_IPC_LOCK))
return 1;
if (rlimit(RLIMIT_MEMLOCK) != 0)
return 1;
return 0;
}
EXPORT_SYMBOL(can_do_mlock);

Q.E.D.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK

2013-05-31 Thread Ingo Molnar

* KOSAKI Motohiro kosaki.motoh...@gmail.com wrote:

  I'm unhappy you guys uses offensive word so much. Please cool down all
  you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's
  patch doesn't have cleaner semantics.
 
  Erm, this feature _regressed_ after the patch. All other concerns are
  secondary. What's so difficult to understand about that?
 
 Because it is not new commit at all. Christoph's patch was introduced
 10 releases before.

That's indeed sad - resource limits are not typically tested by apps. The 
lack of Cc:s to people affected also helped hide the effects of the 
change.

 $ git describe bc3e53f682
 v3.1-7235-gbc3e53f
 
 If we just revert it, we may get another and opposite regression report. 
 I'm worried about it. Moreover, I don't think discussion better fix is 
 too difficult for us.

Sure, I agree that a fix is generally better.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK

2013-05-30 Thread Ingo Molnar

* KOSAKI Motohiro kosaki.motoh...@gmail.com wrote:

 Hi
 
 I'm unhappy you guys uses offensive word so much. Please cool down all 
 you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's 
 patch doesn't have cleaner semantics.

Erm, this feature _regressed_ after the patch. All other concerns are 
secondary. What's so difficult to understand about that?

 And PeterZ proposed make new cleaner one rather than revert. No need to 
 hassle.

Sorry, that's not how we deal with regressions upstream...

I've been pretty polite in fact, compared to say Linus - someone should 
put a politically correct summary of:

   https://lkml.org/lkml/2013/2/22/456
   https://lkml.org/lkml/2013/3/26/1113

into Documentation/, people keep forgetting about it.

I'm sorry if being direct and to the point offends you [no, not really], 
this discussion got _way_ off the real track it should be on: that this is 
a _regression_.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[regression] Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK

2013-05-29 Thread Ingo Molnar

* Christoph Lameter c...@linux.com wrote:

 On Mon, 27 May 2013, Peter Zijlstra wrote:
 
  Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
  had a single resource counter. After your patch RLIMIT_MEMLOCK is
  applied separately to both -- more or less.
 
 Before the patch the count was doubled since a single page was counted 
 twice: Once because it was mlocked (marked with PG_mlock) and then again 
 because it was also pinned (the refcount was increased). Two different 
 things.

Christoph, why are you *STILL* arguing??

You caused a *regression* in a userspace ABI plain and simple, and a 
security relevant one. Furtermore you modified kernel/events/core.c yet 
you never even Cc:-ed the parties involved ...

All your excuses, obfuscation and attempts to redefine the universe to 
your liking won't change reality: it worked before, it does not now. Take 
responsibility for your action for christ's sake and move forward towards 
a resolution , okay?

When can we expect a fix from you for the breakage you caused? Or at least 
a word that acknowledges that you broke a user ABI carelessly?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-10-12 Thread Ingo Molnar

* Jason Gunthorpe jguntho...@obsidianresearch.com wrote:

 On Wed, Sep 30, 2009 at 11:44:56AM +0200, Ingo Molnar wrote:
OK.  It would be nice to tie into something more general, but I 
think I agree -- perf counters are missing the filtering and the no 
lost events that ummunotify does have. [...]
  
  Performance events filtering is being worked on and now with the 
  proper non-DoS limit you've added you can lose events too, dont you? 
  So it's all a question of how much buffering to add - and with perf 
  events too you can buffer arbitrary large amount of events.
 
 No, the ummunotify does not loose events, that is the fundamental 
 difference between it and all tracing schemes.
 
 Every call to ibv_reg_mr is paired with a call to ummunotify to create 
 a matching watcher. Both calls allocate some kernel memory, if one 
 fails the entire operation fails and userspace can do whatever it does 
 on memory allocation failure.

We already support signal notification for perf events, and we also 
support two modi of perf ring-buffer overflow notification. Adding a 
third one that sends a signal when events are lost would be in line with 
that.

This would allow you to have the OOM semantics of requesting a SIGBUS - 
or user-space could do other things: like print a warning in the app or 
ignore the event overflow.

Which are all interesting things to do. (If you do that you might want 
to add that to 'perf top' or 'perf record' as well.)

 After that point the scheme is perfectly lossless.

Well if it can OOM it's not lossless, obviously. You just define event 
loss to be equivalent to Destruction of the universe. ;-)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] Re: [GIT PULL] please pull ummunotify

2009-10-12 Thread Ingo Molnar

* Jason Gunthorpe jguntho...@obsidianresearch.com wrote:

 On Mon, Oct 12, 2009 at 08:19:44PM +0200, Ingo Molnar wrote:
 
   After that point the scheme is perfectly lossless.
  
  Well if it can OOM it's not lossless, obviously. You just define 
  event loss to be equivalent to Destruction of the universe. ;-)
 
 It can't OOM once the ummunotify registration is done - when an event 
 occurs it doesn't allocate any memory and it doesn't loose events.

Well, it has built-in event loss via the UMMUNOTIFY_FLAG_HINT mechanism: 
any double events on the same range will cause an imprecise event to be 
recorded and cause the loss of information.

Is that loss of information more acceptable than the loss of information 
via the loss of events?

It might be more acceptable because the flag-hint mechanism can at most 
cause over-flushing - while with perf events we might miss to invalidate 
a range altogether.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html