Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-05-03 Thread Michal Hocko
On Sat 28-04-18 19:10:47, Matthew Wilcox wrote:
> Another way we could approach this is to get rid of ZONE_DMA. Make GFP_DMA
> a flag which doesn't map to a zone. Rather, it redirects to a separate
> allocator. At boot, we hand all memory under 16MB to the DMA allocator. The
> DMA allocator can have a shrinker which just hands back all the memory once
> we're under memory pressure (if it's never had an allocation).

Yeah, that was exactly the plan with the CMA allocator... We wouldn't
need the shrinker because who cares about 16MB which is not usable
anyway.
-- 
Michal Hocko
SUSE Labs


Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-27 Thread Michal Hocko
On Fri 27-04-18 11:07:07, Cristopher Lameter wrote:
> On Fri, 27 Apr 2018, Michal Hocko wrote:
> 
> > On Thu 26-04-18 22:35:56, Christoph Hellwig wrote:
> > > On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote:
> > > > In practice if you don't have a floppy device on x86, you don't need 
> > > > ZONE_DMA,
> > >
> > > I call BS on that, and you actually explain later why it it BS due
> > > to some drivers using it more explicitly.  But even more importantly
> > > we have plenty driver using it through dma_alloc_* and a small DMA
> > > mask, and they are in use - we actually had a 4.16 regression due to
> > > them.
> >
> > Well, but do we need a zone for that purpose? The idea was to actually
> > replace the zone by a CMA pool (at least on x86). With the current
> > implementation of the CMA we would move the range [0-16M] pfn range into
> > zone_movable so it can be used and we would get rid of all of the
> > overhead each zone brings (a bit in page flags, kmalloc caches and who
> > knows what else)
> 
> Well it looks like what we are using it for is to force allocation from
> low physical memory if we fail to obtain proper memory through a normal
> channel.  The use of ZONE_DMA is only there for emergency purposes.
> I think we could subsitute ZONE_DMA32 on x87 without a problem.
> 
> Which means that ZONE_DMA has no purpose anymore.

We still need to make sure the low 16MB is available on request. And
that is what CMA can help with. We do not really seem to need the whole
zone infreastructure for that.

> Can we make ZONE_DMA on x86 refer to the low 32 bit physical addresses
> instead and remove ZONE_DMA32?

Why that would be an advantage? If anything I would rename ZONE_DMA32 to
ZONE_ADDR32 or something like that.

> That would actually improve the fallback because you have more memory for
> the old devices.

I do not really understand how is that related to removal ZONE_DMA. We
are really talking about the lowest 16MB...

-- 
Michal Hocko
SUSE Labs


Re: [LSF/MM TOPIC NOTES] x86 ZONE_DMA love

2018-04-27 Thread Michal Hocko
On Thu 26-04-18 22:35:56, Christoph Hellwig wrote:
> On Thu, Apr 26, 2018 at 09:54:06PM +, Luis R. Rodriguez wrote:
> > In practice if you don't have a floppy device on x86, you don't need 
> > ZONE_DMA,
> 
> I call BS on that, and you actually explain later why it it BS due
> to some drivers using it more explicitly.  But even more importantly
> we have plenty driver using it through dma_alloc_* and a small DMA
> mask, and they are in use - we actually had a 4.16 regression due to
> them.

Well, but do we need a zone for that purpose? The idea was to actually
replace the zone by a CMA pool (at least on x86). With the current
implementation of the CMA we would move the range [0-16M] pfn range into 
zone_movable so it can be used and we would get rid of all of the
overhead each zone brings (a bit in page flags, kmalloc caches and who
knows what else)
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 4/4] mtd: nand: nandsim: convert to memalloc_noreclaim_*()

2017-04-06 Thread Michal Hocko
On Thu 06-04-17 09:33:44, Adrian Hunter wrote:
> On 05/04/17 14:39, Vlastimil Babka wrote:
> > On 04/05/2017 01:36 PM, Richard Weinberger wrote:
> >> Michal,
> >>
> >> Am 05.04.2017 um 13:31 schrieb Michal Hocko:
> >>> On Wed 05-04-17 09:47:00, Vlastimil Babka wrote:
> >>>> Nandsim has own functions set_memalloc() and clear_memalloc() for robust
> >>>> setting and clearing of PF_MEMALLOC. Replace them by the new generic 
> >>>> helpers.
> >>>> No functional change.
> >>>
> >>> This one smells like an abuser. Why the hell should read/write path
> >>> touch memory reserves at all!
> >>
> >> Could be. Let's ask Adrian, AFAIK he wrote that code.
> >> Adrian, can you please clarify why nandsim needs to play with PF_MEMALLOC?
> > 
> > I was thinking about it and concluded that since the simulator can be
> > used as a block device where reclaimed pages go to, writing the data out
> > is a memalloc operation. Then reading can be called as part of r-m-w
> > cycle, so reading as well.
> 
> IIRC it was to avoid getting stuck with nandsim waiting on memory reclaim
> and memory reclaim waiting on nandsim.

I've got lost in the indirection. Could you describe how would reclaim
get stuck waiting on these paths please?

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 4/4] mtd: nand: nandsim: convert to memalloc_noreclaim_*()

2017-04-05 Thread Michal Hocko
On Wed 05-04-17 13:39:16, Vlastimil Babka wrote:
> On 04/05/2017 01:36 PM, Richard Weinberger wrote:
> > Michal,
> > 
> > Am 05.04.2017 um 13:31 schrieb Michal Hocko:
> >> On Wed 05-04-17 09:47:00, Vlastimil Babka wrote:
> >>> Nandsim has own functions set_memalloc() and clear_memalloc() for robust
> >>> setting and clearing of PF_MEMALLOC. Replace them by the new generic 
> >>> helpers.
> >>> No functional change.
> >>
> >> This one smells like an abuser. Why the hell should read/write path
> >> touch memory reserves at all!
> > 
> > Could be. Let's ask Adrian, AFAIK he wrote that code.
> > Adrian, can you please clarify why nandsim needs to play with PF_MEMALLOC?
> 
> I was thinking about it and concluded that since the simulator can be
> used as a block device where reclaimed pages go to, writing the data out
> is a memalloc operation. Then reading can be called as part of r-m-w
> cycle, so reading as well. But it would be great if somebody more
> knowledgeable confirmed this.

then this deserves a big fat comment explaining all the details,
including how the complete depletion of reserves is prevented.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 4/4] mtd: nand: nandsim: convert to memalloc_noreclaim_*()

2017-04-05 Thread Michal Hocko
On Wed 05-04-17 09:47:00, Vlastimil Babka wrote:
> Nandsim has own functions set_memalloc() and clear_memalloc() for robust
> setting and clearing of PF_MEMALLOC. Replace them by the new generic helpers.
> No functional change.

This one smells like an abuser. Why the hell should read/write path
touch memory reserves at all!

> 
> Signed-off-by: Vlastimil Babka <vba...@suse.cz>
> Cc: Boris Brezillon <boris.brezil...@free-electrons.com>
> Cc: Richard Weinberger <rich...@nod.at>
> ---
>  drivers/mtd/nand/nandsim.c | 29 +
>  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
> index cef818f535ed..03a0d057bf2f 100644
> --- a/drivers/mtd/nand/nandsim.c
> +++ b/drivers/mtd/nand/nandsim.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1368,31 +1369,18 @@ static int get_pages(struct nandsim *ns, struct file 
> *file, size_t count, loff_t
>   return 0;
>  }
>  
> -static int set_memalloc(void)
> -{
> - if (current->flags & PF_MEMALLOC)
> - return 0;
> - current->flags |= PF_MEMALLOC;
> - return 1;
> -}
> -
> -static void clear_memalloc(int memalloc)
> -{
> - if (memalloc)
> - current->flags &= ~PF_MEMALLOC;
> -}
> -
>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, 
> size_t count, loff_t pos)
>  {
>   ssize_t tx;
> - int err, memalloc;
> + int err;
> + unsigned int noreclaim_flag;
>  
>   err = get_pages(ns, file, count, pos);
>   if (err)
>   return err;
> - memalloc = set_memalloc();
> + noreclaim_flag = memalloc_noreclaim_save();
>   tx = kernel_read(file, pos, buf, count);
> - clear_memalloc(memalloc);
> + memalloc_noreclaim_restore(noreclaim_flag);
>   put_pages(ns);
>   return tx;
>  }
> @@ -1400,14 +1388,15 @@ static ssize_t read_file(struct nandsim *ns, struct 
> file *file, void *buf, size_
>  static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, 
> size_t count, loff_t pos)
>  {
>   ssize_t tx;
> - int err, memalloc;
> + int err;
> + unsigned int noreclaim_flag;
>  
>   err = get_pages(ns, file, count, pos);
>   if (err)
>   return err;
> - memalloc = set_memalloc();
> + noreclaim_flag = memalloc_noreclaim_save();
>   tx = kernel_write(file, buf, count, pos);
> - clear_memalloc(memalloc);
> + memalloc_noreclaim_restore(noreclaim_flag);
>   put_pages(ns);
>   return tx;
>  }
> -- 
> 2.12.2

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 2/4] mm: introduce memalloc_noreclaim_{save,restore}

2017-04-05 Thread Michal Hocko
On Wed 05-04-17 09:46:58, Vlastimil Babka wrote:
> The previous patch has shown that simply setting and clearing PF_MEMALLOC in
> current->flags can result in wrongly clearing a pre-existing PF_MEMALLOC flag
> and potentially lead to recursive reclaim. Let's introduce helpers that 
> support
> proper nesting by saving the previous stat of the flag, similar to the 
> existing
> memalloc_noio_* and memalloc_nofs_* helpers. Convert existing setting/clearing
> of PF_MEMALLOC within mm to the new helpers.
> 
> There are no known issues with the converted code, but the change makes it 
> more
> robust.
> 
> Suggested-by: Michal Hocko <mho...@suse.com>
> Signed-off-by: Vlastimil Babka <vba...@suse.cz>

One could argue that tsk_restore_flags() could be extended to provide
tsk_set_flags() and use it for all allocation related PF flags. I do not
have a strong opinion on that but explicit API sounds a bit better to me
because is easier to follow (at least for me). If others think that
generic API would be better then I won't have any objections. Anyway
this looks good to me.
Acked-by: Michal Hocko <mho...@suse.com>

> ---
>  include/linux/sched/mm.h | 12 
>  mm/page_alloc.c  | 11 ++-
>  mm/vmscan.c  | 17 +++--
>  3 files changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 9daabe138c99..2b24a6974847 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -191,4 +191,16 @@ static inline void memalloc_nofs_restore(unsigned int 
> flags)
>   current->flags = (current->flags & ~PF_MEMALLOC_NOFS) | flags;
>  }
>  
> +static inline unsigned int memalloc_noreclaim_save(void)
> +{
> + unsigned int flags = current->flags & PF_MEMALLOC;
> + current->flags |= PF_MEMALLOC;
> + return flags;
> +}
> +
> +static inline void memalloc_noreclaim_restore(unsigned int flags)
> +{
> + current->flags = (current->flags & ~PF_MEMALLOC) | flags;
> +}
> +
>  #endif /* _LINUX_SCHED_MM_H */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b84e6ffbe756..037e32dccd7a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3288,15 +3288,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
> int order,
>   enum compact_priority prio, enum compact_result *compact_result)
>  {
>   struct page *page;
> - unsigned int noreclaim_flag = current->flags & PF_MEMALLOC;
> + unsigned int noreclaim_flag;
>  
>   if (!order)
>   return NULL;
>  
> - current->flags |= PF_MEMALLOC;
> + noreclaim_flag = memalloc_noreclaim_save();
>   *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
>   prio);
> - current->flags = (current->flags & ~PF_MEMALLOC) | noreclaim_flag;
> + memalloc_noreclaim_restore(noreclaim_flag);
>  
>   if (*compact_result <= COMPACT_INACTIVE)
>   return NULL;
> @@ -3443,12 +3443,13 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
>  {
>   struct reclaim_state reclaim_state;
>   int progress;
> + unsigned int noreclaim_flag;
>  
>   cond_resched();
>  
>   /* We now go into synchronous reclaim */
>   cpuset_memory_pressure_bump();
> - current->flags |= PF_MEMALLOC;
> + noreclaim_flag = memalloc_noreclaim_save();
>   lockdep_set_current_reclaim_state(gfp_mask);
>   reclaim_state.reclaimed_slab = 0;
>   current->reclaim_state = _state;
> @@ -3458,7 +3459,7 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
>  
>   current->reclaim_state = NULL;
>   lockdep_clear_current_reclaim_state();
> - current->flags &= ~PF_MEMALLOC;
> + memalloc_noreclaim_restore(noreclaim_flag);
>  
>   cond_resched();
>  
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 58615bb27f2f..ff63b91a0f48 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2992,6 +2992,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct 
> mem_cgroup *memcg,
>   struct zonelist *zonelist;
>   unsigned long nr_reclaimed;
>   int nid;
> + unsigned int noreclaim_flag;
>   struct scan_control sc = {
>   .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX),
>   .gfp_mask = (current_gfp_context(gfp_mask) & GFP_RECLAIM_MASK) |
> @@ -3018,9 +3019,9 @@ unsigned long try_to_free_mem_cgroup_pages(struct 
> mem_cgroup *memcg,
>   sc.gfp_mask,
>   sc.reclaim

Re: [PATCH 3/4] treewide: convert PF_MEMALLOC manipulations to new helpers

2017-04-05 Thread Michal Hocko
On Wed 05-04-17 09:46:59, Vlastimil Babka wrote:
> We now have memalloc_noreclaim_{save,restore} helpers for robust setting and
> clearing of PF_MEMALLOC. Let's convert the code which was using the generic
> tsk_restore_flags(). No functional change.

It would be really great to revisit why those places outside of the mm
proper really need this flag. I know this is a painful exercise but I
wouldn't be surprised if there were abusers there.

> Signed-off-by: Vlastimil Babka <vba...@suse.cz>
> Cc: Josef Bacik <jba...@fb.com>
> Cc: Lee Duncan <ldun...@suse.com>
> Cc: Chris Leech <cle...@redhat.com>
> Cc: "David S. Miller" <da...@davemloft.net>
> Cc: Eric Dumazet <eduma...@google.com>

Acked-by: Michal Hocko <mho...@suse.com>

> ---
>  drivers/block/nbd.c  | 7 ---
>  drivers/scsi/iscsi_tcp.c | 7 ---
>  net/core/dev.c   | 7 ---
>  net/core/sock.c  | 7 ---
>  4 files changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 03ae72985c79..929fc548c7fb 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -210,7 +211,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, 
> int send,
>   struct socket *sock = nbd->socks[index]->sock;
>   int result;
>   struct msghdr msg;
> - unsigned long pflags = current->flags;
> + unsigned int noreclaim_flag;
>  
>   if (unlikely(!sock)) {
>   dev_err_ratelimited(disk_to_dev(nbd->disk),
> @@ -221,7 +222,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, 
> int send,
>  
>   msg.msg_iter = *iter;
>  
> - current->flags |= PF_MEMALLOC;
> + noreclaim_flag = memalloc_noreclaim_save();
>   do {
>   sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC;
>   msg.msg_name = NULL;
> @@ -244,7 +245,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, 
> int send,
>   *sent += result;
>   } while (msg_data_left());
>  
> - tsk_restore_flags(current, pflags, PF_MEMALLOC);
> + memalloc_noreclaim_restore(noreclaim_flag);
>  
>   return result;
>  }
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index 4228aba1f654..4842fc0e809d 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -371,10 +372,10 @@ static inline int iscsi_sw_tcp_xmit_qlen(struct 
> iscsi_conn *conn)
>  static int iscsi_sw_tcp_pdu_xmit(struct iscsi_task *task)
>  {
>   struct iscsi_conn *conn = task->conn;
> - unsigned long pflags = current->flags;
> + unsigned int noreclaim_flag;
>   int rc = 0;
>  
> - current->flags |= PF_MEMALLOC;
> + noreclaim_flag = memalloc_noreclaim_save();
>  
>   while (iscsi_sw_tcp_xmit_qlen(conn)) {
>   rc = iscsi_sw_tcp_xmit(conn);
> @@ -387,7 +388,7 @@ static int iscsi_sw_tcp_pdu_xmit(struct iscsi_task *task)
>   rc = 0;
>   }
>  
> - tsk_restore_flags(current, pflags, PF_MEMALLOC);
> + memalloc_noreclaim_restore(noreclaim_flag);
>   return rc;
>  }
>  
> diff --git a/net/core/dev.c b/net/core/dev.c
> index fde8b3f7136b..e0705a126b24 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -81,6 +81,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -4227,7 +4228,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>   int ret;
>  
>   if (sk_memalloc_socks() && skb_pfmemalloc(skb)) {
> - unsigned long pflags = current->flags;
> + unsigned int noreclaim_flag;
>  
>   /*
>* PFMEMALLOC skbs are special, they should
> @@ -4238,9 +4239,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>* Use PF_MEMALLOC as this saves us from propagating the 
> allocation
>* context down to all allocation sites.
>*/
> - current->flags |= PF_MEMALLOC;
> + noreclaim_flag = memalloc_noreclaim_save();
>   ret = __netif_receive_skb_core(skb, true);
> - tsk_restore_flags(current, pflags, PF_MEMALLOC);
> + memalloc_noreclaim_restore(noreclaim_flag);
>   } else
>   ret = __netif_receive_skb_core(skb, false);
>  
> diff --git a/net/core/sock.c b/n

Re: [PATCH 1/4] mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC

2017-04-05 Thread Michal Hocko
On Wed 05-04-17 09:46:57, Vlastimil Babka wrote:
> The function __alloc_pages_direct_compact() sets PF_MEMALLOC to prevent
> deadlock during page migration by lock_page() (see the comment in
> __unmap_and_move()). Then it unconditionally clears the flag, which can clear 
> a
> pre-existing PF_MEMALLOC flag and result in recursive reclaim. This was not a
> problem until commit a8161d1ed609 ("mm, page_alloc: restructure direct
> compaction handling in slowpath"), because direct compation was called only
> after direct reclaim, which was skipped when PF_MEMALLOC flag was set.
> 
> Even now it's only a theoretical issue, as the new callsite of
> __alloc_pages_direct_compact() is reached only for costly orders and when
> gfp_pfmemalloc_allowed() is true, which means either __GFP_NOMEMALLOC is in
> gfp_flags or in_interrupt() is true. There is no such known context, but let's
> play it safe and make __alloc_pages_direct_compact() robust for cases where
> PF_MEMALLOC is already set.
> 
> Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling 
> in slowpath")
> Reported-by: Andrey Ryabinin <aryabi...@virtuozzo.com>
> Signed-off-by: Vlastimil Babka <vba...@suse.cz>
> Cc: <sta...@vger.kernel.org>

Acked-by: Michal Hocko <mho...@suse.com>

> ---
>  mm/page_alloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3589f8be53be..b84e6ffbe756 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3288,6 +3288,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
> int order,
>   enum compact_priority prio, enum compact_result *compact_result)
>  {
>   struct page *page;
> + unsigned int noreclaim_flag = current->flags & PF_MEMALLOC;
>  
>   if (!order)
>   return NULL;
> @@ -3295,7 +3296,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
> int order,
>   current->flags |= PF_MEMALLOC;
>   *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
>   prio);
> - current->flags &= ~PF_MEMALLOC;
> + current->flags = (current->flags & ~PF_MEMALLOC) | noreclaim_flag;
>  
>   if (*compact_result <= COMPACT_INACTIVE)
>   return NULL;
> -- 
> 2.12.2

-- 
Michal Hocko
SUSE Labs


Re: LSF/MM 2017: Call for Proposals

2016-12-08 Thread Michal Hocko
On Thu 08-12-16 07:30:43, James Bottomley wrote:
> On Thu, 2016-12-08 at 13:26 +0100, Michal Hocko wrote:
> > On Wed 07-12-16 06:57:06, James Bottomley wrote:
> > [...]
> > > Just on this point, since there seems to be a lot of confusion: lsf
> > > -pc
> > > is the list for contacting the programme committee, so you cannot
> > > subscribe to it.
> > > 
> > > There is no -discuss equivalent, like kernel summit has, because we
> > > expect you to cc the relevant existing mailing list and have the
> > > discussion there instead rather than expecting people to subscribe
> > > to a
> > > new list.
> > 
> > There used to be l...@lists.linux-foundation.org. Is it not active
> > anymore?
> 
> Not until we get the list of invitees sorted out.  And then it's only
> for stuff actually to do with lsf (like logistics, bof or session
> requests or other meetups).  Any technical stuff should still be cc'd
> to the relevant mailing list.

OK, I tend to forget about that. Thanks for the clarification!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LSF/MM 2017: Call for Proposals

2016-12-08 Thread Michal Hocko
On Wed 07-12-16 06:57:06, James Bottomley wrote:
[...]
> Just on this point, since there seems to be a lot of confusion: lsf-pc
> is the list for contacting the programme committee, so you cannot
> subscribe to it.
> 
> There is no -discuss equivalent, like kernel summit has, because we
> expect you to cc the relevant existing mailing list and have the
> discussion there instead rather than expecting people to subscribe to a
> new list.

There used to be l...@lists.linux-foundation.org. Is it not active
anymore?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-20 Thread Michal Hocko
On Wed 19-10-16 10:23:55, Dave Hansen wrote:
> On 10/19/2016 10:01 AM, Michal Hocko wrote:
> > The question I had earlier was whether this has to be an explicit FOLL
> > flag used by g-u-p users or we can just use it internally when mm !=
> > current->mm
> 
> The reason I chose not to do that was that deferred work gets run under
> a basically random 'current'.  If we just use 'mm != current->mm', then
> the deferred work will sometimes have pkeys enforced and sometimes not,
> basically randomly.

OK, I see (async_pf_execute and ksm ). It makes more sense to me. Thanks
for the clarification.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-19 Thread Michal Hocko
On Wed 19-10-16 09:49:43, Dave Hansen wrote:
> On 10/19/2016 02:07 AM, Michal Hocko wrote:
> > On Wed 19-10-16 09:58:15, Lorenzo Stoakes wrote:
> >> On Tue, Oct 18, 2016 at 05:30:50PM +0200, Michal Hocko wrote:
> >>> I am wondering whether we can go further. E.g. it is not really clear to
> >>> me whether we need an explicit FOLL_REMOTE when we can in fact check
> >>> mm != current->mm and imply that. Maybe there are some contexts which
> >>> wouldn't work, I haven't checked.
> >>
> >> This flag is set even when /proc/self/mem is used. I've not looked deeply 
> >> into
> >> this flag but perhaps accessing your own memory this way can be considered
> >> 'remote' since you're not accessing it directly. On the other hand, 
> >> perhaps this
> >> is just mistaken in this case?
> > 
> > My understanding of the flag is quite limited as well. All I know it is
> > related to protection keys and it is needed to bypass protection check.
> > See arch_vma_access_permitted. See also 1b2ee1266ea6 ("mm/core: Do not
> > enforce PKEY permissions on remote mm access").
> 
> Yeah, we need the flag to tell us when PKEYs should be applied or not.
> The current task's PKRU (pkey rights register) should really only be
> used to impact access to the task's memory, but has no bearing on how a
> given task should access remote memory.

The question I had earlier was whether this has to be an explicit FOLL
flag used by g-u-p users or we can just use it internally when mm !=
current->mm

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-19 Thread Michal Hocko
On Wed 19-10-16 09:58:15, Lorenzo Stoakes wrote:
> On Tue, Oct 18, 2016 at 05:30:50PM +0200, Michal Hocko wrote:
> > I am wondering whether we can go further. E.g. it is not really clear to
> > me whether we need an explicit FOLL_REMOTE when we can in fact check
> > mm != current->mm and imply that. Maybe there are some contexts which
> > wouldn't work, I haven't checked.
> 
> This flag is set even when /proc/self/mem is used. I've not looked deeply into
> this flag but perhaps accessing your own memory this way can be considered
> 'remote' since you're not accessing it directly. On the other hand, perhaps 
> this
> is just mistaken in this case?

My understanding of the flag is quite limited as well. All I know it is
related to protection keys and it is needed to bypass protection check.
See arch_vma_access_permitted. See also 1b2ee1266ea6 ("mm/core: Do not
enforce PKEY permissions on remote mm access").

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Michal Hocko
On Wed 19-10-16 10:06:46, Lorenzo Stoakes wrote:
> On Wed, Oct 19, 2016 at 10:52:05AM +0200, Michal Hocko wrote:
> > yes this is the desirable and expected behavior.
> >
> > > wonder if this is desirable behaviour or whether this ought to be limited 
> > > to
> > > ptrace system calls. Regardless, by making the flag more visible it makes 
> > > it
> > > easier to see that this is happening.
> >
> > mem_open already enforces PTRACE_MODE_ATTACH
> 
> Ah I missed this, that makes a lot of sense, thanks!
> 
> I still wonder whether other invocations of access_remote_vm() in 
> fs/proc/base.c
> (the principle caller of this function) need FOLL_FORCE, for example the 
> various
> calls that simply read data from other processes, so I think the point stands
> about keeping this explicit.

I do agree. Making them explicit will help to clean them up later,
should there be a need.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Michal Hocko
On Wed 19-10-16 09:59:03, Jan Kara wrote:
> On Thu 13-10-16 01:20:18, Lorenzo Stoakes wrote:
> > This patch removes the write parameter from __access_remote_vm() and 
> > replaces it
> > with a gup_flags parameter as use of this function previously _implied_
> > FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
> > 
> > We make this explicit as use of FOLL_FORCE can result in surprising 
> > behaviour
> > (and hence bugs) within the mm subsystem.
> > 
> > Signed-off-by: Lorenzo Stoakes <lstoa...@gmail.com>
> 
> So I'm not convinced this (and the following two patches) is actually
> helping much. By grepping for FOLL_FORCE we will easily see that any caller
> of access_remote_vm() gets that semantics and can thus continue search

I am really wondering. Is there anything inherent that would require
FOLL_FORCE for access_remote_vm? I mean FOLL_FORCE is a really
non-trivial thing. It doesn't obey vma permissions so we should really
minimize its usage. Do all of those users really need FOLL_FORCE?

Anyway I would rather see the flag explicit and used at more places than
hidden behind a helper function.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] mm: replace __access_remote_vm() write parameter with gup_flags

2016-10-19 Thread Michal Hocko
On Wed 19-10-16 09:40:45, Lorenzo Stoakes wrote:
> On Wed, Oct 19, 2016 at 10:13:52AM +0200, Michal Hocko wrote:
> > On Wed 19-10-16 09:59:03, Jan Kara wrote:
> > > On Thu 13-10-16 01:20:18, Lorenzo Stoakes wrote:
> > > > This patch removes the write parameter from __access_remote_vm() and 
> > > > replaces it
> > > > with a gup_flags parameter as use of this function previously _implied_
> > > > FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
> > > >
> > > > We make this explicit as use of FOLL_FORCE can result in surprising 
> > > > behaviour
> > > > (and hence bugs) within the mm subsystem.
> > > >
> > > > Signed-off-by: Lorenzo Stoakes <lstoa...@gmail.com>
> > >
> > > So I'm not convinced this (and the following two patches) is actually
> > > helping much. By grepping for FOLL_FORCE we will easily see that any 
> > > caller
> > > of access_remote_vm() gets that semantics and can thus continue search
> >
> > I am really wondering. Is there anything inherent that would require
> > FOLL_FORCE for access_remote_vm? I mean FOLL_FORCE is a really
> > non-trivial thing. It doesn't obey vma permissions so we should really
> > minimize its usage. Do all of those users really need FOLL_FORCE?
> 
> I wonder about this also, for example by accessing /proc/self/mem you trigger
> access_remote_vm() and consequently get_user_pages_remote() meaning FOLL_FORCE
> is implied and you can use /proc/self/mem to override any VMA permissions. I

yes this is the desirable and expected behavior. 

> wonder if this is desirable behaviour or whether this ought to be limited to
> ptrace system calls. Regardless, by making the flag more visible it makes it
> easier to see that this is happening.

mem_open already enforces PTRACE_MODE_ATTACH

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] mm: adjust get_user_pages* functions to explicitly pass FOLL_* flags

2016-10-18 Thread Michal Hocko
On Thu 13-10-16 01:20:10, Lorenzo Stoakes wrote:
> This patch series adjusts functions in the get_user_pages* family such that
> desired FOLL_* flags are passed as an argument rather than implied by flags.
> 
> The purpose of this change is to make the use of FOLL_FORCE explicit so it is
> easier to grep for and clearer to callers that this flag is being used. The 
> use
> of FOLL_FORCE is an issue as it overrides missing VM_READ/VM_WRITE flags for 
> the
> VMA whose pages we are reading from/writing to, which can result in surprising
> behaviour.
> 
> The patch series came out of the discussion around commit 38e0885, which
> addressed a BUG_ON() being triggered when a page was faulted in with PROT_NONE
> set but having been overridden by FOLL_FORCE. do_numa_page() was run on the
> assumption the page _must_ be one marked for NUMA node migration as an actual
> PROT_NONE page would have been dealt with prior to this code path, however
> FOLL_FORCE introduced a situation where this assumption did not hold.
> 
> See https://marc.info/?l=linux-mm=147585445805166 for the patch proposal.

I like this cleanup. Tracking FOLL_FORCE users was always a nightmare
and the flag behavior is really subtle so we should better be explicit
about it. I haven't gone through each patch separately but rather
applied the whole series and checked the resulting diff. This all seems
OK to me and feel free to add
Acked-by: Michal Hocko <mho...@suse.com>

I am wondering whether we can go further. E.g. it is not really clear to
me whether we need an explicit FOLL_REMOTE when we can in fact check
mm != current->mm and imply that. Maybe there are some contexts which
wouldn't work, I haven't checked.

Then I am also wondering about FOLL_TOUCH behavior.
__get_user_pages_unlocked has only few callers which used to be
get_user_pages_unlocked before 1e9877902dc7e ("mm/gup: Introduce
get_user_pages_remote()"). To me a dropped FOLL_TOUCH seems
unintentional. Now that get_user_pages_unlocked has gup_flags argument I
guess we might want to get rid of the __g-u-p-u version altogether, no?

__get_user_pages is quite low level and imho shouldn't be exported. It's
only user - kvm - should rather pull those two functions to gup instead
and export them. There is nothing really KVM specific in them.

I also cannot say I would be entirely thrilled about get_user_pages_locked,
we only have one user which can simply do lock g-u-p unlock AFAICS.

I guess there is more work in that area and I do not want to impose all
that work on you, but I couldn't resist once I saw you playing in that
area ;) Definitely a good start!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: fix scsi_error_handler vs. scsi_host_dev_release race

2015-08-28 Thread Michal Hocko
On Fri 28-08-15 07:56:13, James Bottomley wrote:
[...]
  diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
  index 6457a8a0db9c..2c0a817d5dbe 100644
  --- a/drivers/scsi/scsi_error.c
  +++ b/drivers/scsi/scsi_error.c
  @@ -2169,8 +2169,11 @@ int scsi_error_handler(void *data)
   * We never actually get interrupted because kthread_run
   * disables signal delivery for the created thread.
   */
  -   while (!kthread_should_stop()) {
  +   while (true) {
 
 Comment here, I think, please to avoid any other erroneous tidying
 attempts.  How about
 
 /* 
  * The sequence in kthread_stop() sets the stop flag first then 
  * wakes the process.  To avoid missed wakeups, the task should always
  * be in a non running state before the stop flag is checked
  */
 
 Otherwise this looks fine.

I do not have objections to the added comment.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html