Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
On Mon, 28 Sep 2015 11:28:15 -0500 (CDT)
Christoph Lameter  wrote:

> On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote:
> 
> > > Do you really need separate parameters for freelist_head? If you just want
> > > to deal with one object pass it as freelist_head and set cnt = 1?
> >
> > Yes, I need it.  We need to know both the head and tail of the list to
> > splice it.
> 
> Ok so this is to avoid having to scan the list to its end?

True.

> x is the end
> of the list and freelist_head the beginning. That is weird.

Yes, it is a bit weird... the bulk free of freelists comes out as a
second-class citizen.

Okay, I'll try to change the slab_free() and __slab_free() calls to
have a "head" and "tail".  And let tail be NULL on single object free,
to allow compiler to do constant propagation (thus keeping existing
fastpath unaffected).  (The same code should be generated, but we will
have a more intuitive API).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer

On Mon, 28 Sep 2015 11:30:00 -0500 (CDT) Christoph Lameter  
wrote:

> On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote:
> 
> > Not knowing SLUB as well as you, it took me several hours to realize
> > init_object() didn't overwrite the freepointer in the object.  Thus, I
> > think these comments make the reader aware of not-so-obvious
> > side-effects of SLAB_POISON and SLAB_RED_ZONE.
> 
> From the source:
> 
> /*
>  * Object layout:
>  *
>  * object address
>  *  Bytes of the object to be managed.
>  *  If the freepointer may overlay the object then the free
>  *  pointer is the first word of the object.
>  *
>  *  Poisoning uses 0x6b (POISON_FREE) and the last byte is
>  *  0xa5 (POISON_END)
>  *
>  * object + s->object_size
>  *  Padding to reach word boundary. This is also used for Redzoning.
>  *  Padding is extended by another word if Redzoning is enabled and
>  *  object_size == inuse.
>  *
>  *  We fill with 0xbb (RED_INACTIVE) for inactive objects and with
>  *  0xcc (RED_ACTIVE) for objects in use.
>  *
>  * object + s->inuse
>  *  Meta data starts here.
>  *
>  *  A. Free pointer (if we cannot overwrite object on free)
>  *  B. Tracking data for SLAB_STORE_USER
>  *  C. Padding to reach required alignment boundary or at mininum
>  *  one word if debugging is on to be able to detect writes
>  *  before the word boundary.

Okay, I will remove the comment.

The best doc on SLUB and SLAB layout comes from your slides titled:
 "Slab allocators in the Linux Kernel: SLAB, SLOB, SLUB"

Lets gracefully add a link to the slides here:
 http://events.linuxfoundation.org/sites/events/files/slides/slaballocators.pdf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-28 Thread Christoph Lameter
On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote:

> Not knowing SLUB as well as you, it took me several hours to realize
> init_object() didn't overwrite the freepointer in the object.  Thus, I
> think these comments make the reader aware of not-so-obvious
> side-effects of SLAB_POISON and SLAB_RED_ZONE.

>From the source:

/*
 * Object layout:
 *
 * object address
 *  Bytes of the object to be managed.
 *  If the freepointer may overlay the object then the free
 *  pointer is the first word of the object.
 *
 *  Poisoning uses 0x6b (POISON_FREE) and the last byte is
 *  0xa5 (POISON_END)
 *
 * object + s->object_size
 *  Padding to reach word boundary. This is also used for Redzoning.
 *  Padding is extended by another word if Redzoning is enabled and
 *  object_size == inuse.
 *
 *  We fill with 0xbb (RED_INACTIVE) for inactive objects and with
 *  0xcc (RED_ACTIVE) for objects in use.
 *
 * object + s->inuse
 *  Meta data starts here.
 *
 *  A. Free pointer (if we cannot overwrite object on free)
 *  B. Tracking data for SLAB_STORE_USER
 *  C. Padding to reach required alignment boundary or at mininum
 *  one word if debugging is on to be able to detect writes
 *  before the word boundary.
 *
 *  Padding is done using 0x5a (POISON_INUSE)
 *
 * object + s->size
 *  Nothing is used beyond s->size.
 *
 * If slabcaches are merged then the object_size and inuse boundaries are
mostly
 * ignored. And therefore no slab options that rely on these boundaries
 * may be used with merged slabcaches.
 */

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-28 Thread Christoph Lameter
On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote:

> > Do you really need separate parameters for freelist_head? If you just want
> > to deal with one object pass it as freelist_head and set cnt = 1?
>
> Yes, I need it.  We need to know both the head and tail of the list to
> splice it.

Ok so this is to avoid having to scan the list to its end? x is the end
of the list and freelist_head the beginning. That is weird.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-28 Thread Jesper Dangaard Brouer
On Mon, 28 Sep 2015 10:16:49 -0500 (CDT)
Christoph Lameter  wrote:

> On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote:
> 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 1cf98d89546d..13b5f53e4840 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -675,11 +675,18 @@ static void init_object(struct kmem_cache *s, void 
> > *object, u8 val)
> >  {
> > u8 *p = object;
> >
> > +   /* Freepointer not overwritten as SLAB_POISON moved it after object */
> > if (s->flags & __OBJECT_POISON) {
> > memset(p, POISON_FREE, s->object_size - 1);
> > p[s->object_size - 1] = POISON_END;
> > }
> >
> > +   /*
> > +* If both SLAB_RED_ZONE and SLAB_POISON are enabled, then
> > +* freepointer is still safe, as then s->offset equals
> > +* s->inuse and below redzone is after s->object_size and only
> > +* area between s->object_size and s->inuse.
> > +*/
> > if (s->flags & SLAB_RED_ZONE)
> > memset(p + s->object_size, val, s->inuse - s->object_size);
> >  }
> 
> Are these comments really adding something? This is basic metadata
> handling for SLUB that is commented on elsehwere.

Not knowing SLUB as well as you, it took me several hours to realize
init_object() didn't overwrite the freepointer in the object.  Thus, I
think these comments make the reader aware of not-so-obvious
side-effects of SLAB_POISON and SLAB_RED_ZONE.


> > @@ -2584,9 +2646,14 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
> >   * So we still attempt to reduce cache line usage. Just take the slab
> >   * lock and free the item. If there is no additional partial page
> >   * handling required then we can return immediately.
> > + *
> > + * Bulk free of a freelist with several objects (all pointing to the
> > + * same page) possible by specifying freelist_head ptr and object as
> > + * tail ptr, plus objects count (cnt).
> >   */
> >  static void __slab_free(struct kmem_cache *s, struct page *page,
> > -   void *x, unsigned long addr)
> > +   void *x, unsigned long addr,
> > +   void *freelist_head, int cnt)
> 
> Do you really need separate parameters for freelist_head? If you just want
> to deal with one object pass it as freelist_head and set cnt = 1?

Yes, I need it.  We need to know both the head and tail of the list to
splice it.

See:

> @@ -2612,7 +2681,7 @@ static void __slab_free(struct kmem_cache *s, struct 
> page *page,
prior = page->freelist;
counters = page->counters;
>   set_freepointer(s, object, prior);
   ^^ 
Here we update the tail ptr (object) to point to "prior" (page->freelist).

>   new.counters = counters;
>   was_frozen = new.frozen;
> - new.inuse--;
> + new.inuse -= cnt;
>   if ((!new.inuse || !prior) && !was_frozen) {
>  
>   if (kmem_cache_has_cpu_partial(s) && !prior) {
> @@ -2643,7 +2712,7 @@ static void __slab_free(struct kmem_cache *s, struct 
> page *page,
>  
>   } while (!cmpxchg_double_slab(s, page,
>   prior, counters,
> - object, new.counters,
> + new_freelist, new.counters,
>   "__slab_free"));

Here we update page->freelist ("prior") to point to the head. Thus,
splicing the list.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-28 Thread Christoph Lameter
On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote:

> diff --git a/mm/slub.c b/mm/slub.c
> index 1cf98d89546d..13b5f53e4840 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -675,11 +675,18 @@ static void init_object(struct kmem_cache *s, void 
> *object, u8 val)
>  {
>   u8 *p = object;
>
> + /* Freepointer not overwritten as SLAB_POISON moved it after object */
>   if (s->flags & __OBJECT_POISON) {
>   memset(p, POISON_FREE, s->object_size - 1);
>   p[s->object_size - 1] = POISON_END;
>   }
>
> + /*
> +  * If both SLAB_RED_ZONE and SLAB_POISON are enabled, then
> +  * freepointer is still safe, as then s->offset equals
> +  * s->inuse and below redzone is after s->object_size and only
> +  * area between s->object_size and s->inuse.
> +  */
>   if (s->flags & SLAB_RED_ZONE)
>   memset(p + s->object_size, val, s->inuse - s->object_size);
>  }

Are these comments really adding something? This is basic metadata
handling for SLUB that is commented on elsehwere.

> @@ -2584,9 +2646,14 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
>   * So we still attempt to reduce cache line usage. Just take the slab
>   * lock and free the item. If there is no additional partial page
>   * handling required then we can return immediately.
> + *
> + * Bulk free of a freelist with several objects (all pointing to the
> + * same page) possible by specifying freelist_head ptr and object as
> + * tail ptr, plus objects count (cnt).
>   */
>  static void __slab_free(struct kmem_cache *s, struct page *page,
> - void *x, unsigned long addr)
> + void *x, unsigned long addr,
> + void *freelist_head, int cnt)

Do you really need separate parameters for freelist_head? If you just want
to deal with one object pass it as freelist_head and set cnt = 1?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html