Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-21 Thread Konrad Rzeszutek Wilk
On Fri, Sep 21, 2012 at 05:12:52PM +0100, Mel Gorman wrote:
> On Tue, Sep 04, 2012 at 04:34:46PM -0500, Seth Jennings wrote:
> > zcache is the remaining piece of code required to support in-kernel
> > memory compression.  The other two features, cleancache and frontswap,
> > have been promoted to mainline in 3.0 and 3.5 respectively.  This
> > patchset promotes zcache from the staging tree to mainline.
> > 
> 
> This is a very rough review of the code simply because I was asked to
> look at it. I'm barely aware of the history and I'm not a user of this
> code myself so take all of this with a grain of salt.

Ah fresh set of eyes! Yeey!
> 
> Very broadly speaking my initial reaction before I reviewed anything was
> that *some* sort of usable backend for cleancache or frontswap should exist
> at this point. My understanding is that Xen is the primary user of both
> those frontends and ramster, while interesting, is not something that a
> typical user will benefit from.

Right, the majority of users do not use virtualization. Thought embedded
wise .. well, there are a lot of Android users - thought I am not 100%
sure they are using it right now (I recall seeing changelogs for the clones
of Android mentioning zcache).
> 
> That said, I worry that this has bounced around a lot and as Dan (the
> original author) has a rewrite. I'm wary of spending too much time on this
> at all. Is Dan's new code going to replace this or what? It'd be nice to
> find a definitive answer on that.

The idea is to take parts of zcache2 as seperate patches and stick it
in the code you just reviewed (those that make sense as part of unstaging).
The end result will be that zcache1 == zcache2 in functionality. Right
now we are assembling a list of TODOs for zcache that should be done as part
of 'unstaging'.

> 
> Anyway, here goes

.. and your responses will fill the TODO with many extra line-items.

Its going to take a bit of time to mull over your questions, so it will
take me some time. Also Dan will probably beat me in providing the answers.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-21 Thread Seth Jennings
On 09/21/2012 01:02 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 21, 2012 at 05:12:52PM +0100, Mel Gorman wrote:
>> On Tue, Sep 04, 2012 at 04:34:46PM -0500, Seth Jennings wrote:
>>> zcache is the remaining piece of code required to support in-kernel
>>> memory compression.  The other two features, cleancache and frontswap,
>>> have been promoted to mainline in 3.0 and 3.5 respectively.  This
>>> patchset promotes zcache from the staging tree to mainline.
>>>
>>
>> This is a very rough review of the code simply because I was asked to
>> look at it. I'm barely aware of the history and I'm not a user of this
>> code myself so take all of this with a grain of salt.
> 
> Ah fresh set of eyes! Yeey!

Agreed! Thanks so much!

>>
>> Very broadly speaking my initial reaction before I reviewed anything was
>> that *some* sort of usable backend for cleancache or frontswap should exist
>> at this point. My understanding is that Xen is the primary user of both
>> those frontends and ramster, while interesting, is not something that a
>> typical user will benefit from.
> 
> Right, the majority of users do not use virtualization. Thought embedded
> wise .. well, there are a lot of Android users - thought I am not 100%
> sure they are using it right now (I recall seeing changelogs for the clones
> of Android mentioning zcache).
>>
>> That said, I worry that this has bounced around a lot and as Dan (the
>> original author) has a rewrite. I'm wary of spending too much time on this
>> at all. Is Dan's new code going to replace this or what? It'd be nice to
>> find a definitive answer on that.
> 
> The idea is to take parts of zcache2 as seperate patches and stick it
> in the code you just reviewed (those that make sense as part of unstaging).

I agree with this.  Only the changes from zcache2 (Dan's
rewrite) that are necessary for promotion should be
considered right now.  Afaict, none of the concerns raised
in these comments are addressed by the changes in zcache2.

> The end result will be that zcache1 == zcache2 in functionality. Right
> now we are assembling a list of TODOs for zcache that should be done as part
> of 'unstaging'.
> 
>>
>> Anyway, here goes
> 
> .. and your responses will fill the TODO with many extra line-items.

Great, thanks Konrad.

> 
> Its going to take a bit of time to mull over your questions, so it will
> take me some time.

Same here. I'll respond asap. Thanks again, Mel!

--
Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-21 Thread Dan Magenheimer
Hi Mel --

Wow!  An incredibly wonderfully detailed response!  Thank you very
much for taking the time to read through all of zcache!

Your comments run the gamut from nit and code style, to design,
architecture and broad naming.  Until the choice-of-codebase issue
is resolved, I'll avoid the nits and codestyle comments and respond
to the higher level strategic and design questions.  Since a couple
of your questions are repeated and the specific code which provoked
your question is not isolated, I hope it is OK if I answer those
first out-of-context from your original comments in the code.
(This should also make this easier to read and to extract optimal
meaning, for you and for posterity.)

> That said, I worry that this has bounced around a lot and as Dan (the
> original author) has a rewrite. I'm wary of spending too much time on this
> at all. Is Dan's new code going to replace this or what? It'd be nice to
> find a definitive answer on that.

Replacing this code was my intent, but that was blocked.  IMHO zcache2
is _much_ better than the "demo version" of zcache (aka zcache1).
Hopefully a middle ground can be reached.  I've proposed one privately
offlist.

Seth, please feel free to augment or correct anything below, or
respond to anything I haven't commented on.

> Anyway, here goes

Repeated comments answered first out-of-context:

1) The interrupt context for zcache (and any tmem backend) is imposed
   by the frontend callers.  Cleancache_put [see naming comment below]
   is always called with interrupts disabled.  Cleancache_flush is
   sometimes called with interrupts disabled and sometimes not.
   Cleancache_get is never called in an atomic context.  (I think)
   frontswap_get/put/flush are never called in an atomic context but
   sometimes with the swap_lock held. Because it is dangerous (true?)
   for code to sometimes/not be called in atomic context, much of the
   code in zcache and tmem is forced into atomic context.  BUT Andrea
   observed that there are situations where asynchronicity would be
   preferable and, it turns out that cleancache_get and frontswap_get
   are never called in atomic context.  Zcache2/ramster takes advantage of
   that, and a future KVM backend may want to do so as well.  However,
   the interrupt/atomicity model and assumptions certainly does deserve
   better documentation.

2) The naming of the core tmem functions (put, get, flush) has been
   discussed endlessly, everyone has a different opinion, and the
   current state is a mess: cleancache, frontswap, and the various
   backends are horribly inconsistent.   IMHO, the use of "put"
   and "get" for reference counting is a historical accident, and
   the tmem ABI names were chosen well before I understood the historical
   precedence and the potential for confusion by kernel developers.
   So I don't have a good answer... I'd prefer the ABI-documented
   names, but if they are unacceptable, at least we need to agree
   on a consistent set of names and fix all references in all
   the various tmem parts (and possibly Xen and the kernel<->Xen
   ABI as well).

The rest of my comments/replies are in context.

> > +/*
> > + * A tmem host implementation must use this function to register
> > + * callbacks for a page-accessible memory (PAM) implementation
> > + */
> > +static struct tmem_pamops tmem_pamops;
> > +
> > +void tmem_register_pamops(struct tmem_pamops *m)
> > +{
> > +   tmem_pamops = *m;
> > +}
> > +
> 
> This implies that this can only host one client  at a time. I suppose
> that's ok to start with but is there ever an expectation that zcache +
> something else would be enabled at the same time?

There was some thought that zcache and Xen (or KVM) might somehow "chain"
the implementations.
 
> > +/*
> > + * A tmem_obj contains a radix-tree-like tree in which the intermediate
> > + * nodes are called tmem_objnodes.  (The kernel lib/radix-tree.c 
> > implementation
> > + * is very specialized and tuned for specific uses and is not particularly
> > + * suited for use from this code, though some code from the core 
> > algorithms has
> 
> This is a bit vague. It asserts that lib/radix-tree is unsuitable but
> not why. I skipped over most of the implementation to be honest.

IIRC, lib/radix-tree is highly tuned for mm's needs.  Things like
tagging and rcu weren't a good fit for tmem, and new things like calling
a different allocator needed to be added.  In the long run it might
be possible for the lib version to serve both needs, but the impediment
and aggravation of merging all necessary changes into lib seemed a high price
to pay for a hundred lines of code implementing a variation of a widely
documented tree algorithm.

> > + * These "tmem core" operations are implemented in the following functions.
> 
> More nits. As this defines a boundary between two major components it
> probably should have its own Documentation/ entry and the APIs should have
> kernel doc comments.

Agreed.

> > + * a corner case: Wha

Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-21 Thread Seth Jennings
On 09/21/2012 11:12 AM, Mel Gorman wrote:
> That said, my initial feeling still stands. I think that this needs to move
> out of staging because it's in limbo where it is but Andrew may disagree
> because of the reservations. If my reservations are accurate then they
> should at least be *clearly* documented with a note saying that using
> this in production is ill-advised for now. If zcache is activated via the
> kernel parameter, it should print a big dirty warning that the feature is
> still experiemental and leave that warning there until all the issues are
> addressed. Right now I'm not convinced this is production ready but that
> the  issues could be fixed incrementally.

Thank you _so_ much for the review!  Your comments have
provided one of the few glimpses I've had into any other
thoughts on the code save Dan and my own.

I'm in the process of going through the comments you provided.

I am _very_ glad to hear you believe that zcache should be
promoted out of the staging limbo where it currently
resides.  I am fine with providing a warning against use in
production environments until we can address everyone's
concerns.

Once zcache is promoted, I think it will give the code more
opportunity to be used/improved/extended in an incremental
and stable way.

--
Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-21 Thread Dan Magenheimer
> From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> 
> On 09/21/2012 01:02 PM, Konrad Rzeszutek Wilk wrote:
> > On Fri, Sep 21, 2012 at 05:12:52PM +0100, Mel Gorman wrote:
> >> On Tue, Sep 04, 2012 at 04:34:46PM -0500, Seth Jennings wrote:
> >>> zcache is the remaining piece of code required to support in-kernel
> >>> memory compression.  The other two features, cleancache and frontswap,
> >>> have been promoted to mainline in 3.0 and 3.5 respectively.  This
> >>> patchset promotes zcache from the staging tree to mainline.
> 
> >>
> >> Very broadly speaking my initial reaction before I reviewed anything was
> >> that *some* sort of usable backend for cleancache or frontswap should exist
> >> at this point. My understanding is that Xen is the primary user of both
> >> those frontends and ramster, while interesting, is not something that a
> >> typical user will benefit from.
> >
> > Right, the majority of users do not use virtualization. Thought embedded
> > wise .. well, there are a lot of Android users - thought I am not 100%
> > sure they are using it right now (I recall seeing changelogs for the clones
> > of Android mentioning zcache).
> >>
> >> That said, I worry that this has bounced around a lot and as Dan (the
> >> original author) has a rewrite. I'm wary of spending too much time on this
> >> at all. Is Dan's new code going to replace this or what? It'd be nice to
> >> find a definitive answer on that.
> >
> > The idea is to take parts of zcache2 as seperate patches and stick it
> > in the code you just reviewed (those that make sense as part of unstaging).
> 
> I agree with this.  Only the changes from zcache2 (Dan's
> rewrite) that are necessary for promotion should be
> considered right now.  Afaict, none of the concerns raised
> in these comments are addressed by the changes in zcache2.

While I may agree with the proposed end result, this proposal
is a _very_ long way away from a solution.  To me, it sounds like
a "split the baby in half" proposal (cf. wisdom of Solomon)
which may sound reasonable to some but, in the end, everyone loses.

I have proposed a reasonable compromise offlist to Seth, but
it appears that it has been silently rejected; I guess it is
now time to take the proposal public.  I apologize in advance
for my characteristic bluntness...

So let's consider two proposals and the pros and cons of them,
before we waste any further mm developer time.  (Fortunately,
most of Mel's insightful comments apply to both versions, though
he did identify some of the design issues that led to zcache2!)

The two proposals:
A) Recreate all the work done for zcache2 as a proper sequence of
   independent patches and apply them to zcache1. (Seth/Konrad)
B) Add zsmalloc back in to zcache2 as an alternative allocator
   for frontswap pages. (Dan)

Pros for (A):
1. It better preserves the history of the handful of (non-zsmalloc)
   commits in the original zcache code.
2. Seth[1] can incrementally learn the new designs by reading
   normal kernel patches.
3. For kernel purists, it is the _right_ way dammit (and Dan
   should be shot for redesigning code non-incrementally, even
   if it was in staging, etc.)
4. Seth believes that zcache will be promoted out of staging sooner
   because, except for a few nits, it is ready today.

Cons for (A):
1. Nobody has signed up to do the work, including testing.  It
   took the author (and sole expert on all the components
   except zsmalloc) between two and three months essentially
   fulltime to move zcache1->zcache2.  So forward progress on
   zcache will likely be essentially frozen until at least the
   end of 2012, possibly a lot longer.
2. The end result (if we reach one) is almost certainly a
   _third_ implementation of zcache: "zcache 1.5".  So
   we may not be leveraging much of the history/testing
   from zcache1 anyway!
3. Many of the zcache2 changes are closely interwoven so
   a sequence of patches may not be much more incrementally
   readable than zcache2.
4. The merge with ramster will likely be very low priority
   so the fork between the two will continue.
5. Dan believes that, if zcache1 does indeed get promoted with
   few or none of the zcache2 redesigns, zcache will never
   get properly finished.

Pros for (B):
1. Many of the design issues/constraints of zcache are resolved
   in code that has already been tested approximately as well
   as the original. All of the redesign (zcache1->zcache2) has
   been extensively discussed on-list; only the code itself is
   "non-incremental".
2. Both allocators (which AFAIK is the only technical area
   of controversy) will

Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-21 Thread Mel Gorman
On Fri, Sep 21, 2012 at 12:14:39PM -0700, Dan Magenheimer wrote:
> Hi Mel --
> 
> Wow!  An incredibly wonderfully detailed response!  Thank you very
> much for taking the time to read through all of zcache!
> 

My pleasure.

> Your comments run the gamut from nit and code style, to design,
> architecture and broad naming.  Until the choice-of-codebase issue
> is resolved, I'll avoid the nits and codestyle comments and respond
> to the higher level strategic and design questions. 

That's fair enough. FWIW, I would never consider the nits to be
blockers. If all the complaints I had were nits then there would be no
real issue to merging it to the core.

> Since a couple
> of your questions are repeated and the specific code which provoked
> your question is not isolated, I hope it is OK if I answer those
> first out-of-context from your original comments in the code.
> (This should also make this easier to read and to extract optimal
> meaning, for you and for posterity.)

Sure. I recognise that I was repeating myself at parts.

> > That said, I worry that this has bounced around a lot and as Dan (the
> > original author) has a rewrite. I'm wary of spending too much time on this
> > at all. Is Dan's new code going to replace this or what? It'd be nice to
> > find a definitive answer on that.
> 
> Replacing this code was my intent, but that was blocked.  IMHO zcache2
> is _much_ better than the "demo version" of zcache (aka zcache1).
> Hopefully a middle ground can be reached.  I've proposed one privately
> offlist.
> 

Ok. Unfortunately I cannot help resolve that issue but I'll mention it
again later.

> Seth, please feel free to augment or correct anything below, or
> respond to anything I haven't commented on.
> 
> > Anyway, here goes
> 
> Repeated comments answered first out-of-context:
> 
> 1) The interrupt context for zcache (and any tmem backend) is imposed
>by the frontend callers.  Cleancache_put [see naming comment below]
>is always called with interrupts disabled. 

Ok, I sortof see. It's always called within the irq-safe mapping tree_lock
and that infects the lower layers in a sense. It still feels like a layering
violation and minimally I would expect this is propagated down by making
locks like the hb->lock IRQ-safe and document the locking accordingly.

> Cleancache_flush is
>sometimes called with interrupts disabled and sometimes not.
>Cleancache_get is never called in an atomic context.  (I think)
>frontswap_get/put/flush are never called in an atomic context but
>sometimes with the swap_lock held. Because it is dangerous (true?)
>for code to sometimes/not be called in atomic context, much of the
>code in zcache and tmem is forced into atomic context. 

FWIW, if it can be called from a context with IRQs disabled then it must
be consistent throughout or it's unsafe. At the very least lockdep will
throw a fit if it is inconsistent.

> BUT Andrea
>observed that there are situations where asynchronicity would be
>preferable and, it turns out that cleancache_get and frontswap_get
>are never called in atomic context.  Zcache2/ramster takes advantage of
>that, and a future KVM backend may want to do so as well.  However,
>the interrupt/atomicity model and assumptions certainly does deserve
>better documentation.
> 

Minimally, move the locking to use the irq-safe with spin_lock_irqsave
rather than the current arrangement of calling local_irq_save() in
places. That alone would make it a bit easier to follow.

> 2) The naming of the core tmem functions (put, get, flush) has been
>discussed endlessly, everyone has a different opinion, and the
>current state is a mess: cleancache, frontswap, and the various
>backends are horribly inconsistent.   IMHO, the use of "put"
>and "get" for reference counting is a historical accident, and
>the tmem ABI names were chosen well before I understood the historical
>precedence and the potential for confusion by kernel developers.
>So I don't have a good answer... I'd prefer the ABI-documented
>names, but if they are unacceptable, at least we need to agree
>on a consistent set of names and fix all references in all
>the various tmem parts (and possibly Xen and the kernel<->Xen
>ABI as well).
> 

Ok, I see. Well, it's unfortunate but I'm not going to throw the toys out
of the pram over it either. Changing the names at this stage might just
confuse the people who are already familiar with the code. I'm the newbie
here so the confusion about terminology is my problem.

> The rest of my comments/replies are in context.
> 
> > > +/*
> > > + * A tmem host implementation must use this function to register
> > > + * callbacks for a page-accessible memory (PAM) implementation
> > > + */
> > > +static struct tmem_pamops tmem_pamops;
> > > +
> > > +void tmem_register_pamops(struct tmem_pamops *m)
> > > +{
> > > + tmem_pamops = *m;
> > > +}
> > > +
> > 
> > This implies that this c

Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-21 Thread Mel Gorman
On Fri, Sep 21, 2012 at 01:35:15PM -0700, Dan Magenheimer wrote:
> > From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > 
> > On 09/21/2012 01:02 PM, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Sep 21, 2012 at 05:12:52PM +0100, Mel Gorman wrote:
> > >> On Tue, Sep 04, 2012 at 04:34:46PM -0500, Seth Jennings wrote:
> > >>> zcache is the remaining piece of code required to support in-kernel
> > >>> memory compression.  The other two features, cleancache and frontswap,
> > >>> have been promoted to mainline in 3.0 and 3.5 respectively.  This
> > >>> patchset promotes zcache from the staging tree to mainline.
> > 
> > >>
> > >> Very broadly speaking my initial reaction before I reviewed anything was
> > >> that *some* sort of usable backend for cleancache or frontswap should 
> > >> exist
> > >> at this point. My understanding is that Xen is the primary user of both
> > >> those frontends and ramster, while interesting, is not something that a
> > >> typical user will benefit from.
> > >
> > > Right, the majority of users do not use virtualization. Thought embedded
> > > wise .. well, there are a lot of Android users - thought I am not 100%
> > > sure they are using it right now (I recall seeing changelogs for the 
> > > clones
> > > of Android mentioning zcache).
> > >>
> > >> That said, I worry that this has bounced around a lot and as Dan (the
> > >> original author) has a rewrite. I'm wary of spending too much time on 
> > >> this
> > >> at all. Is Dan's new code going to replace this or what? It'd be nice to
> > >> find a definitive answer on that.
> > >
> > > The idea is to take parts of zcache2 as seperate patches and stick it
> > > in the code you just reviewed (those that make sense as part of 
> > > unstaging).
> > 
> > I agree with this.  Only the changes from zcache2 (Dan's
> > rewrite) that are necessary for promotion should be
> > considered right now.  Afaict, none of the concerns raised
> > in these comments are addressed by the changes in zcache2.
> 
> While I may agree with the proposed end result, this proposal
> is a _very_ long way away from a solution.  To me, it sounds like
> a "split the baby in half" proposal (cf. wisdom of Solomon)
> which may sound reasonable to some but, in the end, everyone loses.
> 

I tend to agree but this really is an unhappy situation that should be
resolved in the coming weeks instead of months if it's going to move
forward.

> I have proposed a reasonable compromise offlist to Seth, but
> it appears that it has been silently rejected; I guess it is
> now time to take the proposal public.  I apologize in advance
> for my characteristic bluntness...
> 

Meh, I'm ok with blunt.

> So let's consider two proposals and the pros and cons of them,
> before we waste any further mm developer time.  (Fortunately,
> most of Mel's insightful comments apply to both versions, though
> he did identify some of the design issues that led to zcache2!)
> 
> The two proposals:
> A) Recreate all the work done for zcache2 as a proper sequence of
>independent patches and apply them to zcache1. (Seth/Konrad)
> B) Add zsmalloc back in to zcache2 as an alternative allocator
>for frontswap pages. (Dan)

Throwing it out there but 

C) Merge both, but freeze zcache1 except for critical fixes. Only allow
   future work on zcache2. Document limitations of zcache1 and
   workarounds until zcache2 is fully production ready.

> 
> Pros for (A):
> 1. It better preserves the history of the handful of (non-zsmalloc)
>commits in the original zcache code.

Marginal benefit.

> 2. Seth[1] can incrementally learn the new designs by reading
>normal kernel patches.

Which would be nice but that is not exactly compelling.

> 3. For kernel purists, it is the _right_ way dammit (and Dan
>should be shot for redesigning code non-incrementally, even
>if it was in staging, etc.)

Yes, but there are historical examples of ditching something completely
too. USB has been ditched a few times. Andrea shot a large chunk of the
VM out the window in 2.6.10. jbd vs jbd2 is still there.

> 4. Seth believes that zcache will be promoted out of staging sooner
>because, except for a few nits, it is ready today.
> 

I wouldn't call them minor but it's probably better understood by more
people. It's why I'd be sortof ok with promoting zcache1 as long as
the limitations wer

Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-22 Thread Sasha Levin
On 09/21/2012 09:14 PM, Dan Magenheimer wrote:
>>> +#define MAX_CLIENTS 16
>> > 
>> > Seems a bit arbitrary. Why 16?
> Sasha Levin posted a patch to fix this but it was tied in to
> the proposed KVM implementation, so was never merged.
> 

My patch changed the max pools per client, not the maximum amount of clients.
That patch has already found it's way in.

(MAX_CLIENTS does look like an arbitrary number though).


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-22 Thread Sasha Levin
On 09/22/2012 03:31 PM, Sasha Levin wrote:
> On 09/21/2012 09:14 PM, Dan Magenheimer wrote:
 +#define MAX_CLIENTS 16

 Seems a bit arbitrary. Why 16?
>> Sasha Levin posted a patch to fix this but it was tied in to
>> the proposed KVM implementation, so was never merged.
>>
> 
> My patch changed the max pools per client, not the maximum amount of clients.
> That patch has already found it's way in.
> 
> (MAX_CLIENTS does look like an arbitrary number though).

btw, while we're on the subject of KVM, the implementation of tmem/kvm was
blocked due to insufficient performance caused by the lack of multi-page
ops/batching.

Are there any plans to make it better in the future?


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-22 Thread Dan Magenheimer
> From: Mel Gorman [mailto:mgor...@suse.de]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> 
> On Fri, Sep 21, 2012 at 01:35:15PM -0700, Dan Magenheimer wrote:
> > > From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> > > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > The two proposals:
> > A) Recreate all the work done for zcache2 as a proper sequence of
> >independent patches and apply them to zcache1. (Seth/Konrad)
> > B) Add zsmalloc back in to zcache2 as an alternative allocator
> >for frontswap pages. (Dan)
> 
> Throwing it out there but 
> 
> C) Merge both, but freeze zcache1 except for critical fixes. Only allow
>future work on zcache2. Document limitations of zcache1 and
>workarounds until zcache2 is fully production ready.

Hi Mel (with request for Seth below) --

(C) may be the politically-expedient solution but, personally,
I think it is a bit insane and I suspect that any mm developer
who were to deeply review both codebases side-by-side would come to
the same conclusion.  The cost in developer/maintainer time,
and the confusion presented to the user/distro base if both
are promoted/merged would be way too high, and IMHO completely
unwarranted.  Let me try to explain...

I use the terms "zcache1" and "zcache2" only to clarify which
codebase, not because they are dramatically different. I estimate
that 85%-90% of the code in zcache1 and zcache2 is identical, not
counting the allocator or comments/whitespace/janitorial!

Zcache2 _is_ zcache1 with some good stuff added and with zsmalloc
dropped.  I think after careful study, there would be wide agreement
among mm developers that the stuff added is all moving in the direction
of making zcache "production-ready".  IMHO, zcache1 has _never_
been production-ready, and zcache2 is merely a big step in the right
direction.

(Quick logistical aside: zcache2 is in staging-next and linux-next,
currently housed under the drivers/staging/ramster directory...
with !CONFIG_RAMSTER, ramster _is_ zcache2.)

Seth (and IBM) seems to have a bee in his bonnet that the existing
zcache1 code _must_ be promoted _soon_ with as little change as possible.
Other than the fact that he didn't like my patching approach [1],
the only technical objection Seth has raised to zcache2 is that he
thinks zsmalloc is the best choice of allocator [2] for his limited
benchmarking [3].

I've offered to put zsmalloc back in to zcache2 as an optional
(even default) allocator, but that doesn't seem to be good enough
for Seth.  Any other technical objections to zcache2, or explanation
for his urgent desire to promote zcache1, Seth (and IBM) is keeping
close to his vest, which I find to be a bit disingenuous.

So, I'd like to challenge Seth with a simple question:

If zcache2 offers zsmalloc as an alternative (even default) allocator,
what remaining _technical_ objections do you (Seth) have to merging
zcache2 _instead_ of zcache1?

If Mel agrees that your objections are worth the costs of bifurcating
zcache and will still endorse merging both into core mm, I agree to move
forward with Mel's alternative (C) (and will then repost
https://lkml.org/lkml/2012/7/31/573).

Personally, I would _really_ like to get back to writing code to make
zcacheN more suitable for production so would really like to see this
resolved!

Dan

[1] Monolithic, because GregKH seemed to be unwilling to take further
patches to zcache before it was promoted, and because I thought
a number of things had to be fixed before I would feel comfortable
presenting zcache to be reviewed by mm developers
[2] Note, zsmalloc is used in zcache1 only for frontswap pages...
zbud is used in both zcache1 and zcache2 for cleancache pages.
[3] I've never seen any benchmark results posted for zcache other
than some variation of kernbench.  IMHO that's an issue all in itself.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-23 Thread James Bottomley
On Sat, 2012-09-22 at 02:07 +0100, Mel Gorman wrote:
> > The two proposals:
> > A) Recreate all the work done for zcache2 as a proper sequence of
> >independent patches and apply them to zcache1. (Seth/Konrad)
> > B) Add zsmalloc back in to zcache2 as an alternative allocator
> >for frontswap pages. (Dan)
> 
> Throwing it out there but 
> 
> C) Merge both, but freeze zcache1 except for critical fixes. Only
> allow
>future work on zcache2. Document limitations of zcache1 and
>workarounds until zcache2 is fully production ready.
> 
Actually, there is a fourth option, which is the one we'd have usually
used when staging wasn't around:  Throw the old code out as a successful
prototype which showed the author how to do it better (i.e. flush it
from staging) and start again from the new code which has all the
benefits learned from the old code.

Staging isn't supposed to be some magical set of history that we have to
adhere to no matter what (unlike the rest of the tree). It's supposed to
be an accelerator to get stuff into the kernel and not become a
hindrance to it.

There also seem to be a couple of process issues here that could do with
sorting:  Firstly that rewrites on better reflection, while not common,
are also not unusual so we need a mechanism for coping with them.  This
is actually a serious process problem: everyone becomes so attached to
the code they helped clean up that they're hugely unwilling to
countenance a rewrite which would in their (probably correct) opinion
have the cleanups start from ground zero again. Secondly, we've got a
set of use cases and add ons which grew up around code in staging that
act as a bit of a barrier to ABI/API evolution, even as they help to
demonstrate the problems.

I think the first process issue really crystallises the problem we're
having in staging:  we need to get the design approximately right before
we start on the code cleanups.  What I think this means is that we start
on the list where the people who understand the design issues reside
then, when they're happy with the design, we can begin cleaning it up
afterwards if necessary.  I don't think this is hard and fast: there is,
of course, code so bad that even the experts can't penetrate it to see
the design without having their eyes bleed but we should at least always
try to begin with design.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-24 Thread Mel Gorman
On Sat, Sep 22, 2012 at 02:18:44PM -0700, Dan Magenheimer wrote:
> > From: Mel Gorman [mailto:mgor...@suse.de]
> > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > 
> > On Fri, Sep 21, 2012 at 01:35:15PM -0700, Dan Magenheimer wrote:
> > > > From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> > > > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > > The two proposals:
> > > A) Recreate all the work done for zcache2 as a proper sequence of
> > >independent patches and apply them to zcache1. (Seth/Konrad)
> > > B) Add zsmalloc back in to zcache2 as an alternative allocator
> > >for frontswap pages. (Dan)
> > 
> > Throwing it out there but 
> > 
> > C) Merge both, but freeze zcache1 except for critical fixes. Only allow
> >future work on zcache2. Document limitations of zcache1 and
> >workarounds until zcache2 is fully production ready.
> 
> Hi Mel (with request for Seth below) --
> 
> (C) may be the politically-expedient solution but, personally,
> I think it is a bit insane and I suspect that any mm developer
> who were to deeply review both codebases side-by-side would come to
> the same conclusion. 

I have not read zcache2 and maybe it is the case that no one in their
right mind would use zcache1 if zcache2 was available but the discussion
keeps going in circles.

> The cost in developer/maintainer time,
> and the confusion presented to the user/distro base if both
> are promoted/merged would be way too high, and IMHO completely
> unwarranted.  Let me try to explain...
> 

What would the impact be if zcache2 and zcache1 were mutually exclusive
in Kconfig and the naming was as follows?

CONFIG_ZCACHE_DEPRECATED(zcache1)
CONFIG_ZCACHE   (zcache2)

That would make it absolutely clear to distributions which one they should
be enabling and also make it clear that all future development happen
on zcache2.

I know it looks insane to promote something that is instantly deprecated
but none of the other alternatives seem to be gaining traction either.
This would at least allow the people who are currently heavily behind
zcache1 to continue supporting it and applying critical fixes until they
move to zcache2.

> I use the terms "zcache1" and "zcache2" only to clarify which
> codebase, not because they are dramatically different. I estimate
> that 85%-90% of the code in zcache1 and zcache2 is identical, not
> counting the allocator or comments/whitespace/janitorial!
> 

If 85-90% of the code is identicial then they really should be sharing
the code rather than making copies. That will result in some monolithic
patches but it's unavoidable. I expect it would end up looking like

Patch 1 promote zcache1
Patch 2 promote zcache2
Patch 3 move shared code for zcache1,zcache2 to common files

If the shared code is really shared and not copied it may reduce some of
the friction between the camps.

> Zcache2 _is_ zcache1 with some good stuff added and with zsmalloc
> dropped.  I think after careful study, there would be wide agreement
> among mm developers that the stuff added is all moving in the direction
> of making zcache "production-ready".  IMHO, zcache1 has _never_
> been production-ready, and zcache2 is merely a big step in the right
> direction.
> 

zcache1 does appear to have a few snarls that would make me wary of having
to support it. I don't know if zcache2 suffers the same problems or not
as I have not read it.

> (Quick logistical aside: zcache2 is in staging-next and linux-next,
> currently housed under the drivers/staging/ramster directory...
> with !CONFIG_RAMSTER, ramster _is_ zcache2.)
> 

Unfortunately, I'm not going to get the chance to review it in the
short-term. However, if zcache1 and zcache2 shared code in common files
it would at least reduce the amount of new code I have to read :)

> Seth (and IBM) seems to have a bee in his bonnet that the existing
> zcache1 code _must_ be promoted _soon_ with as little change as possible.
> Other than the fact that he didn't like my patching approach [1],
> the only technical objection Seth has raised to zcache2 is that he
> thinks zsmalloc is the best choice of allocator [2] for his limited
> benchmarking [3].
> 

FWIW, I would fear that kernbench is not that interesting a benchmark for
something like zcache. From an MM perspective, I would be wary that the
data compresses too well and fits too neatly in the different buckets and
make zsmalloc appear to behave much better than it would for a more general
workload.  Of greater concern is that the allocations for zcache would be
too short lived to measure if external fragmentation was a real problem
or not. This is pure guesswork 

Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-24 Thread Seth Jennings
On 09/21/2012 03:35 PM, Dan Magenheimer wrote:
>> From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
>> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
>>
>> On 09/21/2012 01:02 PM, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Sep 21, 2012 at 05:12:52PM +0100, Mel Gorman wrote:
>>>> On Tue, Sep 04, 2012 at 04:34:46PM -0500, Seth Jennings wrote:
>>>>> zcache is the remaining piece of code required to support in-kernel
>>>>> memory compression.  The other two features, cleancache and frontswap,
>>>>> have been promoted to mainline in 3.0 and 3.5 respectively.  This
>>>>> patchset promotes zcache from the staging tree to mainline.
>>
>>>>
>>>> Very broadly speaking my initial reaction before I reviewed anything was
>>>> that *some* sort of usable backend for cleancache or frontswap should exist
>>>> at this point. My understanding is that Xen is the primary user of both
>>>> those frontends and ramster, while interesting, is not something that a
>>>> typical user will benefit from.
>>>
>>> Right, the majority of users do not use virtualization. Thought embedded
>>> wise .. well, there are a lot of Android users - thought I am not 100%
>>> sure they are using it right now (I recall seeing changelogs for the clones
>>> of Android mentioning zcache).
>>>>
>>>> That said, I worry that this has bounced around a lot and as Dan (the
>>>> original author) has a rewrite. I'm wary of spending too much time on this
>>>> at all. Is Dan's new code going to replace this or what? It'd be nice to
>>>> find a definitive answer on that.
>>>
>>> The idea is to take parts of zcache2 as seperate patches and stick it
>>> in the code you just reviewed (those that make sense as part of unstaging).
>>
>> I agree with this.  Only the changes from zcache2 (Dan's
>> rewrite) that are necessary for promotion should be
>> considered right now.  Afaict, none of the concerns raised
>> in these comments are addressed by the changes in zcache2.
> 
> While I may agree with the proposed end result, this proposal
> is a _very_ long way away from a solution.  To me, it sounds like
> a "split the baby in half" proposal (cf. wisdom of Solomon)
> which may sound reasonable to some but, in the end, everyone loses.
> 
> I have proposed a reasonable compromise offlist to Seth, but
> it appears that it has been silently rejected; I guess it is
> now time to take the proposal public. I apologize in advance
> for my characteristic bluntness...
> 
> So let's consider two proposals and the pros and cons of them,
> before we waste any further mm developer time.  (Fortunately,
> most of Mel's insightful comments apply to both versions, though
> he did identify some of the design issues that led to zcache2!)
> 
> The two proposals:
> A) Recreate all the work done for zcache2 as a proper sequence of
>independent patches and apply them to zcache1. (Seth/Konrad)
> B) Add zsmalloc back in to zcache2 as an alternative allocator
>for frontswap pages. (Dan)
> 
> Pros for (A):
> 1. It better preserves the history of the handful of (non-zsmalloc)
>commits in the original zcache code.
> 2. Seth[1] can incrementally learn the new designs by reading
>normal kernel patches.

It's not a matter of breaking the patches up so that I can
understand them.  I understand them just fine as indicated
by my responses to the attempt to overwrite zcache/remove
zsmalloc:

https://lkml.org/lkml/2012/8/14/347
https://lkml.org/lkml/2012/8/17/498

zcache2 also crashes on PPC64, which uses 64k pages, because
a 4k maximum page size is hard coded into the new zbudpage
struct.

The point is to discuss and adopt each change on it's own
merits instead of this "take a 10k line patch or leave it"
approach.

> 3. For kernel purists, it is the _right_ way dammit (and Dan
>should be shot for redesigning code non-incrementally, even
>if it was in staging, etc.)

Dan says "dammit" to add a comic element to this point,
however, it is a valid point (minus the firing squad).

Lets be clear about what zcache2 is.  It is not a rewrite in
the way most people think: a refactored codebase the caries
out the same functional set as an original codebase.  It is
an _overwrite_ to accommodate an entirely new set of
functionally whose code doubles the size of the origin
codebase and regresses performance on the original
functionality.

> 4. Seth believes that zcache will be promoted out of staging sooner
>because, except for a few nits, it is ready today.
> 
> Cons

RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-24 Thread Dan Magenheimer
> From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache

Once again, you have completely ignored a reasonable
compromise proposal.  Why?

> According to Greg's staging-next, ramster adds 6000 lines of
> new code to zcache.
>   :
> functionally whose code doubles the size of the origin

Indeed, and the 6K lines is all in the ramster-specific directory.
I am not asking that ramster be promoted, only that the small
handful of hooks that enable ramster should exist in zcache
(and tmem) if/when zcache is promoted.  And zcache1+zsmalloc
does not have that.
 
> Lets be clear about what zcache2 is.  It is not a rewrite in
> the way most people think: a refactored codebase the caries
> out the same functional set as an original codebase.  It is
> an _overwrite_ to accommodate an entirely new set of
> functionally whose code doubles the size of the origin
> codebase and regresses performance on the original
> functionality.

There were some design deficiencies necessary to support a
range of workloads (other than just kernbench) and that
required some redesign.  Those have been clearly documented
in the post of zcache2 and discussed in other threads.  Other
than janitorial work (much of which was proposed by other people).
zcache2 is actually _less_ of  rewrite than most people think.

By "performance regression", you mean it doesn't use zsmalloc
because zbud has to make more conservative assumptions than
"works really well on kernbench".  Mel identified his preference
for conservative assumptions.  The compromise I have
proposed will give you back zsmalloc for your use kernbench
use case.  Why is that not good enough?

Overwrite was simply a mechanism to avoid a patch post that
nobody (other than you) would be able to read.  Anyone
can do a diff. Focusing on the patch mechanism is a red herring.

> > 4. Seth believes that zcache will be promoted out of staging sooner
> >because, except for a few nits, it is ready today.
> >
> > Cons for (A):
> > 1. Nobody has signed up to do the work, including testing.  It
> >took the author (and sole expert on all the components
> >except zsmalloc) between two and three months essentially
> >fulltime to move zcache1->zcache2.  So forward progress on
> >zcache will likely be essentially frozen until at least the
> >end of 2012, possibly a lot longer.
> 
> This is not true.  I have agreed to do the work necessary to
> make zcache1 acceptable for mainline, which can include
> merging changes from zcache2 if people agree it is a blocker.
>  :
> What is "properly finished"?

In the compromise I have proposed, the work is already done.

You have claimed that that work is not necessary, because it
doesn't help zsmalloc or kernbench.  You have refused to
adapt zsmalloc to meet the needs I have described.  Further
(and sorry to be so horribly blunt in public but, by claiming
you are going to do the work, you are asking for it), you have
NOT designed or written any significant code in the kernel,
just patched and bugfixed and tested and run kernbench on
zcache.  (Zsmalloc, which you have championed, was written
by Nitin and adapted by you.)

And you've continued with (IMHO) disingenuous behavior.
While I understand all too well why that may be necessary
when working for a big company, it makes it very hard to
identify an acceptable compromise.

So, no I don't really trust that you have either the intent
or ability to do the redesigns that I feel (and echoed by
Andrea and Mel) are necessary for zcache to be more than
toy "demo" code.

> The continuous degradation of zcache as "demo" and the

I call it demo code because I wrote it as a demo to
show that in-kernel compression could be a user of
cleancache and frontswap.

I'm not criticizing your code or anyone else's,
I am criticizing MY OWN code.  I had no illusion
that zcache (aka zcache1) was ready for promotion.
It sucked in a number of ways.  MM developers with
real experience in the complexity of managing memory,
Mel and Andrea, without digging very hard, identified
those same ways it sucks.  I'm trying to fix those.
Are you?

> assertion that zcache2 is the "solid codebase" is tedious.
> zcache is actually being worked on by others and has been in
> staging for years.  By definition, _it_ is the more
> hardended codebase.

Please be more specific (and I don't mean a meaningless count
of patches).  Other than your replacement of xvmalloc with
zsmalloc and a bug fix or three, can you point to anything
that was more than cleanup?  Can you point to any broad
workload testing?  And for those two Android distros that have
included zcache (despite the fact that anything in staging
taints the kernel), can you demonstrat

RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-24 Thread Dan Magenheimer
> From: James Bottomley [mailto:james.bottom...@hansenpartnership.com]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache

> On Sat, 2012-09-22 at 02:07 +0100, Mel Gorman wrote:
> > > The two proposals:
> > > A) Recreate all the work done for zcache2 as a proper sequence of
> > >independent patches and apply them to zcache1. (Seth/Konrad)
> > > B) Add zsmalloc back in to zcache2 as an alternative allocator
> > >for frontswap pages. (Dan)
> >
> > Throwing it out there but 
> >
> > C) Merge both, but freeze zcache1 except for critical fixes. Only
> > allow
> >future work on zcache2. Document limitations of zcache1 and
> >workarounds until zcache2 is fully production ready.
> >
> Actually, there is a fourth option, which is the one we'd have usually
> used when staging wasn't around:  Throw the old code out as a successful
> prototype which showed the author how to do it better (i.e. flush it
> from staging) and start again from the new code which has all the
> benefits learned from the old code.
> 
> Staging isn't supposed to be some magical set of history that we have to
> adhere to no matter what (unlike the rest of the tree). It's supposed to
> be an accelerator to get stuff into the kernel and not become a
> hindrance to it.
> 
> There also seem to be a couple of process issues here that could do with
> sorting:  Firstly that rewrites on better reflection, while not common,
> are also not unusual so we need a mechanism for coping with them.  This
> is actually a serious process problem: everyone becomes so attached to
> the code they helped clean up that they're hugely unwilling to
> countenance a rewrite which would in their (probably correct) opinion
> have the cleanups start from ground zero again. Secondly, we've got a
> set of use cases and add ons which grew up around code in staging that
> act as a bit of a barrier to ABI/API evolution, even as they help to
> demonstrate the problems.
> 
> I think the first process issue really crystallises the problem we're
> having in staging:  we need to get the design approximately right before
> we start on the code cleanups.  What I think this means is that we start
> on the list where the people who understand the design issues reside
> then, when they're happy with the design, we can begin cleaning it up
> afterwards if necessary.  I don't think this is hard and fast: there is,
> of course, code so bad that even the experts can't penetrate it to see
> the design without having their eyes bleed but we should at least always
> try to begin with design.


Hi James --

I think you've hit the nail on the head, generalizing this interminable
debate into a process problem that needs to be solved more generally.
Thanks for your insight!

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-24 Thread Dan Magenheimer
> From: Mel Gorman [mailto:mgor...@suse.de]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> 
> On Sat, Sep 22, 2012 at 02:18:44PM -0700, Dan Magenheimer wrote:
> > > From: Mel Gorman [mailto:mgor...@suse.de]
> > > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > >
> > > On Fri, Sep 21, 2012 at 01:35:15PM -0700, Dan Magenheimer wrote:
> > > > > From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> > > > > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > > > The two proposals:
> > > > A) Recreate all the work done for zcache2 as a proper sequence of
> > > >independent patches and apply them to zcache1. (Seth/Konrad)
> > > > B) Add zsmalloc back in to zcache2 as an alternative allocator
> > > >for frontswap pages. (Dan)
> > >
> > > Throwing it out there but 
> > >
> > > C) Merge both, but freeze zcache1 except for critical fixes. Only allow
> > >future work on zcache2. Document limitations of zcache1 and
> > >workarounds until zcache2 is fully production ready.
> >
> What would the impact be if zcache2 and zcache1 were mutually exclusive
> in Kconfig and the naming was as follows?
> 
> CONFIG_ZCACHE_DEPRECATED  (zcache1)
> CONFIG_ZCACHE (zcache2)
> 
> That would make it absolutely clear to distributions which one they should
> be enabling and also make it clear that all future development happen
> on zcache2.
> 
> I know it looks insane to promote something that is instantly deprecated
> but none of the other alternatives seem to be gaining traction either.
> This would at least allow the people who are currently heavily behind
> zcache1 to continue supporting it and applying critical fixes until they
> move to zcache2.

Just wondering... how, in your opinion, is this different from
leaving zcache1 (or even both) in staging?  "Tainting" occurs
either way, it's just a matter of whether or not there is a message
logged by the kernel that it is officially tainted, right?

However, it _is_ another attempt at compromise and, if this
is the only solution that allows the debate to end, and it
is agreed on by whatever maintainer is committed to pull
both (be it you, or Andrew, or Konrad, or Linux), I would
agree to your "C-prime" proposal.
 
> > I use the terms "zcache1" and "zcache2" only to clarify which
> > codebase, not because they are dramatically different. I estimate
> > that 85%-90% of the code in zcache1 and zcache2 is identical, not
> > counting the allocator or comments/whitespace/janitorial!
> 
> If 85-90% of the code is identicial then they really should be sharing
> the code rather than making copies. That will result in some monolithic
> patches but it's unavoidable. I expect it would end up looking like
> 
> Patch 1   promote zcache1
> Patch 2   promote zcache2
> Patch 3   move shared code for zcache1,zcache2 to common files
> 
> If the shared code is really shared and not copied it may reduce some of
> the friction between the camps.

This part I would object to... at least I would object to signing
up to do Patch 3 myself.  Seems like a lot of busywork if zcache1
is truly deprecated.

> zcache1 does appear to have a few snarls that would make me wary of having
> to support it. I don't know if zcache2 suffers the same problems or not
> as I have not read it.
> 
> Unfortunately, I'm not going to get the chance to review [zcache2] in the
> short-term. However, if zcache1 and zcache2 shared code in common files
> it would at least reduce the amount of new code I have to read :)

Understood, which re-emphasizes my point about how the presence
of both reduces the (to date, very limited) MM developer time available
for either.

> > Seth (and IBM) seems to have a bee in his bonnet that the existing
> > zcache1 code _must_ be promoted _soon_ with as little change as possible.
> > Other than the fact that he didn't like my patching approach [1],
> > the only technical objection Seth has raised to zcache2 is that he
> > thinks zsmalloc is the best choice of allocator [2] for his limited
> > benchmarking [3].
> 
> FWIW, I would fear that kernbench is not that interesting a benchmark for
> something like zcache. From an MM perspective, I would be wary that the
> data compresses too well and fits too neatly in the different buckets and
> make zsmalloc appear to behave much better than it would for a more general
> workload.  Of greater concern is that the allocations for zcache would be
> too short lived to measure if external fragmentation was a real p

Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-25 Thread Mel Gorman
On Mon, Sep 24, 2012 at 01:36:48PM -0700, Dan Magenheimer wrote:
> > From: Mel Gorman [mailto:mgor...@suse.de]
> > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > 
> > On Sat, Sep 22, 2012 at 02:18:44PM -0700, Dan Magenheimer wrote:
> > > > From: Mel Gorman [mailto:mgor...@suse.de]
> > > > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > > >
> > > > On Fri, Sep 21, 2012 at 01:35:15PM -0700, Dan Magenheimer wrote:
> > > > > > From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> > > > > > Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> > > > > The two proposals:
> > > > > A) Recreate all the work done for zcache2 as a proper sequence of
> > > > >independent patches and apply them to zcache1. (Seth/Konrad)
> > > > > B) Add zsmalloc back in to zcache2 as an alternative allocator
> > > > >for frontswap pages. (Dan)
> > > >
> > > > Throwing it out there but 
> > > >
> > > > C) Merge both, but freeze zcache1 except for critical fixes. Only allow
> > > >future work on zcache2. Document limitations of zcache1 and
> > > >workarounds until zcache2 is fully production ready.
> > >
> > What would the impact be if zcache2 and zcache1 were mutually exclusive
> > in Kconfig and the naming was as follows?
> > 
> > CONFIG_ZCACHE_DEPRECATED(zcache1)
> > CONFIG_ZCACHE   (zcache2)
> > 
> > That would make it absolutely clear to distributions which one they should
> > be enabling and also make it clear that all future development happen
> > on zcache2.
> > 
> > I know it looks insane to promote something that is instantly deprecated
> > but none of the other alternatives seem to be gaining traction either.
> > This would at least allow the people who are currently heavily behind
> > zcache1 to continue supporting it and applying critical fixes until they
> > move to zcache2.
> 
> Just wondering... how, in your opinion, is this different from
> leaving zcache1 (or even both) in staging? 

Because leaving it in staging implies it is not supported. What I'm
suggesting is that zcache1 be promoted but marked deprecated. Seth and the
embedded people that use it should continue to support it as it currently
stands and fix any critical bugs that are reported but avoid writing new
features for it. The limitations of it should be documented.

> "Tainting" occurs
> either way, it's just a matter of whether or not there is a message
> logged by the kernel that it is officially tainted, right?
> 

Using a deprecated interface does not necessarily taint the kernel.

> However, it _is_ another attempt at compromise and, if this
> is the only solution that allows the debate to end, and it
> is agreed on by whatever maintainer is committed to pull
> both (be it you, or Andrew, or Konrad, or Linux), I would
> agree to your "C-prime" proposal.
>  

And bear in mind that I do not any sort of say in what happens
ultimately. I'm just suggesting alternatives here that may potentially
keep everyone happy (or at least stop it going in circles).

> > > I use the terms "zcache1" and "zcache2" only to clarify which
> > > codebase, not because they are dramatically different. I estimate
> > > that 85%-90% of the code in zcache1 and zcache2 is identical, not
> > > counting the allocator or comments/whitespace/janitorial!
> > 
> > If 85-90% of the code is identicial then they really should be sharing
> > the code rather than making copies. That will result in some monolithic
> > patches but it's unavoidable. I expect it would end up looking like
> > 
> > Patch 1 promote zcache1
> > Patch 2 promote zcache2
> > Patch 3 move shared code for zcache1,zcache2 to common files
> > 
> > If the shared code is really shared and not copied it may reduce some of
> > the friction between the camps.
> 
> This part I would object to... at least I would object to signing
> up to do Patch 3 myself.  Seems like a lot of busywork if zcache1
> is truly deprecated.
> 

It'd help the path to truly deprecating it.

1. Fixes in common code only have to be applied once. This avoids a
   situation where zcache1 gets a fix and zcache2 misses it and vice-versa.
   In a related note it makes it a bit more obvious is a new feature is
   attempted to be merged to zcache1

2. It forces the zcache2 and zcache1 people to keep more or less in sync
   with each other and limit API breakage between co

Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-25 Thread James Bottomley
On Mon, 2012-09-24 at 12:25 -0500, Seth Jennings wrote:
> In summary, I really don't understand the objection to
> promoting zcache and integrating zcache2 improvements and
> features incrementally.  It seems very natural and
> straightforward to me.  Rewrites can even happen in
> mainline, as James pointed out.  Adoption in mainline just
> provides a more stable environment for more people to use
> and contribute to zcache.

This is slightly disingenuous.  Acceptance into mainline commits us to
the interface.  Promotion from staging with simultaneous deprecation
seems like a reasonable (if inelegant) compromise, but the problem is
it's not necessarily a workable solution: as long as we have users of
the interface in mainline, we can't really deprecate stuff however many
feature deprecation files we fill in (I've had a deprecated SCSI ioctl
set that's been deprecated for ten years and counting).  What worries me
looking at this fight is that since there's a use case for the old
interface it will never really get removed.

Conversely, rewrites do tend to vastly increase the acceptance cycle
mainly because of reviewer fatigue (and reviews are our most precious
commodity in the kernel).  I'm saying rewrites should be possible in
staging because it was always possible on plain patch submissions; I'm
not saying they're desirable.  Every time I've seen a rewrite done, it
has added ~6mo-1yr to the acceptance cycle.  I sense that the fatigue
factor with transcendent memory is particularly high, so we're probably
looking at the outside edge of the estimate, so the author needs
seriously to consider if the rewrite is worth this.

Oh, and while this spat goes on, the stalemate is basically assured and
external goodwill eroding.  So, for god's sake find a mutually
acceptable compromise, because we're not going to find one for you.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-25 Thread Dan Magenheimer
> From: Sasha Levin [mailto:levinsasha...@gmail.com]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache

Sorry for delayed response!
 
> On 09/22/2012 03:31 PM, Sasha Levin wrote:
> > On 09/21/2012 09:14 PM, Dan Magenheimer wrote:
> >>>> +#define MAX_CLIENTS 16
> >>>>
> >>>> Seems a bit arbitrary. Why 16?
> >> Sasha Levin posted a patch to fix this but it was tied in to
> >> the proposed KVM implementation, so was never merged.
> >>
> >
> > My patch changed the max pools per client, not the maximum amount of 
> > clients.
> > That patch has already found it's way in.
> >
> > (MAX_CLIENTS does look like an arbitrary number though).
> 
> btw, while we're on the subject of KVM, the implementation of tmem/kvm was
> blocked due to insufficient performance caused by the lack of multi-page
> ops/batching.

Hmmm... I recall that was an unproven assertion.  The tmem/kvm
implementation was not exposed to any wide range of workloads
IIRC?  Also, the WasActive patch is intended to reduce the problem
that multi-guest high volume reads would provoke, so any testing
without that patch may be moot.
 
> Are there any plans to make it better in the future?

If it indeed proves to be a problem, the ramster-merged zcache
(aka zcache2) should be capable of managing a "split" zcache
implementation, i.e. zcache executing in the guest and "overflowing"
page cache pages to the zcache in the host, which should at least
ameliorate most of Avi's concern.  I personally have no plans
to implement that, but would be willing to assist if others
attempt to implement it.

The other main concern expressed by the KVM community, by
Andrea, was zcache's lack of ability to "overflow" frontswap
pages in the host to a real swap device.  The foundation
for that was one of the objectives of the zcache2 redesign;
I am working on a "yet-to-be-posted" patch built on top of zcache2
that will require some insight and review from MM experts.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-06 Thread Dan Magenheimer
In response to this RFC for zcache promotion, I've been asked to summarize
the concerns and objections which led me to NACK the previous zcache
promotion request.  While I see great potential in zcache, I think some
significant design challenges exist, many of which are already resolved in
the new codebase ("zcache2").  These design issues include:

A) Andrea Arcangeli pointed out and, after some deep thinking, I came
   to agree that zcache _must_ have some "backdoor exit" for frontswap
   pages [2], else bad things will eventually happen in many workloads.
   This requires some kind of reaper of frontswap'ed zpages[1] which "evicts"
   the data to the actual swap disk.  This reaper must ensure it can reclaim
   _full_ pageframes (not just zpages) or it has little value.  Further the
   reaper should determine which pageframes to reap based on an LRU-ish
   (not random) approach.

B) Zsmalloc has potentially far superior density vs zbud because zsmalloc can
   pack more zpages into each pageframe and allows for zpages that cross 
pageframe
   boundaries.  But, (i) this is very data dependent... the average compression
   for LZO is about 2x.  The frontswap'ed pages in the kernel compile benchmark
   compress to about 4x, which is impressive but probably not representative of
   a wide range of zpages and workloads.  And (ii) there are many historical
   discussions going back to Knuth and mainframes about tight packing of data...
   high density has some advantages but also brings many disadvantages related 
to
   fragmentation and compaction.  Zbud is much less aggressive (max two zpages
   per pageframe) but has a similar density on average data, without the
   disadvantages of high density.

   So zsmalloc may blow zbud away on a kernel compile benchmark but, if both 
were
   runners, zsmalloc is a sprinter and zbud is a marathoner.  Perhaps the best
   solution is to offer both?

   Further, back to (A), reaping is much easier with zbud because (i) zsmalloc
   is currently unable to deal with pointers to zpages from tmem data structures
   which may be dereferenced concurrently, (ii) because there may be many more 
such
   pointers, and (iii) because zpages stored by zsmalloc may cross pageframe 
boundaries.
   The locking issues that arise with zsmalloc for reaping even a single 
pageframe
   are complex; though they might eventually be solved with zsmalloc, this is
   likely a very big project.

C) Zcache uses zbud(v1) for cleancache pages and includes a shrinker which
   reclaims pairs of zpages to release whole pageframes, but there is
   no attempt to shrink/reclaim cleanache pageframes in LRU order.
   It would also be nice if single-cleancache-pageframe reclaim could
   be implemented.

D) Ramster is built on top of zcache, but required a handful of changes
   (on the order of 100 lines).  Due to various circumstances, ramster was
   submitted as a fork of zcache with the intent to unfork as soon as
   possible.  The proposal to promote the older zcache perpetuates that fork,
   requiring fixes in multiple places, whereas the new codebase supports
   ramster and provides clearly defined boundaries between the two.

The new codebase (zcache) just submitted as part of drivers/staging/ramster
resolves these problems (though (A) is admittedly still a work in progress).
Before other key mm maintainers read and comment on zcache, I think
it would be most wise to move to a codebase which resolves the known design
problems or, at least to thoroughly discuss and debunk the design issues
described above.  OR... it may be possible to identify and pursue some
compromise plan.  In any case, I believe the promotion proposal is premature.

Unfortunately, I will again be away from email for a few days, but
will be happy to respond after I return if clarification or more detailed
discussion is needed.

Dan

Footnotes:
[1] zpage is shorthand for a compressed PAGE_SIZE-sized page.
[2] frontswap, since it uses the tmem architecture, has always had a "frontdoor
bouncer"... any frontswap page can be rejected by zcache for any reason,
such as if there is no non-emergency pageframes available or if any 
individual
page (or long sequence of pages) compresses poorly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-07 Thread Konrad Rzeszutek Wilk
> significant design challenges exist, many of which are already resolved in
> the new codebase ("zcache2").  These design issues include:
.. snip..
> Before other key mm maintainers read and comment on zcache, I think
> it would be most wise to move to a codebase which resolves the known design
> problems or, at least to thoroughly discuss and debunk the design issues
> described above.  OR... it may be possible to identify and pursue some
> compromise plan.  In any case, I believe the promotion proposal is premature.

Thank you for the feedback!

I took your comments and pasted them in this patch.

Seth, Robert, Minchan, Nitin, can you guys provide some comments pls,
so we can put them as a TODO pls or modify the patch below.

Oh, I think I forgot Andrew's comment which was:

 - Explain which workloads this benefits and provide some benchmark data.
   This should help in narrowing down in which case we know zcache works
   well and in which it does not.

My TODO's were:

 - Figure out (this could be - and perhaps should be in frontswap) a
   determination whether this swap is quite fast and the CPU is slow
   (or taxed quite heavily now), so as to not slow the currently executing
   workloads.
 - Work out automatic benchmarks in three categories: database (I am going to 
use
   swing for that), compile (that one is easy), and firefox tab browsers
   overloading.


>From bd85d5fa0cc231f2779f3209ee62b755caf3aa9b Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Fri, 7 Sep 2012 10:21:01 -0400
Subject: [PATCH] zsmalloc/zcache: TODO list.

Adding in comments by Dan.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/staging/zcache/TODO   |   21 +
 drivers/staging/zsmalloc/TODO |   17 +
 2 files changed, 38 insertions(+), 0 deletions(-)
 create mode 100644 drivers/staging/zcache/TODO
 create mode 100644 drivers/staging/zsmalloc/TODO

diff --git a/drivers/staging/zcache/TODO b/drivers/staging/zcache/TODO
new file mode 100644
index 000..bf19a01
--- /dev/null
+++ b/drivers/staging/zcache/TODO
@@ -0,0 +1,21 @@
+
+A) Andrea Arcangeli pointed out and, after some deep thinking, I came
+   to agree that zcache _must_ have some "backdoor exit" for frontswap
+   pages [2], else bad things will eventually happen in many workloads.
+   This requires some kind of reaper of frontswap'ed zpages[1] which "evicts"
+   the data to the actual swap disk.  This reaper must ensure it can reclaim
+   _full_ pageframes (not just zpages) or it has little value.  Further the
+   reaper should determine which pageframes to reap based on an LRU-ish
+   (not random) approach.
+
+B) Zcache uses zbud(v1) for cleancache pages and includes a shrinker which
+   reclaims pairs of zpages to release whole pageframes, but there is
+   no attempt to shrink/reclaim cleanache pageframes in LRU order.
+   It would also be nice if single-cleancache-pageframe reclaim could
+   be implemented.
+
+C) Offer a mechanism to select whether zbud or zsmalloc should be used.
+   This should be for either cleancache or frontswap pages. Meaning there
+   are four choices: cleancache and frontswap using zbud; cleancache and
+   frontswap using zsmalloc; cleancache using zsmalloc, frontswap using zbud;
+   cleacache using zbud, and frontswap using zsmalloc.
diff --git a/drivers/staging/zsmalloc/TODO b/drivers/staging/zsmalloc/TODO
new file mode 100644
index 000..b1debad
--- /dev/null
+++ b/drivers/staging/zsmalloc/TODO
@@ -0,0 +1,17 @@
+
+A) Zsmalloc has potentially far superior density vs zbud because zsmalloc can
+   pack more zpages into each pageframe and allows for zpages that cross 
pageframe
+   boundaries.  But, (i) this is very data dependent... the average compression
+   for LZO is about 2x.  The frontswap'ed pages in the kernel compile benchmark
+   compress to about 4x, which is impressive but probably not representative of
+   a wide range of zpages and workloads.  And (ii) there are many historical
+   discussions going back to Knuth and mainframes about tight packing of 
data...
+   high density has some advantages but also brings many disadvantages related 
to
+   fragmentation and compaction.  Zbud is much less aggressive (max two zpages
+   per pageframe) but has a similar density on average data, without the
+   disadvantages of high density.
+
+   So zsmalloc may blow zbud away on a kernel compile benchmark but, if both 
were
+   runners, zsmalloc is a sprinter and zbud is a marathoner.  Perhaps the best
+   solution is to offer both?
+
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-07 Thread Seth Jennings
On 09/06/2012 03:37 PM, Dan Magenheimer wrote:
> In response to this RFC for zcache promotion, I've been asked to summarize
> the concerns and objections which led me to NACK the previous zcache
> promotion request.  While I see great potential in zcache, I think some
> significant design challenges exist, many of which are already resolved in
> the new codebase ("zcache2").  These design issues include:
> 
> A) Andrea Arcangeli pointed out and, after some deep thinking, I came
>to agree that zcache _must_ have some "backdoor exit" for frontswap
>pages [2], else bad things will eventually happen in many workloads.
>This requires some kind of reaper of frontswap'ed zpages[1] which "evicts"
>the data to the actual swap disk.  This reaper must ensure it can reclaim
>_full_ pageframes (not just zpages) or it has little value.  Further the
>reaper should determine which pageframes to reap based on an LRU-ish
>(not random) approach.

This is a limitation of the design, I admit.  However, in
the case that frontswap/zcache is able to capture all pages
submitted to it and there is no overflow to the swap device,
it doesn't make a difference.

In the case that zcache is not able to allocate memory for
the persistent compressed memory pool (frontswap's pool) or
in the case the memory pool is as large as it is allowed to
be, this makes a difference, since it will overflow more
recently used pages into the swap device.

Keep in mind though that the "difference" is that frontswap
may not offer as much benefit, not that frontswap will
degrade performance relative to the case with only the swap
device.

This is a feature-add that keeps coming up so I'll add it to
the TODO.

I am interested to know from the mm maintainers, would the
absence of this feature be an obstacle for promotion or not?
 The reason I ask is it would be pretty complex and invasive
to implement.

> B) Zsmalloc has potentially far superior density vs zbud because zsmalloc can
>pack more zpages into each pageframe and allows for zpages that cross 
> pageframe
>boundaries.  But, (i) this is very data dependent... the average 
> compression
>for LZO is about 2x.  The frontswap'ed pages in the kernel compile 
> benchmark
>compress to about 4x, which is impressive but probably not representative 
> of
>a wide range of zpages and workloads.

"the average compression for LZO is about 2x". "...probably
not representative of a wide range of zpages and workloads".
 Evidence?

>And (ii) there are many historical
>discussions going back to Knuth and mainframes about tight packing of 
> data...
>high density has some advantages but also brings many disadvantages 
> related to
>fragmentation and compaction.  Zbud is much less aggressive (max two zpages
>per pageframe) but has a similar density on average data, without the
>disadvantages of high density.

What is "average data"?  The context seems to define it in
terms of the desired outcome, i.e. 50% LZO compressibility
with little zbud fragmentation.

>So zsmalloc may blow zbud away on a kernel compile benchmark but, if both 
> were
>runners, zsmalloc is a sprinter and zbud is a marathoner.  Perhaps the best
>solution is to offer both?

Since frontswap pages are not reclaimable, density matters a
lot and reclaimability doesn't matter at all.  In what case,
would zbud work better that zsmalloc in this code?

> C) Zcache uses zbud(v1) for cleancache pages and includes a shrinker which
>reclaims pairs of zpages to release whole pageframes, but there is
>no attempt to shrink/reclaim cleanache pageframes in LRU order.
>It would also be nice if single-cleancache-pageframe reclaim could
>be implemented.

zbud does try to reclaim pages in an LRU-ish order.

There are three lists: the unused list, the unbuddied list,
and the buddied list.  The reclaim is done in density order
first (unused -> unbuddied -> buddied) to maximize the
number of compressed pages zbud can keep around.  But each
list is in LRU-ish order since new zpages are added at the
tail and reclaim starts from the head.  I say LRU-ish order
because the zpages can move between the unbuddied and
buddied lists as single buddies are added or removed which
causes them to lose their LRU order in the lists.  So it's
not purely LRU, but it's not random either.

Not sure what you mean by "single-cleancache-pageframe
reclaim".  Is that zbud_evict_pages(1)?

> D) Ramster is built on top of zcache, but required a handful of changes
>(on the order of 100 lines).  Due to various circumstances, ramster was
>submitted as a fork of zcache with the intent to unfork as soon as
>possible.  The proposal to promote the older zcache perpetuates that fork,

It doesn't perpetuate the fork.  It encourages incremental
change to zcache to accommodate new features, namely
Ramster, as opposed to a unilateral rewrite of zcache.

>requiring fixes in multiple places, whereas the 

Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-08 Thread Nitin Gupta

On 09/07/2012 07:37 AM, Konrad Rzeszutek Wilk wrote:

significant design challenges exist, many of which are already resolved in
the new codebase ("zcache2").  These design issues include:

.. snip..

Before other key mm maintainers read and comment on zcache, I think
it would be most wise to move to a codebase which resolves the known design
problems or, at least to thoroughly discuss and debunk the design issues
described above.  OR... it may be possible to identify and pursue some
compromise plan.  In any case, I believe the promotion proposal is premature.


Thank you for the feedback!

I took your comments and pasted them in this patch.

Seth, Robert, Minchan, Nitin, can you guys provide some comments pls,
so we can put them as a TODO pls or modify the patch below.

Oh, I think I forgot Andrew's comment which was:

  - Explain which workloads this benefits and provide some benchmark data.
This should help in narrowing down in which case we know zcache works
well and in which it does not.

My TODO's were:

  - Figure out (this could be - and perhaps should be in frontswap) a
determination whether this swap is quite fast and the CPU is slow
(or taxed quite heavily now), so as to not slow the currently executing
workloads.
  - Work out automatic benchmarks in three categories: database (I am going to 
use
swing for that), compile (that one is easy), and firefox tab browsers
overloading.


 From bd85d5fa0cc231f2779f3209ee62b755caf3aa9b Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Fri, 7 Sep 2012 10:21:01 -0400
Subject: [PATCH] zsmalloc/zcache: TODO list.

Adding in comments by Dan.

Signed-off-by: Konrad Rzeszutek Wilk 
---
  drivers/staging/zcache/TODO   |   21 +
  drivers/staging/zsmalloc/TODO |   17 +
  2 files changed, 38 insertions(+), 0 deletions(-)
  create mode 100644 drivers/staging/zcache/TODO
  create mode 100644 drivers/staging/zsmalloc/TODO

diff --git a/drivers/staging/zcache/TODO b/drivers/staging/zcache/TODO
new file mode 100644
index 000..bf19a01
--- /dev/null
+++ b/drivers/staging/zcache/TODO
@@ -0,0 +1,21 @@
+
+A) Andrea Arcangeli pointed out and, after some deep thinking, I came
+   to agree that zcache _must_ have some "backdoor exit" for frontswap
+   pages [2], else bad things will eventually happen in many workloads.
+   This requires some kind of reaper of frontswap'ed zpages[1] which "evicts"
+   the data to the actual swap disk.  This reaper must ensure it can reclaim
+   _full_ pageframes (not just zpages) or it has little value.  Further the
+   reaper should determine which pageframes to reap based on an LRU-ish
+   (not random) approach.
+
+B) Zcache uses zbud(v1) for cleancache pages and includes a shrinker which
+   reclaims pairs of zpages to release whole pageframes, but there is
+   no attempt to shrink/reclaim cleanache pageframes in LRU order.
+   It would also be nice if single-cleancache-pageframe reclaim could
+   be implemented.
+
+C) Offer a mechanism to select whether zbud or zsmalloc should be used.
+   This should be for either cleancache or frontswap pages. Meaning there
+   are four choices: cleancache and frontswap using zbud; cleancache and
+   frontswap using zsmalloc; cleancache using zsmalloc, frontswap using zbud;
+   cleacache using zbud, and frontswap using zsmalloc.
diff --git a/drivers/staging/zsmalloc/TODO b/drivers/staging/zsmalloc/TODO
new file mode 100644
index 000..b1debad
--- /dev/null
+++ b/drivers/staging/zsmalloc/TODO
@@ -0,0 +1,17 @@
+
+A) Zsmalloc has potentially far superior density vs zbud because zsmalloc can
+   pack more zpages into each pageframe and allows for zpages that cross 
pageframe
+   boundaries.  But, (i) this is very data dependent... the average compression
+   for LZO is about 2x.  The frontswap'ed pages in the kernel compile benchmark
+   compress to about 4x, which is impressive but probably not representative of
+   a wide range of zpages and workloads.  And (ii) there are many historical
+   discussions going back to Knuth and mainframes about tight packing of 
data...
+   high density has some advantages but also brings many disadvantages related 
to
+   fragmentation and compaction.  Zbud is much less aggressive (max two zpages
+   per pageframe) but has a similar density on average data, without the
+   disadvantages of high density.
+
+   So zsmalloc may blow zbud away on a kernel compile benchmark but, if both 
were
+   runners, zsmalloc is a sprinter and zbud is a marathoner.  Perhaps the best
+   solution is to offer both?
+



The problem is that zbud performs well only when a (compressed) page is 
either PAGE_SIZE/2 - e or PAGE_SIZE - e, where e is small. So, even if 
the average compression ratio is 2x (which is hard to believe), a 
majority of sizes can actually end up in PAGE_SIZE/2 + e bucket and zbud 
will still give bad performance.  For instance, consider these histograms:


# Created tar of /usr/lib (2GB) on a f

RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-17 Thread Dan Magenheimer
> From: Nitin Gupta [mailto:ngu...@vflare.org]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> 
> The problem is that zbud performs well only when a (compressed) page is
> either PAGE_SIZE/2 - e or PAGE_SIZE - e, where e is small. So, even if
> the average compression ratio is 2x (which is hard to believe), a
> majority of sizes can actually end up in PAGE_SIZE/2 + e bucket and zbud
> will still give bad performance.  For instance, consider these histograms:

Whoa whoa whoa.  This is very wrong.  Zbud handles compressed pages
of any range that fits in a pageframe (same, almost, as zsmalloc).
Unless there is some horrible bug you found...

Zbud _does_ require the _distribution_ of zsize to be roughly
centered around PAGE_SIZE/2 (or less).  Is that what you meant?
If so, the following numbers you posted don't make sense to me.
Could you be more explicit on what the numbers mean?

Also, as you know, unlike zram, the architecture of tmem/frontswap
allows zcache to reject any page, so if the distribution of zsize
exceeds PAGE_SIZE/2, some pages can be rejected (and thus passed
through to swap).  This safety valve already exists in zcache (and zcache2)
to avoid situations where zpages would otherwise significantly
exceed half of total pageframes allocated.  IMHO this is a
better policy than accepting a large number of poorly-compressed pages,
i.e. if every data page compresses down from 4096 bytes to 4032
bytes, zsmalloc stores them all (thus using very nearly one pageframe
per zpage), whereas zbud avoids the anomalous page sequence altogether.
 
> # Created tar of /usr/lib (2GB) on a fairly loaded Linux system and
> compressed page-by-page using LZO:
> 
> # first two fields: bin start, end.  Third field: compressed size
> 32 286 7644
> :
> 3842 4096 3482
> 
> The only (approx) sweetspots for zbud are 1810-2064 and 3842-4096 which
> covers only a small fraction of pages.
> 
> # same page-by-page compression for 220MB ISO from project Gutenberg:
> 32 286 70
> :
> 3842 4096 804
> 
> Again very few pages in zbud favoring bins.
> 
> So, we really need zsmalloc style allocator which handles sizes all over
> the spectrum. But yes, compaction remains far easier to implement on zbud.

So it remains to be seen if a third choice exists (which might be either
an enhanced zbud or an enhanced zsmalloc), right?

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-17 Thread Nitin Gupta
On Mon, Sep 17, 2012 at 1:42 PM, Dan Magenheimer
 wrote:
>> From: Nitin Gupta [mailto:ngu...@vflare.org]
>> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
>>
>> The problem is that zbud performs well only when a (compressed) page is
>> either PAGE_SIZE/2 - e or PAGE_SIZE - e, where e is small. So, even if
>> the average compression ratio is 2x (which is hard to believe), a
>> majority of sizes can actually end up in PAGE_SIZE/2 + e bucket and zbud
>> will still give bad performance.  For instance, consider these histograms:
>
> Whoa whoa whoa.  This is very wrong.  Zbud handles compressed pages
> of any range that fits in a pageframe (same, almost, as zsmalloc).
> Unless there is some horrible bug you found...
>
> Zbud _does_ require the _distribution_ of zsize to be roughly
> centered around PAGE_SIZE/2 (or less).  Is that what you meant?

Yes, I meant this only: though zbud can handle any size, it isn't
efficient for any size not centered around PAGESIZE/2.

> If so, the following numbers you posted don't make sense to me.
> Could you be more explicit on what the numbers mean?
>

This is a histogram of the compressed sizes when files were
compressed in 4K chunks. The first number is the lower limit of
bin size, second number of upper limit and third number is the
number of pages that fall in that bin.

> Also, as you know, unlike zram, the architecture of tmem/frontswap
> allows zcache to reject any page, so if the distribution of zsize
> exceeds PAGE_SIZE/2, some pages can be rejected (and thus passed
> through to swap).  This safety valve already exists in zcache (and zcache2)
> to avoid situations where zpages would otherwise significantly
> exceed half of total pageframes allocated.  IMHO this is a
> better policy than accepting a large number of poorly-compressed pages,

Long time back zram had the ability of forwarding poorly compressed
pages to a backing swap device but that was removed to cleanup the
code and help with upstream promotion.  Once zram goes out of staging,
I will try getting that functionality back if there is enough demand.


> i.e. if every data page compresses down from 4096 bytes to 4032
> bytes, zsmalloc stores them all (thus using very nearly one pageframe
> per zpage), whereas zbud avoids the anomalous page sequence altogether.
>

This ability to letting pages go to physical device is not really
highlighting anything
of zbud vs zsmalloc.  That ability is really zram vs frontswap stuff
which is a different
thing.


>> # Created tar of /usr/lib (2GB) on a fairly loaded Linux system and
>> compressed page-by-page using LZO:
>>
>> # first two fields: bin start, end.  Third field: compressed size
>> 32 286 7644
>> :
>> 3842 4096 3482
>>
>> The only (approx) sweetspots for zbud are 1810-2064 and 3842-4096 which
>> covers only a small fraction of pages.
>>
>> # same page-by-page compression for 220MB ISO from project Gutenberg:
>> 32 286 70
>> :
>> 3842 4096 804
>>
>> Again very few pages in zbud favoring bins.
>>
>> So, we really need zsmalloc style allocator which handles sizes all over
>> the spectrum. But yes, compaction remains far easier to implement on zbud.
>
> So it remains to be seen if a third choice exists (which might be either
> an enhanced zbud or an enhanced zsmalloc), right?
>

Yes, definitely. At least for non-ephemeral pages (zram), zsmalloc seems to be
a better choice even without compaction. As for zcache, I don't understand its
codebase anyways so not sure how exactly compaction would interact with it,
so I think zcache should stay with zbud.

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-11-02 Thread Konrad Rzeszutek Wilk
On Fri, Oct 26, 2012 at 04:45:14PM -0500, Seth Jennings wrote:
> On 10/02/2012 01:17 PM, Dan Magenheimer wrote:
> > If so,  and move forward?  What do you see as next steps?
> 
> I've been reviewing the changes between zcache and zcache2 and getting
> a feel for the scope and direction of those changes.
> 
> - Getting the community engaged to review zcache1 at ~2300SLOC was
>   difficult.
> - Adding RAMSter has meant adding RAMSter-specific code broadly across
>   zcache and increases the size of code to review to ~7600SLOC.

One can ignore the drivers/staging/ramster/ramster* directory.

> - The changes have blurred zcache's internal layering and increased
>   complexity beyond what a simple SLOC metric can reflect.

Not sure I see a problem.
> - Getting the community engaged in reviewing zcache2 will be difficult
>   and will require an exceptional amount of effort for maintainer and
>   reviewer.

Exceptional? I think if we start trimming the code down and moving it
around - and moving the 'ramster' specific calls to header files to
not be compiled - that should make it easier to read.

I mean the goal of any review is to address all of the concern you saw
when you were looking over the code. You probably have a page of
questions you asked yourself - and in all likehood the other reviewers
would ask the same questions. So if you address them - either by
giving comments or making the code easier to read - that would do it.

> 
> It is difficult for me to know when it could be ready for mainline and
> production use.  While zcache2 isn't getting broad code reviews yet,
> how do suggest managing that complexity to make the code maintainable
> and get it reviewed?

There are Mel's feedback that is also applicable to zcache2.

Thanks for looking at the code!
> 
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-10-04 Thread Seth Jennings
On 10/02/2012 01:17 PM, Dan Magenheimer wrote:
> If so,  and move forward?  What do you see as next steps?

I'll need to get up to speed on the new codebase before I can answer
this.  I should be able to answer by early next week.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-10-26 Thread Seth Jennings
On 10/02/2012 01:17 PM, Dan Magenheimer wrote:
> If so,  and move forward?  What do you see as next steps?

I've been reviewing the changes between zcache and zcache2 and getting
a feel for the scope and direction of those changes.

- Getting the community engaged to review zcache1 at ~2300SLOC was
  difficult.
- Adding RAMSter has meant adding RAMSter-specific code broadly across
  zcache and increases the size of code to review to ~7600SLOC.
- The changes have blurred zcache's internal layering and increased
  complexity beyond what a simple SLOC metric can reflect.
- Getting the community engaged in reviewing zcache2 will be difficult
  and will require an exceptional amount of effort for maintainer and
  reviewer.

It is difficult for me to know when it could be ready for mainline and
production use.  While zcache2 isn't getting broad code reviews yet,
how do suggest managing that complexity to make the code maintainable
and get it reviewed?

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-09-27 Thread Seth Jennings
On 09/24/2012 02:17 PM, Dan Magenheimer wrote:
>> From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
>> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> 
> Once again, you have completely ignored a reasonable
> compromise proposal.  Why?

We have users who are interested in zcache and we had hoped for a path
that didn't introduce an additional 6-12 month delay.  I am talking
with our team to determine a compromise that resolves this, but also
gets this feature into the hands of users that they can work with.
I'll be away from email until next week, but I wanted to get something
out to the mailing list before I left.  I need a couple days to give a
more definite answer.

Seth


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-27 Thread Dan Magenheimer
> From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> 
> On 09/24/2012 02:17 PM, Dan Magenheimer wrote:
> >> From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> >> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> >
> > Once again, you have completely ignored a reasonable
> > compromise proposal.  Why?
> 
> We have users who are interested in zcache and we had hoped for a path
> that didn't introduce an additional 6-12 month delay.  I am talking
> with our team to determine a compromise that resolves this, but also
> gets this feature into the hands of users that they can work with.
> I'll be away from email until next week, but I wanted to get something
> out to the mailing list before I left.  I need a couple days to give a
> more definite answer.

Hi Seth --

James Bottomley's estimate of the additional 6-12 month
addition to the acceptance cycle was (quote) "every time I've
seen a rewrite done".  Especially with zsmalloc available
as an option in zcache2 (see separately-posted patch),
zcache2 is _really_ _not_ a rewrite, certainly not for
frontswap-centric workloads, which is I think where your
efforts have always been focused (and, I assume, your
future users).  I suspect if you walk through the code
paths in zcache2+zsmalloc, you'll find they are nearly
identical to zcache1, other than some very minor cleanups,
and some changes where Mel gave some feedback which would
need to be cleaned up in zcache1 before promotion anyway
(and happen to already have been cleaned up in zcache2).
The more invasive design changes are all on the zbud paths.

Of course, I'm of the opinion that neither zcache1 nor
zcache2 would be likely to be promoted for at least another
cycle or two, so if you go with zcache2+zsmalloc as the compromise
and it still takes six months for promotion, I hope you don't
blame that on the "rewrite". ;-)

Anyway, looking forward (hopefully) to working with you on
a good compromise.  It would be nice to get back to coding
and working together on a single path forward for zcache
as there is a lot of work to do!

Have a great weekend!

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mm: add support for zsmalloc and zcache

2012-10-02 Thread Seth Jennings
On 09/27/2012 05:07 PM, Dan Magenheimer wrote:
> Of course, I'm of the opinion that neither zcache1 nor
> zcache2 would be likely to be promoted for at least another
> cycle or two, so if you go with zcache2+zsmalloc as the compromise
> and it still takes six months for promotion, I hope you don't
> blame that on the "rewrite". ;-)
> 
> Anyway, looking forward (hopefully) to working with you on
> a good compromise.  It would be nice to get back to coding
> and working together on a single path forward for zcache
> as there is a lot of work to do!

We want to see zcache moving forward so that it can get out of staging
and into the hands of end users.  From the direction the discussion
has taken, replacing zcache with the new code appears to be the right
compromise for the situation.  Moving to the new zcache code resets
the clock so I would like to know that we're all on the same track...

1- Promotion must be the top priority, focus needs to be on making the
code production ready rather than adding more features.

2- The code is in the community and development must be done in
public, no further large private rewrites.

3- Benchmarks need to be agreed on, Mel has suggested some of the
MMTests. We need a way to talk about performance so we can make
comparisions, avoid regressions, and talk about promotion criteria.
They should be something any developer can run.

4- Let's investigate breaking ramster out of zcache so that zcache
remains a separately testable building block; Konrad was looking at
this I believe.  RAMSTer adds another functional mode for zcache and
adds to the difficulty of validating patches.  Not every developer
has a cluster of machines to validate RAMSter.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] mm: add support for zsmalloc and zcache

2012-10-02 Thread Dan Magenheimer
> From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache
> 
> On 09/27/2012 05:07 PM, Dan Magenheimer wrote:
> > Of course, I'm of the opinion that neither zcache1 nor
> > zcache2 would be likely to be promoted for at least another
> > cycle or two, so if you go with zcache2+zsmalloc as the compromise
> > and it still takes six months for promotion, I hope you don't
> > blame that on the "rewrite". ;-)
> >
> > Anyway, looking forward (hopefully) to working with you on
> > a good compromise.  It would be nice to get back to coding
> > and working together on a single path forward for zcache
> > as there is a lot of work to do!
> 
> We want to see zcache moving forward so that it can get out of staging
> and into the hands of end users.  From the direction the discussion
> has taken, replacing zcache with the new code appears to be the right
> compromise for the situation.  Moving to the new zcache code resets
> the clock so I would like to know that we're all on the same track...
> 
> 1- Promotion must be the top priority, focus needs to be on making the
> code production ready rather than adding more features.

Agreed.

> 2- The code is in the community and development must be done in
> public, no further large private rewrites.

Agreed.

> 3- Benchmarks need to be agreed on, Mel has suggested some of the
> MMTests. We need a way to talk about performance so we can make
> comparisions, avoid regressions, and talk about promotion criteria.
> They should be something any developer can run.

Agreed.

> 4- Let's investigate breaking ramster out of zcache so that zcache
> remains a separately testable building block; Konrad was looking at
> this I believe.  RAMSTer adds another functional mode for zcache and
> adds to the difficulty of validating patches.  Not every developer
> has a cluster of machines to validate RAMSter.

In zcache2 (which is now in Linus' 3.7-rc0 tree in the ramster directory),
ramster is already broken out.  It can be disabled either at compile-time
(simply by not specifying CONFIG_RAMSTER) or at run-time (by using
"zcache" as the kernel boot parameter instead of "ramster").

So... also agreed.  RAMster will not be allowed to get in the
way of promotion or performance as long as any reasonable attempt
is made to avoid breaking the existing hooks to RAMster.
(This only because I expect future functionality to also
use these hooks so would like to avoid breaking them, if possible.)

Does this last clarification work for you, Seth?

If so,  and move forward?  What do you see as next steps?

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH] zcache2 on PPC64 (Was: [RFC] mm: add support for zsmalloc and zcache)

2012-09-25 Thread Dan Magenheimer
Attached patch applies to staging-next and I _think_ should
fix the reported problem where zbud in zcache2 does not
work on a PPC64 with PAGE_SIZE!=12.  I do not have a machine
to test this so testing by others would be appreciated.

Ideally there should also be a BUILD_BUG_ON to ensure
PAGE_SHIFT * 2 + 2 doesn't exceed BITS_PER_LONG, but
let's see if this fixes the problem first.

Apologies if there are line breaks... I can't send this from
a linux mailer right now.  If it is broken, let me know,
and I will re-post tomorrow... though it should be easy
to apply manually for test purposes.

Signed-off-by: Dan Magenheimer 

diff --git a/drivers/staging/ramster/zbud.c b/drivers/staging/ramster/zbud.c
index a7c4361..6921af3 100644
--- a/drivers/staging/ramster/zbud.c
+++ b/drivers/staging/ramster/zbud.c
@@ -103,8 +103,8 @@ struct zbudpage {
struct {
unsigned long space_for_flags;
struct {
-   unsigned zbud0_size:12;
-   unsigned zbud1_size:12;
+   unsigned zbud0_size:PAGE_SHIFT;
+   unsigned zbud1_size:PAGE_SHIFT;
unsigned unevictable:2;
};
struct list_head budlist;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] zcache2 on PPC64 (Was: [RFC] mm: add support for zsmalloc and zcache)

2012-09-28 Thread Mel Gorman
On Tue, Sep 25, 2012 at 04:31:01PM -0700, Dan Magenheimer wrote:
> Attached patch applies to staging-next and I _think_ should
> fix the reported problem where zbud in zcache2 does not
> work on a PPC64 with PAGE_SIZE!=12.  I do not have a machine
> to test this so testing by others would be appreciated.
> 

Seth, can you verify?

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] zcache2 on PPC64 (Was: [RFC] mm: add support for zsmalloc and zcache)

2012-10-02 Thread Seth Jennings
On 09/28/2012 08:31 AM, Mel Gorman wrote:
> On Tue, Sep 25, 2012 at 04:31:01PM -0700, Dan Magenheimer wrote:
>> Attached patch applies to staging-next and I _think_ should
>> fix the reported problem where zbud in zcache2 does not
>> work on a PPC64 with PAGE_SIZE!=12.  I do not have a machine
>> to test this so testing by others would be appreciated.
>>
> 
> Seth, can you verify?

Yes, this patch does prevent the crash on PPC64.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH] zsmalloc added back to zcache2 (Was: [RFC] mm: add support for zsmalloc and zcache)

2012-09-25 Thread Dan Magenheimer
Attached patch applies to staging-next and adds zsmalloc
support, optionally at compile-time and run-time, back into
zcache (aka zcache2).  It is only lightly tested and does
not provide some of the debug info from old zcache (aka zcache1)
because it needs to be converted from sysfs to debugfs.
I'll leave that as an exercise for someone else as I'm
not sure if any of those debug fields are critical to
anyone's needs and some of the datatypes are not supported
by debugfs.

Apologies if there are line breaks... I can't send this from
a linux mailer right now.  If it is broken, let me know,
and I will re-post tomorrow.

Signed-off-by: Dan Magenheimer 

diff --git a/drivers/staging/ramster/Kconfig b/drivers/staging/ramster/Kconfig
index 843c541..28403cc 100644
--- a/drivers/staging/ramster/Kconfig
+++ b/drivers/staging/ramster/Kconfig
@@ -15,6 +15,17 @@ config ZCACHE2
  again in the future.  Until then, zcache2 is a single-node
  version of ramster.
 
+config ZCACHE_ZSMALLOC
+   bool "Allow use of zsmalloc allocator for compression of swap pages"
+   depends on ZSMALLOC=y
+   default n
+   help
+ Zsmalloc is a much more efficient allocator for compresssed
+ pages but currently has some design deficiencies in that it
+ does not support reclaim nor compaction.  Select this if
+ you are certain your workload will fit or has mostly short
+ running processes.
+
 config RAMSTER
bool "Cross-machine RAM capacity sharing, aka peer-to-peer tmem"
depends on CONFIGFS_FS=y && SYSFS=y && !HIGHMEM && ZCACHE2=y
diff --git a/drivers/staging/ramster/zcache-main.c 
b/drivers/staging/ramster/zcache-main.c
index a09dd5c..9a4d780 100644
--- a/drivers/staging/ramster/zcache-main.c
+++ b/drivers/staging/ramster/zcache-main.c
@@ -26,6 +26,12 @@
 #include 
 #include 
 #include "tmem.h"
+#ifdef CONFIG_ZCACHE_ZSMALLOC
+#include "../zsmalloc/zsmalloc.h"
+static int zsmalloc_enabled;
+#else
+#define zsmalloc_enabled 0
+#endif
 #include "zcache.h"
 #include "zbud.h"
 #include "ramster.h"
@@ -182,6 +188,35 @@ static unsigned long zcache_last_inactive_anon_pageframes;
 static unsigned long zcache_eph_nonactive_puts_ignored;
 static unsigned long zcache_pers_nonactive_puts_ignored;
 
+#ifdef CONFIG_ZCACHE_ZSMALLOC
+#define ZS_CHUNK_SHIFT 6
+#define ZS_CHUNK_SIZE  (1 << ZS_CHUNK_SHIFT)
+#define ZS_CHUNK_MASK  (~(ZS_CHUNK_SIZE-1))
+#define ZS_NCHUNKS (((PAGE_SIZE - sizeof(struct tmem_handle)) & \
+   ZS_CHUNK_MASK) >> ZS_CHUNK_SHIFT)
+#define ZS_MAX_CHUNK   (ZS_NCHUNKS-1)
+
+/* total number of persistent pages may not exceed this percentage */
+static unsigned int zv_page_count_policy_percent = 75;
+/*
+ * byte count defining poor compression; pages with greater zsize will be
+ * rejected
+ */
+static unsigned int zv_max_zsize = (PAGE_SIZE / 8) * 7;
+/*
+ * byte count defining poor *mean* compression; pages with greater zsize
+ * will be rejected until sufficient better-compressed pages are accepted
+ * driving the mean below this threshold
+ */
+static unsigned int zv_max_mean_zsize = (PAGE_SIZE / 8) * 5;
+
+static atomic_t zv_curr_dist_counts[ZS_NCHUNKS];
+static atomic_t zv_cumul_dist_counts[ZS_NCHUNKS];
+static atomic_t zcache_curr_pers_pampd_count = ATOMIC_INIT(0);
+static unsigned long zcache_curr_pers_pampd_count_max;
+
+#endif
+
 #ifdef CONFIG_DEBUG_FS
 #include 
 #definezdfsdebugfs_create_size_t
@@ -370,6 +405,13 @@ int zcache_new_client(uint16_t cli_id)
if (cli->allocated)
goto out;
cli->allocated = 1;
+#ifdef CONFIG_ZCACHE_ZSMALLOC
+   if (zsmalloc_enabled) {
+   cli->zspool = zs_create_pool("zcache", ZCACHE_GFP_MASK);
+   if (cli->zspool == NULL)
+   goto out;
+   }
+#endif
ret = 0;
 out:
return ret;
@@ -632,6 +674,105 @@ out:
return pampd;
 }
 
+#ifdef CONFIG_ZCACHE_ZSMALLOC
+struct zv_hdr {
+   uint32_t pool_id;
+   struct tmem_oid oid;
+   uint32_t index;
+   size_t size;
+};
+
+static unsigned long zv_create(struct zcache_client *cli, uint32_t pool_id,
+   struct tmem_oid *oid, uint32_t index,
+   struct page *page)
+{
+   struct zv_hdr *zv;
+   int chunks;
+   unsigned long curr_pers_pampd_count, total_zsize, zv_mean_zsize;
+   unsigned long handle = 0;
+   void *cdata;
+   unsigned clen;
+
+   curr_pers_pampd_count = atomic_read(&zcache_curr_pers_pampd_count);
+   if (curr_pers_pampd_count >
+   (zv_page_count_policy_percent * totalram_pages) / 100)
+   goto out;
+   zcache_compress(page, &cdata, &clen);
+   /* reject if compression is too poor */
+   if (clen > zv_max_zsize) {
+   zcache_compress_poor++;
+   goto out;
+   }
+   /* reject if mean compression is too poor */
+   if ((clen > zv_max_mean_zsize) && (curr_pers_pa