On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstr...@ieee.org> wrote:
> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2hor...@gmail.com> wrote:
>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minc...@kernel.org> wrote:
>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minc...@kernel.org> wrote:
>>>> > Hello David,
>>>> >
>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minc...@kernel.org> wrote:
>>>> >> > Since zram has no control feature to limit memory usage,
>>>> >> > it makes hard to manage system memrory.
>>>> >> >
>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>>> >> > a limit so that zram could fail allocation once it reaches
>>>> >> > the limit.
>>>> >> >
>>>> >> > In addition, user could change the limit in runtime so that
>>>> >> > he could manage the memory more dynamically.
>>>> >> >
>>>> >> - Default is no limit so it doesn't break old behavior.
>>>> >> + Initial state is no limit so it doesn't break old behavior.
>>>> >>
>>>> >> I understand your previous post now.
>>>> >>
>>>> >> I was saying that setting to either a null value or garbage
>>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>>> >> removes the limit.
>>>> >>
>>>> >> I think this is "surprise" behaviour and rather the null case should
>>>> >> return  -EINVAL
>>>> >> The test below should be "good enough" though not catching all garbage.
>>>> >
>>>> > Thanks for suggesting but as I said, it should be fixed in memparse 
>>>> > itself,
>>>> > not caller if it is really problem so I don't want to touch it in this
>>>> > patchset. It's not critical for adding the feature.
>>>> >
>>>>
>>>> I've looked into the memparse function more since we talked.
>>>> I do believe a wrapper function around it for the typical use by sysfs 
>>>> would
>>>> be very valuable.
>>>
>>> Agree.
>>>
>>>> However, there is nothing wrong with memparse itself that needs to be 
>>>> fixed.
>>>>
>>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>>> It provides everything that a caller needs to manage the token that it
>>>> processes.
>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>>
>>> Maybe strict_memparse would be better to protect such things so you
>>> could find several places to clean it up.
>>>
>>>>
>>>> The fact that other callers don't check the return pointer value to
>>>> see if only a null
>>>> string was processed, is not its fault.
>>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>>> functions use it in a given manner does not means that is correct -
>>>> nor that it is
>>>> incorrect for that "knob". Some attributes could be just as valid with
>>>> null zeros.
>>>>
>>>> And you are correct, to disambiguate the zero is not required for the
>>>> limit feature.
>>>> Your original patch which disallowed zero was full feature for mem_limit.
>>>> It is the requested non-crucial feature to allow zero to reestablish
>>>> the initial state
>>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>>  when garbage is written.
>>>>
>>>> The final argument is that if we release this feature as is the 
>>>> undocumented
>>>>  functionality could be relied upon, and when later fixed: user space 
>>>> breaks.
>>>
>>> I don't get it. Why does it break userspace?
>>> The sysfs-block-zram says "0" means disable the limit.
>>> If someone writes *garabge* but work as if disabling the limit,
>>> it's not a right thing and he already broke although it worked
>>> so it would be not a problem if we fix later.
>>> (ie, we don't need to take care of broken userspace)
>>> Am I missing your point?
>>>
>>
>> Perhaps you are missing my point, perhaps ignoring or dismissing.
>>
>> Basically, if a facility works in a useful way, even if it was designed for
>> different usage, that becomes the "accepted" interface/usage.
>> The developer may not have intended that usage or may even considered
>> it wrong and a broken usage, but it is what it is and people become
>>  reliant on that behaviour.
>>
>> Case in point is memparse itself.
>>
>> The developer intentionally sets the return pointer because that is the
>> only value that can be validated for correct performance.
>> The return value allows -ve so the standard error message passing is not 
>> valid.
>> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> The developer could consider that absurd and fundamentally broken.
>> But to the user it is a valid situation, because (perhaps) it can't be
>> bothered to handle error cases.
>>
>> So, who is to blame.
>> You say memparse, that it is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>>  And I say  mem_limit_store is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>
> I think we should look at what the rest of the kernel does as far as
> checking memparse results.  It appears to be a mix of some code
> checking memparse while others don't.  The most common way to check
> appears to be to verify that memparse actually parsed at least 1
> character, e.g.:
>   oldp = p;
>   mem_size = memparse(p, &p);
>   if (p == oldp)
>     return -EINVAL;
>
> although other places where 0 isn't valid can simply check for that:
>   mem_size = memparse(p, &p);
>   /* don't remove all of memory when handling "mem={invalid}" param */
>   if (mem_size == 0)
>     return -EINVAL;
>
> or even the other memparse use in zram_drv.c:
>   disksize = memparse(buf, NULL);
>   if (!disksize)
>     return -EINVAL;
>
>
> And there seem to be other places where (maybe?) there's no checking
> at all.  However, it also seems like many cases of memparse usage are
> looking for a non-zero value, and therefore they can either
> immediately check for zero/invalid or (possibly) later code has checks
> to avoid using any zero value.  In this case though, 0 is a valid
> value.  So, while I agree that if a user passes an invalid (i.e.
> non-numeric) value it's clearly user error, it might be closer to the
> apparent (although unwritten AFAICT) memparse usage api to check the
> result for validity; in our case a simple check if at least 1 char was
> parsed is all that's needed, e.g.:
>
> {
>   u64 limit;
>   char *tmp = buf;
>   struct zram *zram = dev_to_zram(dev);
>
>   limit = memparse(buf, &tmp);
>   if (buf == tmp) /* no chars parsed, invalid input */
>     return -EINVAL;
>   down_write(&zram->init_lock);


Thank you Dan, for this clear, unoffensive and I believe compelling analysis.

I have much to learn.

> ...
>
>
> Separate from this patch, it would also help if the lib/cmdline.c
> memparse doc was at least updated to clarify when the result should be
> checked for validity (e.g. always, or at least when the result is 0)
> and how best to do that (e.g. if 0 is an invalid value, just check if
> the result is 0; if 0 is a possible valid value, check if any chars
> were parsed).
>
>

I'd argue that the code is not the place for this usage recommendation.
But rather an expansion of the support doc for sysfs
on how to use such parsing/validation routines.

I agree with Minchan that these helper functions could be improved
for specific use by sysfs.
 And I will pursue this. (and maybe the documentation?)


>>
>> The difference is that memparse cannot stop being abused
>> (C allows the NULL argument and extensive tricks are required to address 
>> that)
>> however, we can readily fix mem_limit_store and ensure
>> 1) no regression when the interface IS fixed and
>> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>>
>>
>>>> They say getting API right is a difficult exercise. I suggest, if we
>>>> don't insisting on
>>>>  an explicit zero we have the API wrong.
>>>>
>>>> I don't think you disagreed, just that the burden to get it correct
>>>> lay elsewhere.
>>>>
>>>> If that is the case it doesn't really matter, we cannot release this
>>>> interface until
>>>>  it is corrected wherever it must be.
>>>>
>>>> And my zero check was a poor hack.
>>>>
>>>> I should have explicitly checked the returned pointer value.
>>>>
>>>> I will send that proposed revision, and hopefully you will consider it
>>>> for inclusion.
>>>>
>>>>
>>>>
>>>>
>>>> >>
>>>> >> >
>>>> >> > Signed-off-by: Minchan Kim <minc...@kernel.org>
>>>> >> > ---
>>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>>> >> >  drivers/block/zram/zram_drv.c              | 41 
>>>> >> > ++++++++++++++++++++++++++++++
>>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>>> >> >
>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram 
>>>> >> > b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > index 70ec992514d0..b8c779d64968 100644
>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > @@ -119,3 +119,13 @@ Description:
>>>> >> >                 efficiency can be calculated using compr_data_size 
>>>> >> > and this
>>>> >> >                 statistic.
>>>> >> >                 Unit: bytes
>>>> >> > +
>>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>>> >> > +Date:          August 2014
>>>> >> > +Contact:       Minchan Kim <minc...@kernel.org>
>>>> >> > +Description:
>>>> >> > +               The mem_limit file is read/write and specifies the 
>>>> >> > amount
>>>> >> > +               of memory to be able to consume memory to store store
>>>> >> > +               compressed data. The limit could be changed in run 
>>>> >> > time
>>>> >> > -               and "0" is default which means disable the limit.
>>>> >> > +               and "0" means disable the limit. No limit is the 
>>>> >> > initial state.
>>>> >>
>>>> >> there should be no default in the API.
>>>> >
>>>> > Thanks.
>>>> >
>>>> >>
>>>> >> > +               Unit: bytes
>>>> >> > diff --git a/Documentation/blockdev/zram.txt 
>>>> >> > b/Documentation/blockdev/zram.txt
>>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>>> >> > --- a/Documentation/blockdev/zram.txt
>>>> >> > +++ b/Documentation/blockdev/zram.txt
>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater 
>>>> >> > than twice the size of memory
>>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 
>>>> >> > 0.1% of the
>>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>>> >> >
>>>> >> > -5) Activate:
>>>> >> > +5) Set memory limit: Optional
>>>> >> > +       Set memory limit by writing the value to sysfs node 
>>>> >> > 'mem_limit'.
>>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>>> >> > +       In addition, you could change the value in runtime.
>>>> >> > +       Examples:
>>>> >> > +           # limit /dev/zram0 with 50MB memory
>>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # Using mem suffixes
>>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # To disable memory limit
>>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +6) Activate:
>>>> >> >         mkswap /dev/zram0
>>>> >> >         swapon /dev/zram0
>>>> >> >
>>>> >> >         mkfs.ext4 /dev/zram1
>>>> >> >         mount /dev/zram1 /tmp
>>>> >> >
>>>> >> > -6) Stats:
>>>> >> > +7) Stats:
>>>> >> >         Per-device statistics are exported as various nodes under
>>>> >> >         /sys/block/zram<id>/
>>>> >> >                 disksize
>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram 
>>>> >> > is wasteful.
>>>> >> >                 compr_data_size
>>>> >> >                 mem_used_total
>>>> >> >
>>>> >> > -7) Deactivate:
>>>> >> > +8) Deactivate:
>>>> >> >         swapoff /dev/zram0
>>>> >> >         umount /dev/zram1
>>>> >> >
>>>> >> > -8) Reset:
>>>> >> > +9) Reset:
>>>> >> >         Write any positive value to 'reset' sysfs node
>>>> >> >         echo 1 > /sys/block/zram0/reset
>>>> >> >         echo 1 > /sys/block/zram1/reset
>>>> >> > diff --git a/drivers/block/zram/zram_drv.c 
>>>> >> > b/drivers/block/zram/zram_drv.c
>>>> >> > index f0b8b30a7128..370c355eb127 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.c
>>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct 
>>>> >> > device *dev,
>>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>>> >> >  }
>>>> >> >
>>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>>> >> > +               struct device_attribute *attr, char *buf)
>>>> >> > +{
>>>> >> > +       u64 val;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       down_read(&zram->init_lock);
>>>> >> > +       val = zram->limit_pages;
>>>> >> > +       up_read(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>>> >> > +}
>>>> >> > +
>>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>>> >> > +               struct device_attribute *attr, const char *buf, 
>>>> >> > size_t len)
>>>> >> > +{
>>>> >> > +       u64 limit;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       limit = memparse(buf, NULL);
>>>> >>
>>>> >>             if (limit = 0 && buf != "0")
>>>> >>                   return  -EINVAL
>>>> >>
>>>> >> > +       down_write(&zram->init_lock);
>>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>>> >> > +       up_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return len;
>>>> >> > +}
>>>> >> > +
>>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>>> >> >                 struct device_attribute *attr, const char *buf, 
>>>> >> > size_t len)
>>>> >> >  {
>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, 
>>>> >> > struct bio_vec *bvec, u32 index,
>>>> >> >                 ret = -ENOMEM;
>>>> >> >                 goto out;
>>>> >> >         }
>>>> >> > +
>>>> >> > +       if (zram->limit_pages &&
>>>> >> > +               zs_get_total_pages(meta->mem_pool) > 
>>>> >> > zram->limit_pages) {
>>>> >> > +               zs_free(meta->mem_pool, handle);
>>>> >> > +               ret = -ENOMEM;
>>>> >> > +               goto out;
>>>> >> > +       }
>>>> >> > +
>>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>> >> >
>>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, 
>>>> >> > bool reset_capacity)
>>>> >> >         struct zram_meta *meta;
>>>> >> >
>>>> >> >         down_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       zram->limit_pages = 0;
>>>> >> > +
>>>> >> >         if (!init_done(zram)) {
>>>> >> >                 up_write(&zram->init_lock);
>>>> >> >                 return;
>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, 
>>>> >> > initstate_show, NULL);
>>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, 
>>>> >> > NULL);
>>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, 
>>>> >> > NULL);
>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>>> >> > +               mem_limit_store);
>>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>>> >> >         &dev_attr_orig_data_size.attr,
>>>> >> >         &dev_attr_compr_data_size.attr,
>>>> >> >         &dev_attr_mem_used_total.attr,
>>>> >> > +       &dev_attr_mem_limit.attr,
>>>> >> >         &dev_attr_max_comp_streams.attr,
>>>> >> >         &dev_attr_comp_algorithm.attr,
>>>> >> >         NULL,
>>>> >> > diff --git a/drivers/block/zram/zram_drv.h 
>>>> >> > b/drivers/block/zram/zram_drv.h
>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.h
>>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>>> >> >         u64 disksize;   /* bytes */
>>>> >> >         int max_comp_streams;
>>>> >> >         struct zram_stats stats;
>>>> >> > +       /*
>>>> >> > +        * the number of pages zram can consume for storing 
>>>> >> > compressed data
>>>> >> > +        */
>>>> >> > +       unsigned long limit_pages;
>>>> >> > +
>>>> >> >         char compressor[10];
>>>> >> >  };
>>>> >> >  #endif
>>>> >> > --
>>>> >> > 2.0.0
>>>> >> >
>>>> >>
>>>> >> --
>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> >> the body to majord...@kvack.org.  For more info on Linux MM,
>>>> >> see: http://www.linux-mm.org/ .
>>>> >> Don't email: <a href=mailto:"d...@kvack.org";> em...@kvack.org </a>
>>>> >
>>>> > --
>>>> > Kind regards,
>>>> > Minchan Kim
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"d...@kvack.org";> em...@kvack.org </a>
>>>
>>> --
>>> Kind regards,
>>> Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to