Re: A few ideas of engine framework

KaiGai Kohei Wed, 07 Apr 2010 19:15:58 -0700

(2010/04/08 9:36), Trond Norbye wrote:
> 
> On 4. apr. 2010, at 17.37, KaiGai Kohei wrote:
>>
>>>> $ git diff origin/reworks_1 origin/reworks_2
>>>> ->   It adds item_get_nkey() and item_get_ndata() engine APIs, to inject
>>>>     security attribute as a part of values by intermediation modules
>>>>     (such as bucket or selinux).
>>>>
>>>
>>> What is the primary motivation for doing this? I don't see why backends
>>> would "dynamically change" these values for an item. The reason I added
>>> a function to get the key and data was because one could imagine that
>>> they could be stored on different locations (or memory mapped data areas)...
>>> CAS is called through the api to allow the cas to be optional in the backend
>>> if you don't want to waste 8 bytes per item... From what I've seen earlier
>>> you chose to store your security information as a textual string after the
>>> key? you could still do that but then let nkey contain the number of bytes
>>> in the key, and keep the other information somewhere else..
>>
>> What I try to do is to store the security information as a part of the value
>> for the secondary modules, rather than just after keys.
>> In this approach, the secondary module doesn't need special treatments on the
>> items with security information, because the it just stores the given value
>> as is.
>>
>> As someone pointed out before, I don't think it is not good idea to handle
>> the security information specially and independently from existing keys and
>> values, because it is unclear whether the secondary modules pay mention about
>> this security properties.
>> If the secondary module see the security information just a part of values,
>> it shall be handled correctly. If not, it is just a bug in the secondary one.
>> So, I want to modify the value when it is delivered from the primary module
>> to the secondary one, and want to split up the item when it is delivered from
>> the primary module to the memcached core.
>>
>> The existing get_item_data() allows the primary module to modify the pointer
>> of data field, although it might be an invention of new usage, but we cannot
>> fix up the length of the data field right now,
>> The purpose of get_item_ndata() is that the primary module inject its 
>> security
>> information transparently for both of the core memcached and the secondary
>> modules.
>>
>> User
>> | ^
>> | | {key = "abcd", value = "foovarbaz"}
>> v |
>> Memcached
>> | ^
>> | | {key = "abcd", nkey=4, value = "foovarbaz", nbytes=9}
>> v |
>> Primary engine module
>> | ^
>> | | {key = "abcd", nkey=4, value = "secret\0foovarbaz", nbytes=16}
>> v |                                 ^^^^^^^^ ... transparently injected
>> Secondary engine module<---->  [it's item storage]
>>
> 
> I'm still having a hard time to see how this will work in practice... How 
> would
> you request the object from the underlying engine? The key would be longer and
> contain data you don't know about? or are you saying that engines have to call
> the function to internally determine the length of their keys for lookup?


In the allocate() method, selinux module calls the secondary allocate() with
modified length which allows to store both of security attribute and original
value.
Then, in the set() method, selinux module copies the original value on its
local buffer, and the combination of its security attribute and original value
will be delivered to the secondary engine.

It does not touch the key and nkey.

I'm not clear what means the function to internally determine the length of 
their
keys.
However, we may need to take "offset" argument to inform secondary (or third, 
...)
engine module how much length of the data field from the head is used by the 
upper
engine modules. Of course, if we have single stack, the offset is always zero.


When we request an item, get() of selinux will be called. Then, it also calls
get() of the secondary engine with same key. If found, it checks permissions
between the client and item. Then, the item will be returned, if allowed.

When memcached core references the data of the item, it calls get_item_data()
and get_item_ndata(). On get_item_data(), selinux module calls the secondary
one, and increments the returned pointer by the length of security attribute.
On get_item_ndata(), selinux module also calls the secondary one, and decrements
the returned length by the length of security attribute.
In the result, the memcached core see the pointer next to end of the security
attribute, and the length which exclude the security attribute.

This idea enables intermediating module (such as selinux engine) to inject its
metadata transparently for both of the memcached core and secondary modules.

> Personally I don't think it is a good idea to store the security attribute on
> a per item basis. you are going to waste a _lot_ of memory (we made CAS 
> optional
> so that users could save 8 bytes per item, this is going to be more). From a 
> memory
> usage perspective I guess it's better to do something like Dustins bucket 
> engine,
> and store items with the same label in a separate container. The drawback 
> with this
> is that you have to search all the containers the connection dominates to 
> find the
> object. In theory this could be a _lot_ of containers, but I would guess that 
> in most
> setups it would most likely be a handful...

At first, system administrator needs to understand here is a trade-off between
better security and better performance or resource usage. The upcoming selinux
engine module will be installed, if they consider it is acceptable trade-off.

The approach with multiple containers will not work well, because we cannot
estimate how many containers are necessary in the start-up time.
Perhaps, we will have to initialize a new secondary container in run-time,
when a client with unprepared security attribute connected to.
If so, we have no way to control maximum usage of the memory consumption,
although user specified a certain value using -m.

In addition, here is no guarantee the traffics are flat per security attributes
of the clients. It can make dead space.

>>>> $ git diff origin/reworks_3 origin/reworks_4
>>>> ->   It replaces settings.engine.v1->xxx(...) invocations by wrapper
>>>>     functions.
>>>>
>>>
>>> The source code does not follow the same coding standard as the rest
>>> of memcached...
>>
>> Does the coding standard mean tab, indent, case arc and others?
>> If so, I'll fix up them according to rest of memcached.
>>
> 
> Look at how the rest of the source code is formatted..
> 
>> Or, are you saying the wrapper functions are not coding standard in the
>> memcached, so unnecessary?
> 
> But I don't see how it increase the readability of the source ;-)
> I would guess that when we are going to support more interface protocols
> we need to modify more than just the invocation of the function (new 
> conditions
> etc). but who knows..

Hmm. It depends on a subjective view which is more beautiful code. :-)
This wrapper functions are not affects to their functionalities, so I don't
push it strongly.

Thanks,
-- 
KaiGai Kohei <kai...@ak.jp.nec.com>


-- 
To unsubscribe, reply using "remove me" as the subject.

Re: A few ideas of engine framework

Reply via email to