Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Rich Megginson Thu, 12 Sep 2013 07:43:43 -0700

On 09/12/2013 07:39 AM, thierry bordaz wrote:

On 09/10/2013 04:35 PM, Ludwig Krispenz wrote:
On 09/10/2013 04:29 PM, Rich Megginson wrote:
On 09/10/2013 01:47 AM, Ludwig Krispenz wrote:
On 09/09/2013 07:19 PM, Rich Megginson wrote:
On 09/09/2013 02:27 AM, Ludwig Krispenz wrote:
On 09/07/2013 05:02 AM, David Boreham wrote:
On 9/6/2013 8:49 PM, Nathan Kinder wrote:
This is a good idea, and it is something that we discussedbriefly off-list. The only downside is that we need to changethe index format to keep a count of ids for each key.Implementing this isn't a big problem, but it does mean thatthe existing indexes need to be updated to populate the countbased off of the contents (as you mention above).
I don't think you need to do this (I certainly wasn't advocatingdoing so). The "statistics" state is much the same as thatproposed in Rich's design. In fact you could probably just usethat same information. My idea is more about where and how youuse the information. All you need is something associated witheach index that says "not much point looking here if you'reafter something specific, move along, look somewhere elseinstead". This is much the same information as "don't use a highscan limit here".
In the short term, we are looking for a way to be able toimprove performance for specific search filters that are notpossible to modify on the client side (for whatever reason)while leaving the index file format exactly as it is. I stillfeel that there is potentially great value in keeping a countof ids per key so we can optimize things on the server sideautomatically without the need for complex index configurationon the administrator's part. I think we should consider thisfor an additional future enhancement.
I'm saying the same thing. Keeping a cardinality count per keyis way more than I'm proposing, and I'm not sure how useful thatwould be anyway, unless you want to do OLAP in the DS ;)
we have the cardinality of the key in old-idl and this makes somesearches where parts of the filter are allids fast.
I'm late in the discussion, but I think Rich's proposal is verypromising to address all the problems related to allids in new-idl.
We could then eventually rework filter ordering based on theseconfigurations. Right now we only have a filter ordering based onindex type and try to postpone "<=" or similar filter as they areknown to be costly, but this could be more elaborate.
An alternative would be to have some kind of index lookupcaching. In the example in ticket 47474 the filter is(&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOfNames)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))"and probably only the "c3sUserID=xxxxx" part will change, if wecache the result for the (&(|(objectClass=... part, even if it isexpensive, it would be done only once.
Thanks everyone for the comments.  I have added Noriko's suggestion:
http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
David, Ludwig: Does the current design address your concerns,and/or provide the necessary first step for further refinements?
yes, the topic of filter reordering or caching could be looked atindependently.
Just one concern abou the syntax:
nsIndexIDListScanLimit:maxsize[:indextype][:flag[,flag...]][:value[,value...]]
since everything is optional, how do you decide if innsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a flag ?and as it defines limits for specific keys, could the attributnamereflect this, eg nsIndexKeyIDListScanLimit or nsIndexKeyScanLimitor ... ?
Thanks, yes, it is ambiguous.
I think it may have to use keyword=value, so something like this:
nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]] [flags=ADD[,OR]][values=val[,val...]]
That should be easy to parse for both humans and machines.
For values, will have to figure out a way to have escapes (e.g. if avalue contains a comma or an escape character). Was thinking ofusing LDAP escapes (e.g. \, or \032)
they should be treated as in filters and normalized, in the config itshould be the string representation according to the attributetype
Hi,

    I was wondering if this configuration attribute at the index
    level, could not also be implemented at the bind-base level.

It could be - it would be more difficult to do - you would have to havethe nsIndexIDListScanLimit attribute specified in the user entry, and itwould have to specify the attribute type e.g.


dn: uid=admin,....

nsIndexIDListScanLimit: limit=xxxx attr=objectclass type=eqvalue=inetOrgPerson

Or perhaps a new attribute - nsIndexIDListScanLimit should be notoperational for use in nsIndex, but should be operational for use in auser entry.

    If an application use to bind with a given entry, it could use its
    own limitations put for example into operational attribute in the
    bound entry itself.

Yes, and we already do this for other limits.


    So that two applications, using the same filter component could
    have their specific idlist size.
    Anyway if it makes sense it could be added later.


Yes, thanks.

best regards
thierry



--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-devel


--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-devel


--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-devel

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-devel

Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Reply via email to