Re: Search engine performances...

2016-08-11 Thread Alex Karasulu

> On Aug 11, 2016, at 4:20 PM, Howard Chu  wrote:
> 
> Emmanuel Lécharny wrote:
>> Le 11/08/16 à 18:05, Alex Karasulu a écrit :
>>> The other substring filter expressions without prefixes or suffixes would 
>>> require an inhibitive full scan of the index: i.e. ‘*klm*’. Pointless to do 
>>> and would clear cache memory with the churn. So your 10% of total size 
>>> thingy if configurable by the administrator makes sense as a best guess 
>>> before the optimizer goes to work.
>> 
>> There are other options, like the ones implemented by OpenLDAP, OpenDJ,
>> OpenDS : build indexes based on part of the values. For instance, let
>> say we have entry 1 : 'cn=A value', we can imagine indexing every 3
>> consecutive letters for this value. The index will then contain things
>> like :
>> 
>> 'A v' -> entry 1
>> ' va' -> entry 1
>> 'val' -> entry 1
>> 'alu' -> entry 1
>> 'lue' -> entry 1
>> 
>> Now, searching for '*lue' will brings entry 1 immediately, as searching
>> for 'A v*', as searching for '*val*'
>> 
>> That comes with a cost : the index will be huge. Openldap and other
>> allows you to tune this index in many ways. Typically, Openldap has the
>> following configuration parameters to tune the index :
>> *index_substr_if_minlen, **index_substr_if_maxlen,
>> **index_substr_any_len, **index_substr_any_step
>> 
>> The XXX_step parameters is by default set to 2, which allows to split
>> the index by a factor 2, but will require an average of 1.5 lookups if
>> the entry is present, or 2 lookups if it's not present. With a bigger
>> step the index will be even smaller (so there is a gain in the index
>> size) but will require more lookups.
>> 
>> This also solves the *xyz problem : you don't need an extra revert index.
>> 
>> Note that it's not a perfect solution either : if the substring in the
>> filter is bigger than the size of the indexed splitted value, you are
>> more likely to get duplicates (and wrong ones). OTOH, it gives an
>> accurate number of candidate, compared to teh holistic approach we use...
>> 
>> The best solution does not exist. The only way to get an accurate count
>> would be to let the user to fine tune the index (or to have a smart
>> indexer, that evolves through the analysis of the search request being
>> done... Not likely to be implemented soon ;-)
>> *
> 
> Substring indexing is always a challenge, but I think trigram-based as we've 
> implemented it is as near to optimal as exists. It has an additional weakness 
> in that you can't use the index for shorter terms. E.g. we can't find "ab*" 
> or "a*" with a 3-character substring index.
> 
> The other benefit from this approach is the space tradeoff, we know our index 
> is a constant factor larger than the original string. With string-reverse 
> indexing, your index size is n^2/2…

Hi Howard !!! Hope all is well with you.

You got me curious. What’s the index space consumption order with the trigram 
based approach? 

Best,
Alex

Re: Search engine performances...

2016-08-11 Thread Alex Karasulu

> On Aug 11, 2016, at 2:02 PM, Emmanuel Lécharny  wrote:
> 
> Le 11/08/16 à 18:05, Alex Karasulu a écrit :
>> Hi Em,
>> 
>> The substring filter is a big PITA for sure. As you know the substring 
>> filters with a fixed prefix like the one you show here with ‘abc’ prefix is 
>> the best kind we can get for implementing a realistic scan count: much 
>> better than a ‘*xyz’. Maybe we can use tricks on the index if it exists for 
>> these classes of substring expressions. Like for example advancing to the 
>> first and last ‘abc’ prefix occurrence to figure out an accurate scan count.
>> 
>> An approach that could be taken with the class of substring expressions with 
>> suffix terms (i.e. ‘*xyz’ ) is to use reverse string indices in addition to 
>> the current forward string index. This comes with a cost though of building 
>> and maintaining the index. However it would speed up most classes of 
>> substring expressions.  
>> 
>> The other substring filter expressions without prefixes or suffixes would 
>> require an inhibitive full scan of the index: i.e. ‘*klm*’. Pointless to do 
>> and would clear cache memory with the churn. So your 10% of total size 
>> thingy if configurable by the administrator makes sense as a best guess 
>> before the optimizer goes to work.
> 
> There are other options, like the ones implemented by OpenLDAP, OpenDJ,
> OpenDS : build indexes based on part of the values. For instance, let
> say we have entry 1 : 'cn=A value', we can imagine indexing every 3
> consecutive letters for this value. The index will then contain things
> like :
> 
> 'A v' -> entry 1
> ' va' -> entry 1
> 'val' -> entry 1
> 'alu' -> entry 1
> 'lue' -> entry 1
> 
> Now, searching for '*lue' will brings entry 1 immediately, as searching
> for 'A v*', as searching for '*val*'
> 
> That comes with a cost : the index will be huge.

Yeah I had seen these “create every substring permutation indices” in other 
LDAP servers. I just turned around, said "Oye vey," and walked the other away. 

> Openldap and other
> allows you to tune this index in many ways. Typically, Openldap has the
> following configuration parameters to tune the index :
> *index_substr_if_minlen, **index_substr_if_maxlen,
> **index_substr_any_len, **index_substr_any_step
> 

Yup lots of management overhead, too many one offs, and lots of index bloat 
potential IMHO. Plus this can be a long running task drawing out the life of a 
transactional update if we have to generate each permutation on substring index 
updates.

> The XXX_step parameters is by default set to 2, which allows to split
> the index by a factor 2, but will require an average of 1.5 lookups if
> the entry is present, or 2 lookups if it's not present. With a bigger
> step the index will be even smaller (so there is a gain in the index
> size) but will require more lookups.
> 
> This also solves the *xyz problem : you don't need an extra revert index.
> 

Yeah it does but it also increases complexity. However if folks want to deal 
with that my hats off to them. 

The reverse index only seems simple and easy. It deals with another major class 
of substring expressions having suffixes. I have a feeling most substring 
expressions are going to have prefixes and/or suffixes. Even if middle terms 
are present scan counts on suffix or prefix terms are enough to get a correct 
scan count. If both are present the prefix and suffix are like AND'd 
expressions where each term shrinks the search space further.

We can throw our hands up in the air when users decide to not include a prefix 
or suffix: i.e. STAR MIDDLE_TERMS STAR. The server can just take an educated 
guess. This seems like the most sane tradeoff.

> Note that it's not a perfect solution either : if the substring in the
> filter is bigger than the size of the indexed splitted value, you are
> more likely to get duplicates (and wrong ones). OTOH, it gives an
> accurate number of candidate, compared to teh holistic approach we use...
> 
> The best solution does not exist. The only way to get an accurate count
> would be to let the user to fine tune the index (or to have a smart
> indexer, that evolves through the analysis of the search request being
> done... Not likely to be implemented soon ;-)
> *

Right.

Best,
Alex



Re: Search engine performances...

2016-08-11 Thread Alex Karasulu
Hi Em,

The substring filter is a big PITA for sure. As you know the substring filters 
with a fixed prefix like the one you show here with ‘abc’ prefix is the best 
kind we can get for implementing a realistic scan count: much better than a 
‘*xyz’. Maybe we can use tricks on the index if it exists for these classes of 
substring expressions. Like for example advancing to the first and last ‘abc’ 
prefix occurrence to figure out an accurate scan count.

An approach that could be taken with the class of substring expressions with 
suffix terms (i.e. ‘*xyz’ ) is to use reverse string indices in addition to the 
current forward string index. This comes with a cost though of building and 
maintaining the index. However it would speed up most classes of substring 
expressions.  

The other substring filter expressions without prefixes or suffixes would 
require an inhibitive full scan of the index: i.e. ‘*klm*’. Pointless to do and 
would clear cache memory with the churn. So your 10% of total size thingy if 
configurable by the administrator makes sense as a best guess before the 
optimizer goes to work.

Best,
Alex

> On Aug 11, 2016, at 7:57 AM, Emmanuel Lécharny  wrote:
> 
> Hi guys,
> 
> 
> I have checked the way we process a search as we have poor performances
> with search likes : (&(cn=abc*)(objectClass=person)) (see DIRSERVER-2062).
> 
> 
> I have found two issues.
> 
> 1) for substring filters, as we have no way to know directly how many
> candidates we will have, we do a blind guess :
> 
>public long greaterThanCount( K key ) throws IOException
>{
>// take a best guess
>return Math.min( count, 10L );
>}
> 
> so if the count is greater than 10, we always pick 10, regardless of the
> index size. The resulting evaluation mighjt very well be totally wrong.
> Let say we have 11 candidates that match the (objectClass=person)
> filter, but 1 candidates that match the (cn=abc*) filter, we will
> still use this filter, pulling 1 entries from the database.
> 
> 
> 2) for an equality filter, where the attributeType is multi-valued, like
> (objectClass=person), we will try to count the candidates. If the values
> are stored in an array, or in a sub-btree, we will read the full
> array/btree and create a set of candidates. That can be costly when we
> have thousands of candidates.
> 
> 
> Actually, the search engine first annotate the filter, then process it.
> The annotation step will sometime construct a set of candidates, even if
> we don't use it, just because we need to compute an estimation of what
> would be the best index to use. In the filter I shown before, the
> substring filter will simply return a count, not a set of candidates,
> while the second filter will construct a set of candidates, when the
> count is immediately available. Moerover, we discard this set when it
> gets bigger than an arbitrary number (100).
> 
> I would rather propose we don't build the candidate set if the number is
> above a threshold (100, for instance), and only return the real count.
> Later on, anyway, we will build the set of candidate if this index is
> selected.
> 
> 
> For the first problem, I would suggest we pick a better approximation
> than a magic number (10) : what about a percentage of the index size ?
> (like 10%). If the index contains 1 entries, then the count will be
> 1000. We can even start to create the set, based on the filter, and stop
> if it gets bigger than a given nulmber (100?). This would provide a
> better value that 'guess-timating' this number...
> 
> 
> wdyt ?
> 



Re: VOTE : add Steve Moyer, Chris Harm, Shawn Smith and Alex Haskell as Apache Directory project committers

2016-07-01 Thread Alex Karasulu
It’s exciting to watch the community grow and see all the great things 
Directory is doing. Community growth around SCIM is wonderful. I’ve read up on 
the discussions, but please take my vote with a grain of salt. 

+1

Best,
--Alex

> On Jul 1, 2016, at 11:07 AM, Kiran Ayyagari  wrote:
> 
> +1
> 
> I have worked with Steve for a while back in 2013 first on ApacheDS Kerberos 
> related issue and later on eSCIMo (which we tried 
> to continue under "Igloo" name until the spec stabilizes but it didn't take 
> off as expected)
> 
> On Fri, Jul 1, 2016 at 7:26 PM, Emmanuel Lécharny  > wrote:
> Hi guys,
> 
> 
> now that we have accepted the SCIM code contribution, we have to vote in
> the four contributors :
> 
> 
> Alex Askell
> 
> Chris Harm
> 
> Shawn Smith
> 
> Steve Moyer
> 
> 
> I personally met Steve 2 years ago (time flies !!!), and Shawn McKinney
> have worked with them a lot recently (Shawn, you can probably confirm
> and bring some more information). I do think they will be good new
> committers, and they for sure understand what is Open Source.
> 
> 
> So :
> 
> 
> [ ] +1 : vote them as committers
> 
> [ ] +/- 0 : no opinion
> 
> 
> [ ] -1 : No, don't vote them in.
> 
> 
> Thanks a lot !
> 
> 
> 
> Kiran Ayyagari
> http://keydap.com 



Re: Travel to Vancouver and Bay Area

2016-05-09 Thread Alex Karasulu
I'm here if any one from directory wants to connect. Chilling by the
escalator for the next couple hours on 2nd floor.

Cheers,
Alex

On Mon, May 9, 2016 at 9:24 AM, Zheng, Kai  wrote:

> Sounds great Colm! I guess we'd be pretty busy on Wednesday, maybe have
> the meetup on Thursday?
>
> Kai
>
> -Original Message-
> From: Colm O hEigeartaigh [mailto:cohei...@apache.org]
> Sent: Monday, May 09, 2016 2:37 AM
> To: Apache Directory Developers List 
> Cc: ke...@directory.apache.org
> Subject: Re: Travel to Vancouver and Bay Area
>
> Hi Kai,
>
> I'll be at ApacheCon as well from Wednesday to Friday, it sounds like we
> have enough people for an Apache Directory meetup ;-)
>
> Colm.
>
>
>
> On Fri, May 6, 2016 at 11:12 PM, Zheng, Kai  wrote:
>
> > Thanks!! A big pity you won't be there but I guess we could eventually
> > be able to meet elsewhere in future!
> >
> > -Original Message-
> > From: Emmanuel Lécharny [mailto:elecha...@gmail.com]
> > Sent: Saturday, May 07, 2016 6:07 AM
> > To: ke...@directory.apache.org
> > Subject: Re: Travel to Vancouver and Bay Area
> >
> > Le 06/05/16 23:41, Zheng, Kai a écrit :
> > > Hi Shawn, it's great we'll be able to have a meet. Yes, the whole
> > > next
> > week I'll be hanging there.
> >
> > Ra... I wish I could have gone :/
> >
> > Enjoy the trip, and have some nice meeting with Shawn and Lucas ! All
> > my best to all of you, guys !
> >
> >
>
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>



-- 
Best Regards,
-- Alex


Re: Fortress API to become an Apache project

2014-06-27 Thread Alex Karasulu
On Fri, Jun 27, 2014 at 1:38 PM, Emmanuel Lécharny 
wrote:

> Hi guys,
>
> for months now, I'm working with Shawn McKinney on Fortress
> (http://www.openldap.org/fortress/ and http://iamfortress.org/overview).
> this week, he asked me what would be the best way to get this project
> becoming an Apache project.
>
>
Great news! Good work on that.


> At thi spoint, we have a few options available :
> - go through incubation
> - accept the project under ADS umbrella, but as a side project, like
> Studio, Mavibot or Escimo
> - ask some other umbrella project if they are interested (Shiro, Syncope ?)
>
> I do think there is some potential for this project (which is an API and
> a web application) to become an interesting part of Apache Directory, as
> it uses either OpenLDAP or ApacheDS as a backend, and also uses the
> Apache LDAP API on the client side. It also is a good replacmenet for
> the dormant Triplesec project.
>
>
+1


> At this point, I would be pleased to get your take on such a move.
>
wdyt ?
> PS : Fortress is already using AL 2.0, and Shawn is the main contributor
> and code owner so no pb with the code attribution or copyright ownership).
>

I think there are 2 ways here:

1. IP Clearance
2. Incubation

#1 is more for things that would be add-ons used by a project here at
Directory, but this sounds more like a standalone application that happens
to use ApacheDS and OLD as a backing store
#2 ... well no explanation necessary

With #2 Directory can be the sponsoring PMC.

Good luck with it.

Best,
Alex


Re: Unit tests for ldap api classes

2014-05-04 Thread Alex Karasulu
Placement at a project level in the proposed layout (under clients) is not
a factor in cyclic dependency considerations for Maven. Only modules are
considered by Maven when calculating cycles. So foreseeably you can place
this ldap-client-test module in the correct logical position under the
clients project without imposing cyclic dependencies.

Another option might be to create a separate project for just tests that
must pull in dependencies from everywhere to explicitly try to avoid
cycles. However that might be a maintenance overhead.

Separately, Stephan makes a good point about cleaning up some of the
unnecessary dependencies before doing this reorganization: might make it
easier to do.



On Sun, May 4, 2014 at 2:33 AM, Emmanuel Lécharny wrote:

> Le 5/3/14 11:50 PM, Stefan Seelmann a écrit :
> > On 05/03/2014 11:30 PM, Emmanuel Lécharny wrote:
> >> Le 5/3/14 10:20 PM, Stefan Seelmann a écrit :
> >>> On 05/03/2014 10:08 PM, Emmanuel Lécharny wrote:
>  Le 5/3/14 10:03 PM, Stefan Seelmann a écrit :
> > Hi Emmanuel,
> >
> > yes, I think your proposal makes sense.
> >
> > However I still see some circular dependencies that needs to be
> solved.
>  Like ?
> >>> ldap-client integration tests (if also moved to the client project)
> need
> >>> apacheds.
> >>>
> >>> apacheds integration tests need ldap-client
> >> api-ldap-client-api, which is from shared.
> >>
> >> I don't think we have cyclic dependenxy here...
> > If
> > - shared/api-ldap-client-api is moved to clients/trunk/ldap and
> > - apacheds/ldap-client-test is moved to clients/trunk/ldap too
> >
> > then we have, because apacheds modules depend on api-ldap-client-api and
> > ldap-client-test depends on apacheds modules. Ok, only on project level,
> > not on module level.
>
> Ok, but why should we move api-ldap-client-api to clients/trunk/ldap ?
>
> What I had in mind was just to move aâcheds/ldap-client-test to
> client/trunk/ldap. In this case, we should not have any cyclic dependency.
>
>
> It's a bit late here, so I may miss something...
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: [VOTE] Release Apache Mavibot 1.0.0-M3

2013-12-11 Thread Alex Karasulu
+1

Been reading up on all the testing and the progress: really this is some
great work guys!


On Wed, Dec 11, 2013 at 11:03 AM, Emmanuel Lécharny wrote:

> Hi !
>
> This is the third release of Apache Mavibot, the MVCC BTree in Java !
>
> Some big refactoring in this milestone, as many of the classes and
> interfaces are now comon to both the managed and in-memory btrees.
> A replace method has been added, the cache is now shared with the
> btree subtrees, and we don't create a subtrees for each values when
> the BTree allows duplicate values, which leads to better performance.
>
> Most important, we don't anymore deserialize the whole page when it
> is read from disk, we just deserialize the needed keys and values.
> This single change boost the performance by an order of magnitude.
>
> The cursors have been refactored, and some tests have been added.
>
> ApacheDS has already been tested with Mavibot 1.0.-M3-SNAPSHOT.
>
> Many other bugs have been fixed.
>
> So let's vote now !
>
>
> The revision :
>
> http://svn.apache.org/r1550054
>
>
> The SVN tag:
> http://svn.apache.org/repos/asf/directory/mavibot/tags/1.0.0-M3/
>
> The source and binary distribution packages:
> http://people.apache.org/~elecharny/
>
> The staging repository:
> https://repository.apache.org/content/repositories/orgapachedirectory-041/
>
>
> Please cast your votes:
> [ ] +1 Release Mavibot 1.0.0-M3
> [ ] 0 abstain
> [ ] -1 Do not release Mavibot 1.0.0-M3
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: Mavibot vs JDBM results

2013-11-11 Thread Alex Karasulu
On Mon, Nov 11, 2013 at 10:46 PM, Howard Chu  wrote:

> Alex Karasulu wrote:
>
>> Now, this is an approach where we used plain Keys (ie, Keys can have
>> various sizes, which is not really efficient, as we may have to
>> allocate more pages than necessary to store nodes and leaves.
>> Open LDAP uses another approach, which is smarter : they use the hash
>> value of each key to retrieve the element. Obviously, this leads to
>> compare the keys when we reach the leaf, as we may have more than one
>> key with the same hash value, and it also destroys the ordering (one
>> can't compare two hash values as the ordering will be different) but
>> most of the case, it's not really a big deal.
>> The main advantage of such an approach is that suddenly, Nodes have a
>> fixed size (a hash can be stored as an int, and the references to a
>> page are longs), so in a fixed page size, we can store a fixed number
>> of elements. Assuming that a node needs at least 28 bytes to store its
>> header and PageIO, in a 512 bytes page we can store (512 - 28) /
>> ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
>> and 17 values (272 bytes). We hve 148 bytes remaining in this case.
>> Atm, we store 16 element per node, which requires many physical pages,
>> ie, many disk access.
>>
>> This is something that worth being investigated in the near future.
>>
>>
>> Sounds like we need a minimal perfect order preserving hash function.
>>
>
> These are a pain to try to use for this purpose. All of the perfect order
> preserving hashes I've found only work because you generate the hash
> function based on knowing the entire set of data in advance. When you add
> new records you need to create a new hash function, and thus the hash
> values of every key changed.
>
> I.e., these are only useful for static data sets.
>

That's lame ... no silver bullets I guess.

-- 
Best Regards,
-- Alex


Re: Mavibot vs JDBM results

2013-11-11 Thread Alex Karasulu
Hi Emmanuel,

On Mon, Nov 11, 2013 at 4:09 PM, Emmanuel Lecharny wrote:

> On Mon, Nov 11, 2013 at 2:31 PM, Alex Karasulu 
> wrote:
> >
> > On Mon, Nov 11, 2013 at 11:01 AM, Emmanuel Lécharny  >
> > wrote:
> >>
> >> Hi,
> >>
> >> now that the mavibot partition is working, here a quick bench I ran this
> >> morning :
> >>
> >> Addition of 10 000 entries, then 400 000 random searches :
> >>
> >> JDBM
> >> 1 : Add = 74/s, Search : 4 729/s
> >>
> >> Mavibot
> >> 1 : Add = 120/s, Search : 11 056/s
> >>
> >> As we can see, Mavibot is 2,1 x faster for additions, and 2,33 x faster
> >> for searches... Will run a test with 100 000 entries.
> >
> >
> > Is this benchmark involving the entire network ADS stack?
>
> Yes, except the network layer (which is irrelevant in this test). Also
> note that they are run on my poor laptop.
>
>
That's really cool. If I remember correctly you have not bought a new Mac
in a while. Now you have to keep this machine as the reference
configuration for all historic metrics comparisons :).


> The last results I get are the following  :
>
>
> JDBM (1Gb)
> 1000 : Add = 56/s, Search = 14 845/s
> Mavibot (1Gb)
> 1000 : Add = 111/s, Search = 17 586/s
>
> JDBM (1Gb)
> 1 : Add = 57/s, Search = 4 729/s
> Mavibot (1Gb)
> 1 : Add = 120/s, Search = 11 056/s
>
> JDBM (2Gb)
> 5 : Add = 51/s, Search = 3 515/s
> Mavibot (2Gb)
> 5 : Add = 134/s, Search = 10335/s
>
>
Impressive! These are by far the best numbers we've seen in all of ADS
history.


>
> Note that if we hit the disk (ie, teh cache and memory is not big
> enough), then the performances get down immediately :
>
>
> JDBM (2Gb)
> 10 : Add = 44/s, Search = 2 957/s
> Mavibot (2Gb)
> 10 : Add = 100/s, Search = 3 308/s
>
>
> This is even more visible for Mavibot than for JDBM, most certainly
> due to the cache we are using in Mavibot (EhCach) which is probably
> overkilling compared to the very basic but efficient LRU used by JDBM.
>
>
Yeah JDBM's cache was uber simple, perhaps a similar KISS cache maybe right
for Mavibot but maybe tunable to various common access scenarios or even
one that is adaptable.


>
> Enough said that given enough memory, Mavibot in its current state (i.e.
> we are still using locks all over, as the revisions are not yet
> implemented) is already more than 2x faster for additions and 3x
> faster for searches...
>
> This is not the end of the story though. There are many possible
> optimizations in Mavibot :
> - first of all, remove the locks that block concurrent access in
> searches (but that requires the handling of revisions in Mavibot,
> which is just a matter of implementing the free page collection)
> - second, we are doing way too many findPos (probably two times more
> than needed). We can get rid of this.
>
>
Looking forward to seeing those stats when these changes take place. I'd
love to see us at least come close to the C-based servers out there.


>
> Now, this is an approach where we used plain Keys (ie, Keys can have
> various sizes, which is not really efficient, as we may have to
> allocate more pages than necessary to store nodes and leaves.
> Open LDAP uses another approach, which is smarter : they use the hash
> value of each key to retrieve the element. Obviously, this leads to
> compare the keys when we reach the leaf, as we may have more than one
> key with the same hash value, and it also destroys the ordering (one
> can't compare two hash values as the ordering will be different) but
> most of the case, it's not really a big deal.
> The main advantage of such an approach is that suddenly, Nodes have a
> fixed size (a hash can be stored as an int, and the references to a
> page are longs), so in a fixed page size, we can store a fixed number
> of elements. Assuming that a node needs at least 28 bytes to store its
> header and PageIO, in a 512 bytes page we can store (512 - 28) /
> ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
> and 17 values (272 bytes). We hve 148 bytes remaining in this case.
> Atm, we store 16 element per node, which requires many physical pages,
> ie, many disk access.
>
> This is something that worth being investigated in the near future.
>
>
Sounds like we need a minimal perfect order preserving hash function.

-- 
Best Regards,
-- Alex


Re: Mavibot vs JDBM results

2013-11-11 Thread Alex Karasulu
On Mon, Nov 11, 2013 at 11:01 AM, Emmanuel Lécharny wrote:

> Hi,
>
> now that the mavibot partition is working, here a quick bench I ran this
> morning :
>
> Addition of 10 000 entries, then 400 000 random searches :
>
> JDBM
> 1 : Add = 74/s, Search : 4 729/s
>
> Mavibot
> 1 : Add = 120/s, Search : 11 056/s
>
> As we can see, Mavibot is 2,1 x faster for additions, and 2,33 x faster
> for searches... Will run a test with 100 000 entries.


Is this benchmark involving the entire network ADS stack?

This is definitely something to be very proud of. Kudos!

-- 
Best Regards,
-- Alex


Re: [VOTE] Make Mavibot a Directory subproject

2013-08-01 Thread Alex Karasulu
On Thu, Aug 1, 2013 at 3:48 PM, Pierre-Arnaud Marcelot wrote:

> [X] +1 : make Mavibot a Directory Subproject
>
> This is going to make a huge different for ApacheDS.
>
> Regards,
> Pierre-Arnaud
>
> PS: Should the Labs ML be in copy of the vote?
>
>
>
You probably want just the summary sent to them to prevent the noise.

Regards,
Alex


Re: [VOTE] Make Mavibot a Directory subproject

2013-08-01 Thread Alex Karasulu
+1

Great job guys this is a true advance!


On Thu, Aug 1, 2013 at 2:29 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> last year, I started an experiment in l...@apache.org, the Mavibot MVCC
> Btree. It has reached a first step, we can now use it as a backend for
> ApacheDS, thanks to Kiran - who joined me on the Mavibot effort, making
> it stable enough to be used in ApacheDS).
>
> This morning, I conducted some performance tests, comparing JDBM and
> Mavibot on a 10K entries base :
>
> CoreSession
> ===
>
> Mavibot 10K entries
> ---
> Add : 80s, 125/s
> Search : 7392/s
>
> Jdbm 10k entries
> 
> Add : 167s, 59/s
> Search : 654/s
>
> NetworkeSession
> ===
>
> Mavibot 10K entries
> ---
> Add : 85s, 117/s
> Search : 1802/s
>
> Jdbm 10k entries
> 
> Add : 176s, 56/s
> Search : 506/s
>
> As we can see, either on CoreSession, or through the network, the gain
> over JDBM is really clear. It's ten time faster.
>
> So this vote to integrate Mavibot as a Directory Subproject, in order to
> be able to release it and use it as our base backend in a near future.
> It's not feature complete yet, but the core BTree already works.
>
> So let's vote :
> [ ] +1 : make Mavibot a Directory Subproject
> [ ] +/-0 : I don't mind
> [ ] -1 : No, Mavibot should not become a Directory subproject
>
> Thanks !
>
> and of course, my +1
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


HDAP - Big Data LDAP Sounds Heavy

2013-07-09 Thread Alex Karasulu
http://www.marketwatch.com/story/radiant-logic-introduces-hdap-the-worlds-first-super-scalable-ldap-directory-driven-by-big-data-and-search-technology-2013-07-08

-- 
Best Regards,
-- Alex


Re: [OTHER] proposal for creating a new sub project

2013-07-08 Thread Alex Karasulu
On Sun, Jul 7, 2013 at 11:10 AM, Kiran Ayyagari wrote:

> thanks everyone for the support, the new project is named "escimo"
> (probably can be stylized as eSCIMo ;)
>
>
I really like the ring to it ... like Eskimos. I hope it does not exist
already but just in case we should follow the same podling name search
guidelines recommended by the incubator just to be on the safe side [0].


[0] - http://incubator.apache.org/guides/names.html

-- 
Best Regards,
-- Alex


Re: [OTHER] proposal for creating a new sub project

2013-07-06 Thread Alex Karasulu
On Sun, Jul 7, 2013 at 1:57 AM, Emmanuel Lécharny wrote:

> Le 7/6/13 9:28 PM, Shawn McKinney a écrit :
> > On 07/06/2013 02:19 PM, Kiran Ayyagari wrote:
> >> I would like to work on implementing the SCIM[1] v2 protocol [2]
> >> and want to make it a sub project of ApacheDS.
> >>
> >> Please let me know your thoughts.
> >>
> >
> > Kiran,
> >
> > I am new to this list and not yet affiliated with ApacheDS but would
> > like to discuss joining you/others in a SCIM project.
>
> Kiran, +1.
>
> Shawn, you are very welcome.
>
>
Ditto! I was not even aware of SCIM's existence. This would be a good
addition to ApacheDS.

-- 
Best Regards,
-- Alex


Re: Operation timing in logs

2013-05-16 Thread Alex Karasulu
Hmm this is neat and could be pretty damn useful.

What about making this happen using a request control? Meaning if an
operation is issued with the TRACE request control then turn on the
production and output to the logs of such trace information? The control is
a simple marker control. This would reduce the amount of noise and logging
overhead if your'e interested in specific operations.

What would be uber bad-assery would be to collect this information and
report it not only in the logs but to return it in a buffer with a response
control. This makes sense to inspect production system issues that are
misbehaving without a shutdown.

Another idea in leu or in addition would be to enable some parameters in
the configuration that could hot enable/disable this feature.

In both cases, you would not have to shutdown the server and would still
have fine grained control over the quantity of logs produced with trace
information.

Just some thoughts.



On Thu, May 16, 2013 at 2:28 PM, Pierre-Arnaud Marcelot 
wrote:

> Hi Emmanuel,
>
> That's an interesting option.
>
> Regards,
> Pierre-Arnaud
>
> On 8 mai 2013, at 12:48, Emmanuel Lécharny  wrote:
>
> > Hi guys,
> >
> > FYI, I have added some timing for each operation. This is not activated
> > by default you have to modify the log4j.properties to get it :
> >
> > log4j.logger.org.apache.directory.server.OPERATION_TIME=DEBUG
> >
> > will produces such logs :
> >
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Add
> > operation took 28404000 ns
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Lookup
> > operation took 50 ns
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Lookup
> > operation took 278000 ns
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Lookup
> > operation took 59 ns
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Bind
> > operation took 782 ns
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Unbind
> > operation took 81000 ns
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Lookup
> > operation took 305000 ns
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Bind
> > operation took 94000 ns
> > [12:41:28] DEBUG [org.apache.directory.server.OPERATION_TIME] - Unbind
> > operation took 22000 ns
> > ...
> >
> > This could be useful for gathering the operation and tell how much time
> > it took to process eah operation.
> >
> > I'm considering offering more information, like the specific operation,
> > but this would be much verbose.
> >
> > I will commit that soon.
> >
> > --
> > Regards,
> > Cordialement,
> > Emmanuel Lécharny
> > www.iktek.com
> >
>
>


-- 
Best Regards,
-- Alex


Re: kinit failed on - Integrity check on decrypted field failed

2013-04-11 Thread Alex Karasulu

When I saw this email I thought to myself - what a great how to this would
be for the Kerberos documentation on our site. Hadoop is setup nicely to
work with Kerberos and thanks to the efforts of the team here we have
something pure java that we can use with Hadoop for security.



On Thu, Apr 11, 2013 at 1:48 AM, Wu, James C.  wrote:

> Hi,
>
> Thanks a lot for your help. I also have verified that apacheds works with
> Hadoop too with trust relationship setup between an Apacheds Kerberos
> service and an MIT Kerberos service.
>
> Regards,
>
> james
>
> -Original Message-
> From: Emmanuel Lécharny [mailto:elecha...@gmail.com]
> Sent: Wednesday, April 10, 2013 3:36 PM
> To: Apache Directory Developers List
> Subject: Re: kinit failed on - Integrity check on decrypted field failed
>
> Le 4/10/13 8:10 PM, Wu, James C. a écrit :
> > Hi,
> >
> > I re-installed the apacheds 2.0.0 M11 and wiped out all the existing
> stuff and used all default settings. The kinit does work.
> >
> > So I guess my problem is the config error because in my actual config, I
> use a different realm, not the EXAMPLE.COM.
> >
> > I am going to play compare the configs to find out what mistake I make
> when changing the realm. I will update in this thread.
>
> Cool !!!
>
> I'm happy that you get it working. Kerberos is not very ind, and
> understanding why it's not working can be a real nightmare. Sadly, due to
> the very nature of the exhcanged data, which are encoded most of the time,
> plus the fact that it's not safe to provide too much information when the
> authent fails, it's difficult to know what can be wrong in the conf.
>
> FYI, we have build a new version which should contain some bug fix : you
> can get ApacheDS 2.0.0-RC1 here http://people.apache.org/~elecharny/
>
> FYI, this release will not be public, as we detected some more issues that
> need to be fixed, but still, it can be worthfull to try it.
>
> Thanks for your patience !
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: Toward 2.0...

2013-03-21 Thread Alex Karasulu
On Thu, Mar 21, 2013 at 2:09 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> it's now 2 years we are working on 2.0, adding milestones after
> milestones. 1.0 has been released nearly 6 years ago, and 1.5 was just
> an intermediate version.
>
> Basically, we can say that we are working on 2.0 since april 2007...
>
> I think it's about time to get a 2.0 out now. The big bugs we had in
> JDBM have been fixed, replication is petty much working, the Kerberos
> server is also delivering tickets,


This sentence below makes perfect sense.


> I don't really know what we can add
> in a milstone that would make 2.0 very different than what we have.
>
> Of course, we have some important things to work on :
> - Mavibot as a replacement for JDBM : this is an ongoing effort, but in
> any case, it's just a new Partition. It can be added in a 2.1
> - MINA 3 switch : again, this is a work in progress. It will bring
> better performances on the network side, but nothing that can't want for
> a 2.1
> - Triggers/SP : not critical atm
> - Replication : we currently support MMR with Syncrepl, we would like to
> add delat-syncrepl, but this is not urgent (2.1)
>
> Eveything else are just new features of improvements, that can wait for
> minor releases.
>
>
True.


> Otherwise, we have a bunch of pending bugs that need to be reviewed and
> fixed, and most important, we need a better documentation.
>
> All in all, I think we are pretty much ready, and we can get a 2.0 done
> by end of april or mid may.
>
> thoughts ?
>

In complete agreement.

-- 
Best Regards,
-- Alex


Re: Kerberos-client module added

2013-02-07 Thread Alex Karasulu
On Thu, Feb 7, 2013 at 1:00 PM, Emmanuel Lécharny wrote:

> Le 2/7/13 11:25 AM, Alex Karasulu a écrit :
> > Why not just make the Kerberos Client it's own client (API) project just
> > like the LDAP API is?
> This is something we will most certainly do later. ATM, I don't have
> time to make it a complete separated project, and it's really in its
> infancy.
>
>
Right that makes perfect sense.


> The pb I had yesterday was to have it building in an environment that is
> fully up to date, instead of having to fight with package
> inconsistancies and jars dependencies. It took me five minutes instead
> of a couple of hours to have it all wokring as a separate project.
>
> Now, if somebody has time for that, I would be moe than pleased to have
> it as a separate project...
>
> >
> > It might be really useful to people who are stuck using the
> > God awful Krb5LoginModule because of a lack of a better Kerberos library
> to
> > do the same.
> Spot on. I even want to have it used in Studio to allow users to play
> with it through a GUI instead of using pkinit...
>
>
>
That would be really nice to have.

-- 
Best Regards,
-- Alex


Re: Kerberos-client module added

2013-02-07 Thread Alex Karasulu
Why not just make the Kerberos Client it's own client (API) project just
like the LDAP API is?

It might be really useful to people who are stuck using the
God awful Krb5LoginModule because of a lack of a better Kerberos library to
do the same.


On Thu, Feb 7, 2013 at 11:10 AM, Emmanuel Lécharny wrote:

> Hi guys,
>
> I added the kerberos-client module as an ApacheDS sub-module. It has
> been fixed by Kiran, and works just fine.
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: Chaing ApacheDS to Active Directory

2012-08-17 Thread Alex Karasulu
On Fri, Aug 17, 2012 at 3:30 PM, torcaz99  wrote:

>
> Hello:
>
> I'm using ApacheDS as a repository. But I want to chain some searches to
> Active Directory.
> Example:
> Quote:
> - domain.com --> Entities stored in ApacheDS.
> - ad.red -- > Entities stored in Active Directory.
> What I try is to use ApacheDS as a front-end that solves queries of
> domain.com and ad.red (chaining).
>

Right now this capability is not something readily present in ApacheDS. You
would have to create a partition that delegates the search to MS AD and
returns the results to the client conducting the search against ApacheDS.
The partition exposes the MS AD entries via ApacheDS.


> - Can any tell me how can be configured ApacheDs for this (don't need
> secure
> connections with ldaps).
> And a second question:
> - Can any tell me how to pass authentication from ApacheDS to Active
> Directory using users/password and/or kerberos?
>
>
There was some work done in the past for password delegation. A fellow
named Antoine worked on this a while back:


http://mail-archives.apache.org/mod_mbox/directory-dev/201012.mbox/%3c4d01171b.1070...@gmx.de%3E


> An example would be appreciated.
>
> Thanks
> --
> View this message in context:
> http://old.nabble.com/Chaing-ApacheDS-to-Active-Directory-tp34311387p34311387.html
> Sent from the Apache Directory Project mailing list archive at Nabble.com.
>
>


-- 
Best Regards,
-- Alex


[ApacheDS] [OSGi] Tests on schema blows chunks in OSGi Branch

2012-07-24 Thread Alex Karasulu
I'm seeing the following in the OSGi branch:

testLoadAllEnabled(org.apache.directory.shared.ldap.schemaloader.SchemaManagerEnableDisableLoadTest)
 Time elapsed: 1.042 sec  <<< FAILURE!
java.lang.AssertionError: expected:<14> but was:<12>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at
org.apache.directory.shared.ldap.schemaloader.SchemaManagerEnableDisableLoadTest.testLoadAllEnabled(SchemaManagerEnableDisableLoadTest.java:146)

There's more errors but I'm thinking this is a direct result of the OSGi
code injecting extra schemas. Now schema information is no longer static
which this test is expecting right?

-- 
Best Regards,
-- Alex


Re: Question about Replication of Config Partition and Schema Partition

2012-07-17 Thread Alex Karasulu
On Tue, Jul 17, 2012 at 10:58 AM, Emmanuel Lécharny wrote:

>
> Le 7/17/12 12:28 AM, Alex Karasulu a écrit :
>
>> On Mon, Jul 16, 2012 at 6:50 PM, Emmanuel Lécharny > >wrote:
>>
>>
>>> I was pretty much thinking that we could store those informations in a
>>> plain text file, but that would be a bit overkilling, when we can store
>>> them in a the DIT too. Maybe storing those information sinto the
>>> ou=config
>>> entry could be the right thing to do, assuming that the ou=config is not
>>> replicated (we will only replicate what's under ou=config, ie, its
>>> children)
>>>
>>>
>>>  Please please please let's not fuck with this. This is the worst idea
>> I've
>> heard of yet. We don't need another one off here.
>>
> That was just a suggestion, but I do agree this is more a hack than
> anything else.


Thank you for seeing this. This would create a nightmare for us in other
areas. This is why I sort of freaked.


> Plus after having checked the ou=config file, I don't even thinhk it's
> necessary.
>
>
Coolio.


> I totally buy the fact that implementing partial replication would solve
> the issue.
>
>
Yeah I think this will help us a great deal. I think we need fractional and
partial replication. We will still want to replicate some entries but not
all of their attributes, this is where fractional replication comes in
handy.


> The ou=config DIT starts with :
>
> version: 1
> dn: ou=config
> ou: config
> objectclass: top
> objectclass: organizationalUnit
>
> dn: ads-directoryServiceId=**default,ou=config
> objectclass: top
> objectclass: ads-directoryService
> ads-directoryserviceid: default
> ads-dsreplicaid: 1
> ...
>
> As we can see, each configuration is specific to a service, here
> "default". If we correctly name the instances so that there is no possible
> confusion between them, then we should be safe even if we replicate
> everything.
>
>
Da, Da, Da!


> The thing we have to solve is about the instance name : how dos the server
> get its instance name ?
>
>
I don't have an answer for this just yet but I am sure we can figure
something out. In addition to instance name we can also perhaps have an
instance UUID to disambiguate collisions.


> I must admit that, even if I worked on those thing in the past, it's not
> really fresh in my mind...
>
>
Think about where I am ;). Right now I'm completely driving off intuition.

Thanks for the very awesome logical response to my minor freak out.

-- 
Best Regards,
-- Alex


Re: Question about Replication of Config Partition and Schema Partition

2012-07-16 Thread Alex Karasulu
On Mon, Jul 16, 2012 at 6:50 PM, Emmanuel Lécharny wrote:

>
>- But i'm not against on replicating configuration completely. Some
>>>
 one

  would want to simply clone the server in its entirety. ( This is
 not
  touched by RFC as i see ) Because our OSGI distribution is Karaf
 based, we
  can use Karaf Cellar here. Cellar is used to keep multiple Karaf
 instances
  synched in terms of Bundles,Features, and Configuration.(We can't
 use
 its
  configuration aspects because we're keeping our component
 configurations in
  ApacheDS itself). I quickly looked through its functionality and
 code. It's
  incomplete but promising. Most importantly it provides API to
 initiate
  Karaf Instance replication.(Again not complete). So when
 replication
 system
  is revisited we can make use of Cellar.
  - I looked at Apache ZooKeeeper a bit, and we should definetely
 leverage

  it in our LDAP replication system. It is simply a distributed data
 and
  notification system for clusters. It has strength in node
 management
 in
  clusters and good for implementing infrastructure related parts of
  replication system.

 So i see no threat to OSGI from LDAP Replication. However i believe
 ou=config should be excluded from replication in any scenario.

  Not sure.
>>>
>>>  Not completely as i say, they must be controlled through other
>> repliccation
>> system. I say they shouldn't be managed by LDAP replication, because as i
>> said: 2 server having the same configuration are not have to have same
>> runtime behaviour because of the different bundle configurations.
>>
> Two servers can't have the same configuration, otherwise we have a problem
> !
>
> There are very few informations that are not to be replicated, because
> they belongs to the local server (the three I mentionned : IP, port, and
> name). That means we need to find a better way to store those informations.
> And if we store them in the DIT, then this part of the DIT must *not* be
> replicated.
>
>
Partial and fractional replication will come in handy here where we say
this attribute or this entry does not get replicated and it does not. That
simple.


> If this is what you were thinking of, then yes, I'm on the same page. May
> be we should put thos information somewhere else than on the ou=config sub
> tree (root DSE ?). We could specifically address those points.
>
>
That's not something I'm feeling good about. We're going to change how we
laid down the organization of configurations just for replication?
Replication needs to advance to take into account these considerations
instead of us injecting one offs to handle these replication scenarios.


> I was pretty much thinking that we could store those informations in a
> plain text file, but that would be a bit overkilling, when we can store
> them in a the DIT too. Maybe storing those information sinto the ou=config
> entry could be the right thing to do, assuming that the ou=config is not
> replicated (we will only replicate what's under ou=config, ie, its children)
>
>
Please please please let's not fuck with this. This is the worst idea I've
heard of yet. We don't need another one off here.


>
>
>>
>>  I believe if
>>>
 we're gonna provide some way to replicate some ApacheDS instance's
 runtime
 layout, we should not do it by LDAP Replication. Because :

  - It is not just config partition manages the server anymore.
  - We just can't be sure that 2 server having the same configuration
 will

  have same runtime behavior, because they might have different
 composition
  of bundles.
  - Replicating ou=config is hell of a job because of site specific
  configuration parameters.

  Unless the server can detect which configuration applies to itself.
>>>
>>> Bottom line, there are a very litte set of information a server needs to
>>> keep local :
>>> - its IP address
>>> - its port
>>> - its name
>>>
>>> The name will be used to identify the configuration that applies to the
>>> local server.
>>>
>> I don't believe simply keeping those 3 information local will help to make
>> replicating configuration information across multiple nodes consistent.
>> When thinking in terms of some system, there will be always some more
>> local
>> configuration points other than IP/port/hostname IMO.
>>
> See upper with the suggestion I made about using pu=config entry but not
> replicating it. We could even extend the ObjectClass used to store the 3
> pieces of informations to store more, if needed. ATM, though, this is all
> what we need locally. Everything else can be safely replicated, assuming
> that the code that reads and handle the configuration knows that it must be
> combined with the logal configuration.
>
>
>
> --
> Regards

Re: Question about Replication of Config Partition and Schema Partition

2012-07-11 Thread Alex Karasulu
On Wed, Jul 11, 2012 at 12:51 PM, Göktürk Gezer wrote:

>
> On Jul 11, 2012 12:13 AM, "Alex Karasulu"  wrote:
> >
> >
> >
> > On Tue, Jul 10, 2012 at 1:54 PM, Göktürk Gezer 
> wrote:
> >>
> >> Hi Everyone,
> >>
> >> I would like to know, it config and schema partitions of one server
> node can be candidate for replication? If replication of some ApacheDS
> instance will also clone its config and schema partition, we have a little
> problem because of the randomly assigned OIDs to component factories. So
> lets say ApacheDS1 and ApacheDS2 both have same set of interceptors,
> because of the nature of OSGI those can be introduced in different order in
> two different server node which results in schema partitions having
> different OID assignments for same components across 2 server node.
> >>
> >
> > Right now I would not worry about replication. You can solve this
> problem later. Just focus on your part functioning properly. This will be a
> problem we will need to solve anyway. Plus replication is not really there
> in a dependable way.
> >
> This kind of things caused me to reimplement osgi layer several times. I
> have some strange feelings, this will be one of them. Because of the things
> we discussed with emanuel, it may not be just easy later. Just to be sure
> that I won't rewrite completely again in the close future, I'll review some
> replication scenarios, and study some mechanisms which people use to
> implement replication these days like Apache ZooKeeper and Cellar.


That's wise. Also to design for flexibility just in case.


> I'll then share my findings on this thread.
>
>
Thanks.


> >>
> >> This is something we've postponed to discuss later,
> >
> >
> > Exactly what I started writing above.
> >
> >>
> >> it's not a concern for single server but in replication scenario i'd
> like to know how this effects consistency between distinct server nodes.
> I'm not so sure what is being replicated in our implementation, some
> partition or entire server with all of its runtime components and
> configuration?
> >>
> >
> > Eventually the replication mechanism will need to support partial and
> fractional replication. It's way over simplified and does not have the
> control structures for us to properly configure it in a fine grained
> manner.
> >
> ZooKeeper based replication can help us out here with some new replication
> mechanism with lots of control. I'll give it a look.
>
> > This will need to change of course but I'd really like to review
> replication after we finish with the Txn stuff because then it will make
> internal replication handling much clearer for us then.
> >
> Sure, it is primary. What I am trying is to get some early picture of
> replication system's effect on configuration partition. So that I can make
> sure this and osgi does not clash. These concerns may not be look so
> important, but they are. they are some nasty side effects of keeping
> configuration also in partition rather then externally managed store.
>
>
Cool, what you say is very reasonable. My personal approach is to make it
really flexible in case stuff like this does pop up later. The game never
ends, it's all about being able to react to new complications without
having to change everything you wrote.


> > Does this make sense?
> >
> > --
> > Best Regards,
> > -- Alex
> >
> Regards,
> G
>
>


-- 
Best Regards,
-- Alex


Re: Question about Replication of Config Partition and Schema Partition

2012-07-10 Thread Alex Karasulu
On Tue, Jul 10, 2012 at 1:54 PM, Göktürk Gezer wrote:

> Hi Everyone,
>
> I would like to know, it config and schema partitions of one server node
> can be candidate for replication? If replication of some ApacheDS instance
> will also clone its config and schema partition, we have a little problem
> because of the randomly assigned OIDs to component factories. So lets say
> ApacheDS1 and ApacheDS2 both have same set of interceptors, because of the
> nature of OSGI those can be introduced in different order in two different
> server node which results in schema partitions having different
> OID assignments for same components across 2 server node.
>
>
Right now I would not worry about replication. You can solve this problem
later. Just focus on your part functioning properly. This will be a problem
we will need to solve anyway. Plus replication is not really there in a
dependable way.


> This is something we've postponed to discuss later,
>

Exactly what I started writing above.


> it's not a concern for single server but in replication scenario i'd like
> to know how this effects consistency between distinct server nodes. I'm not
> so sure what is being replicated in our implementation, some partition or
> entire server with all of its runtime components and configuration?
>
>
Eventually the replication mechanism will need to support partial and
fractional replication. It's way over simplified and does not have the
control structures for us to properly configure it in a fine grained
manner.

This will need to change of course but I'd really like to review
replication after we finish with the Txn stuff because then it will make
internal replication handling much clearer for us then.

Does this make sense?

-- 
Best Regards,
-- Alex


Re: Txn discussion

2012-06-10 Thread Alex Karasulu
On Sat, Jun 9, 2012 at 6:22 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> independently from the ongoing work on the txn layer, I'd like to start a
> thread of discussion about the path we selected, and the other possible
> options.
>
> Feel free to express your opinion here, I'll create a few items I'd liek
> to see debated.
>
> 1) Introduction
>
> We badly need to have a consistent system. The fact is that the current
> trunk - and I guess this is true for all the released we have done so far)
> suffers from some serious issue when multiple modifications are done during
> searches. The reason is that we depend on a BTree implementation that
> exposes a data structure directly reading the pages containing the data,
> expecting those pages to remain unchanged in the ong run. Obviously, when
> we browse more than one entry, we are likely to see a modification changing
> the data...
>
> 2) txn layer
>
> There are a few way to get this problem solved :
> - we can have a MVCC backend, and a protection against concurrent
> modifications. Any read will always succeed, as each read will use a
> revision and only one.
> - we can also read fast the results and store them somwhere, blocking the
> modification until the read is finished.
> - or we can keep a copy of the modified elements within the original
> elements, until the seraches that use those elements are finished.
>
> (there are probably some other solutions, but I don't know them)
>
> AFAICT, the transaction branch is implementing the third solution, keepong
> the copy of modified elements in memory, so that they can be sent back to
> the user.
>
> None of those solution are free of drawbacks.
>
>
Right now we're adding the foundations so of course there will be issues
initial. There are several techniques we can use to mitigate the problem
the problems.


> I think that the first approach, even if it implies we forces a
> serialization of the writes, is the best solution. The rational, AFAICT, is
> that we don't have to deal with the way the backend keep versions of
> elements, this is not our business. Plus keeping the write serialized
> guarantees that we won't compromized the backend.
>
>
As Selcuk already pointed out you will need the same machinery to do this
below inside the partition. It will lead to the same problems.


> At this point, I'd like we discuss all those options, whatever we are
> currently working on.
>
> 3) cross-partition vs single partition protection
>
> Atm, we are working on a cross partition system. That means we protect all
> the partitions at the same time : moving an entry from one partition to
> another one will be done completely, or reverted.
>
> I'm not sure we need such a feature. I don't see what it brings, and even
> if it brings some advantages, I'm not sure we need such a feature now.
>
>
I'm in complete disagreement. There are several reasons why we need to do
this across partitions:

* First keeping partitions simple, handling these semantics in partitions
will make writing new partitions way too difficult to implement
* Aliases working across partitions
* Implementing views and being able to have editable views
* Centrally rooted partition
* Nestable partitions
* ACID across partitions
* Better means to integrate with HBase partition
* Better cache management
* Better means to handle snapshotting and rollback
* Clear transaction boundaries even if changes are across partitions which
makes replication easier to handle.

Say goodbye to a lot of these factors if we do not do this.

Not having to add a txn layer above the partitions is way easier to
> implement.
>
>
Probably easier but not that much easier. We will need the same machinery
if this will work at the partition level. And the machinery will have to be
implemented separately for each partition.


> Here, too, I'd like we discuss our options, and the pros and cons of using
> a txn layer on top of single partitions instead of
> muliple partitions.
>
>
>
I'm completely against this move as I think it will cause us more problems
than the ones we can fully solve right now. We just need patience.

If Emmanuel you don't have time to deal with this painful merge, perhaps
Selcuk and I can handle doing the merge?


>
> ok, this is probably enough elements we have to discuss. You turn :)
>
>
I understand there are hairy issues. However realize that this is an
incomplete state and realize that we do have ways to handle all the
problems. Selcuk provided some excellent solutions in this thread.

To back out now would be a massive mistake. It would also curtail the
growth and progress of the server in the ways described in our application
document. This single decision here would be one of the worst we've ever
made if we decide to back out at this stage.

FYI I'm going to be on the road for the next 48-72 hours. Will still try to
respond to this thread.


-- 
Best Regards,
-- Alex


Re: Txn branch merge : headsup

2012-06-09 Thread Alex Karasulu
On Fri, Jun 8, 2012 at 8:58 PM, Emmanuel Lécharny wrote:

> Le 6/8/12 6:50 PM, Selcuk AYA a écrit :
>
>  Hi Emmanuel,
>> if you can give me information on how I can access your ongoing work,
>> I can help you with AVL changes and change to move xdbm to core and
>> core-api. Just let me know how and when I can start playing around
>> with it.
>>
>
> I wish I can :/
>
> All the changes I have made so far are on my workspace, which is a
> checkout of apacheds/trunk... If I commit that, trunk will be doomed. I
> need a couple of hours to find the best way to get all my files moved to a
> branch, but still having all the conflits.
>

You can shelf your work in SVN like so:

http://markphip.blogspot.com/2007/01/shelves-in-subversion.html


-- 
Best Regards,
-- Alex


[jira] [Commented] (DIRAPI-89) EntryCursorImpl loops forever in next() when using AD Server with referrals

2012-06-07 Thread Alex Karasulu (JIRA)

[ 
https://issues.apache.org/jira/browse/DIRAPI-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291445#comment-13291445
 ] 

Alex Karasulu commented on DIRAPI-89:
-

AD Server = Active Directory Server, or Apache Directory Server?

> EntryCursorImpl loops forever in next() when using AD Server with referrals
> ---
>
> Key: DIRAPI-89
> URL: https://issues.apache.org/jira/browse/DIRAPI-89
> Project: Directory Client API
>  Issue Type: Bug
>Affects Versions: 1.0.0-M12
> Environment: Mac
>Reporter: Dave Briccetti
>  Labels: ActiveDirectory, cursor
>
> Search for a user that doesn’t exist on AD. Cursor provides four referrals, 
> but no SearchResultDone.
> val searchArg = 
> "(&(objectclass=%s)(samaccountname=%s))".format(s.objectClass, user)
> val cursor = connection.search(s.baseDn, searchArg, SearchScope.SUBTREE, "*")
> next() hangs in this loop ending on line 102:
> while ( !( response instanceof SearchResultDone ) );
> Using this code instead causes a SearchResultDone to appear after three 
> referrals:
> val searchRequest = new SearchRequestImpl().setBase(new Dn(s.baseDn))
>   .setFilter(searchArg).setScope(SearchScope.SUBTREE).addControl(new 
> ManageDsaITImpl())

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Problem of JDBM being hard(impossible) to reconfigure once it is initialized

2012-05-23 Thread Alex Karasulu
On Wed, May 23, 2012 at 1:36 AM, Emmanuel Lécharny  wrote:
> Le 5/23/12 12:01 AM, Selcuk AYA a écrit :
>
>> I  do not know the OSGI jargon but I believe,  at the end, changing
>> these should reduce to something like this:
>>
>> 1) quiesce the necessary ldap operations: The simplest thing to do
>> would be to block all operations quiesce all the outstanding
>> operations. In some cases, it would be enough to block only searches.
>> There might be some cases where you do not need any blocking at all. I
>> do not
>
>
> I'm not comfortable with the idea of quiscing the ldap operations. The
> server is supposed to work 24x7, and if we do add some way to modify the
> backend config on the fly, then the user should not notice that.

+1

I agree with Emmanuel here. Such changes should not interrupt the
operation of the server.

> See my comments below.
>
>>
>> 2) do the configuration: In the end I am assuming the configuration is
>> delegated to the component itself to do the reconfig.
>>
>> 3) unblock the operations.
>>
>>
>> For index add:
>>
>> * queisce any ldap operation that might modify the data. Then notify
>> the partition to add the index. Partition will scan the master table
>> and build the index and swap its indexed attributes. Then unlock the
>> operations.
>
> Adding an index should be done while the server is still able to process
> requests. The thing is that the newly added server should not be available
> until it has been created.

I guess you mean the newly added index here above.

The real problem is that we won't be able to
> parse the master table fast enough to have all the subscequent modifications
> into it, so this is a two steps operation :
> - do a full scan search on the master table, and create the additional
> index. We should be safe here, because we will use the current version of
> the master table when we start the search.
> - then for every modification done after the beginning of step 1, apply
> those changes to the index. If we have some more changes since we started
> this step, then do it again.
>
> Now, we can use this index.

OK let me see if I understand. To add an index we:

1). Note the version (V0) when the request to build the index arrives.
Start a full master table scan at version V0 to build the new index.

2). Once the index has been built for this version acquire an
exclusive write lock to prevent writes.

3a). If no changes occurred while building the index, meaning we're
still at V0. Then we enable the index and release the exclusive write
lock.

3b). If changes have occurred while building the index, then we update
the index with changes V1-V0. Then we enable the index and release the
exclusive write lock.

>>
>> For index delete:
>> * block all operations, and remove the index from the indexes list.
>> Maybe commit this change in partition config. unlock changes. Then
>> continue deleting  index file.
>
>
> Here, it's simpler. The operation using this index will continue to use it,
> until they are done. New operations won't be allowed to use it. Once the
> number of requests using this index is 0, then the index can be deleted. I
> don't think we need to block any operation here.

We just need an index state. It should be marked as deleted or offline
so when a new search comes in while waiting for the others to finish
using this index we simply do not use the index. So we need some index
state information.

>>
>>
>> For suffixdn change:
>>
>> * I believe this is a rename operation. again block all operations and
>> do the rename. Then unblock.
>
> changing teh suffix DN is just a matter of modifying the configuration, and
> the DnNode structure. The problem is that we rebuild all the entries' DN
> using the partition suffixDn, so we have to be careful when doing so. The
> request being processed when the rename occurs should be fulfilled with the
> initial suffixDn.

+1

>>
>> Working directory change;
>> * Block all opreations and let the parition change its config dir. For
>> jdbm, this would be a copy of it working directory.
>
>
> Here, this is an administrative task. Not sure we want to do that on a
> running server...

What other option is there?

You should not have to stop and re-start the server just to rename a
partition suffix. We simply need to mark (as we did for indices) the
partition as offline waiting until all current in progress operations
complete on that partition. New requests should not be processed for
it. Maybe we can return an unwilling to perform result response.

Then bring the partition online in the new suffix after doing the copy
and making all the needed changes. This should be a really fast
operation because the file system operation is rename. The other
suffix configuration changes are point changes. It should not take
long regardless of the size of the partition.

-- 
Best Regards,
-- Alex


Re: Problem of JDBM being hard(impossible) to reconfigure once it is initialized

2012-05-20 Thread Alex Karasulu
On Wed, May 16, 2012 at 6:58 AM, Göktürk Gezer  wrote:
> Hi Everyone,
>
> As i told you in OSGI branch update, JDBM is so immutable in runtime. Almost
> every setter calls checkInitialized() method first which throws an exception
> when its already initialized.

This may be unavoidable but I am not qualified to make this call. At
the present moment I am not familiar enough with the internals to
understand how each setting would impact a reconfiguration. It may be
trivial with some settings like for example cache settings by changing
just a few things in JDBM.

In the worst case, if reinitialization with re-configuration must take
place then the proper management can be handled. For example changing
the files/directory path for the record manager for example can occur
with a move of the actual files after the shutdown before the
re-configuration, and before restarting the record manager service
with the new setting for the the db file. Just giving this as one
example. The other settings might require less management overheads.

> We have to change that behavior in as much aspects of it as we can.
> Otherwise there is no point in going into OSGI. However i can't always be
> sure what my changes might lead on runtime. I need some serious help here,
> especially from those are actively working on JDBM, otherwise i'll jump into
> trial and error cycle which will probably last long.
>
> Here is the current configuration points for JdbmPartition:

OK Jdbm partition unlike jdbm internals proper I have some suggestions
on. Some settings are not that difficult to deal with on
re-configuration events.

These should be easier to handle:

> cacheSize(int)
> optimizerEnabled(boolean)
> syncOnWrite(boolean)

These below I suspect will impose many more management issues to
properly reconfigure on the fly due to file movements and the creation
of new indices (deletion of old indices).

> indexedAttributes(List)
> partitionPath(File)
> suffixDn(Dn)
>
> Currently none of them is reconfigurable, once the partition is initialized.

Right.

> However i need to know which are actually reconfigurable among these ones.
> And which are not really reconfigurable and why?
>
> Every reconfiguration will invoke some setter method at desired
> configuration point. So i guess we can make this reconfigurations work
> actually based on it's initialized or not, rather than throwing an exception
> blindly.

Of course we want re-configuration to be hot because that's really
cool. Sometimes you just don't have the option of shutting down the
server. Then there are software updates to the actual component and
that adds yet another dimension of problems to hot reconfiguration. In
the case of the JDBM partition some changes that change the structure
will make it so the jdbm db files are no longer compatible. So hot
deploys of new versions are not easy or may require long running
changes.

Another option or approach can be to handle some of these long running
reconfiguration or software component (bundle) update events not hot
(live) while the server is up and running but after a restart. However
I do not recommend this. There are easy ways to mitigate all these
problems.

> For your information, as we have the lifecycle control capability of
> components created from within ApacheDS, we can implement some fancy stuff.
> Like re-instantiating a component when one of its immutable property is
> changed, or stop it and remove from DirectoryService when one of its
> immutable property is changed to prevent accepting operations while
> reconfiguring. These migtht be implemented in case there is " no
> possibility! " to handle reconfigurations in live and initialized
> JdbmPartition reference.

This is the difference between two different modes of reconfiguration:

(1) An "in-place" re-configuration event recovery by a reconfigured
component while it continues to operate.
(2) A component instance swap out where the reconfiguration causes a
new instance of the component to be constructed and configured with
the new settings while on standby. Until the standby is started and
operational the old component continues to operate. Then the old is
shutdown after a swap out.

Overall the #2 approach is much better in my opinion. Why? Because you
can never predict the outcome of in place modifications to settings.
Changing the component directly may not be something the bundle
developer accounted for right?

Of course there's the events that are propagated by the OSGi container
when such an event does occur. I imagine these events have fired and
then the actual getter/setter configuration method is invoked in the
framework. Is this correct to presume?

I think with option #2 we can be more certain that reconfigurations
are better accounted for by bundle developers.

-- 
Best Regards,
-- Alex


Re: [jira] [Created] (DIR-285) life issues

2012-05-20 Thread Alex Karasulu
Looks like I need some help from Dr. Jadeja with my love life. Nice!

On Sun, May 20, 2012 at 8:53 PM, musa jadeja (JIRA) <
dev@directory.apache.org> wrote:

>musa 
> jadeja<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=drjadeja>created
>  [image:
> Improvement] DIR-285 <https://issues.apache.org/jira/browse/DIR-285>
>  *life issues* <https://issues.apache.org/jira/browse/DIR-285>
>*Issue Type:*  [image: Improvement] Improvement   *Assignee:*  Alex
> Karasulu<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=akarasulu>
> *Created:*  20/May/12 17:53   *Description:*  Strong love spells
> +27764071887
> How many times have you:
> .Been in love with a man who didn't love you back?
> .Thought your relationship was perfect, and then it fell apart?
> .Been scared because you didn't know how to fix your crumbling
> relationship or marriage?
> Wished you could be smarter about dating?
> Do you have to be Beautiful to Win a Man's Love?
> You don't know it yet, but what's been missing is the foundation for a
> rock-solid relationship. Without a foundation, you're just sitting on sand
> and the first wave that comes along will wash away everything, no matter
> how solid you thought it was.
> .I was married for 29 years. I thought I had a great marriage. Then, he
> decided we should have an open marriage. Can you imagine? I didn't want to
> lose my marriage that I valued so much, but there was just no way could I
> be okay with what he was asking…. So here's what I did. Instead of licking
> my wounds, I went into action… I Used Dr. Jadeja love spell we are now back
> again… Becky Sanders, Australia.
> Call Dr Jadeja on +27764071887
> Skype, drjadeja99
> Yahoo messenger, drjadeja99
> drjadej...@yahoo.com
> lostlove-psychic.webs.com
>   *Environment:*  Strong love spells +27764071887   *Project:*  
> Directory<https://issues.apache.org/jira/browse/DIR>
> *Labels:*  security   *Priority:*  [image: Major] Major   *Reporter:*  musa
> jadeja<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=drjadeja>
> *Original Estimate:*  72h   *Remaining Estimate:*  72h   This message
> is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators<https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa>
> .
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>



-- 
Best Regards,
-- Alex


Re: Txn branch merge

2012-05-15 Thread Alex Karasulu
Good luck. Thanks for the heads up Emm.

On Tue, May 15, 2012 at 12:21 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> just FYI, I'm currently trying to merge the txn branch into trunk. Not
> sure Il'll be able to do it completely today, but I'll try.
>
> Thanks !
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


[VOTE] Closing "Release of ApacheDS (2.0.0-M7)" (was [VOTE] Release of ApacheDS (2.0.0-M7))

2012-05-14 Thread Alex Karasulu
This vote has closed: 72 hours are up. Emmanuel is dealing with some family
related matters and Pierre is on vacation so I'm closing it on Emmanuel's
behalf.

+1 binding votes
-
Alex Karasulu
Pierre A. Marcelot
Felix Knecht
Kiran Ayyagari

Thanks Emmanuel for your hard work. This release was not the easiest one
due to the various issues that popped up.

On Fri, May 11, 2012 at 1:12 PM, Emmanuel Lécharny wrote:

> Hi,
>
> I'd like to propose a new milestone release (2.0.0-M7) for Apache
> Directory Server.
>
> Since the last milestone we ahev fixed some serious issues in the index
> handling.
>
> Here are the associated links :
>
> ApacheDS 2.0.0-M7
> -
> - SVN tag r1336754: https://svn.apache.org/repos/**
> asf/directory/apacheds/tags/2.**0.0-M7<https://svn.apache.org/repos/asf/directory/apacheds/tags/2.0.0-M7>
> - Nexus repository: https://repository.apache.org/**content/repositories/*
> *orgapachedirectory-076/<https://repository.apache.org/content/repositories/orgapachedirectory-076/>
> - Distribution packages: 
> http://people.apache.org/~**elecharny<http://people.apache.org/~elecharny>
>
> Here are the release notes for this all sub-projects:
>
> ApacheDS 2.0.0-M7
> =
> Bugs :
> --
> - [DIRSERVER-1093] - the ResourceRecordEncoder and QuestionRecordEncoder
> have bug for empty domainName:(
> - [DIRSERVER-1697] - Creation of new syntax fails due to ERR_277 Attribute
> m-obsolete not declared in objectClasses of entry
> - [DIRSERVER-1698] - Search on entries with multiple AVA in RDN does not
> work correctly if the initial RDN order is not used
> - [DIRSERVER-1702] - Adding an index through annotation does not work
> - [DIRSERVER-1712] - If the index are created using their alias, they are
> deleted immediately
>
> Improvments :
> -
> - [DIRSERVER-1711] - Index initialization is taking way too much time
> - [DIRSERVER-1713] - Error on console with first start of clean system
>
> Let's vote now:
> [ ] +1 | Release Apache Directory Server 2.0.0-M7
> [ ] +/-0 | Abstain
> [ ] -1 | Do NOT release Apache Directory Server 2.0.0-M7
>
> Thanks !
>



-- 
Best Regards,
-- Alex


Re: [VOTE] Release of ApacheDS (2.0.0-M7)

2012-05-11 Thread Alex Karasulu
On Fri, May 11, 2012 at 1:12 PM, Emmanuel Lécharny

[SNIP]

Let's vote now:
> [X] +1 | Release Apache Directory Server 2.0.0-M7
>

-- 
Best Regards,
-- Alex


Re: Release troubles and failing tests

2012-05-10 Thread Alex Karasulu
I am in agreement with Selcuk's analysis. I did not presume just how nasty
the inconsistency handling would get.

On Thu, May 10, 2012 at 8:18 PM, Selcuk AYA  wrote:

> On Thu, May 10, 2012 at 5:51 AM, Emmanuel Lécharny 
> wrote:
> > Le 5/10/12 9:58 AM, Emmanuel Lécharny a écrit :
> >
> >> Le 5/10/12 7:57 AM, Selcuk AYA a écrit :
> >>>
> >>> The problem seems to be caused by the test
> >>> testPagedSearchWrongCookie(). This tests failure in pages search by
> >>> sending a bad cookie. After failing, it relies on ctx.close() to
> >>> cleanup the session. Cleanup of the session will close all the cursors
> >>> related to paged searches through the session.
> >>>
> >>> It seems that somehow ctx.close does not result in an unbind message
> >>> at the server side time to time. I do not know what causes this but
> >>> this leaves a cursor open(specifically a NoDups cursor on rdn index).
> >>> Eventually as changes happen to the Rdn index, we run out of freeable
> >>> cache headers. After ignoring this test, pagedsearchit and searchit
> >>> pass fine together. It would be good to understand why arrival of
> >>> unbind message is a hit and miss case in this test.
> >>
> >>
> >> It's absolutly strange... Neither an UnbindRequest nor an AbandonRequest
> >> is sent by JNDI when closing the context, which is a huge bug.
> >>
> >> I have checked the other tests, and an Ubind request is always sent when
> >> we close teh context, except when we get an UnwillingToPerform
> exception.
> >> It seems like the context is in a state where it considers that no
> unbind
> >> should be send after an exception. Although I can do a lookup (and get
> back
> >> the correct response from the server after this excption), the
> connection is
> >> still borked :/
> >>
> >> I'll try to rewite the test using our API to see if it works better, and
> >> investigate with som Sun guys to see if there is an issue in JNDI.
> >>
> >>
> >>
> > Ok, we have had a long discussion with Alex about this problem...
> >
> > The thing is that even for standard PagedSearch, where everything goes
> fine
> > (ie, when the client is done, he has correctly closed the connextion,
> which
> > sends a UbindRequest, which close the cursor etc), we may have dozens of
> > opened cursors for some extend period of time.
> >
> > At some point, we may have a exhausted cache, with no way to evict any
> > elements from it, leading to a server freeze.
> >
> > Not something we can accept from a LDAP server...
> >
> > A suggestion would be to add some parameter in the OperationContext
> telling
> > the underlying layer that a search is done outside of any transaction.
> When
> > we fetch an ID from an index, and we try to get the associated Entry from
> > the master table, if we get an error  because the ID does not exist
> anymore,
> > then we should just ignore the error, and continue the search.
> >
> > But we still want to be sure that in some case, inside the server, we
> still
> > can have transactions over some searches.
> >
> > Thoughts ?
> >
>
> I dont think having non transactional search is a good idea. I agree
> there is a problem with non closed cursors but I dont think this is
> the right way to solve it. We currently do not have transactions for
> the search but a cursor over the jdbm  B tree gets a snapshot view.
> This snapshot view is not only for getting a snapshot view of the data
> but also the structure itself. If you do not have this(and on top of
> this if you dont have txns):
>
>  - you will have to deal with inconsistencies in the Btree data structure
>  - you might get data as NULL from the Btree and you might have to
> deal with it. Or you might have to deal with cases like you counted 10
> children but you actually end up with 9 children while doing a DFS
> search over your data structure.This might look easy but I think it is
> not.
>  - you might get not only stale data but complete garbage. This
> garbage might confuse the code completely(for example if the garbage
> you read was supposed to be a Btree redirect).
>
> Code from ldap protocol handlers down to search is written in a way
> assuming cursors get consistent data. I dont think it is impossible to
> write code expecting all kinds of inconsistencies but it is very
> difficult and the code will be brittle.
>
>
> As for the paged search, one way to deal with it would be to read all
> the data from the cursors at the beginning of the paged search and
> close the cursor. This would be similar to a normal search. If we get
> worried about memory consumption of this, the entries to be returned
> could be spilled over to temp files.You might say this might lead to
> temp file that are never claimed but if there are not many of them
> then no big deal. Users are supposed to deal with cleaning up their
> contexts. Not doing is similar to opening file handles or socket
> connections and never closing them. Such things are bound to create
> problems.
>
>
> >
> >
> > --
> > Regards,
> > Cordialement

Re: Implementing Kerberos on top of LDAP extended operations - contd.

2012-05-06 Thread Alex Karasulu
On Sun, May 6, 2012 at 8:56 PM, Aleksander Adamowski
 wrote:
> Hi!
>
> Resurrecting the old thread about integrating Kerberos with LDAP (
> http://thread.gmane.org/gmane.comp.apache.incubator.directory.devel/24181
> ), I'd like to share my recent progress in pursuing this idea.
>
> As I wrote in my blog ( http://olo.org.pl/dr/krbldap_thesis ), as a
> subject of my master's thesis, I've made a proof of concept
> implementation that demonstrates the idea in a working form. I've also
> given a nice short name to the resulting combined protocol - KrbLDAP.

Nice work. I went through your thesis as well.

> The thesis (available at
> https://olo.org.pl/files/masters_thesis/Praca_Magisterska-Aleksander_Adamowski-A_new_secure_authentication_concept.pdf
> ) presents the rationale behind my proposal and describes a proof of
> concept implementation (whose code I've made available on Github:
> https://github.com/aadamowski ). More information in my aforementioned
> blog post.
>
> During work on this, as a side effect, I've discovered several
> interoperability issues between MIT libkrb5 client and Apache DS's KDC
> implementation.

I looked at your workarounds for some of the issues. It's obvious from
your knowledge and how you solved the padata issue that you're more
than competent with our code base as well as LDAP & Kerberos
protocols. I highly advise contributing to the project here to make
your KrbLDAP protocol more accessible here at Apache Directory.

> While several issues still remain, some of them have already been
> addressed in the process (without it I wouldn't even be able to
> progress beyond initial message in the Kerberos exchange), e.g.:
> http://thread.gmane.org/gmane.comp.apache.incubator.directory.devel/35632/focus=35687
>
> I suppose that once the interoperability between MIT krb5 and Apache
> DS gets better, my proof of concept test will result in successful
> Kerberos ticket obtainment over KrbLDAP without any needed
> modifications in its code.
>
> Waiting anxiously for your feedback and constructive criticism,
> --
> Best Regards,
>   Aleksander Adamowski
>   http://olo.org.pl



-- 
Best Regards,
-- Alex


Re: Code review here at Apache

2012-05-05 Thread Alex Karasulu
On Sat, May 5, 2012 at 9:45 PM, Emmanuel Lécharny  wrote:
> Le 5/5/12 8:20 PM, Howard Chu a écrit :
>
>> Alex Karasulu wrote:
>>>
>>> On Sat, May 5, 2012 at 11:06 AM, Emmanuel Lécharny 
>>> wrote:
>>>>
>>>>
>>>> Le 5/5/12 9:07 AM, Alex Karasulu a écrit :
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I was surprised to have not known about the existence of this nice tool
>>>>> here for code reviews.
>>>>
>>>>
>>>> You obviously have known that this tool was existing, but you forgot
>>>> about it, as you have commented some piece of code we were discussing 6
>>>> months ago...
>>>>
>>>> https://reviews.apache.org/r/14/
>>>>
>>>>
>>>
>>> I cannot believe this. I used it, forgot about it, then thought I
>>> rediscovered it. I must have Alzheimer's disease setting in. I'm
>>> slightly worried now :-).
>>
>>
>> Memory is the second thing to go. I've forgotten what the first was... ;)
>
>
> Do you remember Alzeihmer's firstname ?
>
> This is how it begins...

Come on guys stop making fun of me. You're hurting my feelings hehehe.

-- 
Best Regards,
-- Alex


Re: Release troubles and failing tests

2012-05-05 Thread Alex Karasulu
On Sat, May 5, 2012 at 1:15 PM, Emmanuel Lécharny  wrote:
> Hi guys,
>
> I'm trying to get ADS 2.0.0-M7 release done today, but I'm having hard
> time... Appart a bug in maven release plugin (mvn release:rollback does not
> remove the tags, which leads to some very tricky issues with the rat
> plugin...), we stll have some random failures in server-integ and
> client-api-tests.
>
> Those failures are well known : timeout in the LRUCache, most certainly due
> to some cursors not being closed.

One way in which we can approach this for the sake of debugging for
unclosed cursors is to use bytecode splicing just for testing purposes
(not for production or normal development use).

If we use something like aspectj to add logging statements to cursor
operations and regions of the code where operations occur we can run
your tests and then do the accounting on the log output. We should be
able to find mismatched opened/closed cursors this way.

-- 
Best Regards,
-- Alex


Re: Code review here at Apache

2012-05-05 Thread Alex Karasulu
On Sat, May 5, 2012 at 11:06 AM, Emmanuel Lécharny  wrote:
>
> Le 5/5/12 9:07 AM, Alex Karasulu a écrit :
>
>> Hi guys,
>>
>> I was surprised to have not known about the existence of this nice tool
>> here for code reviews.
>
> You obviously have known that this tool was existing, but you forgot about 
> it, as you have commented some piece of code we were discussing 6 months 
> ago...
>
> https://reviews.apache.org/r/14/
>
>

I cannot believe this. I used it, forgot about it, then thought I
rediscovered it. I must have Alzheimer's disease setting in. I'm
slightly worried now :-).

Sorry for the noise.

--
Best Regards,
-- Alex


Re: Replace the previous [VOTE] Release of Apache Directory LDAP API/Shared (1.0.0-M12)

2012-05-05 Thread Alex Karasulu
+1


On Sat, May 5, 2012 at 2:33 AM, Emmanuel Lécharny wrote:

> Hi,
>
> sorry for the duplicate message, but I didn't provide the right SVN tag
>
>
You could have just append this update to the original VOTE thread. That
would have been fine for me.


> I'd like to propose a new milestone release (1.0.0-M12) for Apache LDAP
> API/Shared. This milestone is needed as it's used by the ApacheDS-2.0.0-M7
> release (which will be cut in a few hours)
>
> We have discovered and fixed 6 bugs and improvments since the last
> release, none of them being serious.
>
> Here are the links for the LDAP API project:
>
> Apache Directory LDAP API/Shared 1.0.0-M12
> --**
> - SVN tag: https://svn.apache.org/repos/**asf/directory/shared/tags/1.0.**
> 0-M12 ,
> revision 1334186
> - Nexus repository: https://repository.apache.org/**content/repositories/*
> *orgapachedirectory-038/
> - Distribution packages: 
> http://people.apache.org/~**elecharny/
>
> I'm going to upload documentation associated with all these projects
> during the week-end.
>
> Here are the release notes for this all sub-projects:
>
> Apache Directory LDAP API 1.0.0-M12
> --**-
>
> Bugs :
> --
> - [DIRAPI-82] Loading class file as stream under Windows bugged
> - [DIRAPI-83] - Debug logging causes never ending loop when decoding
> messages
> - [DIRAPI-84] - API fails to preserve the UP name of the attribute in a
> normalized DN
> - [DIRSHARED-53] - Review RequestID processing
> - [DIRSHARED-134] - Missing artifact 
> org.apache.directory.jdbm:**apacheds-jdbm2:bundle:2.0.0-M1
> when depending on apacheds-server-jndi
>
>
> Improvements :
> --
> - [DIRSHARED-128] - Update the samba schema with samba version 3 elements
>
>
>
>
> Let's vote now:
> [ ] +1 | Release Apache Directory LDAP API/Shared 1.0.0-M12,
> [ ] +/-0 | Abstain
> [ ] -1 | Do NOT release Apache Directory LDAP API/Shared 1.0.0-M12
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>



-- 
Best Regards,
-- Alex


Code review here at Apache

2012-05-05 Thread Alex Karasulu
Hi guys,

I was surprised to have not known about the existence of this nice tool
here for code reviews.

 https://reviews.apache.org

Before discovering this I was thinking of setting up Gerrit especially if
we think about using Git down the line. This review tool might be something
we can use to collaborate on when having discussions about the code as
well. I wanted to post it here in case others might not be aware of it.

-- 
Best Regards,
-- Alex


Re: Triggers and SPs vs release

2012-05-04 Thread Alex Karasulu
On Fri, May 4, 2012 at 2:14 PM, Emmanuel Lécharny wrote:

> Le 5/4/12 12:20 PM, Alex Karasulu a écrit :
>
>> On Fri, May 4, 2012 at 1:08 PM, Emmanuel Lécharny**
>> wrote:
>>
>>  Le 5/4/12 12:02 PM, Alex Karasulu a écrit :
>>>
>>>  On Fri, May 4, 2012 at 11:27 AM, Emmanuel Lécharny>> >
>>>
>>>  wrote:
>>>>
>>>>  Hi guys,
>>>>
>>>>> now that trunks is stable and fast, I try to spend some time to fix the
>>>>> @Ignored Triggers/SP tests.
>>>>>
>>>>> As we moved away from JNDI, it impacted the associated code, and it was
>>>>> never fixed. I think it's about time...
>>>>>
>>>>>
>>>>>  For now I advise ignoring the SP and Trigger code fixes. First because
>>>>>
>>>> the
>>>> MVCC code and transaction subsystem will impact the implementation and
>>>> we
>>>> need to rethink the implementation. After the transaction branch and the
>>>> OSGi branch are merged in to trunk I think it's a good time to consider
>>>> these features again.
>>>>
>>>>  Really, atm, it's just about getting JNDI out of the code.
>>>
>>
>> Well if it's just a matter of getting the tests running yeah it's not a
>> big
>> deal.
>>
>> In terms of the big picture I think all this code needs to be
>> reimplemented. The trigger and SP specifications need to be better
>> defined.
>> Handling chain recursion issues needs to be reconsidered because we've
>> removed the InvocationStack I think or it's not being leveraged.
>> Everything
>> should be gutted IMHO.
>>
> I can't agree more. And you haven't mentionned the AdministrativeModel we
> have to get fixed...
>
>
Right sorry I overlooked that. So this is something we should do all over
from scratch. I think we're all in agreement here.


>
>>
>>  The idea is to have something that works *before* we get the txn code
>>> merged, because then we will have a base to start with.
>>>
>>> Those tests has been @Ignored since 2008 :/
>>>
>>>
>>>
>>>  Yeah that's why I don't think it's worth the time to deal with it. We
>> should just focus on the TxN and OSGi side then reimplement it together.
>>
> There is little I can do regarding those thwo things. My idea was to cut a
> release today or tomorrow, in order to have a stable base for the next
> iteration.
>
>
Right let's get a release out with all these new advances and have our
users enjoy them. Meanwhile we can focus on these other efforts and work
towards getting out subsequent releases with them included.

-- 
Best Regards,
-- Alex


Re: Triggers and SPs vs release

2012-05-04 Thread Alex Karasulu
On Fri, May 4, 2012 at 1:08 PM, Emmanuel Lécharny wrote:

> Le 5/4/12 12:02 PM, Alex Karasulu a écrit :
>
>  On Fri, May 4, 2012 at 11:27 AM, Emmanuel Lécharny**
>> wrote:
>>
>>  Hi guys,
>>>
>>> now that trunks is stable and fast, I try to spend some time to fix the
>>> @Ignored Triggers/SP tests.
>>>
>>> As we moved away from JNDI, it impacted the associated code, and it was
>>> never fixed. I think it's about time...
>>>
>>>
>>>  For now I advise ignoring the SP and Trigger code fixes. First because
>> the
>> MVCC code and transaction subsystem will impact the implementation and we
>> need to rethink the implementation. After the transaction branch and the
>> OSGi branch are merged in to trunk I think it's a good time to consider
>> these features again.
>>
> Really, atm, it's just about getting JNDI out of the code.


Well if it's just a matter of getting the tests running yeah it's not a big
deal.

In terms of the big picture I think all this code needs to be
reimplemented. The trigger and SP specifications need to be better defined.
Handling chain recursion issues needs to be reconsidered because we've
removed the InvocationStack I think or it's not being leveraged. Everything
should be gutted IMHO.


> The idea is to have something that works *before* we get the txn code
> merged, because then we will have a base to start with.
>
> Those tests has been @Ignored since 2008 :/
>
>
>
Yeah that's why I don't think it's worth the time to deal with it. We
should just focus on the TxN and OSGi side then reimplement it together.
Like a clean room implementation instead of fixing what's outdated. That's
just my opinion but I don't feel adamantly about it. Just think it's a
wasted effort considering the changes coming soon.


-- 
Best Regards,
-- Alex


Re: Triggers and SPs vs release

2012-05-04 Thread Alex Karasulu
On Fri, May 4, 2012 at 11:27 AM, Emmanuel Lécharny wrote:

> Hi guys,
>
> now that trunks is stable and fast, I try to spend some time to fix the
> @Ignored Triggers/SP tests.
>
> As we moved away from JNDI, it impacted the associated code, and it was
> never fixed. I think it's about time...
>
>
For now I advise ignoring the SP and Trigger code fixes. First because the
MVCC code and transaction subsystem will impact the implementation and we
need to rethink the implementation. After the transaction branch and the
OSGi branch are merged in to trunk I think it's a good time to consider
these features again.

WDYT?

-- 
Best Regards,
-- Alex


Re: Inex branch merged into trunk...

2012-05-03 Thread Alex Karasulu
On Wed, May 2, 2012 at 4:44 PM, Emmanuel Lécharny wrote:

> Le 5/2/12 3:21 PM, Alex Karasulu a écrit :
>
>> On Wed, May 2, 2012 at 1:54 PM, Emmanuel Lécharny**
>> wrote:
>>
>>  Le 5/2/12 12:33 PM, Alex Karasulu a écrit :
>>>
>>>  On Wed, May 2, 2012 at 12:49 PM, Emmanuel Lécharny>>> >
>>>>
>>>> wrote:
>>>>
>>>>  Le 5/2/12 9:53 AM, Alex Karasulu a écrit :
>>>>
>>>>>  On Wed, May 2, 2012 at 2:43 AM, Emmanuel Lécharny>>>> >*
>>>>>
>>>>>> ***
>>>>>>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>  Le 5/1/12 3:05 PM, Alex Karasulu a écrit :
>>>>>>
>>>>>>   On Tue, May 1, 2012 at 4:08 AM, Emmanuel Lécharny<
>>>>>>> elecha...@gmail.com
>>>>>>>
>>>>>>>> *
>>>>>>>> ***
>>>>>>>>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> o Object scope search (lookup) : 49 880 req/s compared to 23 081 on
>>>>>>>> the
>>>>>>>> previous trunk
>>>>>>>> o One Level scope search (5 entries returned) : 68 715 entries
>>>>>>>> returned
>>>>>>>> per second, compared to 33 120/s
>>>>>>>> o Sub Level scope search (10 entries returned ) : 70 830 entries
>>>>>>>> returned
>>>>>>>> per second, compared to 18 910/s
>>>>>>>>
>>>>>>>>
>>>>>>>> This is great work Emmanuel. Nicely done!
>>>>>>>>
>>>>>>>>  I have some even better results, as of today :
>>>>>>>>
>>>>>>>>  o Object scope search (lookup) : 52 712 req/s compared to 23 081
>>>>>>> on the
>>>>>>> previous trunk
>>>>>>> o One Level scope search (5 entries returned) : 72 635 entries
>>>>>>> returned
>>>>>>> per second, compared to 33 120/s
>>>>>>> o Sub Level scope search (10 entries returned ) : 75 100 entries
>>>>>>> returned
>>>>>>> per second, compared to 18 910/s
>>>>>>>
>>>>>>>
>>>>>>>  This is just sick man! You've more than doubled the performance.
>>>>>>>
>>>>>>>  Some new idea this morning :
>>>>>>
>>>>> atm, we do clone the entries we fetch from the server, then we filter
>>>>> the
>>>>> Attributes and the values, modifying the cloned entries. This leads to
>>>>> useless create of the removed Attributes and Values. We suggested to
>>>>> accumulate the modifications and to apply them at the end, avoiding the
>>>>> cloning of AT which will not be returned.
>>>>>
>>>>> First of all, we can avoid cloning the Values. The Value implementation
>>>>> are immutable classes. This save around 7% of the time.
>>>>>
>>>>> But this is not all we can do : we can simply avoid the accumulation of
>>>>> modifications *and* avoid cloning the entry !
>>>>>
>>>>> The idea is simple : when we get an entry in the cursor we have got
>>>>> back,
>>>>> we create a new empty entry, then we iterate over all the original
>>>>> entry's
>>>>> attributes and values, and for each one of those attributes and values,
>>>>> we
>>>>> check the filters, which will just tell if the Attribute/Value must be
>>>>> ditched or kept. This way, we don't do anything useless, like storing
>>>>> the
>>>>> modification or creating useless Attributes.
>>>>>
>>>>> It will work to the extent we deal with the CollectiveAttributes which
>>>>> must be injected somewhere, before we enter the loop (a
>>>>> connectiveAttribute
>>>>> might perfectly be removed by the ACI filter...). But we can also
>>>>> inject
>>>>> those added collective attributes into the loop of filters.
>>>>>
>>>>> I may miss something, but I do think that this solution is a clear
>>>>> winner,
>>>>> even

Re: Inex branch merged into trunk...

2012-05-02 Thread Alex Karasulu
On Wed, May 2, 2012 at 1:54 PM, Emmanuel Lécharny wrote:

> Le 5/2/12 12:33 PM, Alex Karasulu a écrit :
>
>> On Wed, May 2, 2012 at 12:49 PM, Emmanuel Lécharny**
>> wrote:
>>
>>  Le 5/2/12 9:53 AM, Alex Karasulu a écrit :
>>>
>>>  On Wed, May 2, 2012 at 2:43 AM, Emmanuel Lécharny*
>>>> ***
>>>>
>>>> wrote:
>>>>
>>>>  Le 5/1/12 3:05 PM, Alex Karasulu a écrit :
>>>>
>>>>>  On Tue, May 1, 2012 at 4:08 AM, Emmanuel Lécharny>>>> >*
>>>>>
>>>>>> ***
>>>>>>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> o Object scope search (lookup) : 49 880 req/s compared to 23 081 on
>>>>>> the
>>>>>> previous trunk
>>>>>> o One Level scope search (5 entries returned) : 68 715 entries
>>>>>> returned
>>>>>> per second, compared to 33 120/s
>>>>>> o Sub Level scope search (10 entries returned ) : 70 830 entries
>>>>>> returned
>>>>>> per second, compared to 18 910/s
>>>>>>
>>>>>>
>>>>>> This is great work Emmanuel. Nicely done!
>>>>>>
>>>>>>  I have some even better results, as of today :
>>>>>>
>>>>> o Object scope search (lookup) : 52 712 req/s compared to 23 081 on the
>>>>> previous trunk
>>>>> o One Level scope search (5 entries returned) : 72 635 entries returned
>>>>> per second, compared to 33 120/s
>>>>> o Sub Level scope search (10 entries returned ) : 75 100 entries
>>>>> returned
>>>>> per second, compared to 18 910/s
>>>>>
>>>>>
>>>>>  This is just sick man! You've more than doubled the performance.
>>>>>
>>>> Some new idea this morning :
>>>
>>> atm, we do clone the entries we fetch from the server, then we filter the
>>> Attributes and the values, modifying the cloned entries. This leads to
>>> useless create of the removed Attributes and Values. We suggested to
>>> accumulate the modifications and to apply them at the end, avoiding the
>>> cloning of AT which will not be returned.
>>>
>>> First of all, we can avoid cloning the Values. The Value implementation
>>> are immutable classes. This save around 7% of the time.
>>>
>>> But this is not all we can do : we can simply avoid the accumulation of
>>> modifications *and* avoid cloning the entry !
>>>
>>> The idea is simple : when we get an entry in the cursor we have got back,
>>> we create a new empty entry, then we iterate over all the original
>>> entry's
>>> attributes and values, and for each one of those attributes and values,
>>> we
>>> check the filters, which will just tell if the Attribute/Value must be
>>> ditched or kept. This way, we don't do anything useless, like storing the
>>> modification or creating useless Attributes.
>>>
>>> It will work to the extent we deal with the CollectiveAttributes which
>>> must be injected somewhere, before we enter the loop (a
>>> connectiveAttribute
>>> might perfectly be removed by the ACI filter...). But we can also inject
>>> those added collective attributes into the loop of filters.
>>>
>>> I may miss something, but I do think that this solution is a clear
>>> winner,
>>> even in term of implementation...
>>>
>>> thoughts ?
>>>
>>>
>>>  We talked about using a wrapper around the entry to encapsulate these
>> matters making it happen automatically behind the scenes. This does not
>> affect the surrounding code.
>>
>> How is this proposal now different? Why would you not use a wrapper?
>>
> Because the wrapper is useless in this case !
>
> The beauty of the solution is that we either create a new entry with all
> the requested AT and values, accordingly to the filters (if the user embeds
> the server), or if this is a network request, we directly generates the
> encoded message without having to create the intermediate entry at all !
>
>
I don't know how you'll pull that off considering that the interceptors
which cause side effects are expecting an entry to alter or from which to
read information from to do their thang.

This is why I'm a bit confused. Maybe its a matter of description and the
language where I'm failing to understand.


-- 
Best Regards,
-- Alex


Re: Inex branch merged into trunk...

2012-05-02 Thread Alex Karasulu
On Wed, May 2, 2012 at 12:49 PM, Emmanuel Lécharny wrote:

> Le 5/2/12 9:53 AM, Alex Karasulu a écrit :
>
>> On Wed, May 2, 2012 at 2:43 AM, Emmanuel Lécharny**
>> wrote:
>>
>>  Le 5/1/12 3:05 PM, Alex Karasulu a écrit :
>>>
>>>  On Tue, May 1, 2012 at 4:08 AM, Emmanuel Lécharny*
>>>> ***
>>>>
>>>> wrote:
>>>>
>>>> o Object scope search (lookup) : 49 880 req/s compared to 23 081 on the
>>>> previous trunk
>>>> o One Level scope search (5 entries returned) : 68 715 entries returned
>>>> per second, compared to 33 120/s
>>>> o Sub Level scope search (10 entries returned ) : 70 830 entries
>>>> returned
>>>> per second, compared to 18 910/s
>>>>
>>>>
>>>> This is great work Emmanuel. Nicely done!
>>>>
>>>>  I have some even better results, as of today :
>>>
>>> o Object scope search (lookup) : 52 712 req/s compared to 23 081 on the
>>> previous trunk
>>> o One Level scope search (5 entries returned) : 72 635 entries returned
>>> per second, compared to 33 120/s
>>> o Sub Level scope search (10 entries returned ) : 75 100 entries returned
>>> per second, compared to 18 910/s
>>>
>>>
>>>  This is just sick man! You've more than doubled the performance.
>>
>
> Some new idea this morning :
>
> atm, we do clone the entries we fetch from the server, then we filter the
> Attributes and the values, modifying the cloned entries. This leads to
> useless create of the removed Attributes and Values. We suggested to
> accumulate the modifications and to apply them at the end, avoiding the
> cloning of AT which will not be returned.
>
> First of all, we can avoid cloning the Values. The Value implementation
> are immutable classes. This save around 7% of the time.
>
> But this is not all we can do : we can simply avoid the accumulation of
> modifications *and* avoid cloning the entry !
>
> The idea is simple : when we get an entry in the cursor we have got back,
> we create a new empty entry, then we iterate over all the original entry's
> attributes and values, and for each one of those attributes and values, we
> check the filters, which will just tell if the Attribute/Value must be
> ditched or kept. This way, we don't do anything useless, like storing the
> modification or creating useless Attributes.
>
> It will work to the extent we deal with the CollectiveAttributes which
> must be injected somewhere, before we enter the loop (a connectiveAttribute
> might perfectly be removed by the ACI filter...). But we can also inject
> those added collective attributes into the loop of filters.
>
> I may miss something, but I do think that this solution is a clear winner,
> even in term of implementation...
>
> thoughts ?
>
>
We talked about using a wrapper around the entry to encapsulate these
matters making it happen automatically behind the scenes. This does not
affect the surrounding code.

How is this proposal now different? Why would you not use a wrapper?

-- 
Best Regards,
-- Alex


Re: Inex branch merged into trunk...

2012-05-02 Thread Alex Karasulu
On Wed, May 2, 2012 at 2:43 AM, Emmanuel Lécharny wrote:

> Le 5/1/12 3:05 PM, Alex Karasulu a écrit :
>
>> On Tue, May 1, 2012 at 4:08 AM, Emmanuel Lécharny**
>> wrote:
>>
>> o Object scope search (lookup) : 49 880 req/s compared to 23 081 on the
>> previous trunk
>> o One Level scope search (5 entries returned) : 68 715 entries returned
>> per second, compared to 33 120/s
>> o Sub Level scope search (10 entries returned ) : 70 830 entries returned
>> per second, compared to 18 910/s
>>
>>
>> This is great work Emmanuel. Nicely done!
>>
>
> I have some even better results, as of today :
>
> o Object scope search (lookup) : 52 712 req/s compared to 23 081 on the
> previous trunk
> o One Level scope search (5 entries returned) : 72 635 entries returned
> per second, compared to 33 120/s
> o Sub Level scope search (10 entries returned ) : 75 100 entries returned
> per second, compared to 18 910/s
>
>
This is just sick man! You've more than doubled the performance.

-- 
Best Regards,
-- Alex


Re: Inex branch merged into trunk...

2012-05-01 Thread Alex Karasulu
On Tue, May 1, 2012 at 4:08 AM, Emmanuel Lécharny wrote:

> Hi,
>
> just to inform you that the index branch has been merged with no harm
> today. I just had to fix 3 conflicts, and two bugs I introduced in the
> branch before the commit.
>
> The server performance is way better for searches, with a few improvements
> I did those last 4 days. It was impressive how easy it was to improve the
> speed with little modifications. The global result is that the server is
> now :
> o Object scope search (lookup) : 49 880 req/s compared to 23 081 on the
> previous trunk
> o One Level scope search (5 entries returned) : 68 715 entries returned
> per second, compared to 33 120/s
> o Sub Level scope search (10 entries returned ) : 70 830 entries returned
> per second, compared to 18 910/s
>
>
This is great work Emmanuel. Nicely done!


> There is room for more improvement, but it will be more complex. The area
> that can be improved are :
> o get rid of the extra getSearchControls() call in intercepotrs. This is
> the easiest fix
> o review the way we handle entries modification before we return them.
> Currently, we clone the entry, and remove the attributes the user has not
> required. See DIRSERVER-1719 for more explaination on this subject. Note
> that the filtering of attributes represent around 9% of the global CPU time.
> o getting back the ID from a Dn is a very costly operation (19% of the
> global CPU time), and the longer the DN, the longer the operation. For each
> RDN, we have to do a lookup in the RdnIndex. The only solution would be to
> have a Dn -> ID cache somewhere. This would boost the server performance,
> that's for sure.
> o fetching an entry from the backend cost 38% of the global time, out of
> which 29% represent the cost to clone the entry. If we could avoid doing
> this clone (see upper), we may have some major performances increase.
> o when evaluating an entry to see if it fits the filter, we use the
> reverseIndex, which is also a costly operation. We shoudl re-evaluate if it
> wouldn't be better to use the MatchingRules comparator to do that instead
> (reverse lookups account for 4% of the used CPU time)
>
>
I guess we have these in JIRA?


> One interesting result is that the LRUCache.get() operation represent 13%
> of the used time. This is definitively not small. There is probably some
> room for some improvement here, but this is way more complex...
>
> All those numbers have been collected using YourKit on a Lookup test (150
> 000 lookups on one single element have been done)
>
>
I wonder what the over the network stats are with a client machine separate
from the server machine. Oh and with multiple clients. It's too bad we
never got a chance to setup such an environment :( .


>
> There are also some improvements to expect on the Add/Delete/Move
> operation, as we have to delete/add the keys on the RdnIndex. This is
> something Im going to work on tomorrow.
>
>
Cool.


>
> One more thing : the number I get when running the server-integ search
> perf are way below (from 2900 to 5400 per second). This is plain normal.
> When going through th network, we pay some extra price :
> o the client code eats 57% of all the time it takes to run the test
> o On the server, normalizing the incoming Dn costs 7% of the processing
> time
> o the entries encoding is very expensive
>
> All in all, on the server, unless we test it on a different machine than
> the injectors, all the measures are pretty impossible to do. There is too
> much noise...
>
> I'd be interested to conduct largest tests on a multi-core server, with
> lots of memory, and a lot of entries, with external injectors, to see what
> kind of performances we can get...
>
>
Ditto.


>
> In the next few days, I will probably fix some pending bugs. I think we
> can cut a M7 release by the end of this week, and make it available by next
> week.
>
>
That sounds great. Thanks!

-- 
Best Regards,
-- Alex


[jira] [Commented] (DIRSERVER-1719) [Perf] Modify the way we process entries to be returned

2012-04-30 Thread Alex Karasulu (JIRA)

[ 
https://issues.apache.org/jira/browse/DIRSERVER-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264888#comment-13264888
 ] 

Alex Karasulu commented on DIRSERVER-1719:
--

This is a great idea but we also have to take into account that some 
interceptors to produce their net effect have to alter the entry on it's way 
out the door. So this can be done but might have to be done using another more 
creative mechanism. I recommend using shadowing this way for example: at the 
bottom you have the copy pulled out from the DIB and around it you have a 
wrapper. Reads tunnel through and read what's at the bottom from the original 
only if nothing has changed. Changes are stored in the wrapper. The wrapper 
serves as a sort of modified value storage. 

This way what is bubbled up to the network layer is the original copy with the 
wrapper around it. The necessary information is read from it and returned to 
the client without copying while still having the effects of the interceptors 
incorporated into the returned results.

WDYT? 

> [Perf] Modify the way we process entries to be returned
> ---
>
> Key: DIRSERVER-1719
> URL: https://issues.apache.org/jira/browse/DIRSERVER-1719
> Project: Directory ApacheDS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-M6
>Reporter: Emmanuel Lecharny
> Fix For: 2.0.0-M8
>
>
> Right now, we clone the entries we will return to the client just after 
> having fetched them from the backend. This is necessary as we will remove and 
> add some attributes and values from those entries, to comply with the user 
> request.
> Another idea would be to compute the attributes (and values) to return, and 
> when done, create a new entry with all those attributes.
> As a user rarely requires all the attributes (including the operational 
> ones), this might save some processing, as in the current system we copy all 
> the attributes, then we remove some of them.
> Even better, when the CoreSession is called from the LdapProtocol layer, we 
> don't have to copy the attributes at all, we just have to write on the socket 
> only the required attributes. This will be even faster than what we currently 
> do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Question about JDBM key/value replacement

2012-04-27 Thread Alex Karasulu
On Fri, Apr 27, 2012 at 8:11 PM, Selcuk AYA  wrote:

> On Fri, Apr 27, 2012 at 9:46 AM, Emmanuel Lécharny 
> wrote:
> > Le 4/27/12 6:35 PM, Selcuk AYA a écrit :
> >>
> >> On Fri, Apr 27, 2012 at 9:08 AM, Emmanuel Lécharny
> >>  wrote:
> >>>
> >>> We don't really care if that serialize the modifications, because the
> >>> server
> >>>
> >>> does not have to be fast when we inject data, only when we read data.
> At
> >>> least, we could think about a serialization over the operations on one
> >>> index
> >>> like the RdnIndex (as an entry modification may update more than one
> >>> entry
> >>> in this index).
> >>>
> >>> Is that a problem ?
> >>
> >> Depends. What I have been describing in the emails and trying to
> >> implement is an optimistic locking scheme where modifications can go
> >> in parallel unless they conflict. It seems we could just get a big
> >> lock for write txns  rather than dealing with optimistic concurrency
> >> control.
> >
> >
> > Ok, I see.
> >
> > What would be the impact on the txn branch if we fallback to such a
> system ?
>
> we would remove the conflict detection and txn retry in case of
> conflicts and change how RW txns are handled.
>
> >
> > What also would be the impact on the current code, assuming that we
> update
> > many elements on the RdnIndex, so that the optimistic locking scheme
> keeps
> > working ?
>
> You know this better. If trying to maintain optimistic locking
> adversely affects searches and we are OK with outstanding RW txn(this
> includes all the operations in the interceptor chain in case of a
> add/delete/modify), then we should get rid of optimistic locking.
>
>
IMO I don't think we should get rid of optimistic locking.

-- 
Best Regards,
-- Alex


Re: Index branch status ...

2012-04-27 Thread Alex Karasulu
On Fri, Apr 27, 2012 at 6:37 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> so this is the end of the week, and I have done some experiments with my
> Index branch (this branch has been created to remove the oneLevel and
> subLevel index, and to use the RdnIndex instead).
>
> First of all, the server is working just fine without those two indexes. I
> still have on-going improvements to get the full advantage of those removal
> (currently, we have some issues with JDBM when we want to update the keys
> to stoe the number of children/descendant each entry has, but this is
> something we are discussing atm), but baseline, the perfs are pretty much
> on par with trunks.
>
> That includes searches, not only modifications.
>
> Today, I did some profiling with the latest version of Yourkit, and I was
> able to tweak the server here and there to get some speed improvement. Here
> are the results I get :
>
> on Trunk :
>
> OBJECT level search, 19 680 entries returned per second (we only get one
> entry)
> ONE level search, 27 900 entries returned per second (we get five entries)
> SUBTREE level search, 17870 entries returned per second (we get 10 entries)
>
> Keep in mind that those tests are done with an embedded server, it does
> not go through any network layer. So to speak, it gives the raw caacity of
> the server to deliver entries, no more.
>
> On the branch now :
> OBJECT level search, 33 373 entries returned per second (we only get one
> entry) : this is a 70% improvement !
> ONE level search, 45 695 entries returned per second (we get five
> entries): this is a 63% improvement !
> SUBTREE level search, 35 300 entries returned per second (we get 10
> entries): this is a 97% improvement !!!
>
>
This is awesome. Nice job man!


-- 
Best Regards,
-- Alex


Re: svn commit: r1330754 [1/3] - in /directory/apacheds/branches/index-work: core-constants/src/main/java/org/apache/directory/server/constants/ jdbm-partition/src/test/java/org/apache/directory/serve

2012-04-26 Thread Alex Karasulu
On Thu, Apr 26, 2012 at 1:22 PM,  wrote:

> Author: elecharny
> Date: Thu Apr 26 10:22:13 2012
> New Revision: 1330754
>
> URL: http://svn.apache.org/viewvc?rev=1330754&view=rev
>


> o Changed the IndexEntry interface to reflect the nature of the stored
> elements : namely, a key and an ID. The getValue/setValue have been renamed
> getKey/setKey
>
>
There was a big reason why this was getValue/setValue as opposed to
getKey/setKey. You sure this was a good decision to change this?

I must admit I don't immediately remember what that reason was but I too
wanted to change it to getKey/setKey but I resisted that when I had more
information available to me. At this point I don't remember it fully but
know it happens to do with directionality when using reverse or direct
btrees in the index.

Regards,
Alex


Re: Questions regarding Aliases

2012-04-25 Thread Alex Karasulu
On Wed, Apr 25, 2012 at 2:54 AM, Emmanuel Lécharny wrote:

> Le 4/24/12 8:06 PM, Alex Karasulu a écrit :
>
>  On Tue, Apr 24, 2012 at 5:50 PM, Kiran Ayyagari**
>> wrote:
>>
>>  On Tue, Apr 24, 2012 at 7:59 PM, Emmanuel Lécharny
>>> wrote:
>>>
>>>> Le 4/24/12 4:19 PM, Kiran Ayyagari a écrit :
>>>>
>>>>> On Tue, Apr 24, 2012 at 4:58 PM, Emmanuel Lécharny>>>> >
>>>>>  wrote:
>>>>>
>>>>>> 3) Current Aliases index
>>>>>>
>>>>>> I didn't have time to check what those index are used for, so if
>>>>>>
>>>>> someone
>>>
>>>> can
>>>>>> give me a quick heads up, that would save me a few days of code
>>>>>>
>>>>> diging...
>>>
>>>> this is used exactly for solving the above mentioned 'cycle detection'
>>>>> problem
>>>>>
>>>>
>>>> This was what I suspected, but how does it work ?
>>>>
>>>>  we lookup the target entry's ID in the alias Index, if it is present
>>> we don't allow adding the current entry
>>>
>>
>> Kiran is right. There is some documentation on this available in our
>> site's
>> developer docuentation section here:
>>
>>  
>> http://directory.apache.org/**apacheds/1.5/alias-and-index.**html<http://directory.apache.org/apacheds/1.5/alias-and-index.html>
>>
>> Oooops it looks empty thought :(. Instead I have a paper I wrote on Alias
>> Dereferencing here on this page which you can access and download:
>>
>>  
>> http://people.apache.org/~**akarasulu/<http://people.apache.org/~akarasulu/>
>>
>> Note this paper is ancient. Perhaps even from 2000. However it should be
>> pretty much up to date with perhaps a name change of the indices used.
>>
>
> Thanks a lot ! I don't think we have had a lot of modification in the
> alias handling those past 5 years.
>
> I'll read this paper tomorrow.
>
> Right now, I'm finishing the SubLevel index removal. So far, I just have
> issues in LdifPartion (JdbmPartition tetss are now passing).
>
> What I did is that I have created a DescendantCursor that recursively go
> down the RdnIndex tree, fetching the children up to the point they have no
> children. It works well. For LdifPartition, it's slightly different as we
> have to udate the underlying AvlIndex (which works) but also update the
> ldif files with the minimum modifications.
>
> Note that it's done in a branch atm.
>
> Once this will be done, and merged back in trunk, I think we will be able
> to cut a release.
>
>
>
Sounds great. Thanks!

-- 
Best Regards,
-- Alex


Re: Questions regarding Aliases

2012-04-24 Thread Alex Karasulu
On Tue, Apr 24, 2012 at 5:50 PM, Kiran Ayyagari wrote:

> On Tue, Apr 24, 2012 at 7:59 PM, Emmanuel Lécharny 
> wrote:
> > Le 4/24/12 4:19 PM, Kiran Ayyagari a écrit :
> >>
> >> On Tue, Apr 24, 2012 at 4:58 PM, Emmanuel Lécharny
> >>  wrote:
> >>>
> >>> 3) Current Aliases index
> >>>
> >>> I didn't have time to check what those index are used for, so if
> someone
> >>> can
> >>> give me a quick heads up, that would save me a few days of code
> diging...
> >>>
> >> this is used exactly for solving the above mentioned 'cycle detection'
> >> problem
> >
> >
> > This was what I suspected, but how does it work ?
> >
> we lookup the target entry's ID in the alias Index, if it is present
> we don't allow adding the current entry


Kiran is right. There is some documentation on this available in our site's
developer docuentation section here:

 http://directory.apache.org/apacheds/1.5/alias-and-index.html

Oooops it looks empty thought :(. Instead I have a paper I wrote on Alias
Dereferencing here on this page which you can access and download:

 http://people.apache.org/~akarasulu/

Note this paper is ancient. Perhaps even from 2000. However it should be
pretty much up to date with perhaps a name change of the indices used.

-- 
Best Regards,
-- Alex


Re: [index] OneLevelIndex removal

2012-04-12 Thread Alex Karasulu
On Thu, Apr 12, 2012 at 7:00 PM, Selcuk AYA  wrote:

> On Thu, Apr 12, 2012 at 7:14 AM, Alex Karasulu 
> wrote:
> >
> >
> > On Thu, Apr 12, 2012 at 4:35 PM, Emmanuel Lécharny 
> > wrote:
> >>
> >> Forgot to reply to this mail, which raises interesting points.
> >>
> >> More inside.
> >>
> >> Le 4/11/12 10:38 PM, Alex Karasulu a écrit :
> >>>
> >>> On Wed, Apr 11, 2012 at 4:04 PM, Emmanuel
> >>> Lécharnywrote:
> >>>
> >>>> I think we should add some mechanism in the server to check that
> >>>> automatically, to avoid doing it by hand (there are hundreds of tests
> to
> >>>> check...). One solution would be to keep a track of every cursor
> >>>> construction in a HashMap, and to remove them when the cursor is
> closed.
> >>>> The remaining cursors are likely not closed.
> >>>
> >>>
> >>> It would be nice to have a Cursor monitor that every opened Cursor
> >>> registers with but this needs to happen automatically. Then when out of
> >>> the
> >>> creation scope the Cursor is expected to be closed and if not this is
> >>> handled automatically. However does creation scope work well since
> >>> sometimes we create Cursors and pass them up?
> >>
> >> We do have a monitor, which is currently used to check that the cursor
> is
> >> not closed when we try to use it. We certainly can use this monitor for
> more
> >> than just checking such thing.
> >>
> >> Now, the pb is that the scope is not as easy to determinate than for a
> >> variable in Java. For instance, if we consider persistent searches, or
> paged
> >> searches, or even an abandonned search request, the scope is pretty
> wide...
> >>
> >> Though we can have a set of rules that help us to close the cursor
> >> automatically :
> >> - if we get an exception during a SearchRequest, then the cursors must
> be
> >> closed immediately. As soon as we store the cursors into the
> SearchContext,
> >> this is pretty easy to do
> >> - an AbandonRequest will close the cursor automatically too (getting the
> >> cursor from the abandonned request)
> >> - when we process the SearchResultDone, we can also close the cursor for
> >> the current search request (this work for PagedSearch too)
> >> - for pagedSearch, if the user reset the search by sending 0 as the
> >> expected number of entries to return, then the cursor will be freed
> >> - for persistent searches, as it will be closed by an unbind or an
> abandon
> >> request, we are fine
> >> - when a client unbinds, then all the pending cursors will be closed.
> >>
> >> All in all, we have everything needed to close the cursors
> automatically,
> >> assuming we keep all the cursors into the session.
>
> For the server side, I would suggest policing this with a test. When
> cursors open, they can bump a global counter atomically. When they
> close, they can decrement it. We can have a test such that after a
> bunch of operations, this counter at the server side should be zero.
>
>
Great idea!

-- 
Best Regards,
-- Alex


Re: [Index] SubLevel removal

2012-04-12 Thread Alex Karasulu
On Thu, Apr 12, 2012 at 4:01 PM, Emmanuel Lécharny wrote:

> Hi guys,
> I'm currently working on the SubLevelIndex removal. This is slightly more
> complex than the OneLevel  index, and there are some choice to be made in
> this area.
>
> Basically, the idea is that we will first search the entry which is the
> parent, then start fetching all the descendant from this point.
>
> Considering we have such a tree :
>
> cn=A
>  cn=B
>  cn=C
>cn=D
>  cn=G
>cn=E
>  cn=F
>cn=H
>
> then a search for entries using a SUBTREE scope and a base DN 'cn=C, cn=A'
> will return the following set of entries (the filter is objectClass=*, to
> keep it simple) :
> {
>  cn=C,cn=A
>  cn=D,cn=C,cn=A
>  cn=G,cn=D,cn=C,cn=A
>  cn=E,cn=C,cn=A
> }
>
> Note that the same search with a ONE scope would return cn=D,cn=C,cn=A and
> cn=E,cn=C,cn=A, excluding the cn=C,cn=A entry, which is the base for this
> search).
>
> Now, we have two ways to get those entries :
> - depth-first traversal. We fetch every entry, and if it has some
> children, then we go down one level, and so on recursively, until we have
> exhausted all the descendant. The consequence is that the entries will be
> returned in this order :
>
>  cn=C,cn=A
>  cn=D,cn=C,cn=A
>  cn=G,cn=D,cn=C,cn=A
>  cn=E,cn=C,cn=A
>
> - Breadth-first traversal. This time, we exhaust all the children for the
> current level, and then we go down one level, read all the entries, etc. We
> never go down one level if all the entries at the same level have not been
> processed. The entries will be returned in this order :
>
>  cn=C,cn=A
>  cn=D,cn=C,cn=A
>  cn=E,cn=C,cn=A
>  cn=G,cn=D,cn=C,cn=A
>
> The problem with the breadth-first approach is that it puts way more
> pressure on the memory, as we have to keep in memory all the entries that
> have children. With the depth-first approach, we just proceed with a new
> cursor when we have at least one children, so we will have as many cursors
> in memory as the tree's depth (so if we have a 10 levels tree, we will keep
> 10 cursors max). OTOH, the order might be a bit strange (even if there is
> no guarantee whatsoever that the server will return the entry in any given
> order).
>
> IMO, the depth-first approach is the one which offers the best balance
> between performance and memory consumption. Also the return order is not
> that critical assuming that we have implemented the Sort control (which is
> not yet part of our implementation).
>
> Any better idea, or any comments ?
>

My initial thought right after reading the email was DFS definitely because
of the following reasons:

(1) As you say order is not important unless a sort control is used and
that's a separate issue
(2) DFS is less memory intensive than BFS since it allows the GC to free up
allocations when stack frames are popped.
(3) Most DIT's today are rather flat so the DFS cursors will not grow that
large and the BFS approach can blow out memory when you have millions of
sibling entries under a container.

Then I got the idea of making this configurable so we can play with
parameters but that might not be worth the pain. Thoughts?

-- 
Best Regards,
-- Alex


Re: [index] OneLevelIndex removal

2012-04-12 Thread Alex Karasulu
On Thu, Apr 12, 2012 at 4:35 PM, Emmanuel Lécharny wrote:

> Forgot to reply to this mail, which raises interesting points.
>
> More inside.
>
> Le 4/11/12 10:38 PM, Alex Karasulu a écrit :
>
>> On Wed, Apr 11, 2012 at 4:04 PM, Emmanuel Lécharny**
>> wrote:
>>
>>  I think we should add some mechanism in the server to check that
>>> automatically, to avoid doing it by hand (there are hundreds of tests to
>>> check...). One solution would be to keep a track of every cursor
>>> construction in a HashMap, and to remove them when the cursor is closed.
>>> The remaining cursors are likely not closed.
>>>
>>
>> It would be nice to have a Cursor monitor that every opened Cursor
>> registers with but this needs to happen automatically. Then when out of
>> the
>> creation scope the Cursor is expected to be closed and if not this is
>> handled automatically. However does creation scope work well since
>> sometimes we create Cursors and pass them up?
>>
> We do have a monitor, which is currently used to check that the cursor is
> not closed when we try to use it. We certainly can use this monitor for
> more than just checking such thing.
>
> Now, the pb is that the scope is not as easy to determinate than for a
> variable in Java. For instance, if we consider persistent searches, or
> paged searches, or even an abandonned search request, the scope is pretty
> wide...
>
> Though we can have a set of rules that help us to close the cursor
> automatically :
> - if we get an exception during a SearchRequest, then the cursors must be
> closed immediately. As soon as we store the cursors into the SearchContext,
> this is pretty easy to do
> - an AbandonRequest will close the cursor automatically too (getting the
> cursor from the abandonned request)
> - when we process the SearchResultDone, we can also close the cursor for
> the current search request (this work for PagedSearch too)
> - for pagedSearch, if the user reset the search by sending 0 as the
> expected number of entries to return, then the cursor will be freed
> - for persistent searches, as it will be closed by an unbind or an abandon
> request, we are fine
> - when a client unbinds, then all the pending cursors will be closed.
>
> All in all, we have everything needed to close the cursors automatically,
> assuming we keep all the cursors into the session.
>
>
These are really great suggestions and make the ideas I tried to express
really tangible. Thanks for it Emmanuel.

One technical point, we need to make Cursor close() operations idempotent
if they are not already - meaning if we close a second time this should not
cause an exception or change the outcome.


> On the client side, this is another issue... As cursors are created by the
> client code, we have no easy way to determinate when we should close the
> cursors, except when the connection is closed or an abandon request/unbind
> request is sent. Of course, when the server returns a searchResultDone we
> could also close the cursor. Remains the situations where the client has
> fetched some entries (but not all), and haven't unbind nor abandonned the
> search.
>
>
I think the aspect for automatic closing of cursors is left to be managed
inside the server even though the API overlaps here.


> In any case, this is less critical as we don't have to deal with the txn
> layer. The client will just blow away with some nasty OOM sooner or
> later... but this is not worse than what we get with NamingEnumeration in
> JNDI, nah ?
>
>
Yup +1


> Have I covered all the server options ? Or did I miss something ?
>
>
>> This sounds like something that can be handled nicely using an aspect
>> oriented solution. Now these things are heavy if you use AspectJ or
>> something like that but other simpler solutions exist to bytecode splice
>> compiled code to automatically handle these things. Maybe our past
>> experiences with Aspects might make us reconsider.
>>
> A bit overkilling, IMO?
>
>
I'm feeling the same but thought it should be just put out there. However
we can achieve the same results perhaps with code or using a lighter
mechanism with Proxy's via CGlib or something similar. These are just raw
thought dumps so it's not a we SHOULD recommendation. Something to think
about.

-- 
Best Regards,
-- Alex


Re: [Index] OneLevel index removal, performances

2012-04-11 Thread Alex Karasulu
On Wed, Apr 11, 2012 at 5:55 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> Kiran and I conducted some quick performance tests to compare the numbers
> we get with the server with no OneLevel index compared to the trunk before
> the merge. It's quite interesting :
>
> Operation (before/after) per second
>
> Add : 222/264 (me) 649/746 (Kiran)
> Delete : 156/191 (me) 442/544 (Kiran)
> Search : 5215/5214 (me) 19932/20335 (Kiran)
> Move : 308/303 (me)
> Rename : 380/392 (me)
> MoveAndRename : 204/275 (me)
>
>
What exactly do these numbers correspond to? Is this a single operation or
are they averages? Also I see for your machine you have a number/number -
what's this mean?


> My machine is an old Linux computer with an old CPU, Kiran has a quad core
> recent computer.
>
>
This definitely should mean your disadvantaged however there are other
considerations besides just CPU like disk access speeds. However if your
machine is much older it's feasible that it's disk is slower.

It's best regardless to compare these operations on the same machine.


> The modifications are now around 20% faster (add/delete).


That's awesome but can you run it on the same hardware? That way it might
actually be more.


> The Move operation has been deeply modified and is not optimal in the new
> operation. We can most certainly improve it.
>
> The very nice point is that searches are not slowed down by the removal of
> the index.
>
>
I think we need more tests to confirm this if we're only performing a
single search. There are several parameters to the search and if we're
doing a single entry lookup we might not be seeing the impact fully.


> Let's expect that the subLevelIndex removal will carry some new perf
> improvements !
>
>
I hope so too. This is great work ... I'm just curious about the results.
If you need more test resources I can help out as well.

-- 
Best Regards,
-- Alex


Re: [index] OneLevelIndex removal

2012-04-11 Thread Alex Karasulu
On Wed, Apr 11, 2012 at 4:04 PM, Emmanuel Lécharny wrote:

> Hi guys !
>
> so I completely removed the OneLevelIndex from the server. The branch
> (index) has been successfully merged back into trunk, and I will now work
> on removing the SublevelIndex from the index branch.
>
> In the process, I spent 3 days closing all the cursors that weren't closed
> after having been used. This was *BORING*. In the future, I would really
> appreciate if those who use the cursors double check that they have closed
> them.
>
> To do that, I added some logs in every cursor constructor and every
> close() method, and matched the opens with the closes. Do'nt ask me if this
> was fun to match them... I created a small program which was able to do
> that for me, but this is not enough to know that a cursor has been created
> but not closed, we also have to know where it has been created.
>
> I think we should add some mechanism in the server to check that
> automatically, to avoid doing it by hand (there are hundreds of tests to
> check...). One solution would be to keep a track of every cursor
> construction in a HashMap, and to remove them when the cursor is closed.
> The remaining cursors are likely not closed.


It would be nice to have a Cursor monitor that every opened Cursor
registers with but this needs to happen automatically. Then when out of the
creation scope the Cursor is expected to be closed and if not this is
handled automatically. However does creation scope work well since
sometimes we create Cursors and pass them up?

This sounds like something that can be handled nicely using an aspect
oriented solution. Now these things are heavy if you use AspectJ or
something like that but other simpler solutions exist to bytecode splice
compiled code to automatically handle these things. Maybe our past
experiences with Aspects might make us reconsider.


> The pb is that it gives no clue about where those cursors have been
> created, unless we associate a stackTrace to this information. Really not
> possible in production, but we might add an extended request to activate
> this mode, or a flag in the config.
>
> If anyone has a better idea ?
>
>
An aspect oriented solution might work well here.


> Thanks !
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: [index] OneLevel index removal headsup

2012-03-30 Thread Alex Karasulu
On Fri, Mar 30, 2012 at 10:58 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> as I'm goig to be offline often this week-end, just a short mail to give
> you some info about what I'm doing and how far I currently am.
>
> The idea is to get rid of the OneLevel index, as I already said in one of
> my previous mails. I had hard time to understand how to use Cursors,
> especially on top of the RdnIndex, but as of today, I finally get something
> working fine to replace the OneLevel index.
> I'm removing all the calls to this index in the code one by one, checking
> that teh server still works as expected. I still have to deal with some
> calls in the LdifPartition, but I'm close.
>
> We can now use the RdnIdx to get a list of children for a given entry, and
> I have designed a dedicated cursor (ChildrenCursor) for that. Right now,
> it's more a hack than anything else (I'm counting the nulber of children
> while returning them, stopping when I reach the expected number of
> children), but I will use a better solution later (basically, I'll check
> the parent ID of each element I pull from the index). It works fine.
>
>
Cool glad to see this working well.


> At the same time, I'm trying to cleanup a bit the Cursor hierarchy. I was
> able to remove a couple of classes and interfaces that were useless, and
> I'm pretty sure we can go farther. The generics are a bit messy, and we
> often have to mask the to get things working.
>
>
I think it's a big mistake to couple the new feature you're adding with
cleanup and refactoring:

(1) This 'cleanup' effort convolutes evaluating your branch come time to
merge. I'm sure you want people to review your work but if you mix this up
with another aspect, 'the cleanup' effort, then this creates extra
background noise. Just add this feature so we can clearly see what the
feature changes are when it's time to do an evaluation. I know you
naturally will want us to collaborate when you're ready to merge. Keeping
the refactoring down to a minimum while adding the feature will make it
easier and less time consuming for us to review these changes and provide
feedback.

(2) You yourself say you're new to the code in this region of the server.
Learning the code by changing it is not a good practice. If you want to do
cleanups do them at another time, or before you start the feature work, or
just note them, or do so in a separate branch. Either way separate these
two initiatives.

(3) This is a very complex region of code and a lot of thought has gone
into it. I am sure many things can be improved but they should be done with
care. Anyone of us can think something is useless but really it might not
be, but we don't see this until later. So there's less risk to this if you
separate the cleanup effort from the removal of the one and sub level
indices.

(4) Changing the code so you feel more comfort while working in it is
understandable but this can impact others. It's good to cleanup but your
changes might cause problems for others. You should be extra careful
especially because this is complex code, something that has been around for
ages, and you're new to it.

Please don't take this as an attack but as a recommendation from a peer. My
concern is that I'm going to look at this code down the line and see that
many things were changed and those changes made it unfamiliar to me without
much gain. I'm sure you also want me to be able to work with you in that
code base so we can bounce ideas around together. If the code is reworked
without good reasons this makes it that much harder for me and others to
participate. Plus I can no longer give the same amount of time I gave in
the past so I want to help but if this takes very long I cannot do so as
well as I'm asked to  just trying to balance all this.

I'm also quite sure that we should abstract more on top of the Table
> implementation : we don't have a generic Browser, and that leads to a
> duplication of cursors (Avl cursors and Jdbm cursors). We most certainly
> can do better.
>
>
Again I wish you could separate these various initiatives so it makes it
very clear in the differentials. Mixing all these objectives in one big
branch merge will make getting others to work together with you more
difficult.


> I will continue up to the point I can completely remove the OneLevel index
> (which is still created and manager), then I'll do the same thing for the
> SubLevelIndex.
>
>
Can't wait to see both indices go. This is a good objective and I'm
thankful someone is taking this on.


> Btw, that could help the txn layer, as there will be two less index to
> manage...
>
>
+1


> That's pretty much it for this week.  Have fun !
>
>
You too have a good weekend.


-- 
Best Regards,
-- Alex


Re: [index] OneLevel and SubLevel index

2012-03-28 Thread Alex Karasulu
On Thu, Mar 22, 2012 at 8:38 PM, Emmanuel Lécharny wrote:

> Hi,
>
> one last mail...
>
> I will just mention that we alreadu discussed this mater (ie should we
> keep oneLevel and sublevel index) last year with Stefan, and he thinks that
> we can get rid of those two guys :
>
> http://mail-archives.apache.**org/mod_mbox/directory-dev/**
> 201110.mbox/%3CCAPz8h_**WfA2Wa2iZpSdwSswKGoFw9aun1---**
> pUvorVpduSLKFgQ@mail.gmail.**com%3E
>
> and
>
> http://mail-archives.apache.**org/mod_mbox/directory-dev/**
> 201110.mbox/%3CCAPz8h_**VTUQKRtgaN4QZ+**4PX3KJTu6DPPjAw6k9roMXvnD7Ok_**
> q...@mail.gmail.com%3E
>
> No need to rehash them, it's probably time to get the branch merged with
> the trunk?
>
>
We should review this then proceed accordingly.

-- 
Best Regards,
-- Alex


Re: [index] RDN

2012-03-28 Thread Alex Karasulu
On Thu, Mar 22, 2012 at 8:33 PM, Emmanuel Lécharny wrote:

> Hi,
>
> a second mail about index, specifically those related to the manipulation
> of the DN.
>
> RDN index
> -
>
> First of all, let's present the data structure used for the Rdn index.
> It's not exactly a RDN index, as the key is a composite structure : the
> ParentIdAndRDN (it contains the entry ID of it's parent in the DIT and the
> entry's RDN). More, it's not exactly a RDN that we store, as for the
> partition root, we store the partition DN (which can contain more than one
> RDN).
>
> One example first. Let's consider that we have a partition
> 'dc=example,dc=com' with one entry 'ou=apache'. The DIT will be like :
>
> dc=example,dc=com[3]
>  |
>  +-- ou=apache[7]
>
> Here, the partition has an ID = 3 and the entry has an id = 7.
>
> Thus, the RDN index will contain the following tuples :
>
> <0, dc=example,dc=com> --> 3   // As the parentID of a partition is the
> root DSE, which ID is 0
> <3, ou=apache> --> 7   // The parentID is 3, and the entry ID is 7
>
> So far, so good. Now, how do we use the index ? And should we have a
> forward and a reverse table, or is only one of those two table is enough ?
>
> First, let's say we want to lookup for an entry knowing its DN. here, we
> want to fetch the 'ou=apache,dc=example,dc=com' entry. How do we do that ?

We use the forward table, starting from the right of the DN.
>
> 1) dc=example,dc=com is the right part, and it's the partition. In the
> forward RDN index, we find :
> <0, dc=example,dc=com> --> 3
> when we search for the element which has 0 (rootDSE ID) as an parentID and
> 'dc=example,dc=com' as the current RDN.
>
> 2) Now, we can move forward in the DN, and pick the next RDN
> ('ou=apache'). Combined with the previous parent ID (3), we can search the
> Rdn forward index, searching for the entry ID for <3, ou=apache>. We get
> back 7, which allows us to fetch the entry from the master table.
>
> Baseline, we *need* the forward table in the RDN index. Now, do we need
> the reverse table ?
>
> Let's say we have found an entry in the master table, and that we have to
> return it back to the client. This entry does not have any DN, we have to
> construct it.
>
> This is done using the reverse RDN index, this way :
>
> 1) We fetch the ParentIdAndRdn from the reverse table, for the entry ID.
> In our example, using 7 as the key, we get back <3, ou=apache>. The left
> part of the DN is now known : 'ou=apache'
>
> 2) With the previous ParentIdAndRdn, we can fetch the parent element using
> the parent ID (3). We get back <0, dc=example,dc=com>. We can now
> concatenate the found RDN to the previous one, giving
> 'ou=apache,dc=example,dc=com'.
>
> We are done, we have built back the DN, usinhg the reverse index.
>
> So basically, yes, the two table (forward and reverse) are mandatory for
> this table.


Agreed.

-- 
Best Regards,
-- Alex


Thinking about others after you (Re: [index] RDN)

2012-03-26 Thread Alex Karasulu
On Thu, Mar 22, 2012 at 8:33 PM, Emmanuel Lécharny wrote:

> Hi,
>
> a second mail about index, specifically those related to the manipulation
> of the DN.
>
> RDN index
> -
>
> First of all, let's present the data structure used for the Rdn index.
> It's not exactly a RDN index, as the key is a composite structure : the
> ParentIdAndRDN (it contains the entry ID of it's parent in the DIT and the
> entry's RDN). More, it's not exactly a RDN that we store, as for the
> partition root, we store the partition DN (which can contain more than one
> RDN).
>
>

These changes are some of the largest changes since the search algorithm
has been designed over 10 years ago and impact one of the more complex
parts of the server.

I am appalled at this not being documented when the changes were made. I
personally took a lot of effort in making sure the documentation was
pretty, informative and allowed new comers and those new to this region of
the code to understand how it worked. Those who made these changes
benefited from my thorough documentation. They need to carry on the
tradition and make sure others after them also benefit, and don't detriment
from it by documenting their changes.

Why is it that others do not feel this way? I can understand straight
forward areas that document themselves well in code alone but people need
to consider new comers that want to get into the more complex code. They
should update these parts of the documentation even if it's not pretty
documentation.

Also it's no excuse to say we need the 2.0 to stabilize otherwise the docs
will change. Over the past ten years very little has changed in our search
algorithm. So this is not a region of code that we f**k with often since
it's so critical to the servers operation.

Committers making changes to "complex" regions of code should think about
others that come after them to maintain their code. This is just good
community citizenship. Those that don't really don't care about the
community. Being polite is not just about correct speech with words like
sir and madam. It's ones actions considering others.


Sorry Emmanuel this was a bit off topic but it's something we all need to
consider. I will try to respond to your comments later. I'm not immediately
available these days but I will not forget to answer these emails even if
it takes me some time.

-- 
Best Regards,
-- Alex


Re: [index] reverse index usage for user attributes

2012-03-26 Thread Alex Karasulu
On Thu, Mar 22, 2012 at 7:20 PM, Emmanuel Lécharny wrote:

> Hi,
>
> Currently, we create a forward and a reverse index for each index
> attribute. The forward index is used when we annotate a filter, searching
> for the number of candidate it filters. The reverse index usage is a bit
> more subtile.
>
>
[Hope you don't mind below I just use concise English to express the
process - not trying to insult your description - apologies if it sounds
this way.]

To state this more concisely, the reverse index facilitates rapid Evaluator
evaluations of non-candidate generating assertions in the filter AST. As
you know, one assertion in the filter (scope assertions included) is
selected as the candidate generating assertion which uses a Cursor to
generate candidates matching it's logic. The other filter assertions in the
AST are used to evaluate whether or not the generated candidates match.

NOTE: the optimizer annotates the AST's nodes with scan counts and this is
used to drive selection of the proper candidate generating assertion in the
AST. This is done using a DFS through the AST to find the lowest scan count
containing leaf node (an assertion). This reduces the search set.

Let's consider a filter like (cn=jo*). As we don't have any clue about the
> value (it will start with 'jo', but that's all we know), we can't use the
> 'cn' forward index.


Yep the scan count annotation used for the cn=jo* assertion will be the
total count of entries in the DIB since the BTree cannot give us a count
figure. So depending on what scope is used it will most likely be the scope
assertion that will drive candidate generation since it will most likely
have a smaller scan count.


> We can't use the 'cn' reverse index either, at least directly. So the
> engine will use the scope to determinate a list of candidate : {E1, E2, ...
> En}. Now, instead of fetching all those entries from the master table, what
> we can do is to use the 'cn' reverse table, as we have a list of entry IDs.
>
> For each entry id in {E1, E2, ... En}, we will check if the 'cn' reverse
> table contain some value starting with 'jo'.
>
>
Yep each candidate generated from the scope assertion E1, E2, ... En will
use the ID to look into the reverse BTree and check if the value for that
candidate matches jo*.


> If th 'cn' index contains the two following table :
> forward =
>  john --> {E1, E3}
>  joe  --> {E2, E3, E5}
>  ...
>
> reverse =
>  E1 --> john
>  E2 --> joe
>  E3 --> joe, john
>  E5 --> joe
>  ...
>
>
Yep.


> then using the reverse table, we will find that the E1, E2, E3 and E5
> entry match, when all the other aren't. No need to fetch the E4, ... En
> entries from the master table.
>
> Now, exploiting this rverse table means we read a btree. Which has the
> same cost than reading the Master Table (except that we won't have to
> deserialize the entry).
>
>
IO time agreed.


> What if we don't have a reverse table ?
>
> We will have to deserialize the entries, all of them from the {E1, E2, ...
> En} set filtered by the scope.
>
> Is this better then to build a reverse index ?


The only issue with this is that it will churn the entry cache. Meaning
there's some proximity value in a settled cache due to the kinds of queries
that generally occur. Optimally a cache should contain those entires most
often looked up.

A large master table scan will whip away a settled cache's knowledge. Using
a reverse index instead has more value in this case. It's a delicate
balance.


> Not necessarily. In the case where we have more than one node in the
> filter, for instance (&(cn=jo*)(sn=test)(**objectClass=person)), then
> using the reverse index means we will access as many btrees as we have
> index attributes in the filter node. Here, if cn, sn and objectClass are
> indexed, we will potentially access 4 btrees (the scope will force us to
> consider using the oneLevel index or the subLevel index).
>
> At the end, it might be more costly, compared to using the entry and match
> it against the nodes in the filter.
>
>
Interesting point! The scan counts might help us out on a better
optimization for these kinds of cases.

If the search set is constrained (below some configurable threshold i.e.
10-50 entries) and if the filter uses many indices (above some threshold
i.e. 3-4 indices) then it might be better to just pull from the master
table directly without leveraging indices.

This will optimize for speed without blowing out the cache memory. What do
you think?


> When we have many entries already in the cache, thus sparing the cost of a
> deserialization, then accessing more than one BTree might be costly,
> comparing to using the entries themselves.
>
>
Agreed but again let me stress protecting the cache memory from a large
master table scan. I think we can take this strategy for the cases
mentioned above.


> An alternative
> --
>
> The pb is that the Node in the filter use a substring : jo*. Our index are
> built using the full normalized value. Th

Re: [index] Presence index usage

2012-03-25 Thread Alex Karasulu
On Thu, Mar 22, 2012 at 6:38 PM,  wrote:

> On Thu, Mar 22, 2012 at 05:28:50PM +0100, Emmanuel Lécharny wrote:
> > Le 3/22/12 5:11 PM, h...@symas.com a écrit :
> > >On Thu, Mar 22, 2012 at 04:40:17PM +0100, Emmanuel Lécharny wrote
>
> > >>Now, if we consider the fact that having all the AT stored in the
> > >>index will allow us to know what will be the impacted entries if an
> > >>AT is removed from the schema, then it can be a good thing to have a
> > >>complete index with all the AT.
> > >It's an interesting idea, if the admin was going to index it anyway.
> > >Otherwise, IMO you're optimizing for a very infrequent case, which
> > >is self-defeating.
> > Here, it's not about optimization, really.
> >
> > The idea is much more about bieng able to see if an AT removal from
> > the schema is likely to impact the data, without doing a full scan.
>
> Yes... but "avoiding a full scan" is just a (coarse) optimization of
> the schema change.
>
> > Not sure it's a sane politic though : removing an AT from a
> > production server sounds a bad idea...
>
> Agreed. And again, even if it's for a valid reason, it will occur once
> in a blue moon. Who cares how long it takes?
>
> If you're really concerned about this scenario, sounds like a refcount
> on the schema elements would be more straightforward.
>

+1 this would be easier to comprehend and maintain in the long run verses
this mechanism which couples the index to ref-count like functionality.

-- 
Best Regards,
-- Alex


Re: PresenceIndex : why is it updated only for indexed AT ?

2012-03-22 Thread Alex Karasulu
Hi Emmanuel,

On Wed, Mar 21, 2012 at 5:21 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> as I was reviewing the presenceIndex usage this morning, I found that we
> only add entries in this index if the AT is indexed.


This is correct and was explicitly intended for this exact behavior. More
below on why ...


> For instance, if the sn AT is indexed, then we will have some  EntryID> elements in the presence index.
>
> Is there any rational beind this choice ?
>
>
This decision was made a while back. The rational was if an AT is not
indexed, having it's own index, then adding entries for that AT on any
system index might create complications that don't necessarily benefit
performance in a consistent fashion. Plus it might actually significantly
harm performance and unnecessarily bloat some system indices. This is an
all or nothing approach. Otherwise behavior will be more difficult to
understand and manage as well.


> Otherwise, that means we will do a full scan on the master table when we
> have a filter like (sn=*) if the sn AT is not indexed...
>
>
Yep and the user should index sn if he's going to use it in search filters.
All indexing is enabled even on use of system indices if the AT is indexed
otherwise we do nothing for that AT. If you want (sn=*) to perform fast
just as (sn=dickens) then index the surname attribute. Also another premise
is, if you will want to use (sn=*) you're likely going to use (sn=foo) too
at some point in your applications. So the all or nothing approach probably
pays off. In the end the all or nothing 0 or 1 approach is much simpler and
easier to understand. You have an index, no deserializing needed in any
filter assertion with the indexed AT. If no index then you're screwed
regardless of the filter assertion operand used.

The alternative of course has not been tested so this theory may not be
reality when tested. It would be nice to test but I think we should do this
in an experimental branch.

-- 
Best Regards,
-- Alex


Re: Index cleanup and improvements

2012-03-20 Thread Alex Karasulu
Hi Emm,

Man I really think you're playing with fire here. At least please work in a
branch in case something goes wrong. These are not simple changes. I don't
think you can presently remove the reverse index but who knows I might be
wrong. I remember you tried this before and decided it was not a good idea
or something.

Plus this is going to big time effect Selcuk on a merge. Can we at least
hold off on this stuff or work a little more incrementally?

Thanks,
Alex

On Tue, Mar 20, 2012 at 7:37 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> today, I removed the GenericIndex class, which was used only to create the
> index from the configuration. I just delegated the creation to the
> XXXPartition classes, using an abstract method (createSystemIndex) in the
> AbstractBtreePartition class. it works well. Now, i'm trying to get rid of
> all the normalization done in the index implementation : we should *never*
> normalize there, all the values should already have been normalized before
> (this will save some extra time).
>
> The next step would probably be to get rid of the reverse index from
> almost all the index but the Rdn index (where it's mandatory). That means
> we will have to reorganize the index content, moving the reverse table to
> the RdnIndex (for both the Avl and Jdbm indexes). I'm not sure I'll work on
> that right now.
>
> Two more things :
> - it could be a good idea to add a system index for AttributeType, it
> could help us to know which entries will be impacted if we remove an AT
> - we should get rid of the oneLevel and subLevel index, and using the Rdn
> index instead. Stefan has already successfully removed them for the Hbase
> partition, there is no reason we should not get rid of them for the Jdbm
> and Avl partitions too.
>
> All in all, removal of those 2 indexes, plus removal of the reverse index,
> plus removal of the spurious normalization could give us better
> performances for the modifications operations.
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: Question about the Log file

2012-03-16 Thread Alex Karasulu
On Fri, Mar 16, 2012 at 8:07 PM, Selcuk AYA  wrote:

> On Fri, Mar 16, 2012 at 2:26 AM, Emmanuel Lécharny 
> wrote:
> > Hi,
> >
> > AFAICS, the og file contains a buffer which stores UserLogRecord until we
> > write them on disk. This allows the Log system to be fast, and also
> allow a
> > dedicated thread to flush the data on disk regularly.
> >
> > So far, so good.
> >
> > But I'm wondering if it would'nt be simpler to use a MemoryMappedFile
> > instead, as the Log file size is fixed. We will then let the underlying
> OS
> > to deal with flushes, not needing a dedicated thread to handle it. Also
> > MemoryMappedFile are faster than RandomFile, as it's working on pages,
> which
> > are managed by the OS (their size is depending on the OS tuning). Last,
> not
> > least, we won't need to dedicate a 4Mb buffer to store temporary
> > userRecords, as MemoryMappedFiles aren't using the JVM memory.
>
> the current implementation is flexible to work with any underlying
> file system rather than being tied to a single implementation.
> Currently it is a random access file system. But memory mapped file
> implementation should work as well. I am also thinking of using HDFS
> files in the future.A couple of notes about some concerns:
>
> *I dont see 4MB being a big concern. This size can be tuned. It can
> also be made zero if the underlying implementation is good to deal
> with writes.
> *I dont think a dedicated background thread is a problem either. If we
> want, we can make user threads to do the log sync as they log the
> records.
>
> Right now this is very low priority given the things we need to
> implement to get the txns to work.
>
>
I agree that this is lower priority than getting the branch working
correctly. This is an optimization IMO. I'd like to get things working and
then get some metrics on performance. Then we can start looking at
optimizations.

It's certainly worth mentioning, noting and discussing. Maybe we should put
these notes into JIRA so we can force ourselves to get back to asking these
questions. I'm always thinking about what performance gains we can get from
using memory mapped files but never had the chance to try it out. I'd love
to put it to the test when we get the chance after getting a full
implementation completed.

-- 
Best Regards,
-- Alex


Re: svn commit: r1300690 - in /directory/apacheds/branches/apacheds-txns: core-api/src/main/java/org/apache/directory/server/core/api/log/ core-api/src/main/java/org/apache/directory/server/core/api/t

2012-03-14 Thread Alex Karasulu
On Wed, Mar 14, 2012 at 10:41 PM, Kiran Ayyagari wrote:

> Selcuk,
>
>I have seen you asking several times on this list for reverting commits,
>this seems to be a bit derogatory in OSS spirit and team work.
>

It can certainly be misinterpreted this way. I think we just need more
communication about why one may need a revert.

Revert requests are OK. There's nothing wrong with that and any committer
can veto a change but they just need to provide reasons. I think we just
need to help people understand this.

Selcuk I'm sure meant no harm and can provide more reasoning.


>Go ahead and make your changes on top of these if you wish to apply
>your fix, commits need not be reverted for this.
>
>
That's also another option but let's just communicate about whatever
difficulties or problems a commit might introduce. If someone is having
problems as a result of commits let's get those reasons out there on the
list.

I have a feeling the code is starting to move before some of the work can
be finished on it and that might produce discomfort. But we're not mind
readers so we need to communicate this.


> On Thu, Mar 15, 2012 at 1:39 AM, Selcuk AYA  wrote:
> > please revert this commit.
> >
>

-- 
Best Regards,
-- Alex


Re: Txn branch state and a few other things before I go sailing

2012-03-02 Thread Alex Karasulu
Thanks for the detailed report. Hope you have a great vacation.

Best,
Alex

On Sat, Mar 3, 2012 at 2:42 AM, Emmanuel Lécharny wrote:

> Hi guys,
>
> FYI, I'll be MIA for one week, with virtually no internet (yeah ! real
> vacations !).
>
> Today, I spent a part of my time suffering by merging the trunk into the
> txn-branch, in order to ease the reintegration we will do when the branch
> will be ready. No need to tell that it was more than painful :/ Anyway,
> it's done, and it builds, passes teh tests with flying colors, etc.
>
> I also fixed some nasty hack we introduced in kerberos-test, which were
> failing on ubuntu and windows, with the most recent JVM versions. Nasty,
> nasty, nasty. At least, a comment said the hack was atrocious :)
>
> The JDBM module has been extracted from Apacheds, and we now have an
> insolated project for it, with two versions :
> - the previous jdbm project
> - the version which has been modified by Selcuk
>
> We also bumped up all the dependencies and plugins we could, except the
> bundle-plugin, which makes the build bark. The 'project' module has to be
> released, right now, we are pointing on 27-SNAPSHOT in the other projects.
> Nothing bad, it can stay for a while like this.
>
> That's pretty much it, it was a tough week with a lot of release, I'd like
> to thank Pierre-Arnaud who mastered those releases, and Slecuk who made the
> txn branch working !
>
> So now, time to jump on a sailing boat, without any computer (but I'll
> bring my samsung Galaxy S2 just in case :)
>
> Have fun, I sure will have some !
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: DIRSERVER-1663 status with txn branch

2012-03-01 Thread Alex Karasulu
On Thu, Mar 1, 2012 at 5:13 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> I just tested the txn branch agaist the small concurrent access test I
> created last september : it seems that with teh fixes Selcuk has injected
> ino the code, we don't have any more issue !
>
> This is a fantastic news !
>
>
Yeah nice work guys.


> Now, that raise a question : should we merge immediately the branch into
> the trunk, or should we finish the ongoing work in the branch ?
>
>
It's up to you guys. Might introduce instability but that might make it
easier to solidify rapidly and not have to deal with merging.

-- 
Best Regards,
-- Alex


Re: [VOTE] Release of Apache Directory Studio 2.0.0-M3, Apache Directory LDAP API/Shared (1.0.0-M11), ApacheDS JDBM (2.0.0-M1) and ApacheDS (2.0.0-M6)

2012-02-28 Thread Alex Karasulu
On Mon, Feb 27, 2012 at 10:33 AM, Emmanuel Lécharny wrote:

> [X] +1 | Release Apache Directory Studio 2.0.0-M3 (2.0.0.v20120224), as
> well as Apache Directory LDAP API/Shared 1.0.0-M11, ApacheDS JDBM
> (2.0.0-M1) and ApacheDS 2.0.0-M6
>

Regardless of docgen issues.

+1

-- 
Best Regards,
-- Alex


Re: [NOTICE] Preparing a release for Shared, JDBM, ApacheDS and Studio

2012-02-24 Thread Alex Karasulu
thx pierre - kicking the tires

On Sat, Feb 25, 2012 at 12:50 AM, Pierre-Arnaud Marcelot 
wrote:

> Ok, it's done.
>
> I just sent the release vote.
>
> Thanks,
> Pierre-Arnaud
>
> On 24 févr. 2012, at 18:33, Pierre-Arnaud Marcelot wrote:
>
> > Some status about the release preparation.
> >
> > - Shared 1.0.0-M11--> Done!
> > - JDBM 2.0.0-M1 --> Done!
> > - ApacheDS 2.0.0-M6 --> Done!
> > - Apache Directory Studio 2.0.0-M3 --> Performing the release...
> >
> > Concurrently, I'm also uploading the distributions on people.apache.org.
> >
> > I will inform the ML when the release preparation is over.
> >
> > Regards,
> > Pierre-Arnaud
> >
> >
> > On 24 févr. 2012, at 12:08, Pierre-Arnaud Marcelot wrote:
> >
> >> Hi,
> >>
> >> I'm currently preparing a release of the following projects:
> >> - Shared 1.0.0-M11
> >> - JDBM 2.0.0
> >> - ApacheDS 2.0.0-M6
> >> - Apache Directory Studio 2.0.0-M3
> >>
> >> Please, try to avoid committing too important stuff until I'm done.
> >>
> >> I will keep you informed during the process.
> >>
> >> Regards,
> >> Pierre-Arnaud
> >
>
>


-- 
Best Regards,
-- Alex


Re: JDBM sub-project creation : done

2012-02-23 Thread Alex Karasulu
thx E!

On Thu, Feb 23, 2012 at 12:29 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> so I created a separate project for JDBM. It contains two versions :
> - the one we used befoe Selcuk modified it to include MVCC into it (jdbm)
> - and Selcuk's version with MVCC (jdbm2)
>
> It's available on 
> https://svn.apache.org/repos/**asf/directory/jdbm/
>
> You have two sub-modules, jdbm and jdbm2.
>
> We currently point to jdbm2 in trunks-with-dependencies (I have added an
> external for it), so don't expect the project to be built fast, as the two
> modules will be compiled if you do a full build.
> This will change as soon as we will have cut a release for JDBM (very
> soon), and I'll then remove jdbm from the externals.
>
> Right now, the current rersion is 2.0.0-SNAPSHOT.
>
> Have fun !
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: [Vote] Make JDBM a side project

2012-02-21 Thread Alex Karasulu
On Tue, Feb 21, 2012 at 4:39 AM, Emmanuel Lécharny wrote:

> Hi,
>
> we forked JDBM a few years ago because we had some modification to include
> into it in order to be able to use it in Apacheds, as the JDBM comminuty
> wasn't responsive. We made it a sub-module of the apcheds project.
>
> Now, with the heavy changes Selcuk made on JDBM to include a MVCC layer
> into it, we may want to make JDBM a side project. I suggest here that we
> vote to move JDBM in its own project, as shared, studio or apacheds.
> ApacheDS will still heavily depend on it.
>
> So :
>
> [X] +1 : Make JDBM a separate project
>

+1

-- 
Best Regards,
-- Alex


Re: Dn, Rdn and Ava inconstancies

2012-02-20 Thread Alex Karasulu
On Tue, Feb 21, 2012 at 1:07 AM, Emmanuel Lécharny wrote:

> Le 2/20/12 11:58 PM, Alex Karasulu a écrit :
>
>  On Tue, Feb 21, 2012 at 12:49 AM, Emmanuel Lécharny*
>> *wrote:
>>
>>  Le 2/20/12 11:20 PM, Alex Karasulu a écrit :
>>>
>>>
>>>   So, the main issue is the way AVA handles values. As soon as we *know*
>>>>
>>>>> what we should expect when we create an AVA, then suddenly it becomes
>>>>> way
>>>>> easier. Basically, an AVA contains one type and one value. This value
>>>>> can
>>>>> be a String or a byte[], depending on the type. Saddly, if the AVA is
>>>>> not
>>>>> schema aware, we can't tell if the value is binary or String.
>>>>>
>>>>>
>>>>>  Sounds like it needs not to be schema aware but binary attribute
>>>>> aware:
>>>>>
>>>> a
>>>> subset of the schema. This is the first level of correctness.
>>>>
>>>>  That's not enough. We need to normalize the valus inside the server,
>>> and
>>> that means we have full access to the schema.
>>>
>>>
>>>  I meant NOT in the server but in the client. This is the minimum
>> requirement for the client.
>>
>>
>>  The next level depends on whether or not we have the full schema
>>>> available
>>>> to properly normalize the value.
>>>>
>>>>  yep, exactly. Just having the AVA being aware of the type of the value
>>> is
>>> not enough.
>>>
>>>
>>>  This is the minimum we need on the server side.
>>
>>  SNIP ...
>>
>> Basically, we will have two forms for an AVA :
>>
>>> - a User Provided form (the standard form)
>>>>> - a Normalized form which will differ depending on the fact that the
>>>>> AVA
>>>>> is schema aware, or not.
>>>>>
>>>>>
>>>>>  Note that whether schema aware or UN-aware AVAs will still need to be
>>>>>
>>>> binary type aware.
>>>>
>>>>  Well, not necessarily. When you inject binary values into an AVA, it's
>>> generally done through the parsing of a DN or an RDN. In this case, the
>>> value will be encoded using an hexstring (#ABCD...).
>>>
>>> Now, if we don't have a full schema available, we can still manage to
>>> determinate if the AVA contains a binary or a String value, as soon as
>>> its
>>> AT is declared as binary or HR in the default schema.
>>>
>>> FYI, the current default SchemaManager contains a Map of non HR
>>> attributes
>>> (we have added all the known binary attributes from the RFC, and it's
>>> extensible). This was mandatory in Studio to fix some bad issues we had
>>> with certificates last weeks.
>>>
>>>
>>>  If not already the case, we should make it so users can add to this
>> their
>> own user defined binary attributes programmatically for the sake of the
>> client API.
>>
>> SNIP ...
>>
>> Is it worth removing the white space variance which we can do with or
>>
>>> without a schema? You don't need schema to do this right? I'm thinking it
>>>> may under certain situations prevent some problems due to case variance
>>>> on
>>>> clients not loading a schema.
>>>>
>>>>  This is an extremely complex problem. Inside the server, String values
>>> must go through a PrepareString process which includes the handling of
>>> unisgnificant spaces (see RFC 4518, Appendix B and par. 2.6.1), and the
>>> normalization of CN will remove all the leading, duplicate and trailing
>>> spaces. How we can keep some spaces the user wants to keep is a problem.
>>>
>>>
>>>  OK no worries. BTW I was not thinking of this on the server side. I was
>> thinking of using this on the client side in the absence of schema. It
>> would be a watered down version of the prep string function.
>>
>
> On the client side, we use whatever SchemaManager the user wants to use :
> - none
> - a default schema : it will have a list of all the binary AT, this list
> is configurable
> - a schema loaded from a LDIF file
> - a schema loaded from a remote server.
>
> We have all our bases covered.
>
>
Lovely!

-- 
Best Regards,
-- Alex


Re: Dn, Rdn and Ava inconstancies

2012-02-20 Thread Alex Karasulu
On Tue, Feb 21, 2012 at 12:49 AM, Emmanuel Lécharny wrote:

> Le 2/20/12 11:20 PM, Alex Karasulu a écrit :
>
>
>>  So, the main issue is the way AVA handles values. As soon as we *know*
>>> what we should expect when we create an AVA, then suddenly it becomes way
>>> easier. Basically, an AVA contains one type and one value. This value can
>>> be a String or a byte[], depending on the type. Saddly, if the AVA is not
>>> schema aware, we can't tell if the value is binary or String.
>>>
>>>
>>>  Sounds like it needs not to be schema aware but binary attribute aware:
>> a
>> subset of the schema. This is the first level of correctness.
>>
> That's not enough. We need to normalize the valus inside the server, and
> that means we have full access to the schema.
>
>
I meant NOT in the server but in the client. This is the minimum
requirement for the client.


>
>> The next level depends on whether or not we have the full schema available
>> to properly normalize the value.
>>
> yep, exactly. Just having the AVA being aware of the type of the value is
> not enough.
>
>
This is the minimum we need on the server side.

 SNIP ...

Basically, we will have two forms for an AVA :
>>> - a User Provided form (the standard form)
>>> - a Normalized form which will differ depending on the fact that the AVA
>>> is schema aware, or not.
>>>
>>>
>>>  Note that whether schema aware or UN-aware AVAs will still need to be
>> binary type aware.
>>
> Well, not necessarily. When you inject binary values into an AVA, it's
> generally done through the parsing of a DN or an RDN. In this case, the
> value will be encoded using an hexstring (#ABCD...).
>
> Now, if we don't have a full schema available, we can still manage to
> determinate if the AVA contains a binary or a String value, as soon as its
> AT is declared as binary or HR in the default schema.
>
> FYI, the current default SchemaManager contains a Map of non HR attributes
> (we have added all the known binary attributes from the RFC, and it's
> extensible). This was mandatory in Studio to fix some bad issues we had
> with certificates last weeks.
>
>
If not already the case, we should make it so users can add to this their
own user defined binary attributes programmatically for the sake of the
client API.

SNIP ...

Is it worth removing the white space variance which we can do with or
>> without a schema? You don't need schema to do this right? I'm thinking it
>> may under certain situations prevent some problems due to case variance on
>> clients not loading a schema.
>>
> This is an extremely complex problem. Inside the server, String values
> must go through a PrepareString process which includes the handling of
> unisgnificant spaces (see RFC 4518, Appendix B and par. 2.6.1), and the
> normalization of CN will remove all the leading, duplicate and trailing
> spaces. How we can keep some spaces the user wants to keep is a problem.
>
>
OK no worries. BTW I was not thinking of this on the server side. I was
thinking of using this on the client side in the absence of schema. It
would be a watered down version of the prep string function.

-- 
Best Regards,
-- Alex


Re: Dn, Rdn and Ava inconstancies

2012-02-20 Thread Alex Karasulu
On Mon, Feb 20, 2012 at 11:59 PM, Emmanuel Lécharny wrote:

> Le 2/20/12 7:36 PM, Alex Karasulu a écrit :
>
>  On Mon, Feb 20, 2012 at 6:33 PM, Emmanuel Lécharny**
>> wrote:
>>
>>  Hi guys,
>>>
>>> those last days, we had to fight with some issues with the way we handle
>>> DNs and its components :
>>> - creating entries with a RDN containing two times the same AT is not
>>> allowed by the spec
>>> - searching for an entry which RDN is cn=Doe+gn=John does not work when
>>> searching for gn=John+cn=Doe
>>> - renaming an entry on itself when we want to upercase a RDN is not
>>> possible when it should.
>>>
>>> Digging a bit into the code, I found that many cases weren't handled
>>> correctly, and the the API is not consistant. We also have issues with
>>> escaped characters.
>>>
>>> For instance, if we consider the Ava class, there are some methods that
>>> need to be renamed :
>>> o getUpName() should be renamed to getName() as Dn.getName() and
>>> Rdn.getName() are used
>>> o getUpType() should be renamed to getType() to be consistant with the
>>> previous rename
>>> o getUpValue() should also be renamed to getValue() for the very same
>>> reason.
>>>
>>> Now, when it comes to what the methods produce, here is a table showing
>>> the expected values :
>>>
>>> If the AVA is not schema aware :
>>>
>>>getNormName()"ou=exemple \+ rdn\C3\A4\ "
>>>getNormType()“ou”
>>>getNormValue()"exemple + rdnä "
>>>getUpName()"OU = Exemple \+ Rdn\C3\A4\ "
>>>getUpType()“OU“
>>>getUpValue()"Exemple + Rdnä "
>>>normalize()"ou=exemple \+ rdn\C3\A4\ "
>>>toString()"OU = Exemple \+ Rdn\C3\A4\ "
>>>
>>> and if the AVA is schema aware :
>>>
>>>getNormName() “2.5.4.11=example \+ rdn\C3\A4\ ”
>>>getNormType() “2.4.5.11”
>>>getNormValue() “exemple + rdnä ”
>>>getUpName() "OU = Exemple \+ Rdn\C3\A4\ "
>>>getUpType() “OU“
>>>getUpValue() "Exemple + Rdnä "
>>>normalize() “2.5.4.11=example \+ rdn\C3\A4\ ”
>>>toString() "OU = Exemple \+ Rdn\C3\A4\ "
>>>
>>> Currently, this is not what we get :
>>>
>>>Ava.getNormName() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
>>>Ava.getUpValue() returns 'Exemple \+ Rdn\C3\A4\ '
>>>Ava.normalize() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
>>>
>>> The normalize() method seems useless.
>>>
>>>
>>> For RDN, we also have some method renaming to anticipate :
>>> o getUpType() should be renamed getType()
>>> o getUpValue() should be renamed getValue()
>>> o getValue(String) should be removed, we can grab the value using the
>>> getAva( String ) instead
>>>
>>> Same, for the expected values and the values we get :
>>>
>>>getName()"OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>>getNormName()"ou=exemple \+ rdn\C3\A4\ "
>>>getNormType()"ou"
>>>getNormValue()"exemple + rdnä "
>>>getUpType()“OU“
>>>getUpValue()"Exemple \+ Rdn\C3\A4\ "
>>>getValue(String)"Exemple \+ Rdn\C3\A4\ " and “TEST”
>>>toString()"OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>>
>>> and if the RDN is schema aware :
>>>
>>>getName()"OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>>getNormName()“2.5.4.11=example \+ rdn\C3\A4\ ”
>>>getNormType()"2.5.4.3"
>>>getNormValue()“exemple + rdnä “
>>>getUpType()“OU“
>>>getUpValue()"Exemple \+ Rdn\C3\A4\ “
>>>getValue(String)"Exemple \+ Rdn\C3\A4\ " and “TEST”
>>>toString()"OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>>>
>>> This is what we get :
>>>
>>> Rdn.getNormName() returns 'ou=Exemple \+ Rdnä\ +cn=TEST'
>>> Rdn.getNormValue() returns 'Exemple + Rdnä '
>>> Rdn.getUpValue() returns ' Exemple \+ Rdn\C3\A4\ '
>>> Rdn.getValue( 'ou' ) returns 'Exemple + Rdnä '
>>> Rdn.getValue( 'test' ) returns ''
>>>
>>> Etc...
>>>
>>>

Re: Dn, Rdn and Ava inconstancies

2012-02-20 Thread Alex Karasulu
On Mon, Feb 20, 2012 at 6:33 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> those last days, we had to fight with some issues with the way we handle
> DNs and its components :
> - creating entries with a RDN containing two times the same AT is not
> allowed by the spec
> - searching for an entry which RDN is cn=Doe+gn=John does not work when
> searching for gn=John+cn=Doe
> - renaming an entry on itself when we want to upercase a RDN is not
> possible when it should.
>
> Digging a bit into the code, I found that many cases weren't handled
> correctly, and the the API is not consistant. We also have issues with
> escaped characters.
>
> For instance, if we consider the Ava class, there are some methods that
> need to be renamed :
> o getUpName() should be renamed to getName() as Dn.getName() and
> Rdn.getName() are used
> o getUpType() should be renamed to getType() to be consistant with the
> previous rename
> o getUpValue() should also be renamed to getValue() for the very same
> reason.
>
> Now, when it comes to what the methods produce, here is a table showing
> the expected values :
>
> If the AVA is not schema aware :
>
>getNormName()"ou=exemple \+ rdn\C3\A4\ "
>getNormType()“ou”
>getNormValue()"exemple + rdnä "
>getUpName()"OU = Exemple \+ Rdn\C3\A4\ "
>getUpType()“OU“
>getUpValue()"Exemple + Rdnä "
>normalize()"ou=exemple \+ rdn\C3\A4\ "
>toString()"OU = Exemple \+ Rdn\C3\A4\ "
>
> and if the AVA is schema aware :
>
>getNormName() “2.5.4.11=example \+ rdn\C3\A4\ ”
>getNormType() “2.4.5.11”
>getNormValue() “exemple + rdnä ”
>getUpName() "OU = Exemple \+ Rdn\C3\A4\ "
>getUpType() “OU“
>getUpValue() "Exemple + Rdnä "
>normalize() “2.5.4.11=example \+ rdn\C3\A4\ ”
>toString() "OU = Exemple \+ Rdn\C3\A4\ "
>
> Currently, this is not what we get :
>
>Ava.getNormName() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
>Ava.getUpValue() returns 'Exemple \+ Rdn\C3\A4\ '
>Ava.normalize() returns 'ou=Exemple \\\+ Rdn\\C3\\A4\\\ '
>
> The normalize() method seems useless.
>
>
> For RDN, we also have some method renaming to anticipate :
> o getUpType() should be renamed getType()
> o getUpValue() should be renamed getValue()
> o getValue(String) should be removed, we can grab the value using the
> getAva( String ) instead
>
> Same, for the expected values and the values we get :
>
>getName()"OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>getNormName()"ou=exemple \+ rdn\C3\A4\ "
>getNormType()"ou"
>getNormValue()"exemple + rdnä "
>getUpType()“OU“
>getUpValue()"Exemple \+ Rdn\C3\A4\ "
>getValue(String)"Exemple \+ Rdn\C3\A4\ " and “TEST”
>toString()"OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>
> and if the RDN is schema aware :
>
>getName()"OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>getNormName()“2.5.4.11=example \+ rdn\C3\A4\ ”
>getNormType()"2.5.4.3"
>getNormValue()“exemple + rdnä “
>getUpType()“OU“
>getUpValue()"Exemple \+ Rdn\C3\A4\ “
>getValue(String)"Exemple \+ Rdn\C3\A4\ " and “TEST”
>toString()"OU = Exemple \+ Rdn\C3\A4\ +cn=  TEST"
>
> This is what we get :
>
> Rdn.getNormName() returns 'ou=Exemple \+ Rdnä\ +cn=TEST'
> Rdn.getNormValue() returns 'Exemple + Rdnä '
> Rdn.getUpValue() returns ' Exemple \+ Rdn\C3\A4\ '
> Rdn.getValue( 'ou' ) returns 'Exemple + Rdnä '
> Rdn.getValue( 'test' ) returns ''
>
> Etc...
>
> I have not yet coded the tests for the schema aware AVA and RDN, but be
> sure we will get more inconsistencies. I still have to write down the same
> analysis for Dn, but this is the same story.
>
>
> We really need to fix those inconsistencies otherwise we will have endless
> issues. This is not the first time we are dealing with them, bt so far, we
> never had to face them for real, and we just tried our best to shoot the
> errors when they appear. I think it's time to play medieval on the code !
>

This makes a lot of sense. As things matured in the project we started
seeing more and more of the corner cases that we need to account for.  As
you say, we did this incrementally as we encountered various situations.

Over time this strains how this area of the library was designed. Naturally
you cannot account for everything and over time various choices see
obsolete as you patch and patch and patch the code.

Now after seeing so much of the corner cases and how the design may not be
supporting it cleanly, and efficiently, then sure re-architect it know that
we have the tests, history and the knowledge.

--
My 2 cents,
-- Alex


Re: We may create a new sub-project for JDBM

2012-02-20 Thread Alex Karasulu
On Mon, Feb 20, 2012 at 4:24 PM, Emmanuel Lécharny wrote:

> Le 2/20/12 3:13 PM, Pierre-Arnaud Marcelot a écrit :
>
>  Hi Emmanuel,
>>
>> It can make sense.
>>
>> Do you want to move it up one level aside 'shared', 'apacheds', 'studio',
>> etc. ?
>>
> Yes. To me, it should be a completly separated project.


I wish jdbm had a live community in db.apache.org :/.

Anyhooo, do we want to this to be a TLP sub project on par release wise
with shared, studio and apacheds? If so then let's launch a VOTE thread to
make it officially a separate sub project of Directory.

-- 
Best Regards,
-- Alex


Re: When was the last time we tried RAP for Studio?

2012-02-16 Thread Alex Karasulu
On Thu, Feb 16, 2012 at 10:20 AM, Pierre-Arnaud Marcelot 
wrote:

> On 16 févr. 2012, at 09:13, Alex Karasulu wrote:
>
> Hi Pierre,
>
> On Thu, Feb 16, 2012 at 10:01 AM, Pierre-Arnaud Marcelot 
> wrote:
>
>> Hi Alex,
>>
>> The last time I tried it was in August 2010.
>>
>> By stripping down a few features here and there, I was able to run the
>> LDAP Browser as a web app.
>>
>> AFAIR, the features I removed were mostly very high level features like
>> code proposals, field completion, etc. All these things were not available
>> in the RAP runtime at the time.
>>
>> Here are a few screenshots from the last experiment:
>> http://people.apache.org/~pamarcelot/Studio_in_RAP/
>>
>>
> No matter how many times I see this it's like the first time. You know
> when you see a jet plane pass close overhead, no matter how many times you
> see it you're like WOW!
>
>
> Yeah, the work accomplished by the RAP team is pretty slick!
>
> We could probably build a smaller version of Studio quite easily in matter
>> of a few days (without the features that are not available in RAP).
>> The difficulty would be find a way to have both versions (RCP and RAP) to
>> rely on a same shared code base, without having a different branch for each
>> version (which would force us to port any change on a branch to the other).
>> This would require an initial heavy refactoring.
>>
>>
> I guess there's no way to conditionally enable/disable these features
> based on the target environment we intend to deploy to?
>
>
> Unfortunately no…
>
> This would cause some compilation (and then execution) issues because some
> classes/fields/methods simply doesn't exist in the RAP version of the
> runtime.
> Studio wouldn't even compile or start in the app server… :(
>

Oh well - guess we're going to have to wait a little longer. Thanks for the
feedback!

-- 
Best Regards,
-- Alex


Re: When was the last time we tried RAP for Studio?

2012-02-16 Thread Alex Karasulu
Hi Pierre,

On Thu, Feb 16, 2012 at 10:01 AM, Pierre-Arnaud Marcelot 
wrote:

> Hi Alex,
>
> The last time I tried it was in August 2010.
>
> By stripping down a few features here and there, I was able to run the
> LDAP Browser as a web app.
>
> AFAIR, the features I removed were mostly very high level features like
> code proposals, field completion, etc. All these things were not available
> in the RAP runtime at the time.
>
> Here are a few screenshots from the last experiment:
> http://people.apache.org/~pamarcelot/Studio_in_RAP/
>
>
No matter how many times I see this it's like the first time. You know when
you see a jet plane pass close overhead, no matter how many times you see
it you're like WOW!


> We could probably build a smaller version of Studio quite easily in matter
> of a few days (without the features that are not available in RAP).
> The difficulty would be find a way to have both versions (RCP and RAP) to
> rely on a same shared code base, without having a different branch for each
> version (which would force us to port any change on a branch to the other).
> This would require an initial heavy refactoring.
>
>
I guess there's no way to conditionally enable/disable these features based
on the target environment we intend to deploy to?

-- 
Best Regards,
-- Alex


When was the last time we tried RAP for Studio?

2012-02-15 Thread Alex Karasulu
Hi all,

Just wondering, we had experimented in the past with RAP, have we tried
again recently to see if the conversion to a web application works better?

-- 
Best Regards,
-- Alex


Re: Renaming an entry with a case insensitive RDN : how to handle it ?

2012-02-15 Thread Alex Karasulu
On Wed, Feb 15, 2012 at 3:26 PM, Emmanuel Lécharny wrote:

> Hi guys,
>
> let's suppose we have an entry like :
>
> dn: cn=john doe, ou=system
> objectclass: person
> cn: john doe
> sn: john doe
>
> Let's now suppose that we want to camel-case the cn to have an entry like :
>
> dn: cn=John Doe, ou=system
> objectclass: person
> cn: John Doe
> sn: john doe
>
> Currently ADS does not support such a modification : it considers that
> it's a modifcation of an entry on itself, and it's not allowed. (cn is case
> insensitive, so basically, it's really a modification on itself).
>
>
You saying this is two ops (modify + modifyDn) that is not allowable in one
shot?


> Now, from the user PoV, this is a bit painful, because even if cn is case
> insensitive, the user wants to see the DN as he provided it (after the
> rename, he may expect dn: cn=John Doe, ou=system).
>
> So
>
> Q1 : should we allow such a rename ? (it will modify the RDN *and* the
> attribute)
>
>
So OK I see now, you just did a moddn on the entry changing the DN to use
an camel humped cn value as the last name component when it was all lower
case.

Let's take the basis case to understand this a little better. Suppose we
did the moddn and it was a totally different CN such as 'foo bar' in your
example. In this case what does the protocol state? I think in this case
the cn: foo bar attribute value pair is automatically added to the entry
right?

Going back into our problem. If the difference in the new name is a case
change on a case insensitive name component attributeType then we should
preserve the case supplied by the user. In this case I would suggest
replacing the cn=john doe with cn=John Doe.


> Q2 : if we modify the cn only, should the RDN be modified too ?
> (currently, ADS does modify the CN, but not the RDN)
>
> wdyt ?
>
>
See above.

-- 
Best Regards,
-- Alex


Re: Syncrepl Master-Master Replication: ApacheDs 2.0.0-M5

2012-02-15 Thread Alex Karasulu
ROTFL

On Wed, Feb 15, 2012 at 3:16 AM, Emmanuel Lécharny wrote:

> Le 2/14/12 8:15 PM, Arul Madavadiyan a écrit :
>
>  Hi
>>
>> I am trying to use ApacheDs 2.0.0-M5. I am planning to have Syncrepl
>> multi-master replication using ApacheDs. I wonder if ApacheDS supports
>> multi master replication.
>>
>> If so, do you have any documentation about it or an example to prove it
>> out. Please point me there. It looks like the documentation is hard to
>> find. I would really appreciate your help.
>>
>> Thanks
>> Arul
>>
>> This message and the information contained herein is proprietary and
>> confidential and subject to the Amdocs policy statement,
>> you may review at 
>> http://www.amdocs.com/email_**disclaimer.asp
>>
>>  Are you already trying to replicate your mail to all the possible
> mailing lists ???
>
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


Re: svn commit: r1241334 [1/2] - in /directory/shared/trunk/ldap: client/api/src/main/java/org/apache/directory/ldap/client/api/ codec/core/src/main/java/org/apache/directory/shared/ldap/codec/api/ mo

2012-02-07 Thread Alex Karasulu
On Tue, Feb 7, 2012 at 4:51 AM,  wrote:

> Author: elecharny
>
> o Modified the X-HOMAN-READABLE extention to X-NOT-HUMAN-READABLE, to use
> the same extension than OpenLDAP (this will increase the compatibility
>

Nice move!

-- 
Best Regards,
-- Alex


Re: Binary attributes handling in the API

2012-02-06 Thread Alex Karasulu
On Mon, Feb 6, 2012 at 5:58 PM, Emmanuel Lécharny wrote:

> On 2/6/12 4:11 PM, Pierre-Arnaud Marcelot wrote:
>
>> I believe we need this list only in the case where no schema is loaded on
>> the connection, right?
>>
>
> Sadly, no. For servers that don't expose the information about the non
> human readable Syntax, we have to provide it. We use a X-IS-HUMAN-READABLE
> information in out LdapSyntax elements in ApacheDS, other servers don't.
>
>
>> Or, do you want to also use this list in addition to an already loaded
>> schema?
>>
>
> Yes, if needed (ie, if we can't build it from the schemas we load)
>
>
Cant we just keep this information in a constants file and make sure the
constants are accessible to the API code? Users should be allowed to
override and add more settings at runtime so more attributes can be tweaked?

-- 
Best Regards,
-- Alex


Re: Renaming the NetworkSchemaLoader

2012-02-05 Thread Alex Karasulu
On Sun, Feb 5, 2012 at 8:35 PM, Emmanuel Lecharny wrote:

> Hi,
>
> i'd like to rename the NetworkSchemaLoader to something like
> AdsSchemaLoader, as it's really dedicated to our own server (it reads the
> LDIF files from the schema partition in ou=schema).
>
> Now that the SsseSchemaLoader is working, we can access the schema from
> ADS in two ways :
> - reading it from cn=schema (SsseSchemaLoader)
> - reading it from ou=schema (AdsSchemaLoader)
>
>
Horrible descriptor name: can't we do better than these? I have no idea
what an SsseSchemaLoader is and I'm intimately familiar LDAP.


> For all the other servers, the way to go is to use the SsseSchemaLoader.
>
> It may also be a good idea to rename the SsseSchemaLoader to something
> more user friendly, like ServerSchemaLoader, to reflect what it does :
> loading the schemas from a remote server.
>
>
Remote to me means over the wire, a.k.a. over the network. So I'd use
AdsNetworkSchemaLoader instead of just AdsSL because it's more descriptive.
This Ssse thing has my head spinning.



> Wdyt ?
>
> Note : We have now 6 implementations of the SchemaLoader interface :
> - JarLdifSchemaLoader, loading the schemas from a jar containing our (ADS)
> schemas
> - LdifSchemaLoader, loading the schemas from a hierarchy of LDIF files
> (still in ADS format)
> - SingleLdifSchemaLoader, loading the schema from one big ldif file (ADS
> format)
> - SchemaEditorSchemaLoader, loading the schemas from files in XML or
> OpenLDAP format (used by Studio)
> - SsseSchemaLoader, loading the schema from a connected LDAP server, using
> the rootDSE subschemaSubentry attribute as a starting point
>

Oooh but how do we say this without saying this by picking a nice
name for this SchemaLoader? Maybe not so easy. Maybe ...
StandardNetworkSchemaLoader, or DefaultNetworkSchemaLoader ... idea is
using the standard LDAP mechanism by looking up the subschema subentry.



> - NetworkSchemaLoader, loading the schemas from ou=config in ADS.
>
>
I'd call this AdsNetworkSchemaLoader.

-- 
Best Regards,
-- Alex


Re: svn commit: r1239907 - in /directory/shared/trunk/ldap/model/src: main/java/org/apache/directory/shared/ldap/model/name/Rdn.java test/java/org/apache/directory/shared/ldap/model/name/RdnTest.java

2012-02-04 Thread Alex Karasulu
On Sat, Feb 4, 2012 at 10:13 AM, Emmanuel Lecharny wrote:

> On 2/3/12 11:09 PM, Alex Karasulu wrote:
>
>> On Fri, Feb 3, 2012 at 12:59 AM,  wrote:
>>
>>  Author: elecharny
>>> Date: Thu Feb  2 22:59:08 2012
>>> New Revision: 1239907
>>>
>>> URL: 
>>> http://svn.apache.org/viewvc?**rev=1239907&view=rev<http://svn.apache.org/viewvc?rev=1239907&view=rev>
>>> Log:
>>> Fix DIRAPI-76 : new Rdn( "A=a,B=b" ) now throws an LdapInvalidDnException
>>>
>>>
>>>  Should the exception not be ... LdapInvalidNameComponent (we can create
>> one
>> if it does not exist).
>>
>> Reason I say this is that the whole issue with the non-intuitive
>> constructor was that the API user was thinking the argument can be a
>> multi-component relative distinguished name or a DN.
>> LdapInvalidDnException
>> might not fit here and it might make the user think they have to use a DN
>> rather than a single name component.
>>
>> WDYT?
>>
>>  Rahhh... Not such an easy move. In many many places, we are expecting a
> LdapInvalidDnException. Rdn is considered as a Dn with one single Rdn in
> most of the code.
>
> Question : would it worth the effort to change every part of the code when
> we can simply improve the message contained in the exception ?
>
>
Never thought it would be this bloody hard. Leave it as is then and just
improve the message contained in the exception.  This is my 2 cents.


-- 
Best Regards,
-- Alex


Re: svn commit: r1239586 - in /directory/apacheds/trunk: all/pom.xml jdbm-partition/pom.xml protocol-ldap/pom.xml

2012-02-03 Thread Alex Karasulu
On Sat, Feb 4, 2012 at 2:44 AM, Emmanuel Lecharny wrote:

> On 2/3/12 10:51 PM, Alex Karasulu wrote:
>
>> This is a really bold move here Emmanuel. The txn branch is not even alpha
>> and a serious change that will effect the server. I thought this was
>> something we would slowly start to transition into the main branch of
>> development.
>>
>> I don't know if it should require a vote but maybe we should talk about
>> this a little bit no?
>>
>> Point to the modified version of JDBM.
>>
>
> Sure. let me explain why I did that move, and why it's not critical.
>
> Having a MVCC backend could allow us to solve the problem we have with
> concurrent modifications and searches. We don't necessarily need to have
> the full in-memory MVCC Selcuk is working on in its branch in order to
> benefit from part of what he already have done : if we protect the
> modifications in the jdbm-partition against concurrent access to the
> backend, then searches and modifications could probably safely be executed
> concurrently.
>
> I need to test this part, and I don't want to do that in a branch, because
> it's too much a pain to merge it back while we are fixing many other issues
> in the server.
>
> Hopefully, this move is just impacting three poms and reverting back to
> jdbm is just a matter to point back to the previous version : just a breeze.
>
> I should have told the list about this change before doing it, my bad.
> Sadly, I made a mistake and had to commit the modifications in the poms
> because I broke the trunk this morning with a partial commit. This is why
> we now point to jdbm2. This can easily be fixed, and we can safely revert
> to jdbm.
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>
Thanks for clarifying and giving a thorough explanation. I was just
surprised to see the move and wanted your thoughts.

-- 
Best Regards,
-- Alex


Re: [ApacheDS] [MVCC] Reason for adding jdbm2 to trunk?

2012-02-03 Thread Alex Karasulu
On Sat, Feb 4, 2012 at 2:37 AM, Emmanuel Lecharny wrote:

> On 2/3/12 10:48 PM, Alex Karasulu wrote:
>
>> Hi Emmanuel,
>>
>> Just curious why you decided to add the stuff in the TXN branch to the
>> trunk.
>>
>>  Just wanted to check if we can benefit from the work done on this part
> in trunk. At least, trunk is still building with no error. Plus jdbm is
> still around, we can switch back if we want.
>
>
No worries you clarified all for me in your other email thread.


-- 
Thanx,
-- Alex


Re: svn commit: r1239907 - in /directory/shared/trunk/ldap/model/src: main/java/org/apache/directory/shared/ldap/model/name/Rdn.java test/java/org/apache/directory/shared/ldap/model/name/RdnTest.java

2012-02-03 Thread Alex Karasulu
On Sat, Feb 4, 2012 at 2:47 AM, Emmanuel Lecharny wrote:

> On 2/3/12 11:09 PM, Alex Karasulu wrote:
>
>> On Fri, Feb 3, 2012 at 12:59 AM,  wrote:
>>
>>  Author: elecharny
>>> Date: Thu Feb  2 22:59:08 2012
>>> New Revision: 1239907
>>>
>>> URL: 
>>> http://svn.apache.org/viewvc?**rev=1239907&view=rev<http://svn.apache.org/viewvc?rev=1239907&view=rev>
>>> Log:
>>> Fix DIRAPI-76 : new Rdn( "A=a,B=b" ) now throws an LdapInvalidDnException
>>>
>>>
>>>  Should the exception not be ... LdapInvalidNameComponent (we can create
>> one
>> if it does not exist).
>>
> Or LdapInvalidRdnException. Yes.
>
>
Sounds good too.


>
>> Reason I say this is that the whole issue with the non-intuitive
>> constructor was that the API user was thinking the argument can be a
>> multi-component relative distinguished name or a DN.
>> LdapInvalidDnException
>> might not fit here and it might make the user think they have to use a DN
>> rather than a single name component.
>>
>> WDYT?
>>
> I totally agree. The LdapInvalidDnException was picked to have a quick fix
> for this issue. I was overloaded with many other issues related to the
> change made in the Rdn constructor fix :
> - DSML parser was not anymore working (a bug in the DSML xml files)
> - some question raised about the ParentIdAnRdn to be double checked (do we
> support a multiple AVA in a NamingContext, or not)
>
>
Totally understandable. I just posted this just in case it was not noticed.

-- 
Best Regards,
-- Alex


Re: Some thoughts about the SchemaObjects

2012-02-03 Thread Alex Karasulu
On Thu, Feb 2, 2012 at 10:10 PM, Emmanuel Lecharny wrote:

> For the former issues, which has been raised when we started to try to
> extend the API to allow a user to add new Schema elements locally, we think
> that we must modify the current data structure for schema objects.
>
> Here are a few brain dump and some examples :
>
> First, the SchemaManager will only manage immutable SchemaObjects (ie,
> here, AttributeType), as there is no reason to allow someone to pull a
> SchemaObject from the SchemaManager and to modify it on the fly. That would
> be destructive for the sc hemaManager user, as it may impact other users.
>
>
Right you don't want to mess with the in memory structure (graph of schema
objects) that is managed by the schema manager directly.


> Now, for Studio, being able to pull an AT from teh SM, modify this AT and
> inject it back to the SM is useful.
>

Yes for the schema editor we've discussed this a couple times.


>
> We then discussed about Mutable and Immutable schema objects, and how they
> can help us solving this issue.
>
> If a user want to modify an existing SchemaObject pulled from the
> SchemaManager must first make it mutable :
>
> AttributeType attributeType = schemaManager.**getAttributeType(
> "2.5.4.11" );
> MutableAttributeType mat = new MutableAttributeType( attributeType );
>
> In this case, the resulting instance is a copy of the initial immutable
> object.
>
>
Will the mutable will track the differences (the deltas) in the mutable
from the original schema object being wrapped?


> In order to be able to implement such a proposal, the following hierarchy
> could be enough :
>
>
>   (SchemaObject)<---**(MutableSchemaObject)
> o ^
> | |
> {AbstractSchemaObject} |
> ^ |
> | |
>   [AttributeType]<--**[MutableAttributeType]
>
>
>
> where (III) are interfaces, {AAA} are abstract classes and [CCC] are
> normal classes.
>
> The base implementation is :
>
> o (SchemaObject) expose all the SO getters.
> o (MutableSchemaObject) interface expose the SO setters.
> o {AbstractSchemaObject} implements the the SO getters
> o [AttributeType] implements the AttributeType getters
> o [MutableAttributeType] implements the AttributeType setters
>
>
With you here.


> (see an exemple at the end of this mail)
>
> With those classes and interface, it's possible to hide the setters for a
> user manipulating an AT he got from the SchemaManager, but this user has
> the possibility to modify this AT by wrapping it into a new MutableAT.
>
> In order to create new SchemaObject, a user can :
>
> 1) create a MutableSchemaObject, and get its immutable copy :
>
> MutableAttributeType mutableAT = new MutableAttributeType();
> mutableAT.setXXX( yyy );
> ...
> AttributeType attributeType = new AttributeType( mutableAT );
>
> 2) create a new AttributeType using the RFC notation :
>
> AttributeType attributeType = new AttributeType( "( 2.5.4.58 NAME 
> '**attributeCertificateAttribute'
> DESC 'attribute certificate use ;binary' SYNTAX
> 1.3.6.1.4.1.1466.115.121.1.8 )" );
>
> In any case, everything stored in the SchemaManager must be immutable.
>
>
>
SNIP


> Thoughts ?
>
>
I would like to share a view I have in my head about all the in memory
schema data structures we have. Just a quick review as some points/facts
first:

(1) We have schema objects that directly reference other schema objects
resulting a graph of schema objects.

(2) The designed model of schema objects let's the containment hierarchy
naturally walk the graph. Like for example looking at the MAY list of an
ObjectClass will reference actual AttributeType objects in the graph
connected to the ObjectClass. Further walking the AT object to see it's
Syntax and MatchingRules does the same.

(3) Registry objects serve as map structures for rapidly indexing into
pools of schema objects by type based on alias names and their OID.

NOTE:
Contained objects like a Syntax referenced by an AttributeType should not
be directly referenced. Instead the Syntax's OID should be kept in the
AttributeType and an accessor like getSyntax() should use a lookup via the
Syntax Registry. This is important from both an OSGi standpoint and to
easily implement a change mechanism to this grand data structure atomic,
consistent and isolated.



I like the Mutator wrappers introduced in this mail thread above: and I
think it's key to implement a proper change algorithm. I also like the idea
of them serving to just store deltas and track changes from the original
immutable objects that they directly reference. This probably will make the
schema editor code a lot easier to implement.

I see a set of mutators being collected/tracked as a group, then applied in
an atomic batch to the main data structure after a validation test to
determine if

Re: svn commit: r1239907 - in /directory/shared/trunk/ldap/model/src: main/java/org/apache/directory/shared/ldap/model/name/Rdn.java test/java/org/apache/directory/shared/ldap/model/name/RdnTest.java

2012-02-03 Thread Alex Karasulu
On Fri, Feb 3, 2012 at 12:59 AM,  wrote:

> Author: elecharny
> Date: Thu Feb  2 22:59:08 2012
> New Revision: 1239907
>
> URL: http://svn.apache.org/viewvc?rev=1239907&view=rev
> Log:
> Fix DIRAPI-76 : new Rdn( "A=a,B=b" ) now throws an LdapInvalidDnException
>
>
Should the exception not be ... LdapInvalidNameComponent (we can create one
if it does not exist).

Reason I say this is that the whole issue with the non-intuitive
constructor was that the API user was thinking the argument can be a
multi-component relative distinguished name or a DN. LdapInvalidDnException
might not fit here and it might make the user think they have to use a DN
rather than a single name component.

WDYT?

-- 
Best Regards,
-- Alex


Re: svn commit: r1239586 - in /directory/apacheds/trunk: all/pom.xml jdbm-partition/pom.xml protocol-ldap/pom.xml

2012-02-03 Thread Alex Karasulu
This is a really bold move here Emmanuel. The txn branch is not even alpha
and a serious change that will effect the server. I thought this was
something we would slowly start to transition into the main branch of
development.

I don't know if it should require a vote but maybe we should talk about
this a little bit no?

Point to the modified version of JDBM.
>
> Modified: directory/apacheds/trunk/all/pom.xml
> URL:
> http://svn.apache.org/viewvc/directory/apacheds/trunk/all/pom.xml?rev=1239586&r1=1239585&r2=1239586&view=diff
>
> ==
> --- directory/apacheds/trunk/all/pom.xml (original)
> +++ directory/apacheds/trunk/all/pom.xml Thu Feb  2 12:44:01 2012
> @@ -73,7 +73,7 @@
>
> 
>   ${project.groupId}
> -  apacheds-jdbm
> +  apacheds-jdbm2
> 
>
>

-- 
Best Regards,
-- Alex


[ApacheDS] [MVCC] Reason for adding jdbm2 to trunk?

2012-02-03 Thread Alex Karasulu
Hi Emmanuel,

Just curious why you decided to add the stuff in the TXN branch to the
trunk.

-- 
Best Regards,
-- Alex


Re: Some thoughts about the SchemaObjects

2012-02-03 Thread Alex Karasulu
OK I will respond to this ... just haivng access and time issues for past 2
days ... not ignored.

Cheers,
Alex

On Thu, Feb 2, 2012 at 10:10 PM, Emmanuel Lecharny wrote:

> Hi guys,
>
> today, we had a long discussion with Pierre-Arnaud about AT, schema and
> pathetic state of the planet. The last problem, we can't solve...
>
> For the former issues, which has been raised when we started to try to
> extend the API to allow a user to add new Schema elements locally, we think
> that we must modify the current data structure for schema objects.
>
> Here are a few brain dump and some examples :
>
> First, the SchemaManager will only manage immutable SchemaObjects (ie,
> here, AttributeType), as there is no reason to allow someone to pull a
> SchemaObject from the SchemaManager and to modify it on the fly. That would
> be destructive for the sc hemaManager user, as it may impact other users.
>
> Now, for Studio, being able to pull an AT from teh SM, modify this AT and
> inject it back to the SM is useful.
>
> We then discussed about Mutable and Immutable schema objects, and how they
> can help us solving this issue.
>
> If a user want to modify an existing SchemaObject pulled from the
> SchemaManager must first make it mutable :
>
> AttributeType attributeType = schemaManager.**getAttributeType(
> "2.5.4.11" );
> MutableAttributeType mat = new MutableAttributeType( attributeType );
>
> In this case, the resulting instance is a copy of the initial immutable
> object.
>
> In order to be able to implement such a proposal, the following hierarchy
> could be enough :
>
>
>   (SchemaObject)<---**(MutableSchemaObject)
> o ^
> | |
> {AbstractSchemaObject} |
> ^ |
> | |
>   [AttributeType]<--**[MutableAttributeType]
>
>
>
> where (III) are interfaces, {AAA} are abstract classes and [CCC] are
> normal classes.
>
> The base implementation is :
>
> o (SchemaObject) expose all the SO getters.
> o (MutableSchemaObject) interface expose the SO setters.
> o {AbstractSchemaObject} implements the the SO getters
> o [AttributeType] implements the AttributeType getters
> o [MutableAttributeType] implements the AttributeType setters
>
> (see an exemple at the end of this mail)
>
> With those classes and interface, it's possible to hide the setters for a
> user manipulating an AT he got from the SchemaManager, but this user has
> the possibility to modify this AT by wrapping it into a new MutableAT.
>
> In order to create new SchemaObject, a user can :
>
> 1) create a MutableSchemaObject, and get its immutable copy :
>
> MutableAttributeType mutableAT = new MutableAttributeType();
> mutableAT.setXXX( yyy );
> ...
> AttributeType attributeType = new AttributeType( mutableAT );
>
> 2) create a new AttributeType using the RFC notation :
>
> AttributeType attributeType = new AttributeType( "( 2.5.4.58 NAME 
> '**attributeCertificateAttribute'
> DESC 'attribute certificate use ;binary' SYNTAX
> 1.3.6.1.4.1.1466.115.121.1.8 )" );
>
> In any case, everything stored in the SchemaManager must be immutable.
>
>
> public interface SchemaObject
> {
>int getValue();
> }
>
> public abstract class AbstractSchemaObject implements SchemaObject
> {
>// Protected to hide it from the user but modifiable by a MutableAT
> instance
>protected int value;
>
>protected AbstractSchemaObject( int value )
>{
>this.value = value;
>}
>
>public int getValue()
>{
>return value;
>}
> }
>
> public class AttributeType extends AbstractSchemaObject
> {
>// This is protected to be modifable by a MutableAT instance
>protected String descr;
>
>public AttributeType()
>{
>super( 1 );
>descr = "Test";
>}
>
>// A constructor creating an immutable AT from a MutableAT
>public AttributeType( MutableAttributeType attributeType )
>{
>super( attributeType.getValue() );
>
>descr = attributeType.getDescr();
>}
>
>public String getDescr()
>{
>return descr;
>}
> }
>
>
> public interface MutableSchemaObject
> {
>void setValue( int value );
> }
>
>
> public class MutableAttributeType extends AttributeType implements
> MutableSchemaObject
> {
>public MutableAttributeType()
>{
>}
>
>// A constructor creating a MutableAT from an AT
>public MutableAttributeType( AttributeType attributeType )
>{
>super();
>
>value = attributeType.getValue();
>descr = attributeType.getDescr();
>}
>
>public void setValue( int value )
>{
>this.value = value;
>}
>
>public void setDescr( String descr )
>{
>this.descr = descr;
>}
> }
>
>
> Thoughts ?
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Best Regards,
-- Alex


  1   2   3   4   5   6   7   8   9   10   >