Le 27/01/15 23:07, Harris, Christopher P a écrit :
> Hi, Emmanuel.
>
> "Can you tell us how you do that ? Ie, are you using a plain new connection
> for each thread you spawn ?"
> Sure. I can tell you how I am implementing a multi-threaded approach to read
> all of LDAP/AD into memory. I'll do the next best thing...paste my code at
> the end of my response.
>
>
> "In any case, the TimeOut is the default LDapConnection timeout (30 seconds)
> :"
> Yes, I noticed mention of the default timeout in your User Guide.
>
>
> "You have to set the LdapConnectionConfig timeout for all the created
> connections to use it. there is a setTimeout() method for that which has been
> added in 1.0.0-M28."
> When visiting your site while seeking to explore connection pool options, I
> noticed that you recently released M28 and fixed DIRAPI-217 and decided to
> update my pom.xml to M28 and test out the PoolableLdapConnectionFactory.
> Great job, btw. Keep up the good work!
>
> Oh, and your example needs to be updated to using
> DefaultPoolableLdapConnectionFactory instead of PoolableLdapConnectionFactory.
>
>
> "config.setTimeOut( whatever fits you );"
> Very good to know. Thank you!
>
>
> "It is the right way."
> Sweeeeeeet!
>
>
> "Side note : you may face various problems when pulling everything from an AD
> server. Typically, the AD config might not let you pull more than
> 1000 entries, as there is a hard limit you need to change on AD if you want
> to get more entries.
>
> Otherwise, the approach - ie, using multiple threads - might seems good, but
> the benefit is limited. Pulling entries from the server is fast, you should
> be able to get tens of thousands per second with one single thread. I'm not
> sure how AD support concurrent searches anyway. Last, not least, it's likely
> that AD does not allow more than a certain number of concurrent threads to
> run, which might lead to contention at some point."
>
> Ah, this is why I wanted to reach out to you guys. You guys know this kind
> of in-depth information about LDAP and AD. So, I may adapt my code to a
> single-thread then. I can live with that. I need to pull about 40k-60k
> entries, so 10's of thousands of entries per second works for me. I may need
> to run the code by you then if I go with a single-threaded approach and need
> to check if I'm going about it in the most efficient manner.
The pb with the multi-threaded approach is that you *have* to know which
entry has children, because they won't give you such an info. So you
will end doing a search for every single entry you get at one level,
with scope ONE_LEVEL, and most of the time, you will just get teh entry
itself. That would more than double the time it takes to grab everything.
>
>
>
> And now time for some code...
>
> import java.io.IOException;
> import java.util.Iterator;
> import java.util.List;
> import java.util.Map;
> import java.util.concurrent.ConcurrentHashMap;
> import java.util.concurrent.ExecutorService;
> import java.util.concurrent.Executors;
> import java.util.concurrent.TimeUnit;
> import java.util.logging.Level;
> import java.util.logging.Logger;
>
> import org.apache.commons.pool.impl.GenericObjectPool;
> import org.apache.directory.api.ldap.model.cursor.CursorException;
> import org.apache.directory.api.ldap.model.cursor.SearchCursor;
> import org.apache.directory.api.ldap.model.entry.Entry;
> import org.apache.directory.api.ldap.model.exception.LdapException;
> import org.apache.directory.api.ldap.model.message.Response;
> import org.apache.directory.api.ldap.model.message.SearchRequest;
> import org.apache.directory.api.ldap.model.message.SearchRequestImpl;
> import org.apache.directory.api.ldap.model.message.SearchResultEntry;
> import org.apache.directory.api.ldap.model.message.SearchScope;
> import org.apache.directory.api.ldap.model.name.Dn;
> import org.apache.directory.ldap.client.api.DefaultLdapConnectionFactory;
> import org.apache.directory.ldap.client.api.LdapConnection;
> import org.apache.directory.ldap.client.api.LdapConnectionConfig;
> import org.apache.directory.ldap.client.api.LdapConnectionPool;
> import org.apache.directory.ldap.client.api.LdapNetworkConnection;
> import
> org.apache.directory.ldap.client.api.DefaultPoolableLdapConnectionFactory;
> import
> org.apache.directory.ldap.client.api.ValidatingPoolableLdapConnectionFactory;
> import org.apache.directory.ldap.client.api.SearchCursorImpl;
> import org.apache.directory.ldap.client.template.EntryMapper;
> import org.apache.directory.ldap.client.template.LdapConnectionTemplate;
>
> /**
> * @author Chris Harris
> *
> */
> public class LdapClient {
>
> public LdapClient() {
>
> }
>
> public Person searchLdapForCeo() {
> return this.searchLdapUsingHybridApproach(ceoQuery);
> }
>
> public Map<String, Person> buildLdapMap() {
> SearchCursor cursor = new SearchCursorImpl(null, 300000,
> TimeUnit.SECONDS);
> LdapConnection connection = new LdapNetworkConnection(host,
> port);
> connection.setTimeOut(300000);
> Entry entry = null;
>
> try {
> connection.bind(dn, pwd);
>
> LdapClient.recursivelyGetLdapDirectReports(connection, cursor, entry,
> ceoQuery);
> System.out.println("Finished all Ldap Map
> Builder threads...");
> } catch (LdapException ex) {
>
> Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null, ex);
> } catch (CursorException ex) {
>
> Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null, ex);
> } finally {
> cursor.close();
> try {
> connection.close();
> } catch (IOException ex) {
>
> Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null, ex);
> }
> }
>
> return concurrentPersonMap;
> }
>
> private static Person recursivelyGetLdapDirectReports(LdapConnection
> connection, SearchCursor cursor, Entry entry, String query)
> throws CursorException {
> Person p = null;
> EntryMapper<Person> em = Person.getEntryMapper();
>
> try {
> SearchRequest sr = new SearchRequestImpl();
> sr.setBase(new Dn(searchBase));
> StringBuilder sb = new StringBuilder(query);
> sr.setFilter(sb.toString());
> sr.setScope( SearchScope.SUBTREE );
Ahhhhh !!!! STOP !!!
Ok, no need to go any further in your code.
You are doing a SUBTREE search on *every single entry* you are pulling
from the base. if you have 40 000 entries, you will do something like O(
40 000! ) (factorial) searches. No wonder why you get timeout... Imagine
you have such a tree :
root
A1
B1
C1
C2
B2
C3
C4
A2
B3
C5
C6
B4
C7
C8
The search on root with pull A1, A2, B1, B2, B3, B4, C1..8 (14 entries
-> 14 searches)
Then the search on A1 will pull B1, C1, C2, B2, C3, C4 (6 entries -> 6
searches)
Then the search on A2 will pull B3, C5, C6, B7, C8, C9 (6 entries -> 6
searches)
Then the search on B1 will pull C1, C2 ( 2 entries -> 2 searches, *4 = 8
...
At the end, you have done 1 + 14 + 12 + 8 = 35 searches, when you have
only 15 entries...
If you want to see what your algorithm is doing, just do a search using
a SearchScope.ONE_LEVEL instead. You will only do somehow O(40 000)
searches, which is way less than what you are doing.
But anyway, doing a search on the root with a SUBTREE scope will be way
faster, because you will do only one single search.