extending SolrIndexSearcher
Hi, I am looking at extending the source code for SolrIndexSearcher for my own purposes. Basically, I am trying to replace the use of Lucene's IndexSearcher with a ParallelMultiSearcher version so that I can have a query search both locally available indexes as well as remote indexes available only via RMI. This ParallelMultiSearcher is instantiated to consist of both local and remote Searchable references. The local Searchables are simply IndexSearcher instances tied to local disk (separate indexes), while the remote Searchables are made reachable via RMI. In essence, where it used to be: IndexSearcher searcher = new IndexSearcher(reader); it is now: (not the actual code but similar) Searchable[] searchables = new Searchable[3]; for (int i=0; i2; i++) { // Local searchable: searchables[i] = new IndexSearcher(/disk + i + /index); } // RMI searchable: throws exception during search.. searchables[2] = (Searchable) Naming.lookup (//remote_host:1099/remote_svc); ParallelMultiSearcher searcher = new ParallelMultiSearcher(sch); When I build the source and use it (the short story, by replacing the relevant class file(s) within solr.war used by the example jetty implementation), it starts up just fine. If I comment out the RMI searchable line, submission of a search query to Jetty/Solr works just fine, and it is able to search any number of indexes. However, with the RMI searchable uncommented out, I get an exception thrown (here's the ending of it): May 9, 2006 1:38:07 AM org.apache.solr.core.SolrException log SEVERE: java.rmi.MarshalException: error marshalling arguments; nested exception is: java.io.NotSerializableException: org.apache.lucene.search.MultiSearcher$1 at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:122) at org.apache.lucene.search.RemoteSearchable_Stub.search(Unknown Source) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java :248) at org.apache.lucene.search.Searcher.search(Searcher.java:116) at org.apache.lucene.search.Searcher.search(Searcher.java:95) at org.apache.solr.search.SolrIndexSearcher.getDocListNC( SolrIndexSearcher.java:794) at org.apache.solr.search.SolrIndexSearcher.getDocListC( SolrIndexSearcher.java:712) at org.apache.solr.search.SolrIndexSearcher.getDocList( SolrIndexSearcher.java:605) at org.apache.solr.request.StandardRequestHandler.handleRequest( StandardRequestHandler.java:106) So it looks like it requires Serialization somehow to get it to work. Wondering if anyone has any ideas to get around this problem. tia, Koji
Re: Java heap space
FYI, I have just committed the a On 5/8/06, Bill Au [EMAIL PROTECTED] wrote: I was able to produce an OutOfMemoryError using Yonik's python script with Jetty 6. I was not able to do so with Jetty 5.1.11RC0, the latest stable version. So that's the version of Jetty with which I will downgrade the Solr example app to. Bill On 5/5/06, Erik Hatcher [EMAIL PROTECTED] wrote: Along these lines, locally I've been using the latest stable version of Jetty and it has worked fine, but I did see an out of memory exception the other day but have not seen it since so I'm not sure what caused it. Moving to Tomcat, as long as we can configure it to be as lightweight as possible, is quite fine to me as well. Erik On May 5, 2006, at 12:12 PM, Bill Au wrote: There seems to be a fair number of folks using the jetty with the example app as oppose to using Solr with their own appserver. So I think it is best to use a stable version of Jetty instead of the beta. If no one objects, I can go ahead and take care of this. Bill On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote: I verified that Tomcat 5.5.17 doesn't experience this problem. -Yonik On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 5/3/06, Yonik Seeley [EMAIL PROTECTED] wrote: I just tried sending in 100,000 deletes and it didn't cause a problem: the memory grew from 22M to 30M. Random thought: perhaps it has something to do with how you are sending your requests? Yep, I was able to reproduce a memory problem w/ Jetty on Linux when using non-persistent connections (closed after each request). The same 100,000 deletes blew up the JVM to 1GB heap. So this looks like it could be a Jetty problem (shame on me for using a beta). I'm still not quite sure what changed in Solr that could make it appear in later version and not in earlier versions though... the version of Jetty is the same.
Re: Java heap space
Sorry, hit the wrong key before... FYI, I have just committed all the changes related to the Jetty downgrade into SVN. Let me know if you notice anything problems. Bill On 5/9/06, Bill Au [EMAIL PROTECTED] wrote: FYI, I have just committed the a On 5/8/06, Bill Au [EMAIL PROTECTED] wrote: I was able to produce an OutOfMemoryError using Yonik's python script with Jetty 6. I was not able to do so with Jetty 5.1.11RC0, the latest stable version. So that's the version of Jetty with which I will downgrade the Solr example app to. Bill On 5/5/06, Erik Hatcher [EMAIL PROTECTED] wrote: Along these lines, locally I've been using the latest stable version of Jetty and it has worked fine, but I did see an out of memory exception the other day but have not seen it since so I'm not sure what caused it. Moving to Tomcat, as long as we can configure it to be as lightweight as possible, is quite fine to me as well. Erik On May 5, 2006, at 12:12 PM, Bill Au wrote: There seems to be a fair number of folks using the jetty with the example app as oppose to using Solr with their own appserver. So I think it is best to use a stable version of Jetty instead of the beta. If no one objects, I can go ahead and take care of this. Bill On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote: I verified that Tomcat 5.5.17 doesn't experience this problem. -Yonik On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 5/3/06, Yonik Seeley [EMAIL PROTECTED] wrote: I just tried sending in 100,000 deletes and it didn't cause a problem: the memory grew from 22M to 30M. Random thought: perhaps it has something to do with how you are sending your requests? Yep, I was able to reproduce a memory problem w/ Jetty on Linux when using non-persistent connections (closed after each request). The same 100,000 deletes blew up the JVM to 1GB heap. So this looks like it could be a Jetty problem (shame on me for using a beta). I'm still not quite sure what changed in Solr that could make it appear in later version and not in earlier versions though... the version of Jetty is the same.
Re: extending SolrIndexSearcher
I tried it with just Lucene + RMI, and that works just fine. It's actually based on the Lucene In Action e-book topic on how to use ParallelMultiSearcher (chap.5). The relevant code snippet follows: /* * search server: * This is the code frag for the search server, which enters * a wait-loop to accept requests on port 1099. * This server implementation is run on 2+ separate boxes, one * is a master while the rest are as slaves, where master is * the main entry point which searches both it's local indexes, * and sends requests to each slave, which only searches its own * local indexes and reports back results to the master. */ //private VectorSearchable _searchables; //private VectorString _localDirs; // ... // add local dirs as searchables.. for (int i=0; i_localDirs.size(); i++) { System.out.println(local searchable: + _localDirs.get(i) + ..); _searchables.add(new IndexSearcher(_localDirs.get(i))); } // add remote nodes (slaves) as searchables.. // note: only master will do this, the slaves only looks at its local indexes.. if (_remoteNodes != null) { Collection nodes = _remoteNodes.values(); Iterator it = nodes.iterator(); String node = ; while (it.hasNext()) { node = (String) it.next(); try { // remote nodes (slaves) also reachable via port 1099 _searchables.add((Searchable) Naming.lookup(// + node + :1099/ + _DEFAULT_SVC_NAME_)); System.out.println(remote searchable: + node + ..); } catch (java.rmi.ConnectException e) { System.err.println(ERROR: unable to connect to node= + node + ...); } } } // just some glue to prepare list of searchables for ParallelMultiSearcher constructor.. Searchable[] sch = new Searchable[_searchables.size()]; for (int i=0; i_searchables.size(); i++) { sch[i] = _searchables.get(i); } // start up server.. System.setSecurityManager(new RMISecurityManager()); LocateRegistry.createRegistry(_port); Searcher parallelSearcher = new ParallelMultiSearcher(sch); RemoteSearchable parallelImpl = new RemoteSearchable(parallelSearcher); Naming.rebind(// + _nodeID + : + _port + / + _DEFAULT_SVC_NAME_, parallelImpl); System.out.println(SearchServer started + (nodeID= + _nodeID + , port= + _port + , role= + ((_remoteNodes!=null)?master:slave) + , # searchables= + _searchables.size() + )...); // enters wait state, ready to accept requests on port 1099... /* * search client * This basically does an RMI naming lookup to get a reference to * the master node on port 1099, then sends a search query.. */ TermQuery query = new TermQuery(new Term(body, word)); MultiSearcher searcher = new MultiSearcher(new Searchable[]{_lookupRemote(_DEFAULT_SVC_NAME_)}); Hits hits = searcher.search(query); Document doc = null; for (int i=0; ihits.length(); i++) { doc = hits.doc(i); // able to get hit info here... } // . private Searchable _lookupRemote(String svcName) throws Exception { return (Searchable) Naming.lookup(// + _host + : + _port + / + svcName); } From both of the above code, I am able to start a server on box1 (master), another server on box2 (slave), then invoke a client that queries box1, which can get results from searching indexes in box1+box2. With this working, that's when I tried to incorporate ParallelMultiSearcher on Solr's SolrIndexSearcher, since I saw that it is the place where it uses Lucene's IndexSearcher. I replaced it with ParallelMultiSearcher, where it is initialized similar to the client code I mentioned above. From that, it seems like Solr itself needs to marshall and unmarshall the searcher instance SolrIndexSearcher holds, and because the ParallelMultiSearcher is initialized with RMI stubs, it fails to proceed with such marshall/unmarshall internal actions. As mentioned in the first email, if I use ParallelMultiSearcher to only look at local indexes (no RMI stub), Solr works just fine. So I'm wondering if there is a way use SolrIndexSearcher to search both local and remote indexes, even if not through the RMI solution Lucene's ebook has suggested via its ParallelMultiSearcher class. tia, Koji On 5/9/06, Chris Hostetter [EMAIL PROTECTED] wrote: I don't really know a lot about RMI, but as i understand it, Serialization is a core neccessity -- if the arguments you want to pass to your Remote Method aren't serializable, then RMI can't pass those argument across the wire. That said: it's not clear to me from the psuedocode/stacktrace you included *what* isn't serializable ... is it a Solr class or a core Lucene class? If it's a Lucene class, you may want to start by making a small proof of concept RMI app that just uses the Lucene core classes, once that works then try your changes in Solr. : Date: Tue, 9 May 2006 02:32:45 -0700 : From: Koji Miyamoto [EMAIL PROTECTED] : Reply-To: solr-user@lucene.apache.org : To:
Re: extending SolrIndexSearcher
: IndexSearcher. I replaced it with ParallelMultiSearcher, where it is : initialized similar to the client code I mentioned above. : : From that, it seems like Solr itself needs to marshall and unmarshall the : searcher instance SolrIndexSearcher holds, and because the : ParallelMultiSearcher is initialized with RMI stubs, it fails to proceed : with such marshall/unmarshall internal actions. As mentioned in the first : email, if I use ParallelMultiSearcher to only look at local indexes (no RMI : stub), Solr works just fine. So I'm wondering if there is a way use : SolrIndexSearcher to search both local and remote indexes, even if not : through the RMI solution Lucene's ebook has suggested via its : ParallelMultiSearcher class. As I said, i don't really know a lot about RMI, but I don't think the client code is expected to marshall/unmarshall things -- but the objects you want to pass to remote methods (or recieve back from from remote methods) need to be serializable. Do you know what objects you got serialization exceptions from? (you didn't include any real source -- just psuedocode, so it's not posisble to use the line numbers in your stack trace to look at the code because we don't know exactly what you changed) -Hoss