extending SolrIndexSearcher

2006-05-09 Thread Koji Miyamoto

Hi,

I am looking at extending the source code for SolrIndexSearcher for my own
purposes.  Basically, I am trying to replace the use of Lucene's
IndexSearcher with a ParallelMultiSearcher version so that I can have a
query search both locally available indexes as well as remote indexes
available only via RMI.  This ParallelMultiSearcher is instantiated to
consist of both local and remote Searchable references.  The local
Searchables are simply IndexSearcher instances tied to local disk (separate
indexes), while the remote Searchables are made reachable via RMI.

In essence, where it used to be:

 IndexSearcher searcher = new IndexSearcher(reader);

it is now: (not the actual code but similar)

 Searchable[] searchables = new Searchable[3];
 for (int i=0; i2; i++) {
   // Local searchable:
   searchables[i] = new IndexSearcher(/disk + i + /index);
 }

 // RMI searchable:  throws exception during search..
 searchables[2] = (Searchable) Naming.lookup
(//remote_host:1099/remote_svc);

 ParallelMultiSearcher searcher = new ParallelMultiSearcher(sch);

When I build the source and use it (the short story, by replacing the
relevant class file(s) within solr.war used by the example jetty
implementation), it starts up just fine.  If I comment out the RMI
searchable line, submission of a search query to Jetty/Solr works just fine,
and it is able to search any number of indexes.  However, with the RMI
searchable uncommented out, I get an exception thrown (here's the ending of
it):

May 9, 2006 1:38:07 AM org.apache.solr.core.SolrException log
SEVERE: java.rmi.MarshalException: error marshalling arguments; nested
exception is:
   java.io.NotSerializableException:
org.apache.lucene.search.MultiSearcher$1
   at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:122)
   at org.apache.lucene.search.RemoteSearchable_Stub.search(Unknown
Source)
   at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java
:248)
   at org.apache.lucene.search.Searcher.search(Searcher.java:116)
   at org.apache.lucene.search.Searcher.search(Searcher.java:95)
   at org.apache.solr.search.SolrIndexSearcher.getDocListNC(
SolrIndexSearcher.java:794)
   at org.apache.solr.search.SolrIndexSearcher.getDocListC(
SolrIndexSearcher.java:712)
   at org.apache.solr.search.SolrIndexSearcher.getDocList(
SolrIndexSearcher.java:605)
   at org.apache.solr.request.StandardRequestHandler.handleRequest(
StandardRequestHandler.java:106)

So it looks like it requires Serialization somehow to get it to work.
Wondering if anyone has any ideas to get around this problem.

tia,
Koji


Re: Java heap space

2006-05-09 Thread Bill Au

FYI, I have just committed the a

On 5/8/06, Bill Au [EMAIL PROTECTED] wrote:


I was able to produce an OutOfMemoryError using Yonik's python script with
Jetty 6.
I was not able to do so with Jetty 5.1.11RC0, the latest stable version.
So that's the
version of Jetty with which I will downgrade the Solr example app to.

Bill


On 5/5/06, Erik Hatcher [EMAIL PROTECTED] wrote:

 Along these lines, locally I've been using the latest stable version
 of Jetty and it has worked fine, but I did see an out of memory
 exception the other day but have not seen it since so I'm not sure
 what caused it.

 Moving to Tomcat, as long as we can configure it to be as lightweight
 as possible, is quite fine to me as well.

 Erik


 On May 5, 2006, at 12:12 PM, Bill Au wrote:

  There seems to be a fair number of folks using the jetty with the
  example
  app
  as oppose to using Solr with their own appserver.  So I think it is
  best to
  use a stable version of Jetty instead of the beta.  If no one
  objects, I can
  go ahead and take care of this.
 
  Bill
 
  On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote:
 
  I verified that Tomcat 5.5.17 doesn't experience this problem.
 
  -Yonik
 
  On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote:
   On 5/3/06, Yonik Seeley  [EMAIL PROTECTED] wrote:
I just tried sending in 100,000 deletes and it didn't cause a
  problem:
the memory grew from 22M to 30M.
   
Random thought: perhaps it has something to do with how you are
sending your requests?
  
   Yep, I was able to reproduce a memory problem w/ Jetty on Linux
  when
   using non-persistent connections (closed after each request).  The
   same 100,000 deletes blew up the JVM to 1GB heap.
  
   So this looks like it could be a Jetty problem (shame on me for
  using a
  beta).
   I'm still not quite sure what changed in Solr that could make it
   appear in later version and not in earlier versions though... the
   version of Jetty is the same.
 





Re: Java heap space

2006-05-09 Thread Bill Au

Sorry, hit the wrong key before...

FYI, I have just committed all the changes related to the Jetty downgrade
into SVN.
Let me know if you notice anything problems.

Bill

On 5/9/06, Bill Au [EMAIL PROTECTED] wrote:


FYI, I have just committed the a


On 5/8/06, Bill Au [EMAIL PROTECTED] wrote:

 I was able to produce an OutOfMemoryError using Yonik's python script
 with Jetty 6.
 I was not able to do so with Jetty 5.1.11RC0, the latest stable
 version.  So that's the
 version of Jetty with which I will downgrade the Solr example app to.

 Bill


 On 5/5/06, Erik Hatcher  [EMAIL PROTECTED] wrote:
 
  Along these lines, locally I've been using the latest stable version
  of Jetty and it has worked fine, but I did see an out of memory
  exception the other day but have not seen it since so I'm not sure
  what caused it.
 
  Moving to Tomcat, as long as we can configure it to be as lightweight
  as possible, is quite fine to me as well.
 
  Erik
 
 
  On May 5, 2006, at 12:12 PM, Bill Au wrote:
 
   There seems to be a fair number of folks using the jetty with the
   example
   app
   as oppose to using Solr with their own appserver.  So I think it is
   best to
   use a stable version of Jetty instead of the beta.  If no one
   objects, I can
   go ahead and take care of this.
  
   Bill
  
   On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote:
  
   I verified that Tomcat 5.5.17 doesn't experience this problem.
  
   -Yonik
  
   On 5/4/06, Yonik Seeley [EMAIL PROTECTED] wrote:
On 5/3/06, Yonik Seeley  [EMAIL PROTECTED] wrote:
 I just tried sending in 100,000 deletes and it didn't cause a
   problem:
 the memory grew from 22M to 30M.

 Random thought: perhaps it has something to do with how you are
 
 sending your requests?
   
Yep, I was able to reproduce a memory problem w/ Jetty on Linux
   when
using non-persistent connections (closed after each
  request).  The
same 100,000 deletes blew up the JVM to 1GB heap.
   
So this looks like it could be a Jetty problem (shame on me for
   using a
   beta).
I'm still not quite sure what changed in Solr that could make it
appear in later version and not in earlier versions though... the
version of Jetty is the same.
  
 
 




Re: extending SolrIndexSearcher

2006-05-09 Thread Koji Miyamoto

I tried it with just Lucene + RMI, and that works just fine.  It's actually
based on the Lucene In Action e-book topic on how to use
ParallelMultiSearcher (chap.5).  The relevant code snippet follows:

/*
* search server:
* This is the code frag for the search server, which enters
* a wait-loop to accept requests on port 1099.
* This server implementation is run on 2+ separate boxes, one
* is a master while the rest are as slaves, where master is
* the main entry point which searches both it's local indexes,
* and sends requests to each slave, which only searches its own
* local indexes and reports back results to the master.
*/

 //private VectorSearchable _searchables;
 //private VectorString _localDirs;
 // ...

 // add local dirs as searchables..
 for (int i=0; i_localDirs.size(); i++) {
System.out.println(local searchable:  + _localDirs.get(i) +  ..);
_searchables.add(new IndexSearcher(_localDirs.get(i)));
 }

 // add remote nodes (slaves) as searchables..
 // note: only master will do this, the slaves only looks at its local
indexes..
 if (_remoteNodes != null) {
Collection nodes = _remoteNodes.values();
Iterator it = nodes.iterator();
String node = ;
while (it.hasNext()) {
   node = (String) it.next();
   try {
  // remote nodes (slaves) also reachable via port 1099
  _searchables.add((Searchable) Naming.lookup(// + node +
:1099/ + _DEFAULT_SVC_NAME_));
  System.out.println(remote searchable:  + node +  ..);
   } catch (java.rmi.ConnectException e) {
  System.err.println(ERROR: unable to connect to node= + node + 
...);
   }
}
 }

 // just some glue to prepare list of searchables for ParallelMultiSearcher
constructor..
 Searchable[] sch = new Searchable[_searchables.size()];
 for (int i=0; i_searchables.size(); i++) {
sch[i] = _searchables.get(i);
 }

 // start up server..
 System.setSecurityManager(new RMISecurityManager());
 LocateRegistry.createRegistry(_port);
 Searcher parallelSearcher = new ParallelMultiSearcher(sch);
 RemoteSearchable parallelImpl = new RemoteSearchable(parallelSearcher);
 Naming.rebind(// + _nodeID + : + _port + / + _DEFAULT_SVC_NAME_,
parallelImpl);
 System.out.println(SearchServer started  +
   (nodeID= + _nodeID +
   , port= + _port +
   , role= + ((_remoteNodes!=null)?master:slave) +
   , # searchables= + _searchables.size() + )...);

 // enters wait state, ready to accept requests on port 1099...



/*
* search client
* This basically does an RMI naming lookup to get a reference to
* the master node on port 1099, then sends a search query..
*/

TermQuery query = new TermQuery(new Term(body, word));
MultiSearcher searcher = new MultiSearcher(new
Searchable[]{_lookupRemote(_DEFAULT_SVC_NAME_)});

Hits hits = searcher.search(query);

Document doc = null;
for (int i=0; ihits.length(); i++) {
 doc = hits.doc(i);
 // able to get hit info here...
}

// .

private Searchable _lookupRemote(String svcName) throws Exception {
 return (Searchable) Naming.lookup(// + _host + : + _port + / +
svcName);
}




From both of the above code, I am able to start a server on box1 (master),

another server on box2 (slave), then invoke a client that queries box1,
which can get results from searching indexes in box1+box2.  With this
working, that's when I tried to incorporate ParallelMultiSearcher on Solr's
SolrIndexSearcher, since I saw that it is the place where it uses Lucene's
IndexSearcher.  I replaced it with ParallelMultiSearcher, where it is
initialized similar to the client code I mentioned above.


From that, it seems like Solr itself needs to marshall and unmarshall the

searcher instance SolrIndexSearcher holds, and because the
ParallelMultiSearcher is initialized with RMI stubs, it fails to proceed
with such marshall/unmarshall internal actions.  As mentioned in the first
email, if I use ParallelMultiSearcher to only look at local indexes (no RMI
stub), Solr works just fine.  So I'm wondering if there is a way use
SolrIndexSearcher to search both local and remote indexes, even if not
through the RMI solution Lucene's ebook has suggested via its
ParallelMultiSearcher class.

tia,
Koji



On 5/9/06, Chris Hostetter [EMAIL PROTECTED] wrote:



I don't really know a lot about RMI, but as i understand it, Serialization
is a core neccessity -- if the arguments you want to pass to your Remote
Method aren't serializable, then RMI can't pass those argument across the
wire.

That said: it's not clear to me from the psuedocode/stacktrace you
included *what* isn't serializable ... is it a Solr class or a core Lucene
class?

If it's a Lucene class, you may want to start by making a small proof
of concept RMI app that just uses the Lucene core classes, once that
works then try your changes in Solr.


: Date: Tue, 9 May 2006 02:32:45 -0700
: From: Koji Miyamoto [EMAIL PROTECTED]
: Reply-To: solr-user@lucene.apache.org
: To: 

Re: extending SolrIndexSearcher

2006-05-09 Thread Chris Hostetter

: IndexSearcher.  I replaced it with ParallelMultiSearcher, where it is
: initialized similar to the client code I mentioned above.
:
: From that, it seems like Solr itself needs to marshall and unmarshall the
: searcher instance SolrIndexSearcher holds, and because the
: ParallelMultiSearcher is initialized with RMI stubs, it fails to proceed
: with such marshall/unmarshall internal actions.  As mentioned in the first
: email, if I use ParallelMultiSearcher to only look at local indexes (no RMI
: stub), Solr works just fine.  So I'm wondering if there is a way use
: SolrIndexSearcher to search both local and remote indexes, even if not
: through the RMI solution Lucene's ebook has suggested via its
: ParallelMultiSearcher class.

As I said, i don't really know a lot about RMI, but I don't think the
client code is expected to marshall/unmarshall things -- but the objects
you want to pass to remote methods (or recieve back from from remote
methods) need to be serializable.  Do you know what objects you got
serialization exceptions from? (you didn't include any real source -- just
psuedocode, so it's not posisble to use the line numbers in your stack
trace to look at the code because we don't know exactly what you changed)



-Hoss