Re: Batching requests using SolrCell with SolrJ

2009-09-22 Thread Grant Ingersoll


On Sep 19, 2009, at 1:22 PM, Jay Hill wrote:


When working with SolrJ I have typically batched a Collection of
SolrInputDocument objects before sending them to the Solr server. I'm
working with the latest nightly build and using the  
ExtractingRequestHandler
to index documents, and everything is working fine. Except I haven't  
been
able to figure out how to batch documents when also including  
literals.

Here's what I've got:

//Looping over a List of Files
 ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/update/extract);
 req.addFile(fileToIndex);
 req.setParam(literal.id, fileToIndex.getCanonicalPath());

 try {
   getSolrServer().request(req);
 } catch (SolrServerException e) {
   e.printStackTrace();
 }

Which works great, except that each document processed in the loop is
sending a separate request. Previously I built a collection of  
SolrInput

docs and had SolrJ send them in batches of 100 or whatever.

It seems like I could batch documents by continuing to add them to the
request (req.addFile(eachFileUpToACount)), but the literals seem to  
present
a problem. By sending one at a time the contents and the literals  
all wind

up in the same document. But in a batch there will just be an array of
params for literal.id (in this example) not matched to the contents.



It might be nice to be able to specify literals on a per stream name  
basis, such as literal.site_pdf.id=site_pdf, but there isn't currently  
support for this.  Then, you could combine that with the  
ContentStreamUpdateRequest to do what is needed, I believe.


-Grant


Batching requests using SolrCell with SolrJ

2009-09-19 Thread Jay Hill
When working with SolrJ I have typically batched a Collection of
SolrInputDocument objects before sending them to the Solr server. I'm
working with the latest nightly build and using the ExtractingRequestHandler
to index documents, and everything is working fine. Except I haven't been
able to figure out how to batch documents when also including literals.
Here's what I've got:

//Looping over a List of Files
  ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/update/extract);
  req.addFile(fileToIndex);
  req.setParam(literal.id, fileToIndex.getCanonicalPath());

  try {
getSolrServer().request(req);
  } catch (SolrServerException e) {
e.printStackTrace();
  }

Which works great, except that each document processed in the loop is
sending a separate request. Previously I built a collection of SolrInput
docs and had SolrJ send them in batches of 100 or whatever.

It seems like I could batch documents by continuing to add them to the
request (req.addFile(eachFileUpToACount)), but the literals seem to present
a problem. By sending one at a time the contents and the literals all wind
up in the same document. But in a batch there will just be an array of
params for literal.id (in this example) not matched to the contents.

Can anyone provide a code snippet of how to do this? Or is there no other
approach than sending a request for each document.

Thanks,
-Jay
http://www.lucidimagination.com