Thank you, Yonik.

Yes, I've seen that page, but I went a bit beyond the material there, as the
code I wrote is able to set parameters such as separators, encapsulators and
the index columns,  whether to split parameters, auto-commit as well as the
ability to do incremental or full index reloads.

Also, from what I've seen in DirectSolrConnection (version 1.4.1), you have
to supply the document body as a String.  We want to avoid havindgto load
the entire document into memory, which is why we load the files into
ContentStream objects and pass them to the embedded Solr server (I am
assuming  ContentStream actually streams the file as its name suggests
instead of trying to load it into memory).  The utility I wrote gets a path,
a Regex expression for all the files to be loaded, as well as the parameters
mentioned above and it does either a full or incremental upload of multiple
files with a single command.

We run a very high load application with SOLR in the back end that requires
that we use the Embedded solr server to eliminate the network round-trip.
Even a small incremental gain in performance is important for us.

On Thu, Apr 21, 2011 at 4:02 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Thu, Apr 21, 2011 at 6:26 PM, Kiko Aumond <k...@alum.mit.edu> wrote:
> > Hi
> >
> > I am new to the list and relatively new to SOLR.  I am working on a tool
> for
> > updating indexes directly through EmbeddedSolrServer thus eliminating the
> > need for sending potentially large documents over HTTP.
>
> http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
> And also DirectSolrConnection
>
> It's generally discouraged as a premature optimization that normally
> gains you only a few percent increase in performance.
>
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>

Reply via email to