On Thu, Apr 21, 2011 at 7:27 PM, Kiko Aumond <[email protected]> wrote: > Yes, I've seen that page, but I went a bit beyond the material there, as the > code I wrote is able to set parameters such as separators, encapsulators and > the index columns, whether to split parameters, auto-commit as well as the > ability to do incremental or full index reloads.
Is this a CSV loader? If so, did you know the CSV loader (and other data loaders) have the option to bypass HTTP also and stream directly from a local file (or other URL)? > Also, from what I've seen in DirectSolrConnection (version 1.4.1), you have > to supply the document body as a String. We want to avoid havindgto load > the entire document into memory, which is why we load the files into > ContentStream objects and pass them to the embedded Solr server (I am > assuming ContentStream actually streams the file as its name suggests > instead of trying to load it into memory). The utility I wrote gets a path, > a Regex expression for all the files to be loaded, as well as the parameters > mentioned above and it does either a full or incremental upload of multiple > files with a single command. > > We run a very high load application with SOLR in the back end that requires > that we use the Embedded solr server to eliminate the network round-trip. > Even a small incremental gain in performance is important for us. Eliminating the network round-trip is certainly important for good bulk indexing performance. Luckily you don't have to embed to do that. You can use multiple threads (say 16 for a 4 core server) that essentially covers up any round-trip latency (use persistent connections though! or use SolrJ which does by default), or you can use the StreamingUpdateSolrServer that eliminates round-trip network delays by streaming documents over multiple already open connections. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
