Hello all,

We have been using Solr 4.0 for a while and suddenly we couldn't get Solr
to come up.   As Solr was starting up it hung after opening a Searcher.
 There wasn't anything else obvious in the logs.  Eventually we realized
that the problem was that the updatelog was being read and that the update
log contained the entire text of all 800,000+ books that we indexed (About
837GB).

We looked and didn't find any obvious note in the Solr 4.0 Release notes on
upgrading from 3.6 or any documentation in the example solrconfig.xml that
mentioned that perhaps if you have large documents and you aren't using
real-time get, you may want to turn this off/comment this out to avoid
transactions logs that can exceed the size of your index.

In the latest 4.0 example/solrconfig.xml (r
*1433064<http://svn.apache.org/viewvc?view=revision&revision=1433064>)
, updateLog is enabled in the default Solr updateHandler by default and the
only comment is:*
*
<!-- Enables a transaction log, currently used for real-time get.
         "dir" - the target directory for transaction logs, defaults to the
            solr data directory -->
*


 Some users who are either new to Solr or upgrading from earlier versions
of Solr may not understand whether or not they need "real-time get" and
they may not want to delve into the details of near- realtime search or
using Solr as an NoSQL server in order to determine whether they should
comment out the updateLog entry.

I think that either the updateLog should not be enabled by default (don't
know the pros and cons of this), or at the very least, something should
mention that this can lead to large transaction logs and there should be a
pointer to some documentation that would enable the user to decide whether
or not to enable/disable this.

Is there documentation of this in some obvious place that I just missed?

I did find the text below on the wiki
http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section, but a
user-friendly translation would be helpful or a pointer to where someone
could read to determine what this means would be helpful.

<openSearcher>false</openSearcher> <!-- SOLR 4.0.  Optionally don't
open a searcher on hard commit.  This is useful to minimize the size
of transaction logs that keep track of uncommitted updates. -->

I did see that several new Solr 4 users created very large logs before they
asked the mailing list how to avoid this:
http://lucene.472066.n3.nabble.com/Documentation-on-the-new-updateLog-transaction-log-feature-tc4000537.html#a4000538

Perhaps some of the information in this thread on the mailing list might be
added to the documentation somewhere.

http://lucene.472066.n3.nabble.com/Testing-Solr4-first-impressions-and-problems-tc4013628.html#a4013814

I think I almost understand the
hard-commit/soft-commit/autocommit/opensearcher discussion in the above
thread and it would seem that this could be put in the wiki or the comments
in the config file as appropriate.

Should I open a JIRA issue?

Tom



----
Log entry.
"Jan 14, 2013 12:40:31 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@59db9f45 main

Reply via email to