Hi Tom, That must have been a fun surprise! Maybe just add info to the SolrCloud Wiki page?
Otis Solr & ElasticSearch Support http://sematext.com/ On Jan 15, 2013 12:43 PM, "Tom Burton-West" <tburt...@umich.edu> wrote: > Hello all, > > We have been using Solr 4.0 for a while and suddenly we couldn't get Solr > to come up. As Solr was starting up it hung after opening a Searcher. > There wasn't anything else obvious in the logs. Eventually we realized > that the problem was that the updatelog was being read and that the update > log contained the entire text of all 800,000+ books that we indexed (About > 837GB). > > We looked and didn't find any obvious note in the Solr 4.0 Release notes > on upgrading from 3.6 or any documentation in the example solrconfig.xml > that mentioned that perhaps if you have large documents and you aren't > using real-time get, you may want to turn this off/comment this out to > avoid transactions logs that can exceed the size of your index. > > In the latest 4.0 example/solrconfig.xml (r > *1433064<http://svn.apache.org/viewvc?view=revision&revision=1433064>) > , updateLog is enabled in the default Solr updateHandler by default and the > only comment is:* > * > <!-- Enables a transaction log, currently used for real-time get. > "dir" - the target directory for transaction logs, defaults to the > solr data directory --> > * > > > Some users who are either new to Solr or upgrading from earlier versions > of Solr may not understand whether or not they need "real-time get" and > they may not want to delve into the details of near- realtime search or > using Solr as an NoSQL server in order to determine whether they should > comment out the updateLog entry. > > I think that either the updateLog should not be enabled by default (don't > know the pros and cons of this), or at the very least, something should > mention that this can lead to large transaction logs and there should be a > pointer to some documentation that would enable the user to decide whether > or not to enable/disable this. > > Is there documentation of this in some obvious place that I just missed? > > I did find the text below on the wiki > http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section, but a > user-friendly translation would be helpful or a pointer to where someone > could read to determine what this means would be helpful. > > <openSearcher>false</openSearcher> <!-- SOLR 4.0. Optionally don't open a > searcher on hard commit. This is useful to minimize the size of transaction > logs that keep track of uncommitted updates. --> > > I did see that several new Solr 4 users created very large logs before > they asked the mailing list how to avoid this: > > http://lucene.472066.n3.nabble.com/Documentation-on-the-new-updateLog-transaction-log-feature-tc4000537.html#a4000538 > > Perhaps some of the information in this thread on the mailing list might > be added to the documentation somewhere. > > http://lucene.472066.n3.nabble.com/Testing-Solr4-first-impressions-and-problems-tc4013628.html#a4013814 > > I think I almost understand the > hard-commit/soft-commit/autocommit/opensearcher discussion in the above > thread and it would seem that this could be put in the wiki or the comments > in the config file as appropriate. > > Should I open a JIRA issue? > > Tom > > > > ---- > Log entry. > "Jan 14, 2013 12:40:31 PM org.apache.solr.search.SolrIndexSearcher <init> > INFO: Opening Searcher@59db9f45 main > >