Re: corrupt solr index on ec2

2008-10-31 Thread Michael McCandless


Bill Graham wrote:


Then it seemed to run well for about an hour and I saw this:

Oct 28, 2008 10:38:51 PM org.apache.solr.update.DirectUpdateHandler2  
commit

INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Oct 28, 2008 10:38:51 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch:  
1156 docs vs 0 length in bytes of _2rv.fdx
   at  
org 
.apache 
.lucene 
.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
   at  
org 
.apache 
.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java: 
83)
   at  
org 
.apache 
.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java: 
47)
   at  
org 
.apache 
.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
   at  
org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java: 
1774)


This particular exception is very spooky -- it really looks like  
something is removing the index files (such as accidentally opening a  
2nd writer on the index).


Mike


corrupt solr index on ec2

2008-10-30 Thread Bill Graham
Hi,

I've been running solr 1.3 on an ec2 instance for a couple of weeks and I've 
had some stability issues. It seems like I need to bounce the app once a day. 
That I could live with and ultimately maybe troubleshoot, but what's more 
disturbing is that three times in the last 2 weeks my index has been corrupted 
when FileNotFoundExceptions started to appear. 

I'm running in jetty and had my index on the local file system until I lost the 
index the first time. Then I moved it to my mounted ebs volume so I could 
restore from a snapshot if needed. I'm wondering if perhaps there are issues 
with the locking mechanize on either the local directory (which is really a 
virual instance), or the mounted xfs volume. Has anyone seem this, or have 
suggestions re the cause? I'm using the single lockType.

I'm running a single solr instance that gets frequent updates from multiple 
threads, and commits about every hour. 

A few things I see in the logs:

- From time to time I see write lock timeouts:
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
out: SingleInstanceLock: write.lock

- I've seen OOM exceptions during warming. I've changed maxWarmingSearchers=1, 
which I suspect will do he trick

- The finally, this is what I fond in the logs today when the index got corrupt:

Oct 29, 2008 12:18:39 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: 
/var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.io.FileNotFoundException: 
/var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:552)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:582)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488)
at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:77)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:355)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:304)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:226)
at 
org.apache.lucene.index.MultiSegmentReader.init(MultiSegmentReader.java:56)
at 
org.apache.lucene.index.ReadOnlyMultiSegmentReader.init(ReadOnlyMultiSegmentReader.java:27)
at 

Re: corrupt solr index on ec2

2008-10-30 Thread Yonik Seeley
On Thu, Oct 30, 2008 at 2:06 AM, Bill Graham [EMAIL PROTECTED] wrote:
 I've been running solr 1.3 on an ec2 instance for a couple of weeks and I've 
 had some stability issues. It seems like I need to bounce the app once a day. 
 That I could live with and ultimately maybe troubleshoot, but what's more 
 disturbing is that three times in the last 2 weeks my index has been 
 corrupted when FileNotFoundExceptions started to appear.

 I'm running in jetty and had my index on the local file system until I lost 
 the index the first time. Then I moved it to my mounted ebs volume so I could 
 restore from a snapshot if needed. I'm wondering if perhaps there are issues 
 with the locking mechanize on either the local directory (which is really a 
 virual instance), or the mounted xfs volume. Has anyone seem this, or have 
 suggestions re the cause? I'm using the single lockType.

 I'm running a single solr instance that gets frequent updates from multiple 
 threads, and commits about every hour.

 A few things I see in the logs:

 - From time to time I see write lock timeouts:
 SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
 out: SingleInstanceLock: write.lock

This is really strange.  It suggests that there is another in-process
writer that is holding the lock.  That should be impossible, unless
it's caused by a previous exception trying to open an IndexWriter and
the lock is simply stale.  What seems to be the first exception that
occurs?

Also, you might try changing the lock type from single to simple to
make it visible cross-process.  That would rule out trying to start
another solr instance on the same index directory opening two
writers on the same directory is one way to get missing files like you
appear to have.

 - I've seen OOM exceptions during warming. I've changed 
 maxWarmingSearchers=1, which I suspect will do he trick

OOM errors are really tricky - if they happen in the wrong place, it's
hard to recover gracefully from. Correctly cleaning up after an OOM
error in the IndexWriter recently had some little fixes in lucene
trunk - you might want to try the latest dev version of Lucene and see
if it helps.


-Yonik


 - The finally, this is what I fond in the logs today when the index got 
 corrupt:

 Oct 29, 2008 12:18:39 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
 Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: 
 /var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368)
at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
at 
 org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
at 
 org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 Caused by: 

Re: corrupt solr index on ec2

2008-10-30 Thread Michael McCandless


One small correction below:

Yonik Seeley wrote:

- I've seen OOM exceptions during warming. I've changed  
maxWarmingSearchers=1, which I suspect will do he trick


OOM errors are really tricky - if they happen in the wrong place, it's
hard to recover gracefully from. Correctly cleaning up after an OOM
error in the IndexWriter recently had some little fixes in lucene
trunk - you might want to try the latest dev version of Lucene and see
if it helps.


This change (to not commit index changes after IndexWriter hits OOME)  
went in Feb 2008.  Solr 1.3 should already have it.


(I'm working now on adding javadocs to IW explaining this).

Mike


Re: corrupt solr index on ec2

2008-10-30 Thread Bill Graham
 of moving solr to another host with more memory, since 
the box I'm on is pretty tight on memory.

thanks!
Bill


- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Thursday, October 30, 2008 8:52:32 AM
Subject: Re: corrupt solr index on ec2

On Thu, Oct 30, 2008 at 2:06 AM, Bill Graham [EMAIL PROTECTED] wrote:
 I've been running solr 1.3 on an ec2 instance for a couple of weeks and I've 
 had some stability issues. It seems like I need to bounce the app once a day. 
 That I could live with and ultimately maybe troubleshoot, but what's more 
 disturbing is that three times in the last 2 weeks my index has been 
 corrupted when FileNotFoundExceptions started to appear.

 I'm running in jetty and had my index on the local file system until I lost 
 the index the first time. Then I moved it to my mounted ebs volume so I could 
 restore from a snapshot if needed. I'm wondering if perhaps there are issues 
 with the locking mechanize on either the local directory (which is really a 
 virual instance), or the mounted xfs volume. Has anyone seem this, or have 
 suggestions re the cause? I'm using the single lockType.

 I'm running a single solr instance that gets frequent updates from multiple 
 threads, and commits about every hour.

 A few things I see in the logs:

 - From time to time I see write lock timeouts:
 SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
 out: SingleInstanceLock: write.lock

This is really strange.  It suggests that there is another in-process
writer that is holding the lock.  That should be impossible, unless
it's caused by a previous exception trying to open an IndexWriter and
the lock is simply stale.  What seems to be the first exception that
occurs?

Also, you might try changing the lock type from single to simple to
make it visible cross-process.  That would rule out trying to start
another solr instance on the same index directory opening two
writers on the same directory is one way to get missing files like you
appear to have.

 - I've seen OOM exceptions during warming. I've changed 
 maxWarmingSearchers=1, which I suspect will do he trick

OOM errors are really tricky - if they happen in the wrong place, it's
hard to recover gracefully from. Correctly cleaning up after an OOM
error in the IndexWriter recently had some little fixes in lucene
trunk - you might want to try the latest dev version of Lucene and see
if it helps.


-Yonik


 - The finally, this is what I fond in the logs today when the index got 
 corrupt:

 Oct 29, 2008 12:18:39 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
 Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: 
 /var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368)
at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
at 
 org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
at 
 org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202