Re: Stop solr without losing documents
On Fri, Nov 13, 2009 at 11:45 PM, Lance Norskog wrote: > I would go with polling Solr to find what is not yet there. In > production, it is better to assume that things will break, and have > backstop janitors that fix them. And then test those janitors > regularly. Good idea, Lance. I certainly agree with the idea of backstop janitors. We don't have a good way of polling Solr for what's in there or not -- we have a kind of asynchronous, multithreaded updating system sending docs to Solr -- but we always can find out *externally* which docs have been committed or not. Michael
Re: Stop solr without losing documents
On Fri, Nov 13, 2009 at 11:02 PM, Otis Gospodnetic wrote: > So I think the question is really: > "If I stop the servlet container, does Solr issue a commit in the shutdown > hook in order to ensure all buffered docs are persisted to disk before the > JVM exits". Exactly right, Otis. > I don't have the Solr source handy, but if I did, I'd look for "Shutdown", > "Hook" and "finalize" in the code. Thanks for the direction. There was some talk of close()ing a SolrCore that I found, but I don't believe this meant a commit. I somehow hadn't thought of actually *trying* to add a doc and then shut down a Solr instance; shame on me. Unfortunately, when I test this via * make a new solr * add a doc * commit * verify it shows up in a search -- it does * add a 2nd doc * shutdown solr doesn't stop. It stops accepting connections, but java refuses to actually die. Not sure what we're doing wrong on our end, but I see this frequently and end up having to do a kill (usually not -9!). I guess we'll stick with externally tracking which docs have committed, so that when we inevitably have to kill Solr it doesn't cause a problem. Michael
Re: Stop solr without losing documents
On Fri, Nov 13, 2009 at 4:09 PM, Chris Hostetter wrote: > please don't kill -9 ... it's grossly overkill, and doesn't give your [ ... snip ... ] > Alternately, you could take advantage of the "enabled" feature from your > client (just have it test the enabled url ever N updates or so) and when > it sees that you have disabled the port it can send one last commit and > then stop sending updates until it sees the enabled URL work againg -- as > soon as you see the updates stop, you can safely shutdown hte port. Thanks, Hoss. I'll use Catalina stop instead of kill -9. It's good to know about the enabled feature -- my team was just discussing whether something like that existed that we could use -- but as we'd also like to recover cleanly from power failures and other Solr terminations, I think we'll track which docs are uncommitted outside of Solr. Michael
Re: Stop solr without losing documents
I would go with polling Solr to find what is not yet there. In production, it is better to assume that things will break, and have backstop janitors that fix them. And then test those janitors regularly. On Fri, Nov 13, 2009 at 8:02 PM, Otis Gospodnetic wrote: > So I think the question is really: > "If I stop the servlet container, does Solr issue a commit in the shutdown > hook in order to ensure all buffered docs are persisted to disk before the > JVM exits". > > I don't have the Solr source handy, but if I did, I'd look for "Shutdown", > "Hook" and "finalize" in the code. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Chris Hostetter >> To: solr-user@lucene.apache.org >> Sent: Fri, November 13, 2009 4:09:00 PM >> Subject: Re: Stop solr without losing documents >> >> >> : which documents have been updated before a successful commit. Now >> : stopping solr is as easy as kill -9. >> >> please don't kill -9 ... it's grossly overkill, and doesn't give your >> servlet container a fair chance to cleanthings up. A lot of work has been >> done to make Lucene indexes robust to hard terminations of the JVM (or >> physical machine) but there's no reason to go out of your way to try and >> stab it in the heart when you could just shut it down cleanly. >> >> that's not to say your appraoch isn't a good one -- if you only have one >> client sending updates/commits then having it keep track of what was >> indexed prior to the lasts successful commit is a viable way to dela with >> what happens if solr stops responding (either because you shut it down, or >> because it crashed for some other reason). >> >> Alternately, you could take advantage of the "enabled" feature from your >> client (just have it test the enabled url ever N updates or so) and when >> it sees that you have disabled the port it can send one last commit and >> then stop sending updates until it sees the enabled URL work againg -- as >> soon as you see the updates stop, you can safely shutdown hte port. >> >> >> -Hoss > > -- Lance Norskog goks...@gmail.com
Re: Stop solr without losing documents
So I think the question is really: "If I stop the servlet container, does Solr issue a commit in the shutdown hook in order to ensure all buffered docs are persisted to disk before the JVM exits". I don't have the Solr source handy, but if I did, I'd look for "Shutdown", "Hook" and "finalize" in the code. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Chris Hostetter > To: solr-user@lucene.apache.org > Sent: Fri, November 13, 2009 4:09:00 PM > Subject: Re: Stop solr without losing documents > > > : which documents have been updated before a successful commit. Now > : stopping solr is as easy as kill -9. > > please don't kill -9 ... it's grossly overkill, and doesn't give your > servlet container a fair chance to cleanthings up. A lot of work has been > done to make Lucene indexes robust to hard terminations of the JVM (or > physical machine) but there's no reason to go out of your way to try and > stab it in the heart when you could just shut it down cleanly. > > that's not to say your appraoch isn't a good one -- if you only have one > client sending updates/commits then having it keep track of what was > indexed prior to the lasts successful commit is a viable way to dela with > what happens if solr stops responding (either because you shut it down, or > because it crashed for some other reason). > > Alternately, you could take advantage of the "enabled" feature from your > client (just have it test the enabled url ever N updates or so) and when > it sees that you have disabled the port it can send one last commit and > then stop sending updates until it sees the enabled URL work againg -- as > soon as you see the updates stop, you can safely shutdown hte port. > > > -Hoss
Re: Stop solr without losing documents
: which documents have been updated before a successful commit. Now : stopping solr is as easy as kill -9. please don't kill -9 ... it's grossly overkill, and doesn't give your servlet container a fair chance to cleanthings up. A lot of work has been done to make Lucene indexes robust to hard terminations of the JVM (or physical machine) but there's no reason to go out of your way to try and stab it in the heart when you could just shut it down cleanly. that's not to say your appraoch isn't a good one -- if you only have one client sending updates/commits then having it keep track of what was indexed prior to the lasts successful commit is a viable way to dela with what happens if solr stops responding (either because you shut it down, or because it crashed for some other reason). Alternately, you could take advantage of the "enabled" feature from your client (just have it test the enabled url ever N updates or so) and when it sees that you have disabled the port it can send one last commit and then stop sending updates until it sees the enabled URL work againg -- as soon as you see the updates stop, you can safely shutdown hte port. -Hoss
Re: Stop solr without losing documents
On Fri, Nov 13, 2009 at 4:32 AM, gwk wrote: > I don't know if this is the best solution, or even if it's applicable to > your situation but we do incremental updates from a database based on a > timestamp, (from a simple seperate sql table filled by triggers so deletes Thanks, gwk! This doesn't exactly meet our needs, but helped us get to a solution. In short, we are manually committing in our outside updater process (instead of letting Solr autocommit), and marking which documents have been updated before a successful commit. Now stopping solr is as easy as kill -9. Michael
Re: Stop solr without losing documents
Michael wrote: I've got a process external to Solr that is constantly feeding it new documents, retrying if Solr is nonresponding. What's the right way to stop Solr (running in Tomcat) so no documents are lost? Currently I'm committing all cores and then running catalina's stop script, but between my commit and the stop, more documents can come in that would need *another* commit... Lots of people must have had this problem already, so I know the answer is simple; I just can't find it! Thanks. Michael I don't know if this is the best solution, or even if it's applicable to your situation but we do incremental updates from a database based on a timestamp, (from a simple seperate sql table filled by triggers so deletes are measures correctly as well). We store this timestamp in solr as well. Our index script first does a simple Solr request to request the newest timestamp and basically selects the documents to update with a "SELECT * FROM document_updates WHERE timestamp >= X" where X is the timestamp returned from Solr (We use >= for the hopefully extremely rare case when two updates are at the same time and also at the same time the index script is run where it only retrieved one of the updates, this will cause some documents to be updates multiple times but as document updates are idempotent this is no real problem.) Regards, gwk