Re: SolrException: Unavailable Service
Erick, I was under the misconception that a solr transaction is ACID. From what you said, I guess solr transactions are not Isolated. Thanks, Phong On Tue, Apr 12, 2011 at 2:54 PM, Erick Erickson erickerick...@gmail.comwrote: See below: On Tue, Apr 12, 2011 at 2:21 PM, Phong Dais phong.gd...@gmail.com wrote: Erick, My setup is not quite the way you described. I have multiple threads indexing simultaneously, but I only have 1 thread doing the commit after all indexing threads finished. I have multiple instances of this running each in their own java vm. I'm ok with throwing out all the docs indexed so far if the commit fail. But this is really the same thing. On the back end, Solr is piping them all into a common index and that is where the autocommit happens. The fact that it's happening in separate JVMs doesn't alter the concept, you should let autocommit handle things. The problem here is knowing what hasn't indexed. I did not know that the recommended procedure is to use auto commit. I will explore this avenue. I was not aware of the master slave setup neither. The first thing that comes to mind is how do I know which docs did not get committed if the auto commit ever fails? What is the recommended procedure for handling failure? Any failed docs will need to be index at some point in the future. Assuming that you have a uniqueKey defined, you can look at the logs to see failures. Then you can choose to re-index all the documents that have changed around that time (backing up as far as you need to to be safe) . The key here is that you can re-index and the old copy (if any) will be replaced by the re-indexed copy. There's nothing really built into Solr that does this for you, you really have to build this part yourself. Best Erick Thanks for the valuable inputs. Phong On Tue, Apr 12, 2011 at 9:09 AM, Erick Erickson erickerick...@gmail.com wrote: Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client will trigger (potentially) a replication. Best Erick On Tue, Apr 12, 2011 at 9:07 AM, Erick Erickson erickerick...@gmail.com wrote: If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html ) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
SolrException: Unavailable Service
Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
Re: SolrException: Unavailable Service
If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
Re: SolrException: Unavailable Service
Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client will trigger (potentially) a replication. Best Erick On Tue, Apr 12, 2011 at 9:07 AM, Erick Erickson erickerick...@gmail.comwrote: If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
Re: SolrException: Unavailable Service
Erick, My setup is not quite the way you described. I have multiple threads indexing simultaneously, but I only have 1 thread doing the commit after all indexing threads finished. I have multiple instances of this running each in their own java vm. I'm ok with throwing out all the docs indexed so far if the commit fail. I did not know that the recommended procedure is to use auto commit. I will explore this avenue. I was not aware of the master slave setup neither. The first thing that comes to mind is how do I know which docs did not get committed if the auto commit ever fails? What is the recommended procedure for handling failure? Any failed docs will need to be index at some point in the future. Thanks for the valuable inputs. Phong On Tue, Apr 12, 2011 at 9:09 AM, Erick Erickson erickerick...@gmail.comwrote: Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client will trigger (potentially) a replication. Best Erick On Tue, Apr 12, 2011 at 9:07 AM, Erick Erickson erickerick...@gmail.com wrote: If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
Re: SolrException: Unavailable Service
See below: On Tue, Apr 12, 2011 at 2:21 PM, Phong Dais phong.gd...@gmail.com wrote: Erick, My setup is not quite the way you described. I have multiple threads indexing simultaneously, but I only have 1 thread doing the commit after all indexing threads finished. I have multiple instances of this running each in their own java vm. I'm ok with throwing out all the docs indexed so far if the commit fail. But this is really the same thing. On the back end, Solr is piping them all into a common index and that is where the autocommit happens. The fact that it's happening in separate JVMs doesn't alter the concept, you should let autocommit handle things. The problem here is knowing what hasn't indexed. I did not know that the recommended procedure is to use auto commit. I will explore this avenue. I was not aware of the master slave setup neither. The first thing that comes to mind is how do I know which docs did not get committed if the auto commit ever fails? What is the recommended procedure for handling failure? Any failed docs will need to be index at some point in the future. Assuming that you have a uniqueKey defined, you can look at the logs to see failures. Then you can choose to re-index all the documents that have changed around that time (backing up as far as you need to to be safe) . The key here is that you can re-index and the old copy (if any) will be replaced by the re-indexed copy. There's nothing really built into Solr that does this for you, you really have to build this part yourself. Best Erick Thanks for the valuable inputs. Phong On Tue, Apr 12, 2011 at 9:09 AM, Erick Erickson erickerick...@gmail.com wrote: Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client will trigger (potentially) a replication. Best Erick On Tue, Apr 12, 2011 at 9:07 AM, Erick Erickson erickerick...@gmail.com wrote: If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html ) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong