Re: SolrCloud replication question
Hi, Interesting article in your link. What servlet container do you use and how is it configured wrt. threads etc? You should be able to utilize all CPUs with a single Solr index, given that you are not I/O bound.. Also, what is your mergeFactor? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. juli 2012, at 22:11, avenka wrote: > Hmm, never mind my question about replicating using symlinks. Given that > replication on a single machine improves throughput, I should be able to get > a similar improvement by simply sharding on a single machine. As also > observed at > > http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ > > I am now benchmarking my workload to compare replication vs. sharding > performance on a single machine. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replication question
The symlink thing sounds... complicated, but as you say you're going another route The indexing speed you're seeing is surprisingly slow, I'd get to the root of the timeouts before giving up. SolrCloud simply _can't_ be that slow by design, something about your setup is causing that I suspect. The timeouts you're seeing are certainly a clue here. Incoming updates have a couple of things happen 1> the incoming request is pulled apart. Any docs for this shard are indexed and forwarded to any replicas. 2> any docs that are for a different shard are packed up and forwarded to the leader for that shard. Which in turn distributes them to any replicas. So I _suspect_ that indexing will be a bit slower, there's some additional communication going on. But not _that_ much slower. Any clue what your slow server is doing that would cause it to timeout? Best Erick On Mon, Jul 9, 2012 at 4:11 PM, avenka wrote: > Hmm, never mind my question about replicating using symlinks. Given that > replication on a single machine improves throughput, I should be able to get > a similar improvement by simply sharding on a single machine. As also > observed at > > http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ > > I am now benchmarking my workload to compare replication vs. sharding > performance on a single machine. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replication question
Hmm, never mind my question about replicating using symlinks. Given that replication on a single machine improves throughput, I should be able to get a similar improvement by simply sharding on a single machine. As also observed at http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ I am now benchmarking my workload to compare replication vs. sharding performance on a single machine. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replication question
Erick, thanks. I now do see segment files in an index. directory at the replicas. Not sure why they were not getting populated earlier. I have a couple more questions, the second is more elaborate - let me know if I should move it to a separate thread. (1) The speed of adding documents in SolrCloud is excruciatingly slow. It takes about 30-50 seconds to add a batch of 100 documents (and about twice that to add 200, etc.) to the primary but just ~10 seconds to add 5K documents in batches of 200 on a standalone solr 4 server. The log files indicate that the primary is timing out with messages like below and Cloud->Graph in the UI shows the other two replicas in orange after starting green. org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://localhost:7574/solr Any idea why? (3) I am seriously considering using symbolic links for a replicated solr setup with completely independent instances on a *single machine*. Tell me if I am thinking about this incorrectly. Here is my reasoning: (a) Master/slave replication in 3.6 simply seems old school as it doesn't have the nice consistency properties of SolrCloud. Polling say every 20 seconds means I don't know exactly how up-to-speed each replica is, which will complicate my request re-distribution. (b) SolrCloud seems like a great alternative to master/slave replication. But it seems slow (see 1) and having played with it, I don't feel comfortable with the maturity of ZK integration (or my comprehension of it) in solr 4 alpha. (c) Symbolic links seem like the fastest and most space-efficient solution *provided* there is only a single writer, which is just fine for me. I plan to run completely separate solr instances with one designated as the primary and do the following operations in sequence: Add a batch to the primary and commit --> From each replica's index directory, remove all symlinks and re-create symlinks to segment files in the primary (but not the write.lock file) --> Call update?commit=true to force replicas to re-load their in-memory index --> Do whatever read-only processing is required on the batch using the primary and all replicas by manually (randomly) distributing read requests --> Repeat sequence. Is there any downside to 3(c) (other than maintaining a trivial script to manage symlinks and call commit)? I tested it on small index sizes and it seems to work fine. The throughput improves with more replicas (for 2-4 replicas) as a single replica is not enough to saturate the machine (due to high query latency). Am I overlooking something in this setup? Overall, I need high throughput and minimal latency from the time a document is added to the time it is available at a replica. SolrCloud's automated request redirection, consistency, and fault-tolerance is awesome for a physically distributed setup, but I don't see how it beats 3(c) in a single-writer, single-machine, replicated setup. AV On Jul 9, 2012, at 9:43 AM, Erick Erickson [via Lucene] wrote: > No, you're misunderstanding the setup. Each replica has a complete > index. Updates get automatically forwarded to _both_ nodes for a > particular shard. So, when a doc comes in to be indexed, it gets > sent to the leader for, say, shard1. From there: > 1> it gets indexed on the leader > 2> it gets forwarded to the replica(s) where it gets indexed locally. > > Each replica has a complete index (for that shard). > > There is no master/slave setup any more. And you do > _not_ have to configure replication. > > Best > Erick > > On Sun, Jul 8, 2012 at 1:03 PM, avenka <[hidden email]> wrote: > > > I am trying to wrap my head around replication in SolrCloud. I tried the > > setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication > > for high query throughput. The setup at the URL above appears to maintain > > just one copy of the index at the primary node (instead of a replicated > > index as in a master/slave configuration). Will I still get roughly an > > n-fold increase in query throughput with n replicas? And if so, why would > > one do master/slave replication with multiple copies of the index at all? > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993889.html > To unsubscribe from SolrCloud replication question, click here. > NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993960.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replication question
No, you're misunderstanding the setup. Each replica has a complete index. Updates get automatically forwarded to _both_ nodes for a particular shard. So, when a doc comes in to be indexed, it gets sent to the leader for, say, shard1. From there: 1> it gets indexed on the leader 2> it gets forwarded to the replica(s) where it gets indexed locally. Each replica has a complete index (for that shard). There is no master/slave setup any more. And you do _not_ have to configure replication. Best Erick On Sun, Jul 8, 2012 at 1:03 PM, avenka wrote: > I am trying to wrap my head around replication in SolrCloud. I tried the > setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication > for high query throughput. The setup at the URL above appears to maintain > just one copy of the index at the primary node (instead of a replicated > index as in a master/slave configuration). Will I still get roughly an > n-fold increase in query throughput with n replicas? And if so, why would > one do master/slave replication with multiple copies of the index at all? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Replication Question
Ok, great. Just wanted to make sure someone was aware. Thanks for looking into this. On Thu, Feb 16, 2012 at 8:26 AM, Mark Miller wrote: > > On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote: > >> Not sure if this is >> expected or not. > > Nope - should be already resolved or will be today though. > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: SolrCloud Replication Question
On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote: > Not sure if this is > expected or not. Nope - should be already resolved or will be today though. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
All of the nodes now show as being Active. When starting the replicas I did receive the following message though. Not sure if this is expected or not. INFO: Attempting to replicate from http://JamiesMac.local:8501/solr/slice2_shard2/ Feb 14, 2012 10:53:34 PM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: null java.lang.NullPointerException at org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) null java.lang.NullPointerExceptionat org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) request: http://JamiesMac.local:8501/solr/admin/cores?action=PREPRECOVERY&core=slice2_shard2&nodeName=JamiesMac.local:8502_solr&coreNodeName=JamiesMac.local:8502_solr_slice2_shard1&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:208) Feb 14, 2012 10:53:34 PM org.apache.solr.update.UpdateLog dropBufferedUpdates
Re: SolrCloud Replication Question
Doing so now, will let you know if I continue to see the same issues On Tue, Feb 14, 2012 at 4:59 PM, Mark Miller wrote: > Doh - looks like I was just seeing a test issue. Do you mind updating and > trying the latest rev? At the least there should be some better logging > around the recovery. > > I'll keep working on tests in the meantime. > > - Mark > > On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote: > >> Sounds good, if I pull the latest from trunk and rerun will that be >> useful or were you able to duplicate my issue now? >> >> On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller wrote: >>> Okay Jamie, I think I have a handle on this. It looks like an issue with >>> what config files are being used by cores created with the admin core >>> handler - I think it's just picking up default config and not the correct >>> config for the collection. This means they end up using config that has no >>> UpdateLog defined - and so recovery fails. >>> >>> I've added more logging around this so that it's easy to determine that. >>> >>> I'm investigating more and working on a test + fix. I'll file a JIRA issue >>> soon as well. >>> >>> - Mark >>> >>> On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: >>> Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller wrote: > Sorry, have not gotten it yet, but will be back trying later today - > monday, tuesday tend to be slow for me (meetings and crap). > > - Mark > > On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: > >> Has there been any success in replicating this? I'm wondering if it >> could be something with my setup that is causing the issue... >> >> >> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson wrote: >>> Yes, I have the following layout on the FS >>> >>> ./bootstrap.sh >>> ./example (standard example directory from distro containing jetty >>> jars, solr confs, solr war, etc) >>> ./slice1 >>> - start.sh >>> -solr.xml >>> - slice1_shard1 >>> - data >>> - slice2_shard2 >>> -data >>> ./slice2 >>> - start.sh >>> - solr.xml >>> -slice2_shard1 >>> -data >>> -slice1_shard2 >>> -data >>> >>> if it matters I'm running everything from localhost, zk and the solr >>> shards >>> >>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 "Jamie Johnson" kirjoitti: > > - Mark Miller > lucidimagination.com > > > > > > > > > > > >>> >>> - Mark Miller >>> lucidimagination.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: SolrCloud Replication Question
Doh - looks like I was just seeing a test issue. Do you mind updating and trying the latest rev? At the least there should be some better logging around the recovery. I'll keep working on tests in the meantime. - Mark On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote: > Sounds good, if I pull the latest from trunk and rerun will that be > useful or were you able to duplicate my issue now? > > On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller wrote: >> Okay Jamie, I think I have a handle on this. It looks like an issue with >> what config files are being used by cores created with the admin core >> handler - I think it's just picking up default config and not the correct >> config for the collection. This means they end up using config that has no >> UpdateLog defined - and so recovery fails. >> >> I've added more logging around this so that it's easy to determine that. >> >> I'm investigating more and working on a test + fix. I'll file a JIRA issue >> soon as well. >> >> - Mark >> >> On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: >> >>> Thanks Mark, not a huge rush, just me trying to get to use the latest >>> stuff on our project. >>> >>> On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller wrote: Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: > Has there been any success in replicating this? I'm wondering if it > could be something with my setup that is causing the issue... > > > On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson wrote: >> Yes, I have the following layout on the FS >> >> ./bootstrap.sh >> ./example (standard example directory from distro containing jetty >> jars, solr confs, solr war, etc) >> ./slice1 >> - start.sh >> -solr.xml >> - slice1_shard1 >> - data >> - slice2_shard2 >> -data >> ./slice2 >> - start.sh >> - solr.xml >> -slice2_shard1 >>-data >> -slice1_shard2 >>-data >> >> if it matters I'm running everything from localhost, zk and the solr >> shards >> >> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren wrote: >>> Do you have unique dataDir for each instance? >>> 13.2.2012 14.30 "Jamie Johnson" kirjoitti: - Mark Miller lucidimagination.com >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Sounds good, if I pull the latest from trunk and rerun will that be useful or were you able to duplicate my issue now? On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller wrote: > Okay Jamie, I think I have a handle on this. It looks like an issue with what > config files are being used by cores created with the admin core handler - I > think it's just picking up default config and not the correct config for the > collection. This means they end up using config that has no UpdateLog defined > - and so recovery fails. > > I've added more logging around this so that it's easy to determine that. > > I'm investigating more and working on a test + fix. I'll file a JIRA issue > soon as well. > > - Mark > > On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: > >> Thanks Mark, not a huge rush, just me trying to get to use the latest >> stuff on our project. >> >> On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller wrote: >>> Sorry, have not gotten it yet, but will be back trying later today - >>> monday, tuesday tend to be slow for me (meetings and crap). >>> >>> - Mark >>> >>> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: >>> Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson wrote: > Yes, I have the following layout on the FS > > ./bootstrap.sh > ./example (standard example directory from distro containing jetty > jars, solr confs, solr war, etc) > ./slice1 > - start.sh > -solr.xml > - slice1_shard1 > - data > - slice2_shard2 > -data > ./slice2 > - start.sh > - solr.xml > -slice2_shard1 > -data > -slice1_shard2 > -data > > if it matters I'm running everything from localhost, zk and the solr > shards > > On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren wrote: >> Do you have unique dataDir for each instance? >> 13.2.2012 14.30 "Jamie Johnson" kirjoitti: >>> >>> - Mark Miller >>> lucidimagination.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: SolrCloud Replication Question
Okay Jamie, I think I have a handle on this. It looks like an issue with what config files are being used by cores created with the admin core handler - I think it's just picking up default config and not the correct config for the collection. This means they end up using config that has no UpdateLog defined - and so recovery fails. I've added more logging around this so that it's easy to determine that. I'm investigating more and working on a test + fix. I'll file a JIRA issue soon as well. - Mark On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote: > Thanks Mark, not a huge rush, just me trying to get to use the latest > stuff on our project. > > On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller wrote: >> Sorry, have not gotten it yet, but will be back trying later today - monday, >> tuesday tend to be slow for me (meetings and crap). >> >> - Mark >> >> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: >> >>> Has there been any success in replicating this? I'm wondering if it >>> could be something with my setup that is causing the issue... >>> >>> >>> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson wrote: Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren wrote: > Do you have unique dataDir for each instance? > 13.2.2012 14.30 "Jamie Johnson" kirjoitti: >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Thanks Mark, not a huge rush, just me trying to get to use the latest stuff on our project. On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller wrote: > Sorry, have not gotten it yet, but will be back trying later today - monday, > tuesday tend to be slow for me (meetings and crap). > > - Mark > > On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: > >> Has there been any success in replicating this? I'm wondering if it >> could be something with my setup that is causing the issue... >> >> >> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson wrote: >>> Yes, I have the following layout on the FS >>> >>> ./bootstrap.sh >>> ./example (standard example directory from distro containing jetty >>> jars, solr confs, solr war, etc) >>> ./slice1 >>> - start.sh >>> -solr.xml >>> - slice1_shard1 >>> - data >>> - slice2_shard2 >>> -data >>> ./slice2 >>> - start.sh >>> - solr.xml >>> -slice2_shard1 >>> -data >>> -slice1_shard2 >>> -data >>> >>> if it matters I'm running everything from localhost, zk and the solr shards >>> >>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren wrote: Do you have unique dataDir for each instance? 13.2.2012 14.30 "Jamie Johnson" kirjoitti: > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: SolrCloud Replication Question
Sorry, have not gotten it yet, but will be back trying later today - monday, tuesday tend to be slow for me (meetings and crap). - Mark On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote: > Has there been any success in replicating this? I'm wondering if it > could be something with my setup that is causing the issue... > > > On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson wrote: >> Yes, I have the following layout on the FS >> >> ./bootstrap.sh >> ./example (standard example directory from distro containing jetty >> jars, solr confs, solr war, etc) >> ./slice1 >> - start.sh >> -solr.xml >> - slice1_shard1 >> - data >> - slice2_shard2 >> -data >> ./slice2 >> - start.sh >> - solr.xml >> -slice2_shard1 >>-data >> -slice1_shard2 >>-data >> >> if it matters I'm running everything from localhost, zk and the solr shards >> >> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren wrote: >>> Do you have unique dataDir for each instance? >>> 13.2.2012 14.30 "Jamie Johnson" kirjoitti: - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Has there been any success in replicating this? I'm wondering if it could be something with my setup that is causing the issue... On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson wrote: > Yes, I have the following layout on the FS > > ./bootstrap.sh > ./example (standard example directory from distro containing jetty > jars, solr confs, solr war, etc) > ./slice1 > - start.sh > -solr.xml > - slice1_shard1 > - data > - slice2_shard2 > -data > ./slice2 > - start.sh > - solr.xml > -slice2_shard1 > -data > -slice1_shard2 > -data > > if it matters I'm running everything from localhost, zk and the solr shards > > On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren wrote: >> Do you have unique dataDir for each instance? >> 13.2.2012 14.30 "Jamie Johnson" kirjoitti:
Re: SolrCloud Replication Question
Yes, I have the following layout on the FS ./bootstrap.sh ./example (standard example directory from distro containing jetty jars, solr confs, solr war, etc) ./slice1 - start.sh -solr.xml - slice1_shard1 - data - slice2_shard2 -data ./slice2 - start.sh - solr.xml -slice2_shard1 -data -slice1_shard2 -data if it matters I'm running everything from localhost, zk and the solr shards On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren wrote: > Do you have unique dataDir for each instance? > 13.2.2012 14.30 "Jamie Johnson" kirjoitti:
Re: SolrCloud Replication Question
Do you have unique dataDir for each instance? 13.2.2012 14.30 "Jamie Johnson" kirjoitti:
Re: SolrCloud Replication Question
I don't see any errors in the log. here are the following scripts I'm running, and to create the cores I run the following commands curl 'http://localhost:8501/solr/admin/cores?action=CREATE&name=slice1_shard1&collection=collection1&shard=slice1&collection.configName=config1' curl 'http://localhost:8501/solr/admin/cores?action=CREATE&name=slice2_shard2&collection=collection1&shard=slice2&collection.configName=config1' curl 'http://localhost:8502/solr/admin/cores?action=CREATE&name=slice2_shard1&collection=collection1&shard=slice2&collection.configName=config1' curl 'http://localhost:8502/solr/admin/cores?action=CREATE&name=slice1_shard2&collection=collection1&shard=slice1&collection.configName=config1' after doing this the nodes are immediately marked as down in clusterstate.json. Restating the solr instances I see that which ever I start first shows up as active, and the other is down. There are no errors in the logs either. On Sat, Feb 11, 2012 at 9:48 PM, Mark Miller wrote: > Yeah, that is what I would expect - for a node to be marked as down, it > either didn't finish starting, or it gave up recovering...either case should > be logged. You might try searching for the recover keyword and see if there > are any interesting bits around that. > > Meanwhile, I have dug up a couple issues around recovery and committed fixes > to trunk - still playing around... > > On Feb 11, 2012, at 8:44 PM, Jamie Johnson wrote: > >> I didn't see anything in the logs, would it be an error? >> >> On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller wrote: >>> >>> On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote: >>> I wiped the zk and started over (when I switch networks I get different host names and honestly haven't dug into why). That being said the latest state shows all in sync, why would the cores show up as down? >>> >>> >>> If recovery fails X times (say because the leader can't be reached from the >>> replica), a node is marked as down. It can't be active, and technically it >>> has stopped trying to recover (it tries X times and eventually give up >>> until you restart it). >>> >>> Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not >>> sure if you have looked at your logs or not, but perhaps it's involved. >>> >>> - Mark Miller >>> lucidimagination.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > > - Mark Miller > lucidimagination.com > > > > > > > > > > > bootstrap.sh Description: Bourne shell script start.sh Description: Bourne shell script start.sh Description: Bourne shell script
Re: SolrCloud Replication Question
Yeah, that is what I would expect - for a node to be marked as down, it either didn't finish starting, or it gave up recovering...either case should be logged. You might try searching for the recover keyword and see if there are any interesting bits around that. Meanwhile, I have dug up a couple issues around recovery and committed fixes to trunk - still playing around... On Feb 11, 2012, at 8:44 PM, Jamie Johnson wrote: > I didn't see anything in the logs, would it be an error? > > On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller wrote: >> >> On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote: >> >>> I wiped the zk and started over (when I switch networks I get >>> different host names and honestly haven't dug into why). That being >>> said the latest state shows all in sync, why would the cores show up >>> as down? >> >> >> If recovery fails X times (say because the leader can't be reached from the >> replica), a node is marked as down. It can't be active, and technically it >> has stopped trying to recover (it tries X times and eventually give up until >> you restart it). >> >> Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not >> sure if you have looked at your logs or not, but perhaps it's involved. >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
I didn't see anything in the logs, would it be an error? On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller wrote: > > On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote: > >> I wiped the zk and started over (when I switch networks I get >> different host names and honestly haven't dug into why). That being >> said the latest state shows all in sync, why would the cores show up >> as down? > > > If recovery fails X times (say because the leader can't be reached from the > replica), a node is marked as down. It can't be active, and technically it > has stopped trying to recover (it tries X times and eventually give up until > you restart it). > > Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not > sure if you have looked at your logs or not, but perhaps it's involved. > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: SolrCloud Replication Question
On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote: > I wiped the zk and started over (when I switch networks I get > different host names and honestly haven't dug into why). That being > said the latest state shows all in sync, why would the cores show up > as down? If recovery fails X times (say because the leader can't be reached from the replica), a node is marked as down. It can't be active, and technically it has stopped trying to recover (it tries X times and eventually give up until you restart it). Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not sure if you have looked at your logs or not, but perhaps it's involved. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
I wiped the zk and started over (when I switch networks I get different host names and honestly haven't dug into why). That being said the latest state shows all in sync, why would the cores show up as down? On Sat, Feb 11, 2012 at 11:08 AM, Mark Miller wrote: > > On Feb 10, 2012, at 9:40 PM, Jamie Johnson wrote: > >> >> >> how'd you resolve this issue? >> > > > I was basing my guess on seeing "JamiesMac.local" and "jamiesmac" in your > first cluster state dump - your latest doesn't seem to mismatch like that > though. > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: SolrCloud Replication Question
On Feb 10, 2012, at 9:40 PM, Jamie Johnson wrote: > > > how'd you resolve this issue? > I was basing my guess on seeing "JamiesMac.local" and "jamiesmac" in your first cluster state dump - your latest doesn't seem to mismatch like that though. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
hmmperhaps I'm seeing the issue you're speaking of. I have everything running right now and my state is as follows: {"collection1":{ "slice1":{ "JamiesMac.local:8501_solr_slice1_shard1":{ "shard_id":"slice1", "leader":"true", "state":"active", "core":"slice1_shard1", "collection":"collection1", "node_name":"JamiesMac.local:8501_solr", "base_url":"http://JamiesMac.local:8501/solr"}, "JamiesMac.local:8502_solr_slice1_shard2":{ "shard_id":"slice1", "state":"down", "core":"slice1_shard2", "collection":"collection1", "node_name":"JamiesMac.local:8502_solr", "base_url":"http://JamiesMac.local:8502/solr"}}, "slice2":{ "JamiesMac.local:8502_solr_slice2_shard1":{ "shard_id":"slice2", "leader":"true", "state":"active", "core":"slice2_shard1", "collection":"collection1", "node_name":"JamiesMac.local:8502_solr", "base_url":"http://JamiesMac.local:8502/solr"}, "JamiesMac.local:8501_solr_slice2_shard2":{ "shard_id":"slice2", "state":"down", "core":"slice2_shard2", "collection":"dataspace", "node_name":"JamiesMac.local:8501_solr", "base_url":"http://JamiesMac.local:8501/solr" how'd you resolve this issue? On Fri, Feb 10, 2012 at 8:49 PM, Mark Miller wrote: > > On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > >> jamiesmac > > Another note: > > Have no idea if this is involved, but when I do tests with my linux box and > mac I run into the following: > > My linux box auto finds the address of halfmetal and my macbook mbpro.local. > If I accept those defaults, my mac connect reach my linux box. It can only > reach the linux box through halfmetal.local, and so I have to override the > host on the linux box to advertise as halfmetal.local and then they can talk. > > In the bad case, if my leaders where on the linux box, they would be able to > forward to the mac no problem, but then if shards on the mac needed to > recover, they would fail to reach the linux box through the halfmetal address. > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: SolrCloud Replication Question
On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > jamiesmac Another note: Have no idea if this is involved, but when I do tests with my linux box and mac I run into the following: My linux box auto finds the address of halfmetal and my macbook mbpro.local. If I accept those defaults, my mac connect reach my linux box. It can only reach the linux box through halfmetal.local, and so I have to override the host on the linux box to advertise as halfmetal.local and then they can talk. In the bad case, if my leaders where on the linux box, they would be able to forward to the mac no problem, but then if shards on the mac needed to recover, they would fail to reach the linux box through the halfmetal address. - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Thanks. If the given ZK snapshot was the end state, then two nodes are marked as down. Generally that happens because replication failed - if you have not, I'd check the logs for those two nodes. - Mark On Fri, Feb 10, 2012 at 7:35 PM, Jamie Johnson wrote: > nothing seems that different. In regards to the states of each I'll > try to verify tonight. > > This was using a version I pulled from SVN trunk yesterday morning > > On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller > wrote: > > Also, it will help if you can mention the exact version of solrcloud you > are talking about in each issue - I know you have one from the old branch, > and I assume a version off trunk you are playing with - so a heads up on > which and if trunk, what rev or day will help in the case that I'm trying > to dupe issues that have been addressed. > > > > - Mark > > > > On Feb 10, 2012, at 6:09 PM, Mark Miller wrote: > > > >> I'm trying, but so far I don't see anything. I'll have to try and mimic > your setup closer it seems. > >> > >> I tried starting up 6 solr instances on different ports as 2 shards, > each with a replication factor of 3. > >> > >> Then I indexed 20k documents to the cluster and verified doc counts. > >> > >> Then I shutdown all the replicas so that only one instance served each > shard. > >> > >> Then I indexed 20k documents to the cluster. > >> > >> Then I started the downed nodes and verified that they where in a > recovery state. > >> > >> After enough time went by I checked and verified document counts on > each instance - they where as expected. > >> > >> I guess next I can try a similar experiment using multiple cores, but > if you notice anything that stands out that is largely different in what > you are doing, let me know. > >> > >> The cores that are behind, does it say they are down, recovering, or > active in zookeeper? > >> > >> On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote: > >> > >>> Sorry for pinging this again, is more information needed on this? I > >>> can provide more details but am not sure what to provide. > >>> > >>> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson > wrote: > Sorry, I shut down the full solr instance. > > On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller > wrote: > > Can you explain a little more how you doing this? How are you > bringing the cores down and then back up? Shutting down a full solr > instance, unloading the core? > > > > On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > > > >> I know that the latest Solr Cloud doesn't use standard replication > but > >> I have a question about how it appears to be working. I currently > >> have the following cluster state > >> > >> {"collection1":{ > >> "slice1":{ > >> "JamiesMac.local:8501_solr_slice1_shard1":{ > >> "shard_id":"slice1", > >> "state":"active", > >> "core":"slice1_shard1", > >> "collection":"collection1", > >> "node_name":"JamiesMac.local:8501_solr", > >> "base_url":"http://JamiesMac.local:8501/solr"}, > >> "JamiesMac.local:8502_solr_slice1_shard2":{ > >> "shard_id":"slice1", > >> "state":"active", > >> "core":"slice1_shard2", > >> "collection":"collection1", > >> "node_name":"JamiesMac.local:8502_solr", > >> "base_url":"http://JamiesMac.local:8502/solr"}, > >> "jamiesmac:8501_solr_slice1_shard1":{ > >> "shard_id":"slice1", > >> "state":"down", > >> "core":"slice1_shard1", > >> "collection":"collection1", > >> "node_name":"jamiesmac:8501_solr", > >> "base_url":"http://jamiesmac:8501/solr"}, > >> "jamiesmac:8502_solr_slice1_shard2":{ > >> "shard_id":"slice1", > >> "leader":"true", > >> "state":"active", > >> "core":"slice1_shard2", > >> "collection":"collection1", > >> "node_name":"jamiesmac:8502_solr", > >> "base_url":"http://jamiesmac:8502/solr"}}, > >> "slice2":{ > >> "JamiesMac.local:8501_solr_slice2_shard2":{ > >> "shard_id":"slice2", > >> "state":"active", > >> "core":"slice2_shard2", > >> "collection":"collection1", > >> "node_name":"JamiesMac.local:8501_solr", > >> "base_url":"http://JamiesMac.local:8501/solr"}, > >> "JamiesMac.local:8502_solr_slice2_shard1":{ > >> "shard_id":"slice2", > >> "state":"active", > >> "core":"slice2_shard1", > >> "collection":"collection1", > >> "node_name":"JamiesMac.local:8502_solr", > >> "base_url":"http://JamiesMac.local:8502/solr"}, > >> "jamiesmac:8501_solr_slice2_shard2":{ > >> "shard_id":"slice2", > >> "state":"down", > >> "core":"slice2_shard2", > >> "collection":"collection1", > >> "node_name":"jamiesmac:8501_solr", > >>
Re: SolrCloud Replication Question
nothing seems that different. In regards to the states of each I'll try to verify tonight. This was using a version I pulled from SVN trunk yesterday morning On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller wrote: > Also, it will help if you can mention the exact version of solrcloud you are > talking about in each issue - I know you have one from the old branch, and I > assume a version off trunk you are playing with - so a heads up on which and > if trunk, what rev or day will help in the case that I'm trying to dupe > issues that have been addressed. > > - Mark > > On Feb 10, 2012, at 6:09 PM, Mark Miller wrote: > >> I'm trying, but so far I don't see anything. I'll have to try and mimic your >> setup closer it seems. >> >> I tried starting up 6 solr instances on different ports as 2 shards, each >> with a replication factor of 3. >> >> Then I indexed 20k documents to the cluster and verified doc counts. >> >> Then I shutdown all the replicas so that only one instance served each shard. >> >> Then I indexed 20k documents to the cluster. >> >> Then I started the downed nodes and verified that they where in a recovery >> state. >> >> After enough time went by I checked and verified document counts on each >> instance - they where as expected. >> >> I guess next I can try a similar experiment using multiple cores, but if you >> notice anything that stands out that is largely different in what you are >> doing, let me know. >> >> The cores that are behind, does it say they are down, recovering, or active >> in zookeeper? >> >> On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote: >> >>> Sorry for pinging this again, is more information needed on this? I >>> can provide more details but am not sure what to provide. >>> >>> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson wrote: Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller wrote: > Can you explain a little more how you doing this? How are you bringing > the cores down and then back up? Shutting down a full solr instance, > unloading the core? > > On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > >> I know that the latest Solr Cloud doesn't use standard replication but >> I have a question about how it appears to be working. I currently >> have the following cluster state >> >> {"collection1":{ >> "slice1":{ >> "JamiesMac.local:8501_solr_slice1_shard1":{ >> "shard_id":"slice1", >> "state":"active", >> "core":"slice1_shard1", >> "collection":"collection1", >> "node_name":"JamiesMac.local:8501_solr", >> "base_url":"http://JamiesMac.local:8501/solr"}, >> "JamiesMac.local:8502_solr_slice1_shard2":{ >> "shard_id":"slice1", >> "state":"active", >> "core":"slice1_shard2", >> "collection":"collection1", >> "node_name":"JamiesMac.local:8502_solr", >> "base_url":"http://JamiesMac.local:8502/solr"}, >> "jamiesmac:8501_solr_slice1_shard1":{ >> "shard_id":"slice1", >> "state":"down", >> "core":"slice1_shard1", >> "collection":"collection1", >> "node_name":"jamiesmac:8501_solr", >> "base_url":"http://jamiesmac:8501/solr"}, >> "jamiesmac:8502_solr_slice1_shard2":{ >> "shard_id":"slice1", >> "leader":"true", >> "state":"active", >> "core":"slice1_shard2", >> "collection":"collection1", >> "node_name":"jamiesmac:8502_solr", >> "base_url":"http://jamiesmac:8502/solr"}}, >> "slice2":{ >> "JamiesMac.local:8501_solr_slice2_shard2":{ >> "shard_id":"slice2", >> "state":"active", >> "core":"slice2_shard2", >> "collection":"collection1", >> "node_name":"JamiesMac.local:8501_solr", >> "base_url":"http://JamiesMac.local:8501/solr"}, >> "JamiesMac.local:8502_solr_slice2_shard1":{ >> "shard_id":"slice2", >> "state":"active", >> "core":"slice2_shard1", >> "collection":"collection1", >> "node_name":"JamiesMac.local:8502_solr", >> "base_url":"http://JamiesMac.local:8502/solr"}, >> "jamiesmac:8501_solr_slice2_shard2":{ >> "shard_id":"slice2", >> "state":"down", >> "core":"slice2_shard2", >> "collection":"collection1", >> "node_name":"jamiesmac:8501_solr", >> "base_url":"http://jamiesmac:8501/solr"}, >> "jamiesmac:8502_solr_slice2_shard1":{ >> "shard_id":"slice2", >> "leader":"true", >> "state":"active", >> "core":"slice2_shard1", >> "collection":"collection1", >> "node_name":"jamiesmac:8502_solr", >> "base_url":"http://jamiesmac:8502/solr" >> >> I then added some docs to the following shards using Solr
Re: SolrCloud Replication Question
Also, it will help if you can mention the exact version of solrcloud you are talking about in each issue - I know you have one from the old branch, and I assume a version off trunk you are playing with - so a heads up on which and if trunk, what rev or day will help in the case that I'm trying to dupe issues that have been addressed. - Mark On Feb 10, 2012, at 6:09 PM, Mark Miller wrote: > I'm trying, but so far I don't see anything. I'll have to try and mimic your > setup closer it seems. > > I tried starting up 6 solr instances on different ports as 2 shards, each > with a replication factor of 3. > > Then I indexed 20k documents to the cluster and verified doc counts. > > Then I shutdown all the replicas so that only one instance served each shard. > > Then I indexed 20k documents to the cluster. > > Then I started the downed nodes and verified that they where in a recovery > state. > > After enough time went by I checked and verified document counts on each > instance - they where as expected. > > I guess next I can try a similar experiment using multiple cores, but if you > notice anything that stands out that is largely different in what you are > doing, let me know. > > The cores that are behind, does it say they are down, recovering, or active > in zookeeper? > > On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote: > >> Sorry for pinging this again, is more information needed on this? I >> can provide more details but am not sure what to provide. >> >> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson wrote: >>> Sorry, I shut down the full solr instance. >>> >>> On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller wrote: Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > I know that the latest Solr Cloud doesn't use standard replication but > I have a question about how it appears to be working. I currently > have the following cluster state > > {"collection1":{ > "slice1":{ > "JamiesMac.local:8501_solr_slice1_shard1":{ > "shard_id":"slice1", > "state":"active", > "core":"slice1_shard1", > "collection":"collection1", > "node_name":"JamiesMac.local:8501_solr", > "base_url":"http://JamiesMac.local:8501/solr"}, > "JamiesMac.local:8502_solr_slice1_shard2":{ > "shard_id":"slice1", > "state":"active", > "core":"slice1_shard2", > "collection":"collection1", > "node_name":"JamiesMac.local:8502_solr", > "base_url":"http://JamiesMac.local:8502/solr"}, > "jamiesmac:8501_solr_slice1_shard1":{ > "shard_id":"slice1", > "state":"down", > "core":"slice1_shard1", > "collection":"collection1", > "node_name":"jamiesmac:8501_solr", > "base_url":"http://jamiesmac:8501/solr"}, > "jamiesmac:8502_solr_slice1_shard2":{ > "shard_id":"slice1", > "leader":"true", > "state":"active", > "core":"slice1_shard2", > "collection":"collection1", > "node_name":"jamiesmac:8502_solr", > "base_url":"http://jamiesmac:8502/solr"}}, > "slice2":{ > "JamiesMac.local:8501_solr_slice2_shard2":{ > "shard_id":"slice2", > "state":"active", > "core":"slice2_shard2", > "collection":"collection1", > "node_name":"JamiesMac.local:8501_solr", > "base_url":"http://JamiesMac.local:8501/solr"}, > "JamiesMac.local:8502_solr_slice2_shard1":{ > "shard_id":"slice2", > "state":"active", > "core":"slice2_shard1", > "collection":"collection1", > "node_name":"JamiesMac.local:8502_solr", > "base_url":"http://JamiesMac.local:8502/solr"}, > "jamiesmac:8501_solr_slice2_shard2":{ > "shard_id":"slice2", > "state":"down", > "core":"slice2_shard2", > "collection":"collection1", > "node_name":"jamiesmac:8501_solr", > "base_url":"http://jamiesmac:8501/solr"}, > "jamiesmac:8502_solr_slice2_shard1":{ > "shard_id":"slice2", > "leader":"true", > "state":"active", > "core":"slice2_shard1", > "collection":"collection1", > "node_name":"jamiesmac:8502_solr", > "base_url":"http://jamiesmac:8502/solr" > > I then added some docs to the following shards using SolrJ > http://localhost:8502/solr/slice2_shard1 > http://localhost:8502/solr/slice1_shard2 > > I then bring back up the other cores and I don't see replication > happening. Looking at the stats for each core I see that on the 8501 > instance (the instance that was off) the number of docs is 0, so I
Re: SolrCloud Replication Question
I'm trying, but so far I don't see anything. I'll have to try and mimic your setup closer it seems. I tried starting up 6 solr instances on different ports as 2 shards, each with a replication factor of 3. Then I indexed 20k documents to the cluster and verified doc counts. Then I shutdown all the replicas so that only one instance served each shard. Then I indexed 20k documents to the cluster. Then I started the downed nodes and verified that they where in a recovery state. After enough time went by I checked and verified document counts on each instance - they where as expected. I guess next I can try a similar experiment using multiple cores, but if you notice anything that stands out that is largely different in what you are doing, let me know. The cores that are behind, does it say they are down, recovering, or active in zookeeper? On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote: > Sorry for pinging this again, is more information needed on this? I > can provide more details but am not sure what to provide. > > On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson wrote: >> Sorry, I shut down the full solr instance. >> >> On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller wrote: >>> Can you explain a little more how you doing this? How are you bringing the >>> cores down and then back up? Shutting down a full solr instance, unloading >>> the core? >>> >>> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: >>> I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {"collection1":{ "slice1":{ "JamiesMac.local:8501_solr_slice1_shard1":{ "shard_id":"slice1", "state":"active", "core":"slice1_shard1", "collection":"collection1", "node_name":"JamiesMac.local:8501_solr", "base_url":"http://JamiesMac.local:8501/solr"}, "JamiesMac.local:8502_solr_slice1_shard2":{ "shard_id":"slice1", "state":"active", "core":"slice1_shard2", "collection":"collection1", "node_name":"JamiesMac.local:8502_solr", "base_url":"http://JamiesMac.local:8502/solr"}, "jamiesmac:8501_solr_slice1_shard1":{ "shard_id":"slice1", "state":"down", "core":"slice1_shard1", "collection":"collection1", "node_name":"jamiesmac:8501_solr", "base_url":"http://jamiesmac:8501/solr"}, "jamiesmac:8502_solr_slice1_shard2":{ "shard_id":"slice1", "leader":"true", "state":"active", "core":"slice1_shard2", "collection":"collection1", "node_name":"jamiesmac:8502_solr", "base_url":"http://jamiesmac:8502/solr"}}, "slice2":{ "JamiesMac.local:8501_solr_slice2_shard2":{ "shard_id":"slice2", "state":"active", "core":"slice2_shard2", "collection":"collection1", "node_name":"JamiesMac.local:8501_solr", "base_url":"http://JamiesMac.local:8501/solr"}, "JamiesMac.local:8502_solr_slice2_shard1":{ "shard_id":"slice2", "state":"active", "core":"slice2_shard1", "collection":"collection1", "node_name":"JamiesMac.local:8502_solr", "base_url":"http://JamiesMac.local:8502/solr"}, "jamiesmac:8501_solr_slice2_shard2":{ "shard_id":"slice2", "state":"down", "core":"slice2_shard2", "collection":"collection1", "node_name":"jamiesmac:8501_solr", "base_url":"http://jamiesmac:8501/solr"}, "jamiesmac:8502_solr_slice2_shard1":{ "shard_id":"slice2", "leader":"true", "state":"active", "core":"slice2_shard1", "collection":"collection1", "node_name":"jamiesmac:8502_solr", "base_url":"http://jamiesmac:8502/solr" I then added some docs to the following shards using SolrJ http://localhost:8502/solr/slice2_shard1 http://localhost:8502/solr/slice1_shard2 I then bring back up the other cores and I don't see replication happening. Looking at the stats for each core I see that on the 8501 instance (the instance that was off) the number of docs is 0, so I've clearly set something up incorrectly. Any help on this would be greatly appreciated. >>> >>> - Mark Miller >>> lucidimagination.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> - Mark Miller lucidimagination.com
Re: SolrCloud Replication Question
Sorry for pinging this again, is more information needed on this? I can provide more details but am not sure what to provide. On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson wrote: > Sorry, I shut down the full solr instance. > > On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller wrote: >> Can you explain a little more how you doing this? How are you bringing the >> cores down and then back up? Shutting down a full solr instance, unloading >> the core? >> >> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: >> >>> I know that the latest Solr Cloud doesn't use standard replication but >>> I have a question about how it appears to be working. I currently >>> have the following cluster state >>> >>> {"collection1":{ >>> "slice1":{ >>> "JamiesMac.local:8501_solr_slice1_shard1":{ >>> "shard_id":"slice1", >>> "state":"active", >>> "core":"slice1_shard1", >>> "collection":"collection1", >>> "node_name":"JamiesMac.local:8501_solr", >>> "base_url":"http://JamiesMac.local:8501/solr"}, >>> "JamiesMac.local:8502_solr_slice1_shard2":{ >>> "shard_id":"slice1", >>> "state":"active", >>> "core":"slice1_shard2", >>> "collection":"collection1", >>> "node_name":"JamiesMac.local:8502_solr", >>> "base_url":"http://JamiesMac.local:8502/solr"}, >>> "jamiesmac:8501_solr_slice1_shard1":{ >>> "shard_id":"slice1", >>> "state":"down", >>> "core":"slice1_shard1", >>> "collection":"collection1", >>> "node_name":"jamiesmac:8501_solr", >>> "base_url":"http://jamiesmac:8501/solr"}, >>> "jamiesmac:8502_solr_slice1_shard2":{ >>> "shard_id":"slice1", >>> "leader":"true", >>> "state":"active", >>> "core":"slice1_shard2", >>> "collection":"collection1", >>> "node_name":"jamiesmac:8502_solr", >>> "base_url":"http://jamiesmac:8502/solr"}}, >>> "slice2":{ >>> "JamiesMac.local:8501_solr_slice2_shard2":{ >>> "shard_id":"slice2", >>> "state":"active", >>> "core":"slice2_shard2", >>> "collection":"collection1", >>> "node_name":"JamiesMac.local:8501_solr", >>> "base_url":"http://JamiesMac.local:8501/solr"}, >>> "JamiesMac.local:8502_solr_slice2_shard1":{ >>> "shard_id":"slice2", >>> "state":"active", >>> "core":"slice2_shard1", >>> "collection":"collection1", >>> "node_name":"JamiesMac.local:8502_solr", >>> "base_url":"http://JamiesMac.local:8502/solr"}, >>> "jamiesmac:8501_solr_slice2_shard2":{ >>> "shard_id":"slice2", >>> "state":"down", >>> "core":"slice2_shard2", >>> "collection":"collection1", >>> "node_name":"jamiesmac:8501_solr", >>> "base_url":"http://jamiesmac:8501/solr"}, >>> "jamiesmac:8502_solr_slice2_shard1":{ >>> "shard_id":"slice2", >>> "leader":"true", >>> "state":"active", >>> "core":"slice2_shard1", >>> "collection":"collection1", >>> "node_name":"jamiesmac:8502_solr", >>> "base_url":"http://jamiesmac:8502/solr" >>> >>> I then added some docs to the following shards using SolrJ >>> http://localhost:8502/solr/slice2_shard1 >>> http://localhost:8502/solr/slice1_shard2 >>> >>> I then bring back up the other cores and I don't see replication >>> happening. Looking at the stats for each core I see that on the 8501 >>> instance (the instance that was off) the number of docs is 0, so I've >>> clearly set something up incorrectly. Any help on this would be >>> greatly appreciated. >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >>
Re: SolrCloud Replication Question
Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller wrote: > Can you explain a little more how you doing this? How are you bringing the > cores down and then back up? Shutting down a full solr instance, unloading > the core? > > On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > >> I know that the latest Solr Cloud doesn't use standard replication but >> I have a question about how it appears to be working. I currently >> have the following cluster state >> >> {"collection1":{ >> "slice1":{ >> "JamiesMac.local:8501_solr_slice1_shard1":{ >> "shard_id":"slice1", >> "state":"active", >> "core":"slice1_shard1", >> "collection":"collection1", >> "node_name":"JamiesMac.local:8501_solr", >> "base_url":"http://JamiesMac.local:8501/solr"}, >> "JamiesMac.local:8502_solr_slice1_shard2":{ >> "shard_id":"slice1", >> "state":"active", >> "core":"slice1_shard2", >> "collection":"collection1", >> "node_name":"JamiesMac.local:8502_solr", >> "base_url":"http://JamiesMac.local:8502/solr"}, >> "jamiesmac:8501_solr_slice1_shard1":{ >> "shard_id":"slice1", >> "state":"down", >> "core":"slice1_shard1", >> "collection":"collection1", >> "node_name":"jamiesmac:8501_solr", >> "base_url":"http://jamiesmac:8501/solr"}, >> "jamiesmac:8502_solr_slice1_shard2":{ >> "shard_id":"slice1", >> "leader":"true", >> "state":"active", >> "core":"slice1_shard2", >> "collection":"collection1", >> "node_name":"jamiesmac:8502_solr", >> "base_url":"http://jamiesmac:8502/solr"}}, >> "slice2":{ >> "JamiesMac.local:8501_solr_slice2_shard2":{ >> "shard_id":"slice2", >> "state":"active", >> "core":"slice2_shard2", >> "collection":"collection1", >> "node_name":"JamiesMac.local:8501_solr", >> "base_url":"http://JamiesMac.local:8501/solr"}, >> "JamiesMac.local:8502_solr_slice2_shard1":{ >> "shard_id":"slice2", >> "state":"active", >> "core":"slice2_shard1", >> "collection":"collection1", >> "node_name":"JamiesMac.local:8502_solr", >> "base_url":"http://JamiesMac.local:8502/solr"}, >> "jamiesmac:8501_solr_slice2_shard2":{ >> "shard_id":"slice2", >> "state":"down", >> "core":"slice2_shard2", >> "collection":"collection1", >> "node_name":"jamiesmac:8501_solr", >> "base_url":"http://jamiesmac:8501/solr"}, >> "jamiesmac:8502_solr_slice2_shard1":{ >> "shard_id":"slice2", >> "leader":"true", >> "state":"active", >> "core":"slice2_shard1", >> "collection":"collection1", >> "node_name":"jamiesmac:8502_solr", >> "base_url":"http://jamiesmac:8502/solr" >> >> I then added some docs to the following shards using SolrJ >> http://localhost:8502/solr/slice2_shard1 >> http://localhost:8502/solr/slice1_shard2 >> >> I then bring back up the other cores and I don't see replication >> happening. Looking at the stats for each core I see that on the 8501 >> instance (the instance that was off) the number of docs is 0, so I've >> clearly set something up incorrectly. Any help on this would be >> greatly appreciated. > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: SolrCloud Replication Question
Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > I know that the latest Solr Cloud doesn't use standard replication but > I have a question about how it appears to be working. I currently > have the following cluster state > > {"collection1":{ >"slice1":{ > "JamiesMac.local:8501_solr_slice1_shard1":{ >"shard_id":"slice1", >"state":"active", >"core":"slice1_shard1", >"collection":"collection1", >"node_name":"JamiesMac.local:8501_solr", >"base_url":"http://JamiesMac.local:8501/solr"}, > "JamiesMac.local:8502_solr_slice1_shard2":{ >"shard_id":"slice1", >"state":"active", >"core":"slice1_shard2", >"collection":"collection1", >"node_name":"JamiesMac.local:8502_solr", >"base_url":"http://JamiesMac.local:8502/solr"}, > "jamiesmac:8501_solr_slice1_shard1":{ >"shard_id":"slice1", >"state":"down", >"core":"slice1_shard1", >"collection":"collection1", >"node_name":"jamiesmac:8501_solr", >"base_url":"http://jamiesmac:8501/solr"}, > "jamiesmac:8502_solr_slice1_shard2":{ >"shard_id":"slice1", >"leader":"true", >"state":"active", >"core":"slice1_shard2", >"collection":"collection1", >"node_name":"jamiesmac:8502_solr", >"base_url":"http://jamiesmac:8502/solr"}}, >"slice2":{ > "JamiesMac.local:8501_solr_slice2_shard2":{ >"shard_id":"slice2", >"state":"active", >"core":"slice2_shard2", >"collection":"collection1", >"node_name":"JamiesMac.local:8501_solr", >"base_url":"http://JamiesMac.local:8501/solr"}, > "JamiesMac.local:8502_solr_slice2_shard1":{ >"shard_id":"slice2", >"state":"active", >"core":"slice2_shard1", >"collection":"collection1", >"node_name":"JamiesMac.local:8502_solr", >"base_url":"http://JamiesMac.local:8502/solr"}, > "jamiesmac:8501_solr_slice2_shard2":{ >"shard_id":"slice2", >"state":"down", >"core":"slice2_shard2", >"collection":"collection1", >"node_name":"jamiesmac:8501_solr", >"base_url":"http://jamiesmac:8501/solr"}, > "jamiesmac:8502_solr_slice2_shard1":{ >"shard_id":"slice2", >"leader":"true", >"state":"active", >"core":"slice2_shard1", >"collection":"collection1", >"node_name":"jamiesmac:8502_solr", >"base_url":"http://jamiesmac:8502/solr" > > I then added some docs to the following shards using SolrJ > http://localhost:8502/solr/slice2_shard1 > http://localhost:8502/solr/slice1_shard2 > > I then bring back up the other cores and I don't see replication > happening. Looking at the stats for each core I see that on the 8501 > instance (the instance that was off) the number of docs is 0, so I've > clearly set something up incorrectly. Any help on this would be > greatly appreciated. - Mark Miller lucidimagination.com