Re: Solr Architecture discussion
: B- A backup of the current index would be created : C- Re-Indexing will happen on Master-core2 : D- When Indexing is done, we'll trigger a swap between Master-core1 and : core2 ... : But how can B,C, and D. I'll do it manually. Wait! I'm not sure my boss will : pay for that. : 1/Can I leverage on some solr mechanisms (that is, by configuration only) in : order to reach that goal? : I haven't found how to do it! your best bet is some external scheduler -- depending on how your build process works, you can fairly easily integrate it into external publishing tools. : 2/ Is there any issue while replicating master "swapped" index files? I've : seen in the literature that there might be some issues. As long as the "new" version of the index is treuly "newer" then the old version, there shouldn't be any problem. Frankly though: i'm not sure you need core swapping on the master either -- it depends largely on how much "churn" will happen each time you do one of these full rebuilds. you could just as easily do incremental reindexing on your master, with occasional commits (or even autocommits) nad your slaves picking up those new segments -- either gradually, or all at once when you do a monolithic commit. if you're ok with the slaves pulling over the *entire* index after you do the core swap, then you should be fine with the slaves pulling over the *entire* index (or maybe just most of it) after a rebuild directly to the existing core. all you really need to do explicitly on the master is trigger a backup just before you rebuild the world, and if (and only if) something goes terribly wrong, then restore from your backup. -Hoss
Re: Solr Architecture discussion
Thinking twice about this architecture ... I'm concerned about the way I'm going to automate the following steps: A- The slaves would regularly poll Master-core1 for changes B- A backup of the current index would be created C- Re-Indexing will happen on Master-core2 D- When Indexing is done, we'll trigger a swap between Master-core1 and core2 E- Slaves will then poll and pickup the freshly updated index segments F- and so on! This seems to be simple when it's done manually. But I can not just sit there and trigger a button to send the events. To reach that goal, I realized that on solution would be to have 2 cores on the master side, while the slaves would only have one core (as previously discussed). We'll just need to configure the slave polling period (A,E), and send the right http request (B,C,D). Well ok, step A is automated "natively". Easy enough, using the internal solr capabilities. But how can B,C, and D. I'll do it manually. Wait! I'm not sure my boss will pay for that. All right so I imagine that I should implement a process that will automate the phases that I would otherwise do manually. This would be an external process not based on solr mechanism. My questions are: 1/Can I leverage on some solr mechanisms (that is, by configuration only) in order to reach that goal? I haven't found how to do it! 2/ Is there any issue while replicating master "swapped" index files? I've seen in the literature that there might be some issues. 3/If a solr configuration based solution does not exist, my first attempt would be to write a shell based process that will regularly trigger the events, wait for the end of each phase by polling the current phase status, in order to trigger the next one. Does that sound good to you? Or is there a better and more elegant way to do the trick when indexing and replication should be beating at a high pace? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860942.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Architecture discussion
Hi Chris, Thanks for your insights. I totally understand your point about steps 4 and 5. I wanted to control the moment when the swap would happen on the slave side but as you say there is no use for that. It only adds up complexity that internal solr mechanisms are already providing. For the replication aspect, I re-read the whole documentation and with the light you shed on that topic, I realize that the only problem here is the huge amount of data that can be passed over the wire depending on the segments that the indexing will update. As you say, optimizing can have a devastating effect on the replication phase as, if I have a good understanding of what you said, this could potentially update all the index segments. OK! so if I rephrase it, the best strategy in my case is to limit the optimization phases in order to prioritize the replication performance, and make the optimization only when the replication activity is not so crucial in order to avoid degrading the search performances. Thank you very much. That helps a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860767.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Architecture discussion
: 4- trigger swap between core 1 and core2 : 5- At this point Slave index has been renewed ... we can revert back to the : previous index if there was any issues with the new one. these steps are largely unneccessary -- within a single SolrCore Solr already keeps track of the "current" searcher (which is serving requests) and one or more "onDeck" searchers which know about "newer" versions of the index -- it can manage warming up caches for hte "onDeck" seracher for you (and can do it more effectively then you can becuase it can do so based on the contents of the "old" caches from the "current" searcher). While there may be some value in being able to "revert" on the slaves, this is typically just as easy by configuring it to keep snapshots arround for a while and if you have a problem manually restore from one of hte older snapshots -- this will tide you over until you fix whatever the problem is on the master, possibly by restoring from a backup, and then start replication again. : 2 / My first concern is about the size of the index that would need to be : replicated. We need to perform indexing all day long (every 5min) and : replicate as soon as the index is built. : As far as I know, replication copies over all the index files. I think that : there can not be delta replication (only replicating what changed). That's : my assumption. : But, is there any way to make a delta replication if that make any sense? replication does *not* copy over all of hte index files. replication "syncs" all of hte index files, but only data is sent over the wire -- in cases of segment merges (or full optimizations) this can result in fluctuations and "higher then typical" amounts of data getting sent over the wire even when the only a few docs have been added/deleted, but this is easy to manage. (don't optimize during times when replication speed is critical, and use conservative merge settings) -Hoss
Re: Solr Architecture discussion
Do you have any insights that could help me and other people that might be interested in that discussion? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p828658.html Sent from the Solr - User mailing list archive at Nabble.com.