Re: Solr Architecture discussion

2010-06-14 Thread Chris Hostetter

: B- A backup of the current index would be created
: C- Re-Indexing will happen on Master-core2 
: D- When Indexing is done, we'll trigger a swap between Master-core1 and
: core2
...
: But how can B,C, and D. I'll do it manually. Wait! I'm not sure my boss will
: pay for that.

: 1/Can I leverage on some solr mechanisms (that is, by configuration only) in
: order to reach that goal?
: I haven't found how to do it!

your best bet is some external scheduler -- depending on how your build 
process works, you can fairly easily integrate it into external 
publishing tools.

: 2/ Is there any issue while replicating master "swapped" index files? I've
: seen in the literature that there might be some issues.

As long as the "new" version of the index is treuly "newer" then the old 
version, there shouldn't be any problem.

Frankly though: i'm not sure you need core swapping on the master either 
-- it depends largely on how much "churn" will happen each time you do one 
of these full rebuilds.  you could just as easily do incremental 
reindexing on your master, with occasional commits (or even autocommits) 
nad your slaves picking up those new segments -- either gradually, or all 
at once when you do a monolithic commit. 

if you're ok with the slaves pulling over the *entire* index after you do 
the core swap, then you should be fine with the slaves pulling over the 
*entire* index (or maybe just most of it) after a rebuild directly to the 
existing core.

all you really need to do explicitly on the master is trigger a backup 
just before you rebuild the world, and if (and only if) something goes 
terribly wrong, then restore from your backup.




-Hoss



Re: Solr Architecture discussion

2010-06-01 Thread rabahb

Thinking twice about this architecture ...

I'm concerned about the way I'm going to automate the following steps:

A- The slaves would regularly poll Master-core1 for changes
B- A backup of the current index would be created
C- Re-Indexing will happen on Master-core2 
D- When Indexing is done, we'll trigger a swap between Master-core1 and
core2
E- Slaves will then poll and pickup the freshly updated index segments
F- and so on!

This seems to be simple when it's done manually. But I can not just sit
there and trigger a button to send the events. To reach that goal, I
realized that on solution would be to have 2 cores on the master side, while
the slaves would only have one core (as previously discussed). We'll just
need to configure the slave polling period (A,E), and send the right http
request (B,C,D). 

Well ok, step A is automated "natively". Easy enough, using the internal
solr capabilities.
But how can B,C, and D. I'll do it manually. Wait! I'm not sure my boss will
pay for that.

All right so I imagine that I should implement a process that will automate
the phases that I would otherwise do manually. This would be an external
process not based on solr mechanism.

My questions are:

1/Can I leverage on some solr mechanisms (that is, by configuration only) in
order to reach that goal?
I haven't found how to do it!

2/ Is there any issue while replicating master "swapped" index files? I've
seen in the literature that there might be some issues.

3/If a solr configuration based solution does not exist, my first attempt
would be to write a shell based process that will regularly trigger the
events, wait for the end of each phase by polling the current phase status,
in order to trigger the next one. Does that sound good to you? Or is there a
better and more elegant way to do the trick when indexing and replication
should be beating at a high pace? 

Thank you.
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Architecture discussion

2010-06-01 Thread rabahb

Hi Chris,

Thanks for your insights. I totally understand your point about steps 4 and
5. I wanted to control the moment when the swap would happen on the slave
side but as you say there is no use for that. It only adds up complexity
that internal solr mechanisms are already providing.  

For the replication aspect, I re-read the whole documentation and with the
light you shed on that topic, I realize that the only problem here is the
huge amount of data that can be passed over the wire depending on the
segments that the indexing will update. As you say, optimizing can have a
devastating effect on the replication phase as, if I have a good
understanding of what you said, this could potentially update all the index
segments. 

OK! so if I rephrase it, the best strategy in my case is to limit the
optimization phases in order to prioritize the replication performance, and
make the optimization only when the replication activity is not so crucial
in order to avoid degrading the search performances. 

Thank you very much. That helps a lot.






-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860767.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Architecture discussion

2010-05-26 Thread Chris Hostetter

: 4- trigger swap between core 1 and core2
: 5- At this point Slave index has been renewed ... we can revert back to the
: previous index if there was any issues with the new one.

these steps are largely unneccessary -- within a single SolrCore Solr 
already keeps track of the "current" searcher (which is serving requests) 
and one or more "onDeck" searchers which know about "newer" versions of 
the index -- it can manage warming up caches for hte "onDeck" seracher for 
you (and can do it more effectively then you can becuase it can do so 
based on the contents of the "old" caches from the "current" searcher).

While there may be some value in being able to "revert" on the slaves, 
this is typically just as easy by configuring it to keep snapshots arround 
for a while and if you have a problem manually restore from one of hte 
older snapshots -- this will tide you over until you fix whatever the 
problem is on the master, possibly by restoring from a backup, and then 
start replication again.

: 2 / My first concern is about the size of the index that would need to be
: replicated. We need to perform indexing all day long (every 5min) and
: replicate as soon as the index is built.
: As far as I know, replication copies over all the index files. I think that
: there can not be delta replication (only replicating what changed). That's
: my assumption. 
: But, is there any way to make a delta replication if that make any sense?

replication does *not* copy over all of hte index files.  replication 
"syncs" all of hte index files, but only data is sent over the wire -- in 
cases of segment merges (or full optimizations) this can result in 
fluctuations and "higher then typical" amounts of data getting sent over 
the wire even when the only a few docs have been added/deleted, but this 
is easy to manage.  (don't optimize during times when replication speed is 
critical, and use conservative merge settings)



-Hoss



Re: Solr Architecture discussion

2010-05-19 Thread rabahb

Do you have any insights that could help me and other people that might be
interested in that discussion?
Thanks.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p828658.html
Sent from the Solr - User mailing list archive at Nabble.com.