Re: Pause and resume indexing on SolR 4 for backups
The point was as much about how to use a backup, as to how to make one in the first place. the replication handler can handle spitting out a backup, but there's no straightforward way to tell Solr to switch to another set of index files instead. You'd have to do clever stuff with the CoreAdminHandler, I reckon. Upayavira On Wed, Jan 9, 2013, at 09:27 PM, Paul Jungwirth wrote: > Yes, I agree about making sure the backups actually work, whatever the > approach. Thanks for your reply and all you've contributed to the > Solr/Lucene community. The Lucene in Action book has been a huge help to > me. > > Paul > > > On Wed, Jan 9, 2013 at 12:16 PM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > > > Hi Paul, > > > > Hot backup is OK. There was a thread on this topic yesterday and the day > > before. But you should always try running from backup regardless of what > > anyone says here, because if you have to do that one day you want to > > know you verified it :) > > > > Otis > > -- > > Solr & ElasticSearch Support > > http://sematext.com/ > > > > > > > > > > > > On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth > > wrote: > > > > > > Are you sure a commit didn't happen between? > > > > Also, a background merge might have happened. > > > > > > > > As to using a backup, you are right, just stop solr, > > > > put the snapshot into index/data, and restart. > > > > > > > > > This was mentioned before but seems not to have gotten any attention: > > can't > > > you use the ReplicationHandler by just going to a URL like this?: > > > > > > > > > > > > > > http://host:8080/solr/replication?command=backup&location=/home/jboss/backup > > > > > > The 2nd edition Lucene in Action book describes a way to take hot backups > > > without stopping your IndexWriter (pp. 374ff), and it appears that > > > ReplicationHandler uses a similar strategy if I'm reading the code > > > correctly (Solr 3.6.1; I guess v4 is the same). > > > > > > It'd be great if someone more knowledgeable could confirm that you can > > use > > > the ReplicationHandler to take hot backups. I'm surprised to see such a > > > long thread about starting/stopping index jobs when there is such an easy > > > answer. Or am I mistaken and at risk of corrupt backups if I use it? > > > > > > Thanks, > > > Paul > > > > > > -- > > > _ > > > Pulchritudo splendor veritatis. > > > > > > > > > -- > _ > Pulchritudo splendor veritatis.
Re: Pause and resume indexing on SolR 4 for backups
Yes, I agree about making sure the backups actually work, whatever the approach. Thanks for your reply and all you've contributed to the Solr/Lucene community. The Lucene in Action book has been a huge help to me. Paul On Wed, Jan 9, 2013 at 12:16 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi Paul, > > Hot backup is OK. There was a thread on this topic yesterday and the day > before. But you should always try running from backup regardless of what > anyone says here, because if you have to do that one day you want to > know you verified it :) > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth > wrote: > > > > Are you sure a commit didn't happen between? > > > Also, a background merge might have happened. > > > > > > As to using a backup, you are right, just stop solr, > > > put the snapshot into index/data, and restart. > > > > > > This was mentioned before but seems not to have gotten any attention: > can't > > you use the ReplicationHandler by just going to a URL like this?: > > > > > > > > > http://host:8080/solr/replication?command=backup&location=/home/jboss/backup > > > > The 2nd edition Lucene in Action book describes a way to take hot backups > > without stopping your IndexWriter (pp. 374ff), and it appears that > > ReplicationHandler uses a similar strategy if I'm reading the code > > correctly (Solr 3.6.1; I guess v4 is the same). > > > > It'd be great if someone more knowledgeable could confirm that you can > use > > the ReplicationHandler to take hot backups. I'm surprised to see such a > > long thread about starting/stopping index jobs when there is such an easy > > answer. Or am I mistaken and at risk of corrupt backups if I use it? > > > > Thanks, > > Paul > > > > -- > > _ > > Pulchritudo splendor veritatis. > > > -- _ Pulchritudo splendor veritatis.
Re: Pause and resume indexing on SolR 4 for backups
Hi Paul, Hot backup is OK. There was a thread on this topic yesterday and the day before. But you should always try running from backup regardless of what anyone says here, because if you have to do that one day you want to know you verified it :) Otis -- Solr & ElasticSearch Support http://sematext.com/ On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth wrote: > > Are you sure a commit didn't happen between? > > Also, a background merge might have happened. > > > > As to using a backup, you are right, just stop solr, > > put the snapshot into index/data, and restart. > > > This was mentioned before but seems not to have gotten any attention: can't > you use the ReplicationHandler by just going to a URL like this?: > > > > http://host:8080/solr/replication?command=backup&location=/home/jboss/backup > > The 2nd edition Lucene in Action book describes a way to take hot backups > without stopping your IndexWriter (pp. 374ff), and it appears that > ReplicationHandler uses a similar strategy if I'm reading the code > correctly (Solr 3.6.1; I guess v4 is the same). > > It'd be great if someone more knowledgeable could confirm that you can use > the ReplicationHandler to take hot backups. I'm surprised to see such a > long thread about starting/stopping index jobs when there is such an easy > answer. Or am I mistaken and at risk of corrupt backups if I use it? > > Thanks, > Paul > > -- > _ > Pulchritudo splendor veritatis. >
Re: Pause and resume indexing on SolR 4 for backups
> Are you sure a commit didn't happen between? > Also, a background merge might have happened. > > As to using a backup, you are right, just stop solr, > put the snapshot into index/data, and restart. This was mentioned before but seems not to have gotten any attention: can't you use the ReplicationHandler by just going to a URL like this?: http://host:8080/solr/replication?command=backup&location=/home/jboss/backup The 2nd edition Lucene in Action book describes a way to take hot backups without stopping your IndexWriter (pp. 374ff), and it appears that ReplicationHandler uses a similar strategy if I'm reading the code correctly (Solr 3.6.1; I guess v4 is the same). It'd be great if someone more knowledgeable could confirm that you can use the ReplicationHandler to take hot backups. I'm surprised to see such a long thread about starting/stopping index jobs when there is such an easy answer. Or am I mistaken and at risk of corrupt backups if I use it? Thanks, Paul -- _ Pulchritudo splendor veritatis.
Re: Pause and resume indexing on SolR 4 for backups
On 20/12/12 20:19, alx...@aim.com wrote: Depending on your architecture, why not index the same data into two machines? One will be your prod another your backup? Because we're trying to keep costs and complexity low whilst in the development stage ;-) But more seriously, this will obviously be a must sooner or later. -- Andy D'Arcy Jewell SysMicro Limited Linux Support T: 0844 9918804 M: 07961605631 E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
Depending on your architecture, why not index the same data into two machines? One will be your prod another your backup? Thanks. Alex. -Original Message- From: Upayavira To: solr-user Sent: Thu, Dec 20, 2012 11:51 am Subject: Re: Pause and resume indexing on SolR 4 for backups You're saying that there's no chance to catch it in the middle of writing the segments file? Having said that, the segments file is pretty small, so the chance would be pretty slim. Upayavira On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote: > To be clear: 1) is fine. Lucene index updates are carefully sequenced so > that the index is never in a bogus state. All data files are written and > flushed to disk, then the segments.* files are written that match the > data files. You can capture the files with a set of hard links to create > a backup. > > The CheckIndex program will verify the index backup. > java -cp yourcopy/lucene-core-SOMETHING.jar > org.apache.lucene.index.CheckIndex collection/data/index > > lucene-core-SOMETHING.jar is usually in the solr-webapp directory where > Solr is unpacked. > > On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote: > > Hi all. > > > > Can anyone advise me of a way to pause and resume SolR 4 so I can > > perform a backup? I need to be able to revert to a usable (though not > > necessarily complete) index after a crash or other "disaster" more > > quickly than a re-index operation would yield. > > > > I can't yet afford the "extravagance" of a separate SolR replica just > > for backups, and I'm not sure if I'll ever have the luxury. I'm > > currently running with just one node, be we are not yet live. > > > > I can think of the following ways to do this, each with various > > downsides: > > > > 1) Just backup the existing index files whilst indexing continues > > + Easy > > + Fast > > - Incomplete > > - Potential for corruption? (e.g. partial files) > > > > 2) Stop/Start Tomcat > > + Easy > > - Very slow and I/O, CPU intensive > > - Client gets errors when trying to connect > > > > 3) Block/unblock SolR port with IpTables > > + Fast > > - Client gets errors when trying to connect > > - Have to wait for existing transactions to complete (not sure > > how, maybe watch socket FD's in /proc) > > > > 4) Pause/Restart SolR service > > + Fast ? (hopefully) > > - Client gets errors when trying to connect > > > > In any event, the web app will have to gracefully handle > > unavailability of SolR, probably by displaying a "down for > > maintenance" message, but this should preferably be only a very short > > amount of time. > > > > Can anyone comment on my proposed solutions above, or provide any > > additional ones? > > > > Thanks for any input you can provide! > > > > -Andy > > >
Re: Pause and resume indexing on SolR 4 for backups
You're saying that there's no chance to catch it in the middle of writing the segments file? Having said that, the segments file is pretty small, so the chance would be pretty slim. Upayavira On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote: > To be clear: 1) is fine. Lucene index updates are carefully sequenced so > that the index is never in a bogus state. All data files are written and > flushed to disk, then the segments.* files are written that match the > data files. You can capture the files with a set of hard links to create > a backup. > > The CheckIndex program will verify the index backup. > java -cp yourcopy/lucene-core-SOMETHING.jar > org.apache.lucene.index.CheckIndex collection/data/index > > lucene-core-SOMETHING.jar is usually in the solr-webapp directory where > Solr is unpacked. > > On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote: > > Hi all. > > > > Can anyone advise me of a way to pause and resume SolR 4 so I can > > perform a backup? I need to be able to revert to a usable (though not > > necessarily complete) index after a crash or other "disaster" more > > quickly than a re-index operation would yield. > > > > I can't yet afford the "extravagance" of a separate SolR replica just > > for backups, and I'm not sure if I'll ever have the luxury. I'm > > currently running with just one node, be we are not yet live. > > > > I can think of the following ways to do this, each with various > > downsides: > > > > 1) Just backup the existing index files whilst indexing continues > > + Easy > > + Fast > > - Incomplete > > - Potential for corruption? (e.g. partial files) > > > > 2) Stop/Start Tomcat > > + Easy > > - Very slow and I/O, CPU intensive > > - Client gets errors when trying to connect > > > > 3) Block/unblock SolR port with IpTables > > + Fast > > - Client gets errors when trying to connect > > - Have to wait for existing transactions to complete (not sure > > how, maybe watch socket FD's in /proc) > > > > 4) Pause/Restart SolR service > > + Fast ? (hopefully) > > - Client gets errors when trying to connect > > > > In any event, the web app will have to gracefully handle > > unavailability of SolR, probably by displaying a "down for > > maintenance" message, but this should preferably be only a very short > > amount of time. > > > > Can anyone comment on my proposed solutions above, or provide any > > additional ones? > > > > Thanks for any input you can provide! > > > > -Andy > > >
Re: Pause and resume indexing on SolR 4 for backups
To be clear: 1) is fine. Lucene index updates are carefully sequenced so that the index is never in a bogus state. All data files are written and flushed to disk, then the segments.* files are written that match the data files. You can capture the files with a set of hard links to create a backup. The CheckIndex program will verify the index backup. java -cp yourcopy/lucene-core-SOMETHING.jar org.apache.lucene.index.CheckIndex collection/data/index lucene-core-SOMETHING.jar is usually in the solr-webapp directory where Solr is unpacked. On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote: Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other "disaster" more quickly than a re-index operation would yield. I can't yet afford the "extravagance" of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a "down for maintenance" message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy
Re: Pause and resume indexing on SolR 4 for backups
Are you sure a commit didn't happen between? Also, a background merge might have happened. As to using a backup, you are right, just stop solr, put the snapshot into index/data, and restart. Upayavira On Thu, Dec 20, 2012, at 05:16 PM, Andy D'Arcy Jewell wrote: > On 20/12/12 13:38, Upayavira wrote: > > The backup directory should just be a clone of the index files. I'm > > curious to know whether it is a cp -r or a cp -lr that the replication > > handler produces. > > > > You would prevent commits by telling your app not to commit. That is, > > Solr only commits when it is *told* to. > > > > Unless you use autocommit, in which case I guess you could monitor your > > logs for the last commit, and do your backup a 10 seconds after that. > > > > > Hmm. Strange - the files created by the backup API don't seem to > correlate exactly with the files stored under the solr data directory: > > andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/ > /tmp/snapshot.20121220155853703/ > /tmp/snapshot.20121220155853703/_2vq.fdx > /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim > /tmp/snapshot.20121220155853703/segments_2vs > /tmp/snapshot.20121220155853703/_2vq_nrm.cfs > /tmp/snapshot.20121220155853703/_2vq.fnm > /tmp/snapshot.20121220155853703/_2vq_nrm.cfe > /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq > /tmp/snapshot.20121220155853703/_2vq.fdt > /tmp/snapshot.20121220155853703/_2vq.si > /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip > andydj@me-solr01:~$ find /var/lib/solr/data/index/ > /var/lib/solr/data/index/ > /var/lib/solr/data/index/_2w6_Lucene40_0.frq > /var/lib/solr/data/index/_2w6.si > /var/lib/solr/data/index/segments_2w8 > /var/lib/solr/data/index/write.lock > /var/lib/solr/data/index/_2w6_nrm.cfs > /var/lib/solr/data/index/_2w6.fdx > /var/lib/solr/data/index/_2w6_Lucene40_0.tip > /var/lib/solr/data/index/_2w6_nrm.cfe > /var/lib/solr/data/index/segments.gen > /var/lib/solr/data/index/_2w6.fnm > /var/lib/solr/data/index/_2w6.fdt > /var/lib/solr/data/index/_2w6_Lucene40_0.tim > > Am I correct in thinking that to restore from this backup, I would need > to do the following? > > 1. Stop Tomcat (or maybe just solr) > 2. Remove all files under /var/lib/solr/data/index/ > 3. Move/copy files from /tmp/snapshot.20121220155853703/ to > /var/lib/solr/data/index/ > 4. Restart Tomcat (or just solr) > > > Thanks everyone who's pitched in on this! Once I've got this working, > I'll document it. > -Andy > > -- > Andy D'Arcy Jewell > > SysMicro Limited > Linux Support > E: andy.jew...@sysmicro.co.uk > W: www.sysmicro.co.uk >
Re: Pause and resume indexing on SolR 4 for backups
On 20/12/12 13:38, Upayavira wrote: The backup directory should just be a clone of the index files. I'm curious to know whether it is a cp -r or a cp -lr that the replication handler produces. You would prevent commits by telling your app not to commit. That is, Solr only commits when it is *told* to. Unless you use autocommit, in which case I guess you could monitor your logs for the last commit, and do your backup a 10 seconds after that. Hmm. Strange - the files created by the backup API don't seem to correlate exactly with the files stored under the solr data directory: andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/ /tmp/snapshot.20121220155853703/ /tmp/snapshot.20121220155853703/_2vq.fdx /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim /tmp/snapshot.20121220155853703/segments_2vs /tmp/snapshot.20121220155853703/_2vq_nrm.cfs /tmp/snapshot.20121220155853703/_2vq.fnm /tmp/snapshot.20121220155853703/_2vq_nrm.cfe /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq /tmp/snapshot.20121220155853703/_2vq.fdt /tmp/snapshot.20121220155853703/_2vq.si /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip andydj@me-solr01:~$ find /var/lib/solr/data/index/ /var/lib/solr/data/index/ /var/lib/solr/data/index/_2w6_Lucene40_0.frq /var/lib/solr/data/index/_2w6.si /var/lib/solr/data/index/segments_2w8 /var/lib/solr/data/index/write.lock /var/lib/solr/data/index/_2w6_nrm.cfs /var/lib/solr/data/index/_2w6.fdx /var/lib/solr/data/index/_2w6_Lucene40_0.tip /var/lib/solr/data/index/_2w6_nrm.cfe /var/lib/solr/data/index/segments.gen /var/lib/solr/data/index/_2w6.fnm /var/lib/solr/data/index/_2w6.fdt /var/lib/solr/data/index/_2w6_Lucene40_0.tim Am I correct in thinking that to restore from this backup, I would need to do the following? 1. Stop Tomcat (or maybe just solr) 2. Remove all files under /var/lib/solr/data/index/ 3. Move/copy files from /tmp/snapshot.20121220155853703/ to /var/lib/solr/data/index/ 4. Restart Tomcat (or just solr) Thanks everyone who's pitched in on this! Once I've got this working, I'll document it. -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
That's neat, but wouldn't that run on every commit? How would you use it to, say, back up once a day? Upayavira On Thu, Dec 20, 2012, at 01:57 PM, Markus Jelsma wrote: > You can use the postCommit event in updateHandler to execute a task. > > -Original message- > > From:Upayavira > > Sent: Thu 20-Dec-2012 14:45 > > To: solr-user@lucene.apache.org > > Subject: Re: Pause and resume indexing on SolR 4 for backups > > > > The backup directory should just be a clone of the index files. I'm > > curious to know whether it is a cp -r or a cp -lr that the replication > > handler produces. > > > > You would prevent commits by telling your app not to commit. That is, > > Solr only commits when it is *told* to. > > > > Unless you use autocommit, in which case I guess you could monitor your > > logs for the last commit, and do your backup a 10 seconds after that. > > > > Upayavira > > > > On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote: > > > On 20/12/12 11:58, Upayavira wrote: > > > > I've never used it, but the replication handler has an option: > > > > > > > >http://master_host:port/solr/replication?command=backup > > > > > > > > Which will take you a backup. > > > I've looked at that this morning as suggested by Markus Jelsma. Looks > > > good, but I'll have to work out how to use the resultant backup > > > directory. I've been dealing with another unrelated issue in the > > > mean-time and I haven't had a chance to look for any docu so far. > > > > Also something to note, if you don't want to use the above, and you are > > > > running on Unix, you can create fast 'hard link' clones of lucene > > > > indexes. Doing: > > > > > > > > cp -lr data data.bak > > > > > > > > will copy your index instantly. If you can avoid doing this when a > > > > commit is happening, then you'll have a good index copy, that will take > > > > no space on your disk and be made instantly. This is because it just > > > > copies the directory structure, not the files themselves, and given > > > > files in a lucene index never change (they are only ever deleted or > > > > replaced), this works as a good copy technique for backing up. > > > That's the approach that Shawn Heisey proposed, and what I've been > > > working towards, but it still leaves open the question of how to > > > *pause* SolR or prevent commits during the backup (otherwise we have a > > > potential race condition). > > > > > > -Andy > > > > > > > > > -- > > > Andy D'Arcy Jewell > > > > > > SysMicro Limited > > > Linux Support > > > E: andy.jew...@sysmicro.co.uk > > > W: www.sysmicro.co.uk > > > > >
RE: Pause and resume indexing on SolR 4 for backups
You can use the postCommit event in updateHandler to execute a task. -Original message- > From:Upayavira > Sent: Thu 20-Dec-2012 14:45 > To: solr-user@lucene.apache.org > Subject: Re: Pause and resume indexing on SolR 4 for backups > > The backup directory should just be a clone of the index files. I'm > curious to know whether it is a cp -r or a cp -lr that the replication > handler produces. > > You would prevent commits by telling your app not to commit. That is, > Solr only commits when it is *told* to. > > Unless you use autocommit, in which case I guess you could monitor your > logs for the last commit, and do your backup a 10 seconds after that. > > Upayavira > > On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote: > > On 20/12/12 11:58, Upayavira wrote: > > > I've never used it, but the replication handler has an option: > > > > > >http://master_host:port/solr/replication?command=backup > > > > > > Which will take you a backup. > > I've looked at that this morning as suggested by Markus Jelsma. Looks > > good, but I'll have to work out how to use the resultant backup > > directory. I've been dealing with another unrelated issue in the > > mean-time and I haven't had a chance to look for any docu so far. > > > Also something to note, if you don't want to use the above, and you are > > > running on Unix, you can create fast 'hard link' clones of lucene > > > indexes. Doing: > > > > > > cp -lr data data.bak > > > > > > will copy your index instantly. If you can avoid doing this when a > > > commit is happening, then you'll have a good index copy, that will take > > > no space on your disk and be made instantly. This is because it just > > > copies the directory structure, not the files themselves, and given > > > files in a lucene index never change (they are only ever deleted or > > > replaced), this works as a good copy technique for backing up. > > That's the approach that Shawn Heisey proposed, and what I've been > > working towards, but it still leaves open the question of how to > > *pause* SolR or prevent commits during the backup (otherwise we have a > > potential race condition). > > > > -Andy > > > > > > -- > > Andy D'Arcy Jewell > > > > SysMicro Limited > > Linux Support > > E: andy.jew...@sysmicro.co.uk > > W: www.sysmicro.co.uk > > >
Re: Pause and resume indexing on SolR 4 for backups
The backup directory should just be a clone of the index files. I'm curious to know whether it is a cp -r or a cp -lr that the replication handler produces. You would prevent commits by telling your app not to commit. That is, Solr only commits when it is *told* to. Unless you use autocommit, in which case I guess you could monitor your logs for the last commit, and do your backup a 10 seconds after that. Upayavira On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote: > On 20/12/12 11:58, Upayavira wrote: > > I've never used it, but the replication handler has an option: > > > >http://master_host:port/solr/replication?command=backup > > > > Which will take you a backup. > I've looked at that this morning as suggested by Markus Jelsma. Looks > good, but I'll have to work out how to use the resultant backup > directory. I've been dealing with another unrelated issue in the > mean-time and I haven't had a chance to look for any docu so far. > > Also something to note, if you don't want to use the above, and you are > > running on Unix, you can create fast 'hard link' clones of lucene > > indexes. Doing: > > > > cp -lr data data.bak > > > > will copy your index instantly. If you can avoid doing this when a > > commit is happening, then you'll have a good index copy, that will take > > no space on your disk and be made instantly. This is because it just > > copies the directory structure, not the files themselves, and given > > files in a lucene index never change (they are only ever deleted or > > replaced), this works as a good copy technique for backing up. > That's the approach that Shawn Heisey proposed, and what I've been > working towards, but it still leaves open the question of how to > *pause* SolR or prevent commits during the backup (otherwise we have a > potential race condition). > > -Andy > > > -- > Andy D'Arcy Jewell > > SysMicro Limited > Linux Support > E: andy.jew...@sysmicro.co.uk > W: www.sysmicro.co.uk >
Re: Pause and resume indexing on SolR 4 for backups
On 20/12/12 11:58, Upayavira wrote: I've never used it, but the replication handler has an option: http://master_host:port/solr/replication?command=backup Which will take you a backup. I've looked at that this morning as suggested by Markus Jelsma. Looks good, but I'll have to work out how to use the resultant backup directory. I've been dealing with another unrelated issue in the mean-time and I haven't had a chance to look for any docu so far. Also something to note, if you don't want to use the above, and you are running on Unix, you can create fast 'hard link' clones of lucene indexes. Doing: cp -lr data data.bak will copy your index instantly. If you can avoid doing this when a commit is happening, then you'll have a good index copy, that will take no space on your disk and be made instantly. This is because it just copies the directory structure, not the files themselves, and given files in a lucene index never change (they are only ever deleted or replaced), this works as a good copy technique for backing up. That's the approach that Shawn Heisey proposed, and what I've been working towards, but it still leaves open the question of how to *pause* SolR or prevent commits during the backup (otherwise we have a potential race condition). -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
I've never used it, but the replication handler has an option: http://master_host:port/solr/replication?command=backup Which will take you a backup. Also something to note, if you don't want to use the above, and you are running on Unix, you can create fast 'hard link' clones of lucene indexes. Doing: cp -lr data data.bak will copy your index instantly. If you can avoid doing this when a commit is happening, then you'll have a good index copy, that will take no space on your disk and be made instantly. This is because it just copies the directory structure, not the files themselves, and given files in a lucene index never change (they are only ever deleted or replaced), this works as a good copy technique for backing up. Upayavira On Thu, Dec 20, 2012, at 10:34 AM, Markus Jelsma wrote: > You can use the replication handler to fetch a complete snapshot of the > index over HTTP. > http://wiki.apache.org/solr/SolrReplication#HTTP_API > > > -Original message- > > From:Andy D'Arcy Jewell > > Sent: Thu 20-Dec-2012 11:23 > > To: solr-user@lucene.apache.org > > Subject: Pause and resume indexing on SolR 4 for backups > > > > Hi all. > > > > Can anyone advise me of a way to pause and resume SolR 4 so I can > > perform a backup? I need to be able to revert to a usable (though not > > necessarily complete) index after a crash or other "disaster" more > > quickly than a re-index operation would yield. > > > > I can't yet afford the "extravagance" of a separate SolR replica just > > for backups, and I'm not sure if I'll ever have the luxury. I'm > > currently running with just one node, be we are not yet live. > > > > I can think of the following ways to do this, each with various downsides: > > > > 1) Just backup the existing index files whilst indexing continues > > + Easy > > + Fast > > - Incomplete > > - Potential for corruption? (e.g. partial files) > > > > 2) Stop/Start Tomcat > > + Easy > > - Very slow and I/O, CPU intensive > > - Client gets errors when trying to connect > > > > 3) Block/unblock SolR port with IpTables > > + Fast > > - Client gets errors when trying to connect > > - Have to wait for existing transactions to complete (not sure how, > > maybe watch socket FD's in /proc) > > > > 4) Pause/Restart SolR service > > + Fast ? (hopefully) > > - Client gets errors when trying to connect > > > > In any event, the web app will have to gracefully handle unavailability > > of SolR, probably by displaying a "down for maintenance" message, but > > this should preferably be only a very short amount of time. > > > > Can anyone comment on my proposed solutions above, or provide any > > additional ones? > > > > Thanks for any input you can provide! > > > > -Andy > > > > -- > > Andy D'Arcy Jewell > > > > SysMicro Limited > > Linux Support > > E: andy.jew...@sysmicro.co.uk > > W: www.sysmicro.co.uk > > > >
Re: Pause and resume indexing on SolR 4 for backups
On 20 December 2012 16:14, Andy D'Arcy Jewell wrote: [...] > It's attached to a web-app, which accepts uploads and will be available > 24/7, with a global audience, so "pausing" it may be rather difficult (tho I > may put this to the developer - it may for instance be possible if he has a > small number of choke points for input into SolR). [...] It adds work for the web developer, but one could pause indexing, put indexing requests into some kind of a queuing system, do the backup, and flush the queue when the backup is done. Regards, Gora
Re: Pause and resume indexing on SolR 4 for backups
On 20/12/12 10:24, Gora Mohanty wrote: Unless I am missing something, the index is only being written to when you are adding/updating the index. So, the question is how is this being done in your case, and could you pause indexing for the duration of the backup? Regards, Gora It's attached to a web-app, which accepts uploads and will be available 24/7, with a global audience, so "pausing" it may be rather difficult (tho I may put this to the developer - it may for instance be possible if he has a small number of choke points for input into SolR). Thanks. -- Andy D'Arcy Jewell SysMicro Limited Linux Support T: 0844 9918804 M: 07961605631 E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
RE: Pause and resume indexing on SolR 4 for backups
You can use the replication handler to fetch a complete snapshot of the index over HTTP. http://wiki.apache.org/solr/SolrReplication#HTTP_API -Original message- > From:Andy D'Arcy Jewell > Sent: Thu 20-Dec-2012 11:23 > To: solr-user@lucene.apache.org > Subject: Pause and resume indexing on SolR 4 for backups > > Hi all. > > Can anyone advise me of a way to pause and resume SolR 4 so I can > perform a backup? I need to be able to revert to a usable (though not > necessarily complete) index after a crash or other "disaster" more > quickly than a re-index operation would yield. > > I can't yet afford the "extravagance" of a separate SolR replica just > for backups, and I'm not sure if I'll ever have the luxury. I'm > currently running with just one node, be we are not yet live. > > I can think of the following ways to do this, each with various downsides: > > 1) Just backup the existing index files whilst indexing continues > + Easy > + Fast > - Incomplete > - Potential for corruption? (e.g. partial files) > > 2) Stop/Start Tomcat > + Easy > - Very slow and I/O, CPU intensive > - Client gets errors when trying to connect > > 3) Block/unblock SolR port with IpTables > + Fast > - Client gets errors when trying to connect > - Have to wait for existing transactions to complete (not sure how, > maybe watch socket FD's in /proc) > > 4) Pause/Restart SolR service > + Fast ? (hopefully) > - Client gets errors when trying to connect > > In any event, the web app will have to gracefully handle unavailability > of SolR, probably by displaying a "down for maintenance" message, but > this should preferably be only a very short amount of time. > > Can anyone comment on my proposed solutions above, or provide any > additional ones? > > Thanks for any input you can provide! > > -Andy > > -- > Andy D'Arcy Jewell > > SysMicro Limited > Linux Support > E: andy.jew...@sysmicro.co.uk > W: www.sysmicro.co.uk > >
Re: Pause and resume indexing on SolR 4 for backups
On 20 December 2012 15:46, Andy D'Arcy Jewell wrote: > Hi all. > > Can anyone advise me of a way to pause and resume SolR 4 so I can perform a > backup? I need to be able to revert to a usable (though not necessarily > complete) index after a crash or other "disaster" more quickly than a > re-index operation would yield. [...] Unless I am missing something, the index is only being written to when you are adding/updating the index. So, the question is how is this being done in your case, and could you pause indexing for the duration of the backup? Regards, Gora
Pause and resume indexing on SolR 4 for backups
Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other "disaster" more quickly than a re-index operation would yield. I can't yet afford the "extravagance" of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a "down for maintenance" message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk