Re: Pause and resume indexing on SolR 4 for backups

2013-01-09 Thread Upayavira
The point was as much about how to use a backup, as to how to make one
in the first place. the replication handler can handle spitting out a
backup, but there's no straightforward way to tell Solr to switch to
another set of index files instead. You'd have to do clever stuff with
the CoreAdminHandler, I reckon.

Upayavira

On Wed, Jan 9, 2013, at 09:27 PM, Paul Jungwirth wrote:
> Yes, I agree about making sure the backups actually work, whatever the
> approach. Thanks for your reply and all you've contributed to the
> Solr/Lucene community. The Lucene in Action book has been a huge help to
> me.
> 
> Paul
> 
> 
> On Wed, Jan 9, 2013 at 12:16 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
> 
> > Hi Paul,
> >
> > Hot backup is OK.  There was a thread on this topic yesterday and the day
> > before.  But you should always try running from backup regardless of what
> > anyone says here, because if you have to do that one day you want to
> > know you verified it :)
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth
> > wrote:
> >
> > > > Are you sure a commit didn't happen between?
> > > > Also, a background merge might have happened.
> > > >
> > > > As to using a backup, you are right, just stop solr,
> > > > put the snapshot into index/data, and restart.
> > >
> > >
> > > This was mentioned before but seems not to have gotten any attention:
> > can't
> > > you use the ReplicationHandler by just going to a URL like this?:
> > >
> > >
> > >
> > >
> > http://host:8080/solr/replication?command=backup&location=/home/jboss/backup
> > >
> > > The 2nd edition Lucene in Action book describes a way to take hot backups
> > > without stopping your IndexWriter (pp. 374ff), and it appears that
> > > ReplicationHandler uses a similar strategy if I'm reading the code
> > > correctly (Solr 3.6.1; I guess v4 is the same).
> > >
> > > It'd be great if someone more knowledgeable could confirm that you can
> > use
> > > the ReplicationHandler to take hot backups. I'm surprised to see such a
> > > long thread about starting/stopping index jobs when there is such an easy
> > > answer. Or am I mistaken and at risk of corrupt backups if I use it?
> > >
> > > Thanks,
> > > Paul
> > >
> > > --
> > > _
> > > Pulchritudo splendor veritatis.
> > >
> >
> 
> 
> 
> -- 
> _
> Pulchritudo splendor veritatis.


Re: Pause and resume indexing on SolR 4 for backups

2013-01-09 Thread Paul Jungwirth
Yes, I agree about making sure the backups actually work, whatever the
approach. Thanks for your reply and all you've contributed to the
Solr/Lucene community. The Lucene in Action book has been a huge help to me.

Paul


On Wed, Jan 9, 2013 at 12:16 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Paul,
>
> Hot backup is OK.  There was a thread on this topic yesterday and the day
> before.  But you should always try running from backup regardless of what
> anyone says here, because if you have to do that one day you want to
> know you verified it :)
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth
> wrote:
>
> > > Are you sure a commit didn't happen between?
> > > Also, a background merge might have happened.
> > >
> > > As to using a backup, you are right, just stop solr,
> > > put the snapshot into index/data, and restart.
> >
> >
> > This was mentioned before but seems not to have gotten any attention:
> can't
> > you use the ReplicationHandler by just going to a URL like this?:
> >
> >
> >
> >
> http://host:8080/solr/replication?command=backup&location=/home/jboss/backup
> >
> > The 2nd edition Lucene in Action book describes a way to take hot backups
> > without stopping your IndexWriter (pp. 374ff), and it appears that
> > ReplicationHandler uses a similar strategy if I'm reading the code
> > correctly (Solr 3.6.1; I guess v4 is the same).
> >
> > It'd be great if someone more knowledgeable could confirm that you can
> use
> > the ReplicationHandler to take hot backups. I'm surprised to see such a
> > long thread about starting/stopping index jobs when there is such an easy
> > answer. Or am I mistaken and at risk of corrupt backups if I use it?
> >
> > Thanks,
> > Paul
> >
> > --
> > _
> > Pulchritudo splendor veritatis.
> >
>



-- 
_
Pulchritudo splendor veritatis.


Re: Pause and resume indexing on SolR 4 for backups

2013-01-09 Thread Otis Gospodnetic
Hi Paul,

Hot backup is OK.  There was a thread on this topic yesterday and the day
before.  But you should always try running from backup regardless of what
anyone says here, because if you have to do that one day you want to
know you verified it :)

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth
wrote:

> > Are you sure a commit didn't happen between?
> > Also, a background merge might have happened.
> >
> > As to using a backup, you are right, just stop solr,
> > put the snapshot into index/data, and restart.
>
>
> This was mentioned before but seems not to have gotten any attention: can't
> you use the ReplicationHandler by just going to a URL like this?:
>
>
>
> http://host:8080/solr/replication?command=backup&location=/home/jboss/backup
>
> The 2nd edition Lucene in Action book describes a way to take hot backups
> without stopping your IndexWriter (pp. 374ff), and it appears that
> ReplicationHandler uses a similar strategy if I'm reading the code
> correctly (Solr 3.6.1; I guess v4 is the same).
>
> It'd be great if someone more knowledgeable could confirm that you can use
> the ReplicationHandler to take hot backups. I'm surprised to see such a
> long thread about starting/stopping index jobs when there is such an easy
> answer. Or am I mistaken and at risk of corrupt backups if I use it?
>
> Thanks,
> Paul
>
> --
> _
> Pulchritudo splendor veritatis.
>


Re: Pause and resume indexing on SolR 4 for backups

2013-01-09 Thread Paul Jungwirth
> Are you sure a commit didn't happen between?
> Also, a background merge might have happened.
>
> As to using a backup, you are right, just stop solr,
> put the snapshot into index/data, and restart.


This was mentioned before but seems not to have gotten any attention: can't
you use the ReplicationHandler by just going to a URL like this?:


http://host:8080/solr/replication?command=backup&location=/home/jboss/backup

The 2nd edition Lucene in Action book describes a way to take hot backups
without stopping your IndexWriter (pp. 374ff), and it appears that
ReplicationHandler uses a similar strategy if I'm reading the code
correctly (Solr 3.6.1; I guess v4 is the same).

It'd be great if someone more knowledgeable could confirm that you can use
the ReplicationHandler to take hot backups. I'm surprised to see such a
long thread about starting/stopping index jobs when there is such an easy
answer. Or am I mistaken and at risk of corrupt backups if I use it?

Thanks,
Paul

-- 
_
Pulchritudo splendor veritatis.


Re: Pause and resume indexing on SolR 4 for backups

2012-12-21 Thread Andy D'Arcy Jewell

On 20/12/12 20:19, alx...@aim.com wrote:

Depending on your architecture, why not index the same data into two machines? 
One will be your prod another your backup?
Because we're trying to keep costs and complexity low whilst in the 
development stage ;-)


But more seriously, this will obviously be a must sooner or later.

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
T:  0844 9918804
M:  07961605631
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread alxsss
Depending on your architecture, why not index the same data into two machines? 
One will be your prod another your backup?

Thanks.
Alex.

 

 

 

-Original Message-
From: Upayavira 
To: solr-user 
Sent: Thu, Dec 20, 2012 11:51 am
Subject: Re: Pause and resume indexing on SolR 4 for backups


You're saying that there's no chance to catch it in the middle of
writing the segments file?

Having said that, the segments file is pretty small, so the chance would
be pretty slim.

Upayavira

On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote:
> To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
> that the index is never in a bogus state. All data files are written and 
> flushed to disk, then the segments.* files are written that match the 
> data files. You can capture the files with a set of hard links to create 
> a backup.
> 
> The CheckIndex program will verify the index backup.
> java -cp yourcopy/lucene-core-SOMETHING.jar 
> org.apache.lucene.index.CheckIndex collection/data/index
> 
> lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
> Solr is unpacked.
> 
> On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:
> > Hi all.
> >
> > Can anyone advise me of a way to pause and resume SolR 4 so I can 
> > perform a backup? I need to be able to revert to a usable (though not 
> > necessarily complete) index after a crash or other "disaster" more 
> > quickly than a re-index operation would yield.
> >
> > I can't yet afford the "extravagance" of a separate SolR replica just 
> > for backups, and I'm not sure if I'll ever have the luxury. I'm 
> > currently running with just one node, be we are not yet live.
> >
> > I can think of the following ways to do this, each with various 
> > downsides:
> >
> > 1) Just backup the existing index files whilst indexing continues
> > + Easy
> > + Fast
> > - Incomplete
> > - Potential for corruption? (e.g. partial files)
> >
> > 2) Stop/Start Tomcat
> > + Easy
> > - Very slow and I/O, CPU intensive
> > - Client gets errors when trying to connect
> >
> > 3) Block/unblock SolR port with IpTables
> > + Fast
> > - Client gets errors when trying to connect
> > - Have to wait for existing transactions to complete (not sure 
> > how, maybe watch socket FD's in /proc)
> >
> > 4) Pause/Restart SolR service
> > + Fast ? (hopefully)
> > - Client gets errors when trying to connect
> >
> > In any event, the web app will have to gracefully handle 
> > unavailability of SolR, probably by displaying a "down for 
> > maintenance" message, but this should preferably be only a very short 
> > amount of time.
> >
> > Can anyone comment on my proposed solutions above, or provide any 
> > additional ones?
> >
> > Thanks for any input you can provide!
> >
> > -Andy
> >
> 

 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
You're saying that there's no chance to catch it in the middle of
writing the segments file?

Having said that, the segments file is pretty small, so the chance would
be pretty slim.

Upayavira

On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote:
> To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
> that the index is never in a bogus state. All data files are written and 
> flushed to disk, then the segments.* files are written that match the 
> data files. You can capture the files with a set of hard links to create 
> a backup.
> 
> The CheckIndex program will verify the index backup.
> java -cp yourcopy/lucene-core-SOMETHING.jar 
> org.apache.lucene.index.CheckIndex collection/data/index
> 
> lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
> Solr is unpacked.
> 
> On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:
> > Hi all.
> >
> > Can anyone advise me of a way to pause and resume SolR 4 so I can 
> > perform a backup? I need to be able to revert to a usable (though not 
> > necessarily complete) index after a crash or other "disaster" more 
> > quickly than a re-index operation would yield.
> >
> > I can't yet afford the "extravagance" of a separate SolR replica just 
> > for backups, and I'm not sure if I'll ever have the luxury. I'm 
> > currently running with just one node, be we are not yet live.
> >
> > I can think of the following ways to do this, each with various 
> > downsides:
> >
> > 1) Just backup the existing index files whilst indexing continues
> > + Easy
> > + Fast
> > - Incomplete
> > - Potential for corruption? (e.g. partial files)
> >
> > 2) Stop/Start Tomcat
> > + Easy
> > - Very slow and I/O, CPU intensive
> > - Client gets errors when trying to connect
> >
> > 3) Block/unblock SolR port with IpTables
> > + Fast
> > - Client gets errors when trying to connect
> > - Have to wait for existing transactions to complete (not sure 
> > how, maybe watch socket FD's in /proc)
> >
> > 4) Pause/Restart SolR service
> > + Fast ? (hopefully)
> > - Client gets errors when trying to connect
> >
> > In any event, the web app will have to gracefully handle 
> > unavailability of SolR, probably by displaying a "down for 
> > maintenance" message, but this should preferably be only a very short 
> > amount of time.
> >
> > Can anyone comment on my proposed solutions above, or provide any 
> > additional ones?
> >
> > Thanks for any input you can provide!
> >
> > -Andy
> >
> 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Lance Norskog
To be clear: 1) is fine. Lucene index updates are carefully sequenced so 
that the index is never in a bogus state. All data files are written and 
flushed to disk, then the segments.* files are written that match the 
data files. You can capture the files with a set of hard links to create 
a backup.


The CheckIndex program will verify the index backup.
java -cp yourcopy/lucene-core-SOMETHING.jar 
org.apache.lucene.index.CheckIndex collection/data/index


lucene-core-SOMETHING.jar is usually in the solr-webapp directory where 
Solr is unpacked.


On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote:

Hi all.

Can anyone advise me of a way to pause and resume SolR 4 so I can 
perform a backup? I need to be able to revert to a usable (though not 
necessarily complete) index after a crash or other "disaster" more 
quickly than a re-index operation would yield.


I can't yet afford the "extravagance" of a separate SolR replica just 
for backups, and I'm not sure if I'll ever have the luxury. I'm 
currently running with just one node, be we are not yet live.


I can think of the following ways to do this, each with various 
downsides:


1) Just backup the existing index files whilst indexing continues
+ Easy
+ Fast
- Incomplete
- Potential for corruption? (e.g. partial files)

2) Stop/Start Tomcat
+ Easy
- Very slow and I/O, CPU intensive
- Client gets errors when trying to connect

3) Block/unblock SolR port with IpTables
+ Fast
- Client gets errors when trying to connect
- Have to wait for existing transactions to complete (not sure 
how, maybe watch socket FD's in /proc)


4) Pause/Restart SolR service
+ Fast ? (hopefully)
- Client gets errors when trying to connect

In any event, the web app will have to gracefully handle 
unavailability of SolR, probably by displaying a "down for 
maintenance" message, but this should preferably be only a very short 
amount of time.


Can anyone comment on my proposed solutions above, or provide any 
additional ones?


Thanks for any input you can provide!

-Andy





Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
Are you sure a commit didn't happen between? Also, a background merge
might have happened.

As to using a backup, you are right, just stop solr, put the snapshot
into index/data, and restart.

Upayavira

On Thu, Dec 20, 2012, at 05:16 PM, Andy D'Arcy Jewell wrote:
> On 20/12/12 13:38, Upayavira wrote:
> > The backup directory should just be a clone of the index files. I'm
> > curious to know whether it is a cp -r or a cp -lr that the replication
> > handler produces.
> >
> > You would prevent commits by telling your app not to commit. That is,
> > Solr only commits when it is *told* to.
> >
> > Unless you use autocommit, in which case I guess you could monitor your
> > logs for the last commit, and do your backup a 10 seconds after that.
> >
> >
> Hmm. Strange - the files created by the backup API don't seem to 
> correlate exactly with the files stored under the solr data directory:
> 
> andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/
> /tmp/snapshot.20121220155853703/
> /tmp/snapshot.20121220155853703/_2vq.fdx
> /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim
> /tmp/snapshot.20121220155853703/segments_2vs
> /tmp/snapshot.20121220155853703/_2vq_nrm.cfs
> /tmp/snapshot.20121220155853703/_2vq.fnm
> /tmp/snapshot.20121220155853703/_2vq_nrm.cfe
> /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq
> /tmp/snapshot.20121220155853703/_2vq.fdt
> /tmp/snapshot.20121220155853703/_2vq.si
> /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip
> andydj@me-solr01:~$ find /var/lib/solr/data/index/
> /var/lib/solr/data/index/
> /var/lib/solr/data/index/_2w6_Lucene40_0.frq
> /var/lib/solr/data/index/_2w6.si
> /var/lib/solr/data/index/segments_2w8
> /var/lib/solr/data/index/write.lock
> /var/lib/solr/data/index/_2w6_nrm.cfs
> /var/lib/solr/data/index/_2w6.fdx
> /var/lib/solr/data/index/_2w6_Lucene40_0.tip
> /var/lib/solr/data/index/_2w6_nrm.cfe
> /var/lib/solr/data/index/segments.gen
> /var/lib/solr/data/index/_2w6.fnm
> /var/lib/solr/data/index/_2w6.fdt
> /var/lib/solr/data/index/_2w6_Lucene40_0.tim
> 
> Am I correct in thinking that to restore from this backup, I would need 
> to do the following?
> 
> 1. Stop Tomcat (or maybe just solr)
> 2. Remove all files under /var/lib/solr/data/index/
> 3. Move/copy files from /tmp/snapshot.20121220155853703/ to 
> /var/lib/solr/data/index/
> 4. Restart Tomcat (or just solr)
> 
> 
> Thanks everyone who's pitched in on this! Once I've got this working, 
> I'll document it.
> -Andy
> 
> -- 
> Andy D'Arcy Jewell
> 
> SysMicro Limited
> Linux Support
> E:  andy.jew...@sysmicro.co.uk
> W:  www.sysmicro.co.uk
> 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 13:38, Upayavira wrote:

The backup directory should just be a clone of the index files. I'm
curious to know whether it is a cp -r or a cp -lr that the replication
handler produces.

You would prevent commits by telling your app not to commit. That is,
Solr only commits when it is *told* to.

Unless you use autocommit, in which case I guess you could monitor your
logs for the last commit, and do your backup a 10 seconds after that.


Hmm. Strange - the files created by the backup API don't seem to 
correlate exactly with the files stored under the solr data directory:


andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/
/tmp/snapshot.20121220155853703/
/tmp/snapshot.20121220155853703/_2vq.fdx
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim
/tmp/snapshot.20121220155853703/segments_2vs
/tmp/snapshot.20121220155853703/_2vq_nrm.cfs
/tmp/snapshot.20121220155853703/_2vq.fnm
/tmp/snapshot.20121220155853703/_2vq_nrm.cfe
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq
/tmp/snapshot.20121220155853703/_2vq.fdt
/tmp/snapshot.20121220155853703/_2vq.si
/tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip
andydj@me-solr01:~$ find /var/lib/solr/data/index/
/var/lib/solr/data/index/
/var/lib/solr/data/index/_2w6_Lucene40_0.frq
/var/lib/solr/data/index/_2w6.si
/var/lib/solr/data/index/segments_2w8
/var/lib/solr/data/index/write.lock
/var/lib/solr/data/index/_2w6_nrm.cfs
/var/lib/solr/data/index/_2w6.fdx
/var/lib/solr/data/index/_2w6_Lucene40_0.tip
/var/lib/solr/data/index/_2w6_nrm.cfe
/var/lib/solr/data/index/segments.gen
/var/lib/solr/data/index/_2w6.fnm
/var/lib/solr/data/index/_2w6.fdt
/var/lib/solr/data/index/_2w6_Lucene40_0.tim

Am I correct in thinking that to restore from this backup, I would need 
to do the following?


1. Stop Tomcat (or maybe just solr)
2. Remove all files under /var/lib/solr/data/index/
3. Move/copy files from /tmp/snapshot.20121220155853703/ to 
/var/lib/solr/data/index/

4. Restart Tomcat (or just solr)


Thanks everyone who's pitched in on this! Once I've got this working, 
I'll document it.

-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
That's neat, but wouldn't that run on every commit? How would you use it
to, say, back up once a day?

Upayavira

On Thu, Dec 20, 2012, at 01:57 PM, Markus Jelsma wrote:
> You can use the postCommit event in updateHandler to execute a task. 
>  
> -Original message-
> > From:Upayavira 
> > Sent: Thu 20-Dec-2012 14:45
> > To: solr-user@lucene.apache.org
> > Subject: Re: Pause and resume indexing on SolR 4 for backups
> > 
> > The backup directory should just be a clone of the index files. I'm
> > curious to know whether it is a cp -r or a cp -lr that the replication
> > handler produces.
> > 
> > You would prevent commits by telling your app not to commit. That is,
> > Solr only commits when it is *told* to.
> > 
> > Unless you use autocommit, in which case I guess you could monitor your
> > logs for the last commit, and do your backup a 10 seconds after that.
> > 
> > Upayavira
> > 
> > On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
> > > On 20/12/12 11:58, Upayavira wrote:
> > > > I've never used it, but the replication handler has an option:
> > > >
> > > >http://master_host:port/solr/replication?command=backup
> > > >
> > > > Which will take you a backup.
> > > I've looked at that this morning as suggested by Markus Jelsma. Looks 
> > > good, but I'll have to work out how to use the resultant backup 
> > > directory. I've been dealing with another unrelated issue in the 
> > > mean-time and I haven't had a chance to look for any docu so far.
> > > > Also something to note, if you don't want to use the above, and you are
> > > > running on Unix, you can create fast 'hard link' clones of lucene
> > > > indexes. Doing:
> > > >
> > > > cp -lr data data.bak
> > > >
> > > > will copy your index instantly. If you can avoid doing this when a
> > > > commit is happening, then you'll have a good index copy, that will take
> > > > no space on your disk and be made instantly. This is because it just
> > > > copies the directory structure, not the files themselves, and given
> > > > files in a lucene index never change (they are only ever deleted or
> > > > replaced), this works as a good copy technique for backing up.
> > > That's the approach that Shawn Heisey proposed, and what I've been 
> > > working towards,  but it still leaves open the question of how to 
> > > *pause* SolR or prevent commits during the backup (otherwise we have a 
> > > potential race condition).
> > > 
> > > -Andy
> > > 
> > > 
> > > -- 
> > > Andy D'Arcy Jewell
> > > 
> > > SysMicro Limited
> > > Linux Support
> > > E:  andy.jew...@sysmicro.co.uk
> > > W:  www.sysmicro.co.uk
> > > 
> > 


RE: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Markus Jelsma
You can use the postCommit event in updateHandler to execute a task. 
 
-Original message-
> From:Upayavira 
> Sent: Thu 20-Dec-2012 14:45
> To: solr-user@lucene.apache.org
> Subject: Re: Pause and resume indexing on SolR 4 for backups
> 
> The backup directory should just be a clone of the index files. I'm
> curious to know whether it is a cp -r or a cp -lr that the replication
> handler produces.
> 
> You would prevent commits by telling your app not to commit. That is,
> Solr only commits when it is *told* to.
> 
> Unless you use autocommit, in which case I guess you could monitor your
> logs for the last commit, and do your backup a 10 seconds after that.
> 
> Upayavira
> 
> On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
> > On 20/12/12 11:58, Upayavira wrote:
> > > I've never used it, but the replication handler has an option:
> > >
> > >http://master_host:port/solr/replication?command=backup
> > >
> > > Which will take you a backup.
> > I've looked at that this morning as suggested by Markus Jelsma. Looks 
> > good, but I'll have to work out how to use the resultant backup 
> > directory. I've been dealing with another unrelated issue in the 
> > mean-time and I haven't had a chance to look for any docu so far.
> > > Also something to note, if you don't want to use the above, and you are
> > > running on Unix, you can create fast 'hard link' clones of lucene
> > > indexes. Doing:
> > >
> > > cp -lr data data.bak
> > >
> > > will copy your index instantly. If you can avoid doing this when a
> > > commit is happening, then you'll have a good index copy, that will take
> > > no space on your disk and be made instantly. This is because it just
> > > copies the directory structure, not the files themselves, and given
> > > files in a lucene index never change (they are only ever deleted or
> > > replaced), this works as a good copy technique for backing up.
> > That's the approach that Shawn Heisey proposed, and what I've been 
> > working towards,  but it still leaves open the question of how to 
> > *pause* SolR or prevent commits during the backup (otherwise we have a 
> > potential race condition).
> > 
> > -Andy
> > 
> > 
> > -- 
> > Andy D'Arcy Jewell
> > 
> > SysMicro Limited
> > Linux Support
> > E:  andy.jew...@sysmicro.co.uk
> > W:  www.sysmicro.co.uk
> > 
> 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
The backup directory should just be a clone of the index files. I'm
curious to know whether it is a cp -r or a cp -lr that the replication
handler produces.

You would prevent commits by telling your app not to commit. That is,
Solr only commits when it is *told* to.

Unless you use autocommit, in which case I guess you could monitor your
logs for the last commit, and do your backup a 10 seconds after that.

Upayavira

On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote:
> On 20/12/12 11:58, Upayavira wrote:
> > I've never used it, but the replication handler has an option:
> >
> >http://master_host:port/solr/replication?command=backup
> >
> > Which will take you a backup.
> I've looked at that this morning as suggested by Markus Jelsma. Looks 
> good, but I'll have to work out how to use the resultant backup 
> directory. I've been dealing with another unrelated issue in the 
> mean-time and I haven't had a chance to look for any docu so far.
> > Also something to note, if you don't want to use the above, and you are
> > running on Unix, you can create fast 'hard link' clones of lucene
> > indexes. Doing:
> >
> > cp -lr data data.bak
> >
> > will copy your index instantly. If you can avoid doing this when a
> > commit is happening, then you'll have a good index copy, that will take
> > no space on your disk and be made instantly. This is because it just
> > copies the directory structure, not the files themselves, and given
> > files in a lucene index never change (they are only ever deleted or
> > replaced), this works as a good copy technique for backing up.
> That's the approach that Shawn Heisey proposed, and what I've been 
> working towards,  but it still leaves open the question of how to 
> *pause* SolR or prevent commits during the backup (otherwise we have a 
> potential race condition).
> 
> -Andy
> 
> 
> -- 
> Andy D'Arcy Jewell
> 
> SysMicro Limited
> Linux Support
> E:  andy.jew...@sysmicro.co.uk
> W:  www.sysmicro.co.uk
> 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 11:58, Upayavira wrote:

I've never used it, but the replication handler has an option:

   http://master_host:port/solr/replication?command=backup

Which will take you a backup.
I've looked at that this morning as suggested by Markus Jelsma. Looks 
good, but I'll have to work out how to use the resultant backup 
directory. I've been dealing with another unrelated issue in the 
mean-time and I haven't had a chance to look for any docu so far.

Also something to note, if you don't want to use the above, and you are
running on Unix, you can create fast 'hard link' clones of lucene
indexes. Doing:

cp -lr data data.bak

will copy your index instantly. If you can avoid doing this when a
commit is happening, then you'll have a good index copy, that will take
no space on your disk and be made instantly. This is because it just
copies the directory structure, not the files themselves, and given
files in a lucene index never change (they are only ever deleted or
replaced), this works as a good copy technique for backing up.
That's the approach that Shawn Heisey proposed, and what I've been 
working towards,  but it still leaves open the question of how to 
*pause* SolR or prevent commits during the backup (otherwise we have a 
potential race condition).


-Andy


--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Upayavira
I've never used it, but the replication handler has an option:

  http://master_host:port/solr/replication?command=backup 

Which will take you a backup.

Also something to note, if you don't want to use the above, and you are
running on Unix, you can create fast 'hard link' clones of lucene
indexes. Doing:

cp -lr data data.bak

will copy your index instantly. If you can avoid doing this when a
commit is happening, then you'll have a good index copy, that will take
no space on your disk and be made instantly. This is because it just
copies the directory structure, not the files themselves, and given
files in a lucene index never change (they are only ever deleted or
replaced), this works as a good copy technique for backing up.

Upayavira

On Thu, Dec 20, 2012, at 10:34 AM, Markus Jelsma wrote:
> You can use the replication handler to fetch a complete snapshot of the
> index over HTTP.
> http://wiki.apache.org/solr/SolrReplication#HTTP_API
>  
>  
> -Original message-
> > From:Andy D'Arcy Jewell 
> > Sent: Thu 20-Dec-2012 11:23
> > To: solr-user@lucene.apache.org
> > Subject: Pause and resume indexing on SolR 4 for backups
> > 
> > Hi all.
> > 
> > Can anyone advise me of a way to pause and resume SolR 4 so I can 
> > perform a backup? I need to be able to revert to a usable (though not 
> > necessarily complete) index after a crash or other "disaster" more 
> > quickly than a re-index operation would yield.
> > 
> > I can't yet afford the "extravagance" of a separate SolR replica just 
> > for backups, and I'm not sure if I'll ever have the luxury. I'm 
> > currently running with just one node, be we are not yet live.
> > 
> > I can think of the following ways to do this, each with various downsides:
> > 
> > 1) Just backup the existing index files whilst indexing continues
> >  + Easy
> >  + Fast
> >  - Incomplete
> >  - Potential for corruption? (e.g. partial files)
> > 
> > 2) Stop/Start Tomcat
> >  + Easy
> >  - Very slow and I/O, CPU intensive
> >  - Client gets errors when trying to connect
> > 
> > 3) Block/unblock SolR port with IpTables
> >  + Fast
> >  - Client gets errors when trying to connect
> >  - Have to wait for existing transactions to complete (not sure how, 
> > maybe watch socket FD's in /proc)
> > 
> > 4) Pause/Restart SolR service
> >  + Fast ? (hopefully)
> >  - Client gets errors when trying to connect
> > 
> > In any event, the web app will have to gracefully handle unavailability 
> > of SolR, probably by displaying a "down for maintenance" message, but 
> > this should preferably be only a very short amount of time.
> > 
> > Can anyone comment on my proposed solutions above, or provide any 
> > additional ones?
> > 
> > Thanks for any input you can provide!
> > 
> > -Andy
> > 
> > -- 
> > Andy D'Arcy Jewell
> > 
> > SysMicro Limited
> > Linux Support
> > E:  andy.jew...@sysmicro.co.uk
> > W:  www.sysmicro.co.uk
> > 
> > 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Gora Mohanty
On 20 December 2012 16:14, Andy D'Arcy Jewell
 wrote:
[...]
> It's attached to a web-app, which accepts uploads and will be available
> 24/7, with a global audience, so "pausing" it may be rather difficult (tho I
> may put this to the developer - it may for instance be possible if he has a
> small number of choke points for input into SolR).
[...]

It adds work for the web developer, but one could pause indexing,
put indexing requests into some kind of a queuing system, do the
backup, and flush the queue when the backup is done.

Regards,
Gora


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

On 20/12/12 10:24, Gora Mohanty wrote:


Unless I am missing something, the index is only being written to
when you are adding/updating the index. So, the question is how
is this being done in your case, and could you pause indexing for
the duration of the backup?

Regards,
Gora
It's attached to a web-app, which accepts uploads and will be available 
24/7, with a global audience, so "pausing" it may be rather difficult 
(tho I may put this to the developer - it may for instance be possible 
if he has a small number of choke points for input into SolR).


Thanks.

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
T:  0844 9918804
M:  07961605631
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



RE: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Markus Jelsma
You can use the replication handler to fetch a complete snapshot of the index 
over HTTP.
http://wiki.apache.org/solr/SolrReplication#HTTP_API
 
 
-Original message-
> From:Andy D'Arcy Jewell 
> Sent: Thu 20-Dec-2012 11:23
> To: solr-user@lucene.apache.org
> Subject: Pause and resume indexing on SolR 4 for backups
> 
> Hi all.
> 
> Can anyone advise me of a way to pause and resume SolR 4 so I can 
> perform a backup? I need to be able to revert to a usable (though not 
> necessarily complete) index after a crash or other "disaster" more 
> quickly than a re-index operation would yield.
> 
> I can't yet afford the "extravagance" of a separate SolR replica just 
> for backups, and I'm not sure if I'll ever have the luxury. I'm 
> currently running with just one node, be we are not yet live.
> 
> I can think of the following ways to do this, each with various downsides:
> 
> 1) Just backup the existing index files whilst indexing continues
>  + Easy
>  + Fast
>  - Incomplete
>  - Potential for corruption? (e.g. partial files)
> 
> 2) Stop/Start Tomcat
>  + Easy
>  - Very slow and I/O, CPU intensive
>  - Client gets errors when trying to connect
> 
> 3) Block/unblock SolR port with IpTables
>  + Fast
>  - Client gets errors when trying to connect
>  - Have to wait for existing transactions to complete (not sure how, 
> maybe watch socket FD's in /proc)
> 
> 4) Pause/Restart SolR service
>  + Fast ? (hopefully)
>  - Client gets errors when trying to connect
> 
> In any event, the web app will have to gracefully handle unavailability 
> of SolR, probably by displaying a "down for maintenance" message, but 
> this should preferably be only a very short amount of time.
> 
> Can anyone comment on my proposed solutions above, or provide any 
> additional ones?
> 
> Thanks for any input you can provide!
> 
> -Andy
> 
> -- 
> Andy D'Arcy Jewell
> 
> SysMicro Limited
> Linux Support
> E:  andy.jew...@sysmicro.co.uk
> W:  www.sysmicro.co.uk
> 
> 


Re: Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Gora Mohanty
On 20 December 2012 15:46, Andy D'Arcy Jewell
 wrote:
> Hi all.
>
> Can anyone advise me of a way to pause and resume SolR 4 so I can perform a
> backup? I need to be able to revert to a usable (though not necessarily
> complete) index after a crash or other "disaster" more quickly than a
> re-index operation would yield.
[...]

Unless I am missing something, the index is only being written to
when you are adding/updating the index. So, the question is how
is this being done in your case, and could you pause indexing for
the duration of the backup?

Regards,
Gora


Pause and resume indexing on SolR 4 for backups

2012-12-20 Thread Andy D'Arcy Jewell

Hi all.

Can anyone advise me of a way to pause and resume SolR 4 so I can 
perform a backup? I need to be able to revert to a usable (though not 
necessarily complete) index after a crash or other "disaster" more 
quickly than a re-index operation would yield.


I can't yet afford the "extravagance" of a separate SolR replica just 
for backups, and I'm not sure if I'll ever have the luxury. I'm 
currently running with just one node, be we are not yet live.


I can think of the following ways to do this, each with various downsides:

1) Just backup the existing index files whilst indexing continues
+ Easy
+ Fast
- Incomplete
- Potential for corruption? (e.g. partial files)

2) Stop/Start Tomcat
+ Easy
- Very slow and I/O, CPU intensive
- Client gets errors when trying to connect

3) Block/unblock SolR port with IpTables
+ Fast
- Client gets errors when trying to connect
- Have to wait for existing transactions to complete (not sure how, 
maybe watch socket FD's in /proc)


4) Pause/Restart SolR service
+ Fast ? (hopefully)
- Client gets errors when trying to connect

In any event, the web app will have to gracefully handle unavailability 
of SolR, probably by displaying a "down for maintenance" message, but 
this should preferably be only a very short amount of time.


Can anyone comment on my proposed solutions above, or provide any 
additional ones?


Thanks for any input you can provide!

-Andy

--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk