Re: Create incremental snapshot

2009-07-16 Thread Chris Hostetter

: Thanks for the reply Asif. We have already tried removing the optimization
: step. Unfortunately the commit command alone is also causing an identical
: behaviour . Is there any thing else that we are missing ?

the hardlinking behavior of snapshots is based on the files in the index 
directory, and the files in the index directory are based on the current 
segments of your index -- so if you make enough changes to your index to 
cause all of hte segments to change every snapshot will be different.

optimizing garuntees you that every segment will be different (because all 
the old segment are gone, and a new segment is created) but if your merge 
settings are set to be really aggressive, then it's euqally possible that 
some number of delete/add calls will also cause every segment to be 
replaced.

without your configs, and directory listings of subsequent snapshots, it's 
hard to guess what the problem might be (if you already stoped optimizing 
on every batch)

But i think we have an XY problem here...

:  This process continues for around 160,000 documents i.e. 800 times and by
:  the end of it we have 800 snapshots.

Why do you keep 800 snapshots?

you really only need snapshots arround long enough to ensure that a slave 
isn't snappulling in hte middle of deleteing it ... unless you have some 
really funky usecase where you want some of your query boxes to 
deliberately fetch old versions of hte index, you odn't really need more 
then couple of snapshots at one time.

it can be prudent to keep more snapshots then you need arround in case 
of logical index corruption (ie: someone foolishly deletes a bunch of 
docs they shouldn't have) because snapshots are *usually* more disk 
space efficient then full backup copies -- but if you are finding that 
that's not hte case, why bother keeping them?


-Hoss


Re: Create incremental snapshot

2009-07-12 Thread tushar kapoor

Thanks for the reply Asif. We have already tried removing the optimization
step. Unfortunately the commit command alone is also causing an identical
behaviour . Is there any thing else that we are missing ?


Asif Rahman wrote:
 
 Tushar:
 
 Is it necessary to do the optimize on each iteration?  When you run an
 optimize, the entire index is rewritten.  Thus each index file can have at
 most one hard link and each snapshot will consume the full amount of space
 on your disk.
 
 Asir
 
 On Thu, Jul 9, 2009 at 3:26 AM, tushar kapoor 
 tushar_kapoor...@rediffmail.com wrote:
 

 What I gather from this discussion is -

 1. Snapshots are always hard links and not actual files so they cannot
 possibly consume the same amountof space.
 2. Snapshots contain hard links to existing docs + delta docs.

 We are facing a situation wherein the snapshot occupies the same space as
 the actual indexes thus violating the first point.
 We have a batch processing scheme for refreshing indexes. the steps we
 follow are -

 1. Delete 200 documents in one go.
 2. Do an optimize.
 3. Create the 200 documents deleted earlier.
 4. Do a commit.

 This process continues for around 160,000 documents i.e. 800 times and by
 the end of it we have 800 snapshots.

 The size of actual indexes is 200 Mb and remarkably all the 800 snapshots
 are of size around 200 Mb each. In effect this process consumes around
 160
 Gb space on our disks. This is causing a lot of pain right now.

 My concern are - Is our understanding of the snapshooter correct ? Should
 this massive space consumption be happening at all ? Are we missing
 something critical ?

 Regards,
 Tushar.

 Shalin Shekhar Mangar wrote:
 
  On Sat, Apr 18, 2009 at 1:06 PM, Koushik Mitra
  koushik_mi...@infosys.comwrote:
 
  Ok
 
  If these are hard links, then where does the index data get stored?
 Those
  must be getting stored somewhere in the file system.
 
 
  Yes, of course they are stored on disk. The hard links are created from
  the
  actual files inside the index directory. When those older files are
  deleted
  by Solr, they are still left on the disk if at least one hard link to
 that
  file exists. If you are looking for how to clean old snapshots, you
 could
  use the snapcleaner script.
 
  Is that what you wanted to do?
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 

 --
 View this message in context:
 http://www.nabble.com/Create-incremental-snapshot-tp23109877p24405434.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Asif Rahman
 Lead Engineer - NewsCred
 a...@newscred.com
 http://platform.newscred.com
 
 
:-((
-- 
View this message in context: 
http://www.nabble.com/Create-incremental-snapshot-tp23109877p24447593.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Create incremental snapshot

2009-07-10 Thread Asif Rahman
Tushar:

Is it necessary to do the optimize on each iteration?  When you run an
optimize, the entire index is rewritten.  Thus each index file can have at
most one hard link and each snapshot will consume the full amount of space
on your disk.

Asir

On Thu, Jul 9, 2009 at 3:26 AM, tushar kapoor 
tushar_kapoor...@rediffmail.com wrote:


 What I gather from this discussion is -

 1. Snapshots are always hard links and not actual files so they cannot
 possibly consume the same amountof space.
 2. Snapshots contain hard links to existing docs + delta docs.

 We are facing a situation wherein the snapshot occupies the same space as
 the actual indexes thus violating the first point.
 We have a batch processing scheme for refreshing indexes. the steps we
 follow are -

 1. Delete 200 documents in one go.
 2. Do an optimize.
 3. Create the 200 documents deleted earlier.
 4. Do a commit.

 This process continues for around 160,000 documents i.e. 800 times and by
 the end of it we have 800 snapshots.

 The size of actual indexes is 200 Mb and remarkably all the 800 snapshots
 are of size around 200 Mb each. In effect this process consumes around 160
 Gb space on our disks. This is causing a lot of pain right now.

 My concern are - Is our understanding of the snapshooter correct ? Should
 this massive space consumption be happening at all ? Are we missing
 something critical ?

 Regards,
 Tushar.

 Shalin Shekhar Mangar wrote:
 
  On Sat, Apr 18, 2009 at 1:06 PM, Koushik Mitra
  koushik_mi...@infosys.comwrote:
 
  Ok
 
  If these are hard links, then where does the index data get stored?
 Those
  must be getting stored somewhere in the file system.
 
 
  Yes, of course they are stored on disk. The hard links are created from
  the
  actual files inside the index directory. When those older files are
  deleted
  by Solr, they are still left on the disk if at least one hard link to
 that
  file exists. If you are looking for how to clean old snapshots, you could
  use the snapcleaner script.
 
  Is that what you wanted to do?
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 

 --
 View this message in context:
 http://www.nabble.com/Create-incremental-snapshot-tp23109877p24405434.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com


Re: Create incremental snapshot

2009-07-09 Thread tushar kapoor

What I gather from this discussion is -

1. Snapshots are always hard links and not actual files so they cannot
possibly consume the same amountof space.
2. Snapshots contain hard links to existing docs + delta docs.

We are facing a situation wherein the snapshot occupies the same space as
the actual indexes thus violating the first point.
We have a batch processing scheme for refreshing indexes. the steps we
follow are -

1. Delete 200 documents in one go.
2. Do an optimize.
3. Create the 200 documents deleted earlier.
4. Do a commit.

This process continues for around 160,000 documents i.e. 800 times and by
the end of it we have 800 snapshots.

The size of actual indexes is 200 Mb and remarkably all the 800 snapshots
are of size around 200 Mb each. In effect this process consumes around 160
Gb space on our disks. This is causing a lot of pain right now.

My concern are - Is our understanding of the snapshooter correct ? Should
this massive space consumption be happening at all ? Are we missing
something critical ?

Regards,
Tushar.

Shalin Shekhar Mangar wrote:
 
 On Sat, Apr 18, 2009 at 1:06 PM, Koushik Mitra
 koushik_mi...@infosys.comwrote:
 
 Ok

 If these are hard links, then where does the index data get stored? Those
 must be getting stored somewhere in the file system.

 
 Yes, of course they are stored on disk. The hard links are created from
 the
 actual files inside the index directory. When those older files are
 deleted
 by Solr, they are still left on the disk if at least one hard link to that
 file exists. If you are looking for how to clean old snapshots, you could
 use the snapcleaner script.
 
 Is that what you wanted to do?
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Create-incremental-snapshot-tp23109877p24405434.html
Sent from the Solr - User mailing list archive at Nabble.com.



Create incremental snapshot

2009-04-18 Thread Koushik Mitra
Hi,

We want to create snapshot incrementally.

What we want is every time the snap shooter script runs, it should not create a 
snapshot with pre-existing (last snapshot indexes) + delta (newly created 
indexes), rather just create a snapshot with the delta (newly created indexes).

Any references here would be highly appreciated.

Regards,
Koushik

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Create incremental snapshot

2009-04-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
the snapshooter does not really copy any files. They ar just hardlinks
(does not consume disk space) so even a full copy is not very
expensive

On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra
koushik_mi...@infosys.com wrote:
 Hi,

 We want to create snapshot incrementally.

 What we want is every time the snap shooter script runs, it should not create 
 a snapshot with pre-existing (last snapshot indexes) + delta (newly created 
 indexes), rather just create a snapshot with the delta (newly created 
 indexes).

 Any references here would be highly appreciated.

 Regards,
 Koushik

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




-- 
--Noble Paul


Re: Create incremental snapshot

2009-04-18 Thread Koushik Mitra
When we run the snapshooter script, it creates a snapshot folder e.g. 
snapshot.20090418064010 and this snapshot folder contains physical index files 
which take space on the file system (as shown below). Are we missing anything 
here?

-rw-r-  46 test  test 59 Apr 17 23:26 _i.tii
-rw-r-  46 test  test507 Apr 17 23:26 _i.prx
-rw-r-  46 test  test 14 Apr 17 23:26 _i.nrm
-rw-r-  46 test  test333 Apr 17 23:26 _i.frq
-rw-r-  46 test  test135 Apr 17 23:26 _i.fnm
-rw-r-  46 test  test 12 Apr 17 23:26 _i.fdx
-rw-r-  46 test  test   1433 Apr 17 23:26 _i.fdt

Regards,
Koushik



On 18/04/09 12:17 PM, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com wrote:

the snapshooter does not really copy any files. They ar just hardlinks
(does not consume disk space) so even a full copy is not very
expensive

On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra
koushik_mi...@infosys.com wrote:
 Hi,

 We want to create snapshot incrementally.

 What we want is every time the snap shooter script runs, it should not create 
 a snapshot with pre-existing (last snapshot indexes) + delta (newly created 
 indexes), rather just create a snapshot with the delta (newly created 
 indexes).

 Any references here would be highly appreciated.

 Regards,
 Koushik

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




--
--Noble Paul



Re: Create incremental snapshot

2009-04-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
yeah ,that is right but those are hardlinks

http://linux.about.com/cs/linux101/g/hardlinks.htm


On Sat, Apr 18, 2009 at 12:20 PM, Koushik Mitra
koushik_mi...@infosys.com wrote:
 When we run the snapshooter script, it creates a snapshot folder e.g. 
 snapshot.20090418064010 and this snapshot folder contains physical index 
 files which take space on the file system (as shown below). Are we missing 
 anything here?

 -rw-r-  46 test  test         59 Apr 17 23:26 _i.tii
 -rw-r-  46 test  test        507 Apr 17 23:26 _i.prx
 -rw-r-  46 test  test         14 Apr 17 23:26 _i.nrm
 -rw-r-  46 test  test        333 Apr 17 23:26 _i.frq
 -rw-r-  46 test  test        135 Apr 17 23:26 _i.fnm
 -rw-r-  46 test  test         12 Apr 17 23:26 _i.fdx
 -rw-r-  46 test  test       1433 Apr 17 23:26 _i.fdt

 Regards,
 Koushik



 On 18/04/09 12:17 PM, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com 
 wrote:

 the snapshooter does not really copy any files. They ar just hardlinks
 (does not consume disk space) so even a full copy is not very
 expensive

 On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra
 koushik_mi...@infosys.com wrote:
 Hi,

 We want to create snapshot incrementally.

 What we want is every time the snap shooter script runs, it should not 
 create a snapshot with pre-existing (last snapshot indexes) + delta (newly 
 created indexes), rather just create a snapshot with the delta (newly 
 created indexes).

 Any references here would be highly appreciated.

 Regards,
 Koushik

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, 
 please
 notify the sender by e-mail and delete the original message. Further, you 
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry 
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves 
 the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




 --
 --Noble Paul





-- 
--Noble Paul


Re: Create incremental snapshot

2009-04-18 Thread Koushik Mitra
Ok

If these are hard links, then where does the index data get stored? Those must 
be getting stored somewhere in the file system.

Regards,
Koushik


On 18/04/09 12:35 PM, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com wrote:

yeah ,that is right but those are hardlinks

http://linux.about.com/cs/linux101/g/hardlinks.htm


On Sat, Apr 18, 2009 at 12:20 PM, Koushik Mitra
koushik_mi...@infosys.com wrote:
 When we run the snapshooter script, it creates a snapshot folder e.g. 
 snapshot.20090418064010 and this snapshot folder contains physical index 
 files which take space on the file system (as shown below). Are we missing 
 anything here?

 -rw-r-  46 test  test 59 Apr 17 23:26 _i.tii
 -rw-r-  46 test  test507 Apr 17 23:26 _i.prx
 -rw-r-  46 test  test 14 Apr 17 23:26 _i.nrm
 -rw-r-  46 test  test333 Apr 17 23:26 _i.frq
 -rw-r-  46 test  test135 Apr 17 23:26 _i.fnm
 -rw-r-  46 test  test 12 Apr 17 23:26 _i.fdx
 -rw-r-  46 test  test   1433 Apr 17 23:26 _i.fdt

 Regards,
 Koushik



 On 18/04/09 12:17 PM, Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com 
 wrote:

 the snapshooter does not really copy any files. They ar just hardlinks
 (does not consume disk space) so even a full copy is not very
 expensive

 On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra
 koushik_mi...@infosys.com wrote:
 Hi,

 We want to create snapshot incrementally.

 What we want is every time the snap shooter script runs, it should not 
 create a snapshot with pre-existing (last snapshot indexes) + delta (newly 
 created indexes), rather just create a snapshot with the delta (newly 
 created indexes).

 Any references here would be highly appreciated.

 Regards,
 Koushik

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, 
 please
 notify the sender by e-mail and delete the original message. Further, you 
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry 
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves 
 the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




 --
 --Noble Paul





--
--Noble Paul



Re: Create incremental snapshot

2009-04-18 Thread Shalin Shekhar Mangar
On Sat, Apr 18, 2009 at 1:06 PM, Koushik Mitra koushik_mi...@infosys.comwrote:

 Ok

 If these are hard links, then where does the index data get stored? Those
 must be getting stored somewhere in the file system.


Yes, of course they are stored on disk. The hard links are created from the
actual files inside the index directory. When those older files are deleted
by Solr, they are still left on the disk if at least one hard link to that
file exists. If you are looking for how to clean old snapshots, you could
use the snapcleaner script.

Is that what you wanted to do?

-- 
Regards,
Shalin Shekhar Mangar.