Re: Create incremental snapshot
: Thanks for the reply Asif. We have already tried removing the optimization : step. Unfortunately the commit command alone is also causing an identical : behaviour . Is there any thing else that we are missing ? the hardlinking behavior of snapshots is based on the files in the index directory, and the files in the index directory are based on the current segments of your index -- so if you make enough changes to your index to cause all of hte segments to change every snapshot will be different. optimizing garuntees you that every segment will be different (because all the old segment are gone, and a new segment is created) but if your merge settings are set to be really aggressive, then it's euqally possible that some number of delete/add calls will also cause every segment to be replaced. without your configs, and directory listings of subsequent snapshots, it's hard to guess what the problem might be (if you already stoped optimizing on every batch) But i think we have an XY problem here... : This process continues for around 160,000 documents i.e. 800 times and by : the end of it we have 800 snapshots. Why do you keep 800 snapshots? you really only need snapshots arround long enough to ensure that a slave isn't snappulling in hte middle of deleteing it ... unless you have some really funky usecase where you want some of your query boxes to deliberately fetch old versions of hte index, you odn't really need more then couple of snapshots at one time. it can be prudent to keep more snapshots then you need arround in case of logical index corruption (ie: someone foolishly deletes a bunch of docs they shouldn't have) because snapshots are *usually* more disk space efficient then full backup copies -- but if you are finding that that's not hte case, why bother keeping them? -Hoss
Re: Create incremental snapshot
Thanks for the reply Asif. We have already tried removing the optimization step. Unfortunately the commit command alone is also causing an identical behaviour . Is there any thing else that we are missing ? Asif Rahman wrote: Tushar: Is it necessary to do the optimize on each iteration? When you run an optimize, the entire index is rewritten. Thus each index file can have at most one hard link and each snapshot will consume the full amount of space on your disk. Asir On Thu, Jul 9, 2009 at 3:26 AM, tushar kapoor tushar_kapoor...@rediffmail.com wrote: What I gather from this discussion is - 1. Snapshots are always hard links and not actual files so they cannot possibly consume the same amountof space. 2. Snapshots contain hard links to existing docs + delta docs. We are facing a situation wherein the snapshot occupies the same space as the actual indexes thus violating the first point. We have a batch processing scheme for refreshing indexes. the steps we follow are - 1. Delete 200 documents in one go. 2. Do an optimize. 3. Create the 200 documents deleted earlier. 4. Do a commit. This process continues for around 160,000 documents i.e. 800 times and by the end of it we have 800 snapshots. The size of actual indexes is 200 Mb and remarkably all the 800 snapshots are of size around 200 Mb each. In effect this process consumes around 160 Gb space on our disks. This is causing a lot of pain right now. My concern are - Is our understanding of the snapshooter correct ? Should this massive space consumption be happening at all ? Are we missing something critical ? Regards, Tushar. Shalin Shekhar Mangar wrote: On Sat, Apr 18, 2009 at 1:06 PM, Koushik Mitra koushik_mi...@infosys.comwrote: Ok If these are hard links, then where does the index data get stored? Those must be getting stored somewhere in the file system. Yes, of course they are stored on disk. The hard links are created from the actual files inside the index directory. When those older files are deleted by Solr, they are still left on the disk if at least one hard link to that file exists. If you are looking for how to clean old snapshots, you could use the snapcleaner script. Is that what you wanted to do? -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Create-incremental-snapshot-tp23109877p24405434.html Sent from the Solr - User mailing list archive at Nabble.com. -- Asif Rahman Lead Engineer - NewsCred a...@newscred.com http://platform.newscred.com :-(( -- View this message in context: http://www.nabble.com/Create-incremental-snapshot-tp23109877p24447593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Create incremental snapshot
Tushar: Is it necessary to do the optimize on each iteration? When you run an optimize, the entire index is rewritten. Thus each index file can have at most one hard link and each snapshot will consume the full amount of space on your disk. Asir On Thu, Jul 9, 2009 at 3:26 AM, tushar kapoor tushar_kapoor...@rediffmail.com wrote: What I gather from this discussion is - 1. Snapshots are always hard links and not actual files so they cannot possibly consume the same amountof space. 2. Snapshots contain hard links to existing docs + delta docs. We are facing a situation wherein the snapshot occupies the same space as the actual indexes thus violating the first point. We have a batch processing scheme for refreshing indexes. the steps we follow are - 1. Delete 200 documents in one go. 2. Do an optimize. 3. Create the 200 documents deleted earlier. 4. Do a commit. This process continues for around 160,000 documents i.e. 800 times and by the end of it we have 800 snapshots. The size of actual indexes is 200 Mb and remarkably all the 800 snapshots are of size around 200 Mb each. In effect this process consumes around 160 Gb space on our disks. This is causing a lot of pain right now. My concern are - Is our understanding of the snapshooter correct ? Should this massive space consumption be happening at all ? Are we missing something critical ? Regards, Tushar. Shalin Shekhar Mangar wrote: On Sat, Apr 18, 2009 at 1:06 PM, Koushik Mitra koushik_mi...@infosys.comwrote: Ok If these are hard links, then where does the index data get stored? Those must be getting stored somewhere in the file system. Yes, of course they are stored on disk. The hard links are created from the actual files inside the index directory. When those older files are deleted by Solr, they are still left on the disk if at least one hard link to that file exists. If you are looking for how to clean old snapshots, you could use the snapcleaner script. Is that what you wanted to do? -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Create-incremental-snapshot-tp23109877p24405434.html Sent from the Solr - User mailing list archive at Nabble.com. -- Asif Rahman Lead Engineer - NewsCred a...@newscred.com http://platform.newscred.com
Re: Create incremental snapshot
What I gather from this discussion is - 1. Snapshots are always hard links and not actual files so they cannot possibly consume the same amountof space. 2. Snapshots contain hard links to existing docs + delta docs. We are facing a situation wherein the snapshot occupies the same space as the actual indexes thus violating the first point. We have a batch processing scheme for refreshing indexes. the steps we follow are - 1. Delete 200 documents in one go. 2. Do an optimize. 3. Create the 200 documents deleted earlier. 4. Do a commit. This process continues for around 160,000 documents i.e. 800 times and by the end of it we have 800 snapshots. The size of actual indexes is 200 Mb and remarkably all the 800 snapshots are of size around 200 Mb each. In effect this process consumes around 160 Gb space on our disks. This is causing a lot of pain right now. My concern are - Is our understanding of the snapshooter correct ? Should this massive space consumption be happening at all ? Are we missing something critical ? Regards, Tushar. Shalin Shekhar Mangar wrote: On Sat, Apr 18, 2009 at 1:06 PM, Koushik Mitra koushik_mi...@infosys.comwrote: Ok If these are hard links, then where does the index data get stored? Those must be getting stored somewhere in the file system. Yes, of course they are stored on disk. The hard links are created from the actual files inside the index directory. When those older files are deleted by Solr, they are still left on the disk if at least one hard link to that file exists. If you are looking for how to clean old snapshots, you could use the snapcleaner script. Is that what you wanted to do? -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Create-incremental-snapshot-tp23109877p24405434.html Sent from the Solr - User mailing list archive at Nabble.com.
Create incremental snapshot
Hi, We want to create snapshot incrementally. What we want is every time the snap shooter script runs, it should not create a snapshot with pre-existing (last snapshot indexes) + delta (newly created indexes), rather just create a snapshot with the delta (newly created indexes). Any references here would be highly appreciated. Regards, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Create incremental snapshot
the snapshooter does not really copy any files. They ar just hardlinks (does not consume disk space) so even a full copy is not very expensive On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra koushik_mi...@infosys.com wrote: Hi, We want to create snapshot incrementally. What we want is every time the snap shooter script runs, it should not create a snapshot with pre-existing (last snapshot indexes) + delta (newly created indexes), rather just create a snapshot with the delta (newly created indexes). Any references here would be highly appreciated. Regards, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- --Noble Paul
Re: Create incremental snapshot
When we run the snapshooter script, it creates a snapshot folder e.g. snapshot.20090418064010 and this snapshot folder contains physical index files which take space on the file system (as shown below). Are we missing anything here? -rw-r- 46 test test 59 Apr 17 23:26 _i.tii -rw-r- 46 test test507 Apr 17 23:26 _i.prx -rw-r- 46 test test 14 Apr 17 23:26 _i.nrm -rw-r- 46 test test333 Apr 17 23:26 _i.frq -rw-r- 46 test test135 Apr 17 23:26 _i.fnm -rw-r- 46 test test 12 Apr 17 23:26 _i.fdx -rw-r- 46 test test 1433 Apr 17 23:26 _i.fdt Regards, Koushik On 18/04/09 12:17 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: the snapshooter does not really copy any files. They ar just hardlinks (does not consume disk space) so even a full copy is not very expensive On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra koushik_mi...@infosys.com wrote: Hi, We want to create snapshot incrementally. What we want is every time the snap shooter script runs, it should not create a snapshot with pre-existing (last snapshot indexes) + delta (newly created indexes), rather just create a snapshot with the delta (newly created indexes). Any references here would be highly appreciated. Regards, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- --Noble Paul
Re: Create incremental snapshot
yeah ,that is right but those are hardlinks http://linux.about.com/cs/linux101/g/hardlinks.htm On Sat, Apr 18, 2009 at 12:20 PM, Koushik Mitra koushik_mi...@infosys.com wrote: When we run the snapshooter script, it creates a snapshot folder e.g. snapshot.20090418064010 and this snapshot folder contains physical index files which take space on the file system (as shown below). Are we missing anything here? -rw-r- 46 test test 59 Apr 17 23:26 _i.tii -rw-r- 46 test test 507 Apr 17 23:26 _i.prx -rw-r- 46 test test 14 Apr 17 23:26 _i.nrm -rw-r- 46 test test 333 Apr 17 23:26 _i.frq -rw-r- 46 test test 135 Apr 17 23:26 _i.fnm -rw-r- 46 test test 12 Apr 17 23:26 _i.fdx -rw-r- 46 test test 1433 Apr 17 23:26 _i.fdt Regards, Koushik On 18/04/09 12:17 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: the snapshooter does not really copy any files. They ar just hardlinks (does not consume disk space) so even a full copy is not very expensive On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra koushik_mi...@infosys.com wrote: Hi, We want to create snapshot incrementally. What we want is every time the snap shooter script runs, it should not create a snapshot with pre-existing (last snapshot indexes) + delta (newly created indexes), rather just create a snapshot with the delta (newly created indexes). Any references here would be highly appreciated. Regards, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- --Noble Paul -- --Noble Paul
Re: Create incremental snapshot
Ok If these are hard links, then where does the index data get stored? Those must be getting stored somewhere in the file system. Regards, Koushik On 18/04/09 12:35 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: yeah ,that is right but those are hardlinks http://linux.about.com/cs/linux101/g/hardlinks.htm On Sat, Apr 18, 2009 at 12:20 PM, Koushik Mitra koushik_mi...@infosys.com wrote: When we run the snapshooter script, it creates a snapshot folder e.g. snapshot.20090418064010 and this snapshot folder contains physical index files which take space on the file system (as shown below). Are we missing anything here? -rw-r- 46 test test 59 Apr 17 23:26 _i.tii -rw-r- 46 test test507 Apr 17 23:26 _i.prx -rw-r- 46 test test 14 Apr 17 23:26 _i.nrm -rw-r- 46 test test333 Apr 17 23:26 _i.frq -rw-r- 46 test test135 Apr 17 23:26 _i.fnm -rw-r- 46 test test 12 Apr 17 23:26 _i.fdx -rw-r- 46 test test 1433 Apr 17 23:26 _i.fdt Regards, Koushik On 18/04/09 12:17 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: the snapshooter does not really copy any files. They ar just hardlinks (does not consume disk space) so even a full copy is not very expensive On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra koushik_mi...@infosys.com wrote: Hi, We want to create snapshot incrementally. What we want is every time the snap shooter script runs, it should not create a snapshot with pre-existing (last snapshot indexes) + delta (newly created indexes), rather just create a snapshot with the delta (newly created indexes). Any references here would be highly appreciated. Regards, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- --Noble Paul -- --Noble Paul
Re: Create incremental snapshot
On Sat, Apr 18, 2009 at 1:06 PM, Koushik Mitra koushik_mi...@infosys.comwrote: Ok If these are hard links, then where does the index data get stored? Those must be getting stored somewhere in the file system. Yes, of course they are stored on disk. The hard links are created from the actual files inside the index directory. When those older files are deleted by Solr, they are still left on the disk if at least one hard link to that file exists. If you are looking for how to clean old snapshots, you could use the snapcleaner script. Is that what you wanted to do? -- Regards, Shalin Shekhar Mangar.