RE: Medium sized binaries, lots of commits and performance
Paul and Doug, Thanks a lot for the advice (Larry too, but I replied to him elsewhere). You've given me plenty to go on. I believe it's up to me now to figure out what's most appropriate. Regards, Jesper Vad Kristensen Aarhus, Denmark ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs
RE: Medium sized binaries, lots of commits and performance
Larry Jones wrote: I and the rest of us out here work with Oracle Forms and that means binary source code. Are you sure there isn't a way to store them as text or to convert them to text? Source control systems are popular enough that there almost certainly is. Storing them in text form rather than in binary is by far the best solution to your potential problem. I absolutely sure there is such a way :) The Oracle Form Builder tool supports this innately. I'm a bit hesitant to go this way, however, because it complicates most people's lives here (having to always do explicit converting and all). Also, we pull source code directly from CVS (using a perl script) and compile our releases/launches from it - but your suggestion may of course make it far more expedient to pull everything as ascii files, convert to binaries, and then compile as usual. Your advice may well be implemented - if I don't successfully follow any of the other very fine suggestions by the people on this list. BTW, if I have a binary of 2.8 MB it converts to text format as 6,4 MB. Altbough the ascii is bigger I'm assuming CVS will be able to handle it more efficiently (smaller deltas). Thanks a lot for the advice! Regards, Jesper Vad Kristensen Aarhus, Denmark ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs
Re: Medium sized binaries, lots of commits and performance
You first asked (or at least seemed to want to know :-) ) why performance on a large binary CVS file goes way down when you update from a branch instead of from HEAD. Answer: CVS stores the trunk such that getting the HEAD revision is simply a matter of retrieving a copy of it from the CVS file. To get a branched revision, however, requires the retrieval of the first version in the branch, then all the deltas from then to the revision you want, going forward through branch revisions. I would therefore regard this performance hit as a natural consequence of your use of CVS for binary source code, unfortunately. For a binary file, as you know, a delta can be a considerable percentage of the original file size. Your second question was how to remove old revisions in order to improve performance. I don't have a CVS manual URL handy like most participants on this list seem to have, but check out the cvs admin command. It can indeed permanently delete revisions and ranges of them. You could, for example, delete all the revisions from the start of a branch until two or so revisions behind its current state, so as to speed up retrieval of revisions on that branch. Good luck. On Wed, Feb 09, 2005 at 04:37:14PM +0100, Jesper Vad Kristensen wrote: Hi folks, I've searched the net and mail archives for some help or workaround to my problem, but most binary issues tend to deal with the impossibility of diff/merge or whether very large files can be stuffed into CVS. I and the rest of us out here work with Oracle Forms and that means binary source code. At first I was very suspicious of moving to CVS because we were having binary source code, but as it turns out I and everyone else have become extremely happy with CVS. We can't merge, granted, but with our external diff application we reap enormous benefits from using CVS. Even branching is manageable. But here's the problem, especially with our largest 3,5 MB file that's been committed approx. 70 times. When doing a cvs update -r HEAD filename things work real fast (5 seconds). But if we do a cvs update -r branch version filename performance drops from 5 seconds to a minute and a half. I can imagine something ugly happening with the filename,v file on the cvs server which is 200 MB large. The performance isn't killing us right now, but in maybe 6 months to a year, who knows how bad it may have gotten? So the question is if there are any administrative tools one can use to compress/rationalize/index the file so branch access becomes faster? Is there a way to permanently erase stuff older than 6 months? And if not: opinions about my ideas below would be great? My ideas so far: MOVE variant: I wouldn't _like_ to lose the history of the application, but it might be acceptable if performance degrades too much. I figure I could move the filename,v file on the cvsroot repository (to a backup folder), then delete from client and add a fresh one and the 1-2 active branches - but can any history be kept if you do this? Will the old history be in the backup folder? MIGRATE: An alternative would be to create a new folder (while keeping the old one) and simply migrate _all_ 85 files to the new folder (grab HEAD, add all in HEAD to new folder, grab endpoints on branches, add all branches as I best can). Regards, Jesper Vad Kristensen Aarhus, Denmark ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs -- Doug Lee [EMAIL PROTECTED]http://www.dlee.org Bartimaeus Group [EMAIL PROTECTED] http://www.bartsite.com It is difficult to produce a television documentary that is both incisive and probing when every twelve minutes one is interrupted by dancing rabbits singing about toilet paper. --Rod Serling ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs
Re: Medium sized binaries, lots of commits and performance
Doug Lee writes: To get a branched revision, however, requires the retrieval of the first version in the branch, then all the deltas from then to the revision you want, going forward through branch revisions. It's even worse than that since retreival of the first version in the branch requires retreiving the head of the trunk (the only revision that's stored intact), then all the deltas from there to the branch point, going backwards through trunk revisions. So, for example, if you branched at revision 1.1, the head of the trunk is now 1.5, and the head of the branch is 1.1.2.5, CVS does the following: 1) Retrieve revision 1.5 2) Retrieve the 1.4 - 1.5 delta and apply it backwards to recreate revision 1.4 3) Retrieve the 1.3 - 1.4 delta and apply it backwards to recreate revision 1.3 4) Retrieve the 1.2 - 1.3 delta and apply it backwards to recreate revision 1.2 5) Retrieve the 1.1 - 1.2 delta and apply it backwards to recreate revision 1.1 6) Retrieve the 1.1 - 1.1.2.1 delta and apply it to recreate revision 1.1.2.1 7) Retrieve the 1.1.2.1 - 1.1.2.2 delta and apply it to recreate revision 1.1.2.2 8) Retrieve the 1.1.2.2 - 1.1.2.3 delta and apply it to recreate revision 1.1.2.3 9) Retrieve the 1.1.2.3 - 1.1.2.4 delta and apply it to recreate revision 1.1.2.4 10) Retrieve the 1.1.2.4 - 1.1.2.5 delta and apply it to recreate revision 1.1.2.5 -Larry Jones Moms and reason are like oil and water. -- Calvin ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs
Re: Medium sized binaries, lots of commits and performance
Jesper Vad Kristensen writes: I and the rest of us out here work with Oracle Forms and that means binary source code. Are you sure there isn't a way to store them as text or to convert them to text? Source control systems are popular enough that there almost certainly is. Storing them in text form rather than in binary is by far the best solution to your potential problem. -Larry Jones I've never seen a sled catch fire before. -- Hobbes ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs
Re: Medium sized binaries, lots of commits and performance
Larry gave a great description of why you're seeing your performance degrade over time. As you can see, the more versions sit between the head and the version you want, the longer it takes to construct the version you want. I can think of two effective and usable ways to combat the problem, plus one marginal one. All of them essentially move the version you want closer to the head. The first method is your MIGRATE method, which is a time-honored technique with CVS. The second, which I believe was mentioned, is to reduce the number of revisions by obsoleting those that are no longer needed. This is the marginal technique because history is lost, and the nature of the differences may not buy you anything. The third method is to spawn new branches off the head and merge the latest versions of your existing branches onto the new branches, then convert your process to use the new branches instead. This must be repeated periodically to keep a cap on response time. On Feb 9, 2005, at 7:37 AM, [EMAIL PROTECTED] wrote: Hi folks, I've searched the net and mail archives for some help or workaround to my problem, but most binary issues tend to deal with the impossibility of diff/merge or whether very large files can be stuffed into CVS. I and the rest of us out here work with Oracle Forms and that means binary source code. At first I was very suspicious of moving to CVS because we were having binary source code, but as it turns out I and everyone else have become extremely happy with CVS. We can't merge, granted, but with our external diff application we reap enormous benefits from using CVS. Even branching is manageable. But here's the problem, especially with our largest 3,5 MB file that's been committed approx. 70 times. When doing a cvs update -r HEAD filename things work real fast (5 seconds). But if we do a cvs update -r branch version filename performance drops from 5 seconds to a minute and a half. I can imagine something ugly happening with the filename,v file on the cvs server which is 200 MB large. The performance isn't killing us right now, but in maybe 6 months to a year, who knows how bad it may have gotten? So the question is if there are any administrative tools one can use to compress/rationalize/index the file so branch access becomes faster? Is there a way to permanently erase stuff older than 6 months? And if not: opinions about my ideas below would be great? My ideas so far: MOVE variant: I wouldn't _like_ to lose the history of the application, but it might be acceptable if performance degrades too much. I figure I could move the filename,v file on the cvsroot repository (to a backup folder), then delete from client and add a fresh one and the 1-2 active branches - but can any history be kept if you do this? Will the old history be in the backup folder? MIGRATE: An alternative would be to create a new folder (while keeping the old one) and simply migrate _all_ 85 files to the new folder (grab HEAD, add all in HEAD to new folder, grab endpoints on branches, add all branches as I best can). Regards, Jesper Vad Kristensen Aarhus, Denmark ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs -- Paul Sander | To do two things at once is to do neither [EMAIL PROTECTED] | Publilius Syrus, Roman philosopher, 100 B.C. ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs
Re: Medium sized binaries, lots of commits and performance
On Feb 9, 2005, at 8:53 AM, [EMAIL PROTECTED] wrote: Jesper Vad Kristensen writes: I and the rest of us out here work with Oracle Forms and that means binary source code. Are you sure there isn't a way to store them as text or to convert them to text? Source control systems are popular enough that there almost certainly is. Storing them in text form rather than in binary is by far the best solution to your potential problem. Jesper also wrote: We can't merge, granted, but with our external diff application we reap enormous benefits from using CVS. Even branching is manageable. This appears to be a case where adding support for external datatype-specific diff and merge tools would be useful. -- Paul Sander | Lets stick to the new mistakes and get rid of the old [EMAIL PROTECTED] | ones -- William Brown ___ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs