I followed this discussion only loosely and kept silent because I suspect someone will shoot me to pieces for the complaint I'm about to make, but now that we're to the stage of actual implementation, I guess I'd like to say this anyway...
I have reservations about any system that makes whitespace significant in a text file. I can make an exception for indent levels, as used by Python, because these are visible and errors are obvious without resorting to odd tactics like hex editors, vi's :list command, etc. I say I expect to be shot down because, of course, the "proper" theory is that all in a CVS file is opaque and should not be depended upon by CVS users. True in theory, but in practice, sometimes I've found it much quicker to fix, say, a log mistake by hand in a CVS file (yes I'm aware of the section specifically addressing this in Cederqvist). The current danger to editing the file directly is real, but I think much more easily avoided now than if we come to require a lot of consecutive lines of just whitespace which, if mangled, could cause overwrites of other data and suchlike. I'll leave the message that spurred this commentary below for reference, but I top-posted because it's really a new subject (well that, and I suppose I have a bias against bottom-posting: I'm blind, and bottom-posting makes me read through the whole blooming family tree of messages every time a new one comes along <grin>--but I digress, and I'm not about to try to change the list's standard on this). To the author of the idea being discussed, my apologies for weighing in so tardily. I guess I'm guilty of having complacently assumed nothing would happen. I see your new behavior is made optional and a user choice, which I appreciate. On Mon, Mar 28, 2005 at 07:06:36PM +0100, Dr. David Alan Gilbert wrote: Hi, Well, I've had a crack at implementing the optimisation; and attached is a patch which seems to work - but there is at least one nasty hack in it; more about that in a sec. To enable it you need to add: TagOverwriteEnable=yes to the config file in the CVSROOT; without that it should not change behaviour in any way (except adding that as a commented out option with warning to a newly created repository). It won't give you any performance benefit on the first tag, but should give something on subsequent tags. I see some improvement (~15%) but it is variable, on a large repository that doesn't fit in memory on my home machine. It is my first dig into the CVS code base, so I would appreciate (gentle) comments. Now some details; 1) The real nasty hack; this is something that I hadn't thought of (and I don't think anyone else noticed?) in my original description; the permissions on the rcs files is read only so when I need to open them to overwrite I can't - this is a pain; this patch has a gratuitous (and obviously WRONG) hack in of chmod'ing it before the open - I'm open for any suggestions *if* there is a right way of doing this. (This was a pain because it was at the very last stage of the patch that I noticed this!). 2) I don't currently create the dummy ,foo, locking file. 3) I haven't written any docs yet. 4) I needed to get a couple of values out of rcsbuf_getkey and have shoved them in globals for the moment; I was looking for a neater way that wouldn't mean changing all the callers. 5) I'm worried about the right types to use for file offsets in a portable way. (Has anyone tried cvs with rcs files over 2GB?) The patch is against 1.12.9 which is the version my debian happened to have. As I say, suggestions - and experiences welcome. Dave -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ diff -ur orig/cvs-1.12.9/ChangeLog cvs-1.12.9/ChangeLog --- orig/cvs-1.12.9/ChangeLog 2004-06-09 15:52:32.000000000 +0100 +++ cvs-1.12.9/ChangeLog 2005-03-24 23:43:48.000000000 +0000 @@ -1,3 +1,6 @@ +2005-03-24 Dave Gilbert <[EMAIL PROTECTED]> + * Added fast tagging mechanism; rcs.h/c, parseinfo.c,mkmodules.c + 2004-06-09 Derek Price <[EMAIL PROTECTED]> * NEWS: Note Stefan & Sebastian's security fixes. diff -ur orig/cvs-1.12.9/src/admin.c cvs-1.12.9/src/admin.c --- orig/cvs-1.12.9/src/admin.c 2004-03-22 15:37:34.000000000 +0000 +++ cvs-1.12.9/src/admin.c 2005-03-27 20:39:38.000000000 +0100 @@ -792,7 +792,7 @@ || (rev = RCS_tag2rev (rcs, p))) /* tag2rev may exit */ { RCS_check_tag (tag); /* exit if not a valid tag */ - RCS_settag (rcs, tag, rev); + RCS_settag (rcs, tag, rev, NULL); free (rev); } else diff -ur orig/cvs-1.12.9/src/commit.c cvs-1.12.9/src/commit.c --- orig/cvs-1.12.9/src/commit.c 2004-06-09 15:52:37.000000000 +0100 +++ cvs-1.12.9/src/commit.c 2005-03-27 20:39:45.000000000 +0100 @@ -2144,7 +2144,7 @@ head = RCS_getversion (rcs, NULL, NULL, 0, (int *) NULL); magicrev = RCS_magicrev (rcs, head); - retcode = RCS_settag (rcs, tag, magicrev); + retcode = RCS_settag (rcs, tag, magicrev, NULL); RCS_rewrite (rcs, NULL, NULL); free (head); diff -ur orig/cvs-1.12.9/src/import.c cvs-1.12.9/src/import.c --- orig/cvs-1.12.9/src/import.c 2004-04-27 22:08:40.000000000 +0100 +++ cvs-1.12.9/src/import.c 2005-03-27 20:39:59.000000000 +0100 @@ -770,7 +770,7 @@ if (noexec) return (0); - if ((retcode = RCS_settag(rcs, vtag, vbranch)) != 0) + if ((retcode = RCS_settag(rcs, vtag, vbranch, NULL)) != 0) { ierrno = errno; fperrmsg (logfp, 0, retcode == -1 ? ierrno : 0, @@ -792,7 +792,7 @@ vers = Version_TS (&finfo, NULL, vtag, NULL, 1, 0); for (i = 0; i < targc; i++) { - if ((retcode = RCS_settag (rcs, targv[i], vers->vn_rcs)) == 0) + if ((retcode = RCS_settag (rcs, targv[i], vers->vn_rcs, NULL)) == 0) RCS_rewrite (rcs, NULL, NULL); else { diff -ur orig/cvs-1.12.9/src/mkmodules.c cvs-1.12.9/src/mkmodules.c --- orig/cvs-1.12.9/src/mkmodules.c 2004-05-29 05:48:52.000000000 +0100 +++ cvs-1.12.9/src/mkmodules.c 2005-03-24 23:43:38.000000000 +0000 @@ -349,6 +349,23 @@ "# Be warned that these strings could be disabled in any new version of CVS.\n", "UseNewInfoFmtStrings=yes\n", #endif /* SUPPORT_OLD_INFO_FMT_STRINGS */ + "# Options relating to the Tag overwrite optimisation\n", + "# ** WARNING ** Only enable this after reading the appropriate documentation\n", + "# since it can cause incompatibilities with none-cvs tools that operate\n", + "# on the repository and also has the potential to be unsafe in the case\n", + "# of interrupted operation\n", + "# having said that lot - it might give a good performance boost on tagging\n", + "#TagOverwriteEnable=yes\n", + "# The following must be set to the filesystem blocksize of the repository\n", + "# such that a write entirely within a block may not be partially written\n", + "# 512 is the default\n", + "#TagOverwriteBlocksize=512\n", + "# The overwrite optimisation is only performed on files of a minimum size\n", + "# this option sets that threshold, it is in bytes\n", + "#TagOverwriteThreshold=16384\n", + "# This is the amount of blankspace to add into files to make room for tags\n", + "#TagOverwriteBlankspace=1024\n", + NULL }; diff -ur orig/cvs-1.12.9/src/parseinfo.c cvs-1.12.9/src/parseinfo.c --- orig/cvs-1.12.9/src/parseinfo.c 2004-05-29 05:48:52.000000000 +0100 +++ cvs-1.12.9/src/parseinfo.c 2005-03-24 23:35:56.000000000 +0000 @@ -432,6 +432,42 @@ } } #endif /* SUPPORT_OLD_INFO_FMT_STRINGS */ + else if (strcmp (line, "TagOverwriteEnable") == 0) + { + if (strcmp (p, "no") == 0) + tag_overwrite_enable = 0; + else if (strcmp (p, "yes") == 0) + tag_overwrite_enable = 1; + else + { + error (0, 0, "unrecognized value '%s' for TagOverwriteEnable", p); + goto error_return; + } + } + else if (strcmp (line, "TagOverwriteBlocksize") == 0) + { + tag_overwrite_blocksize=atoi(p); + if ((tag_overwrite_blocksize<128) || (tag_overwrite_blocksize>(1<<24))) + { + error (0, 0, "invalid value '%d' for TagOverwriteBlocksize", + tag_overwrite_blocksize); + goto error_return; + } + } + else if (strcmp (line, "TagOverwriteThreshold") == 0) + { + tag_overwrite_threshold=atoi(p); + } + else if (strcmp (line, "TagOverwriteBlankspace") == 0) + { + tag_overwrite_blankspace=atoi(p); + if ((tag_overwrite_blocksize<128) || (tag_overwrite_blocksize>4096)) + { + error (0, 0, "invalid value '%d' for TagOverwriteBlankspace", + tag_overwrite_blankspace); + goto error_return; + } + } else { /* We may be dealing with a keyword which was added in a diff -ur orig/cvs-1.12.9/src/rcs.c cvs-1.12.9/src/rcs.c --- orig/cvs-1.12.9/src/rcs.c 2004-05-16 02:39:52.000000000 +0100 +++ cvs-1.12.9/src/rcs.c 2005-03-28 16:33:46.000000000 +0100 @@ -25,6 +25,14 @@ int preserve_perms = 0; +/* Options from the config file in the repository related + * to tag_overwrite optimisation. + */ +unsigned int tag_overwrite_enable=0; +unsigned int tag_overwrite_blocksize=512; +unsigned int tag_overwrite_threshold=32768; +unsigned int tag_overwrite_blankspace=1024; + /* The RCS -k options, and a set of enums that must match the array. These come first so that we can use enum kflag in function prototypes. */ @@ -165,7 +173,15 @@ static char *rcs_lockfile = NULL; static int rcs_lockfd = -1; - +/* These are set by rcsbuf_getkey and indicate the first character + * after the key and the last character before the start of the value. + * The values are offsets within the actual file. + * postkeyoffset points either to a trailing whitespace or a ; + * TODO: Find a better way than globals! + * (I don't think we can use pointer differences because of the + * non-mmap case). + */ +static off_t rcsbuf_postkeyoffset, rcsbuf_prevalueoffset; /* * char * @@ -282,6 +298,13 @@ rcs = RCS_parsercsfile_i(fp, rcsfile); if (rcs != NULL) { + struct stat sb; + if (CVS_STAT(rcsfile,&sb)<0) + { + error (0, errno, "cannot stat %s", rcsfile); + } + rcs->file_size=sb.st_size; + rcs->flags |= VALID; if ( inattic ) rcs->flags |= INATTIC; @@ -511,6 +534,10 @@ if (STREQ (RCSSYMBOLS, key)) { + /* Store away this white space info for the tag overwrite hack */ + rdata->symbols_whitespacestart=rcsbuf_postkeyoffset; + rdata->symbols_whitespaceend=rcsbuf_prevalueoffset; + if (value != NULL) rdata->symbols_data = rcsbuf_valcopy (&rcsbuf, value, 0, (size_t *) NULL); @@ -1150,11 +1177,16 @@ to terminate the key. If the key ended with ';', then there is no value. */ + rcsbuf_postkeyoffset=rcsbuf->pos+(ptr-rcsbuf_buffer); + *ptr = '\0'; ++ptr; if (c == ';') { + // No value, no offset! + rcsbuf_prevalueoffset=0; + *valp = NULL; rcsbuf->ptr = ptr; return 1; @@ -1176,6 +1208,9 @@ c = *ptr; if (c == ';') { + /* Point to the last valid white space */ + rcsbuf_prevalueoffset=rcsbuf->pos+(ptr-rcsbuf_buffer)-1; + *valp = NULL; rcsbuf->ptr = ptr + 1; return 1; @@ -1187,6 +1222,8 @@ /* Now PTR points to the start of the value, and C is the first character of the value. */ + /* Point to the last valid white space */ + rcsbuf_prevalueoffset=rcsbuf->pos+(ptr-rcsbuf_buffer)-1; if (c != '@') *valp = ptr; @@ -5580,10 +5617,12 @@ /* For RCS file RCS, make symbolic tag TAG point to revision REV. This validates that TAG is OK for a user to use. Return value is -1 for error (and errno is set to indicate the error), positive for - error (and an error message has been printed), or zero for success. */ + error (and an error message has been printed), or zero for success. + New node is set to a new node if and only if a new node has been created; + if newnode is null no attempt is made to write to it.*/ int -RCS_settag (RCSNode *rcs, const char *tag, const char *rev) +RCS_settag (RCSNode *rcs, const char *tag, const char *rev, Node** newnode) { List *symbols; Node *node; @@ -5631,6 +5670,8 @@ node->key = xstrdup (tag); node->data = xstrdup (rev); (void)addnode_at_front (symbols, node); + if (newnode) + *newnode=node; } return 0; @@ -7714,6 +7755,39 @@ fputs (";\n", fp); fputs (RCSSYMBOLS, fp); + + /* + * The tag overwrite enable is a hack that places some white space + * before the symbols to allow new tags to be added by a simple overwrite + * but we only do it for large files and only if enabled + */ + if ((tag_overwrite_enable) && (rcs->file_size>=tag_overwrite_threshold)) + { + /* if we are going to add extra space this is the point it needs + * to go in. + * We add the white space as lines of 63 spaces followed by a newline + * if the amount to be added is not a multiple of 64 then we just + * tack a few more spaces on + * (This is because it seems unreasonable to have a very long line in + * the header since something somewhere might not expect it) + * (TODO: Could just prebuild this string!) + */ + unsigned int remain=tag_overwrite_blankspace; + + while (remain) + { + if (remain>=64) + { + fputs (" \n", + fp); + remain-=64; + } else { + fputs (" ",fp); + remain--; + } + } + } + /* If we haven't had to convert the symbols to a list yet, don't force a conversion now; just write out the string. */ if (rcs->symbols == NULL && rcs->symbols_data != NULL) @@ -8317,6 +8391,150 @@ rcs_internal_unlockfile (fout, rcs->path); } +/* This is called by RCS_rewrite_withtagoverwritecheck if it has come to + * the conclusion we can safely do an overwrite. + * lastwritelocation is the location the last character of our change must + * land on. + */ +static void +do_tagoverwritetrick(RCSNode *rcs, Node *newtagnode, off_t lastwritelocation) +{ + /* TODO: Consider creating a ,foo, file to passify any rcs processes + * but in reality if rcs processes are poking about in the repository + * at the same time as we're doing this it is probably bad news + */ + struct stat sb; + char *buffertowrite; + unsigned int keydatalen=strlen(newtagnode->key)+strlen(newtagnode->data); + + /*fprintf(stderr,"RCS_rewrite_withtagoverwritecheck: start/end=%lu,%lu key=%s data=%s lastwritelocation=%ld\n", + (unsigned long)(rcs->symbols_whitespacestart), + (unsigned long)(rcs->symbols_whitespaceend), + newtagnode->key,newtagnode->data,(unsigned long)lastwritelocation); + */ + + /* Actually we add 1 for the : and then take one off because lastwriteloc + * is inclusive */ + off_t seekpos=lastwritelocation-keydatalen; + + /* The '2' is one for the : and one for the terminating \0 that isn't + * written */ + buffertowrite=xmalloc(keydatalen+2); + + sprintf(buffertowrite,"%s:%s",newtagnode->key,newtagnode->data); + + /**********************************************************************/ + /* HACK HACK - TEMPORARY HACK */ + /* Bah by default we create all files read only by everyone */ + /* What the hell is the correct solution? */ + /**********************************************************************/ + if (CVS_STAT(rcs->path,&sb)<0) + { + error (1, errno, "cannot stat during overwrite %s", rcs->path); + } + if (chmod(rcs->path, sb.st_mode | S_IWRITE | S_IWGRP)!=0) + { + error (1, errno, "cannot chmod +w during overwrite %s", rcs->path); + } + int fd=open (rcs->path, OPEN_BINARY | O_RDWR | O_EXCL); + if (fd<0) + { + free(buffertowrite); + error (1, errno, "could not open file for tag overwrite`%s'", rcs->path); + } + + if (lseek(fd, seekpos, SEEK_SET)!=seekpos) + { + free(buffertowrite); + close(fd); + error (1, errno, "failed to seek in file for tag overwrite`%s'", rcs->path); + } + + if (write(fd, buffertowrite, keydatalen+1)!=(keydatalen+1)) + { + free(buffertowrite); + close(fd); + /* Eeep this is possibly very bad if a partial write happened */ + error (1, errno, "Failure to write for tag overwrite in `%s' - CHECK CONSISTENCY!", rcs->path); + } + + free(buffertowrite); + close(fd); + if (chmod(rcs->path, sb.st_mode)!=0) + { + error (0, errno, "cannot chmod back after overwrite %s", rcs->path); + } +} + +/* Perform a rewrite unless we can perform the tag overwrite optimisation + * the 'newtagnode' must be the node created by a settag; it will only + * return one if it is a new node and hence isn't changing an existing tag + */ +void +RCS_rewrite_withtagoverwritecheck(RCSNode *rcs, Node *newtagnode) +{ + if ((tag_overwrite_enable) && newtagnode) + { + /* This looks promising, now we have to check that the new tag + * will fit in the given amount of white space and that we can + * write it without going over a block boundary + * Note: The whitespacestart/end *include* a white space + */ + + /* Amount of text we want to put in; key:data, +1 is : */ + unsigned int spaceNeeded=strlen(newtagnode->key)+ + strlen(newtagnode->data)+1; + + /* Check that 1) There is enough space, 2) That the spaceneeded is + * smaller than our block. The initial space check saves + * us from doing a subtraction later and I want to be careful + * because things are unsigned. Note we need 2 extra locations + * for the white space before and after the new tag but those + * don't get rewritten. + */ + if ((spaceNeeded<tag_overwrite_blocksize) && + ((rcs->symbols_whitespacestart+spaceNeeded+2)<rcs->symbols_whitespaceend)) + { + /* looks promising, so the only thing we have to check for now + * is block boundary alignment - we aren't going to write + * accross a block boundary. + */ + off_t firstwritelocation, lastwritelocation; + + /* last place to be written is character before the end of the white + * space leaving us one white space */ + lastwritelocation=rcs->symbols_whitespaceend-1; + firstwritelocation=(lastwritelocation-spaceNeeded)+1; + + if ((lastwritelocation/tag_overwrite_blocksize)== + (firstwritelocation/tag_overwrite_blocksize)) + { + /* same block */ + do_tagoverwritetrick(rcs,newtagnode, lastwritelocation); + return; + } else { + /* spans block boundary - so we look whether we can pull the + * write earlier in the file into the previous block and see + * if there is still space. + * We need to check that there is still space, also note + * that we know that there is whitespace at the start of the + * next block so we don't need to worry about that here. + */ + off_t oldlast=lastwritelocation; + lastwritelocation=oldlast-(1+(oldlast % tag_overwrite_blocksize)); + + if ((rcs->symbols_whitespacestart+spaceNeeded+1)<lastwritelocation) + { + /* looks good */ + do_tagoverwritetrick(rcs,newtagnode, lastwritelocation); + return; + } + } + } + } + RCS_rewrite(rcs,NULL,NULL); +} + /* Abandon changes to an RCS file. */ void diff -ur orig/cvs-1.12.9/src/rcs.h cvs-1.12.9/src/rcs.h --- orig/cvs-1.12.9/src/rcs.h 2004-03-22 15:37:34.000000000 +0000 +++ cvs-1.12.9/src/rcs.h 2005-03-27 20:42:39.000000000 +0100 @@ -111,6 +111,11 @@ is the "id" which introduces the newphrase, and the value of which is the value from the newphrase. */ List *other; + + /* Size of the file (in bytes) when parsed */ + off_t file_size; + /* Offsets to start and end of white space after the 'symbols' keyword */ + off_t symbols_whitespacestart, symbols_whitespaceend; }; typedef struct rcsnode RCSNode; @@ -222,7 +227,7 @@ const char *rev, int flags); int RCS_cmp_file (RCSNode *, const char *, char **, const char *, const char *, const char * ); -int RCS_settag (RCSNode *, const char *, const char *); +int RCS_settag (RCSNode *, const char *, const char *, Node** newnode); int RCS_deltag (RCSNode *, const char *); int RCS_setbranch (RCSNode *, const char *); int RCS_lock (RCSNode *, const char *, int); @@ -232,6 +237,7 @@ void RCS_delaccess (RCSNode *, char *); char *RCS_getaccess (RCSNode *); void RCS_rewrite (RCSNode *, Deltatext *, char *); +void RCS_rewrite_withtagoverwritecheck(RCSNode *rcs, Node* newtagnode); void RCS_abandon (RCSNode *); int rcs_change_text (const char *, char *, size_t, const char *, size_t, char **, size_t *); @@ -244,6 +250,19 @@ extern int preserve_perms; +/* Whether to enable the fast tagging by overwrite mechanism; default false */ +extern unsigned int tag_overwrite_enable; +/* The blocksize assumed for disc I/O - we don't do an overwrite if it would + * cross the blocksize boundary this hopefully makes partial writes impossible + * (hopefully!) */ +extern unsigned int tag_overwrite_blocksize; +/* The size of the existing RCS file must be before we add extra blankspace + * this means that small files don't get bloated by whitespace + */ +extern unsigned int tag_overwrite_threshold; +/* The amount of blankspace to add to RCS files */ +extern unsigned int tag_overwrite_blankspace; + /* From import.c. */ extern int add_rcs_file (const char *, const char *, const char *, const char *, const char *, const char *, diff -ur orig/cvs-1.12.9/src/tag.c cvs-1.12.9/src/tag.c --- orig/cvs-1.12.9/src/tag.c 2004-04-30 15:11:49.000000000 +0100 +++ cvs-1.12.9/src/tag.c 2005-03-27 20:43:07.000000000 +0100 @@ -858,13 +858,14 @@ * the branch. Use a symbolic tag for that. */ rev = branch_mode ? RCS_magicrev (rcsfile, version) : numtag; - retcode = RCS_settag(rcsfile, symtag, numtag); + retcode = RCS_settag(rcsfile, symtag, numtag, NULL); if (retcode == 0) - RCS_rewrite (rcsfile, NULL, NULL); + RCS_rewrite (rcsfile,NULL,NULL); } else { char *oversion; + Node** newtagnode; /* * As an enhancement for the case where a tag is being re-applied to @@ -924,9 +925,10 @@ } free (oversion); } - retcode = RCS_settag(rcsfile, symtag, rev); + newtagnode=NULL; + retcode = RCS_settag(rcsfile, symtag, rev,&newtagnode); if (retcode == 0) - RCS_rewrite (rcsfile, NULL, NULL); + RCS_rewrite_withtagoverwritecheck (rcsfile,newtagnode); } if (retcode != 0) @@ -1020,6 +1022,7 @@ Vers_TS *vers; int retcode = 0; int retval = 0; + Node* newtagnode; vers = Version_TS (finfo, NULL, NULL, NULL, 0, 0); @@ -1189,7 +1192,8 @@ free (oversion); } - if ((retcode = RCS_settag(vers->srcfile, symtag, rev)) != 0) + newtagnode=NULL; + if ((retcode = RCS_settag(vers->srcfile, symtag, rev, &newtagnode)) != 0) { error (1, retcode == -1 ? errno : 0, "failed to set tag %s to revision %s in %s", @@ -1201,7 +1205,7 @@ } if (branch_mode) free (rev); - RCS_rewrite (vers->srcfile, NULL, NULL); + RCS_rewrite_withtagoverwritecheck (vers->srcfile, newtagnode); /* more warm fuzzies */ if (!really_quiet) _______________________________________________ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs -- Doug Lee [EMAIL PROTECTED] http://www.dlee.org Bartimaeus Group [EMAIL PROTECTED] http://www.bartsite.com "It is difficult to produce a television documentary that is both incisive and probing when every twelve minutes one is interrupted by dancing rabbits singing about toilet paper." --Rod Serling _______________________________________________ Info-cvs mailing list Info-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/info-cvs