i,
I would like to download the entire
CVS depository for LyX. However,
webcvs gives a depository that has been
cut off at some point in the past. I can no longer
go to the very first versions of the files.
Do you know where I can get such a fuller CVS
depository for LyX? While I need
versions from the very beginning (or as early
as possible), I do not need the very latest versions.
Amir
P.S. This is to test a new kind of search tool
for code. It would only work well if I have access
to early commits to the code.
--- FYI, tool description follows
CVSSearch: A New Way to Search through Source Code
CVSSearch searches for code fragments
using CVS comments. Specifically, it
takes advantage of the fact that a CVS comment
typically describes the lines of code involved
in the commit and that this description
will typically hold for many future versions.
In other words, CVSSearch allows you to
better search the most recent version of
the code by looking at previous versions
to better understand the current version.
It works as follows:
* typically, each comment in a CVS commit not only
describes the change made but also
indirectly describes the purpose of the lines of code
involved in that change (e.g., "added footnote feature"
indirectly tells you that the lines involved
in the commit have something to do with footnotes)
* each line in the code accumulates
a "profile" that contains all words in commits
that involved that line, and each word
has an associated frequency, which is
the number of commits that involved that
line with a comment containing that word.
The idea is to let you search the code base
based on the profiles extracted from the CVS
comments.
This has several advantages:
* if a line is affected by many commits, then
you get multiple summaries/aspects of
the purpose of that line, as described by
multiple authors in multiple commits
(in contrast, a comment in the code itself
can be viewed as just one summary)
* you can search for something like
"editing window" and get a match even
if the code does not contain these words
but at least one author decided to use
those terms to describe his modifications
to the code. (That is, this allows us to
address the vocabulary mismatch problem.)
* you can search for "bug" to find lines
in the code that are especially bug prone
(since you have many commits with
"bug fixed" or something similar)
* you get very precise information about
the exact lines in the code that relate
to your query (which need not appear
in a contiguous region of code)
Intuitively speaking, a comment on a particular
version of an application will probably continue
to hold for a lot of versions that follow, so
it makes sense to combine commit comments
in this way.
The method I described can be viewed as computing
a *vertical* profile for each line from previous changes to the code.
It is also possible to compute a *horizontal* profile for lines
by looking at CVS comments in other projects with similar code.
Thus, to get a meaningful profile for a line/group of lines, it is
only necessary that a CVS comment has applied to those lines
in the past of the current application or in some other application
with similar code. (You can use local similarity, as is done
with DNA, to identify similar code fragments in different contexts.)
Of course, you can combine vertical and horizontal profiles.
In this way, we can get around the great variation in CVS
comment quality.