Hi, everyone. I'd like feedback an idea that I've had for some
years now but never written up before.
Subversion can already be used to manage large (usually binary)
files. In fact, we use SVN for this at my company and it works
decently. However, there are two possible features that would
make Subversion go beyond "decent" all the way to "quite good" at
this :-). They are:
1) Make pristine text-base files optional. See issue #525 for
details. In summary: currently, every large file uses twice the
storage on the client side, and yet for most of these files
there's little benefit. They're usually not plaintext, so 'svn
diff' against the pristine base is pointless (unless you have some
specialized diff tool for the particular binary format, but that's
rare), and 'svn commit' likewise just sends up the whole working
file. The only thing a local base gets you is local 'svn revert',
which can be nice, but many of us would happily give it up for
large files to avoid the 2x local storage cost.
Note that this is a purely client-side change, controlled entirely
by client-side configuration. Different people can thus have
different thresholds, depending on how much local disk space they
have. A server would never even know if a client is or isn't
saving text-bases.
2) Add a new '--depth=directories' depth type to make it easy to
check out a sparse tree, that is, a skeleton directory tree
without the files. Then, within a given directory, you can do
'svn update --depth=files' or check out a particular file by name
as needed. There's no ticket associated with this feature, as far
as I know, but I can file one after this post if people think this
idea is worthwhile.
It's easy to see how these two features would work together to
make Subversion a quite good system for managing blobs ("binary
large objects"):
* Put your blobs into the repository using whatever client you
want. (I tend to use both regular 'svn' and the wonderfully handy
'svnmucc'.)
* Organize the repository however you want. Each person working
with the blobs maintains a sparse checkout of the tree.
* When someone needs a blob locally, they just check out (i.e.,
update) that blob. There are various ways to do this, and it
would even be easy to script new tools based on 'svn ls' that
auto-complete the filenames or whatever. When one is done with
the file, one can keep it around or make it disappear locally.
(Right now making it go away requires some fancy dance moves, but
we could fix 'svn update --depth=empty FILENAME' to Do The Right
Thing, or we could add a new flag, or whatever. Also, people
would presumably write scripts to help with blob management in
SVN, and eventually some of those scripts would make their way
into our contrib/ area.)
* Subversion's existing path-based authorization can be used so
that each person's sparse checkout has the directories it needs
and doesn't have any subtrees that it shouldn't have.
Neither of these two proposed changes is huge. Of the two, issue
#525 is bigger, and recently there is some interest in solving it
(I need to follow up with some other folks who have shown
interest, and I will post back here if it looks like we have a
coalition). The --depth change shouldn't be very hard at all,
though please correct me if I'm mistaken about that.
I wanted to circulate this to see if it sounds good to others, and
because people might suggest refinements -- or even suggest better
ideas entirely for managing blobs in Subversion.
In the meantime, I'm going to go follow up with some folks who
have written recently about issue #525. I'll follow up in a
separate thread about that if there's any news.
Best regards,
-Karl
- A two-part vision for Subversion and large binary object... Karl Fogel
-