Hi, everyone. I'd like feedback an idea that I've had for some years now but never written up before.

Subversion can already be used to manage large (usually binary) files. In fact, we use SVN for this at my company and it works decently. However, there are two possible features that would make Subversion go beyond "decent" all the way to "quite good" at this :-). They are:

1) Make pristine text-base files optional. See issue #525 for details. In summary: currently, every large file uses twice the storage on the client side, and yet for most of these files there's little benefit. They're usually not plaintext, so 'svn diff' against the pristine base is pointless (unless you have some specialized diff tool for the particular binary format, but that's rare), and 'svn commit' likewise just sends up the whole working file. The only thing a local base gets you is local 'svn revert', which can be nice, but many of us would happily give it up for large files to avoid the 2x local storage cost.

Note that this is a purely client-side change, controlled entirely by client-side configuration. Different people can thus have different thresholds, depending on how much local disk space they have. A server would never even know if a client is or isn't saving text-bases.

2) Add a new '--depth=directories' depth type to make it easy to check out a sparse tree, that is, a skeleton directory tree without the files. Then, within a given directory, you can do 'svn update --depth=files' or check out a particular file by name as needed. There's no ticket associated with this feature, as far as I know, but I can file one after this post if people think this idea is worthwhile.

It's easy to see how these two features would work together to make Subversion a quite good system for managing blobs ("binary large objects"):

* Put your blobs into the repository using whatever client you want. (I tend to use both regular 'svn' and the wonderfully handy 'svnmucc'.)

* Organize the repository however you want. Each person working with the blobs maintains a sparse checkout of the tree.

* When someone needs a blob locally, they just check out (i.e., update) that blob. There are various ways to do this, and it would even be easy to script new tools based on 'svn ls' that auto-complete the filenames or whatever. When one is done with the file, one can keep it around or make it disappear locally. (Right now making it go away requires some fancy dance moves, but we could fix 'svn update --depth=empty FILENAME' to Do The Right Thing, or we could add a new flag, or whatever. Also, people would presumably write scripts to help with blob management in SVN, and eventually some of those scripts would make their way into our contrib/ area.)

* Subversion's existing path-based authorization can be used so that each person's sparse checkout has the directories it needs and doesn't have any subtrees that it shouldn't have.

Neither of these two proposed changes is huge. Of the two, issue #525 is bigger, and recently there is some interest in solving it (I need to follow up with some other folks who have shown interest, and I will post back here if it looks like we have a coalition). The --depth change shouldn't be very hard at all, though please correct me if I'm mistaken about that.

I wanted to circulate this to see if it sounds good to others, and because people might suggest refinements -- or even suggest better ideas entirely for managing blobs in Subversion.

In the meantime, I'm going to go follow up with some folks who have written recently about issue #525. I'll follow up in a separate thread about that if there's any news.

Best regards,
-Karl

Reply via email to