Hi, Well, you could potentially work around the scale issue by using a shallow clone (--depth 1). I'll use git.git as an example since you have a small history repo with examples below:
$ time git clone --depth 1 git://github.com/git/git.git Cloning into 'git'... remote: Counting objects: 26189, done. remote: Compressing objects: 100% (13591/13591), done. remote: Total 26189 (delta 21763), reused 15849 (delta 12193) Receiving objects: 100% (26189/26189), 9.05 MiB | 1.13 MiB/s, done. Resolving deltas: 100% (21763/21763), done. real 0m16.753s user 0m3.931s sys 0m0.609s $ cd git [master]$ git log commit 4570aeb0d85f3b5ff274b6d5a651c2ee06d25d76 Merge: 228c341 28755db Author: Junio C Hamano <[email protected]> Date: Tue Jan 3 14:09:28 2012 -0800 Merge branch 'pw/p4-docs-and-tests' * pw/p4-docs-and-tests: git-p4: document and test submit options git-p4: test and document --use-client-spec git-p4: test --keep-path git-p4: test --max-changes git-p4: document and test --import-local git-p4: honor --changesfile option and test git-p4: document and test clone --branch git-p4: test cloning with two dirs, clarify doc git-p4: clone does not use --git-dir git-p4: introduce asciidoc documentation rename git-p4 tests commit 228c3418356d06d0596408bee1c863e53ca27d58 Author: Junio C Hamano <[email protected]> Date: Tue Jan 3 13:48:00 2012 -0800 Merge branch 'maint' * maint: docs: describe behavior of relative submodule URLs fix hang in git fetch if pointed at a 0 length bundle Documentation: read-tree --prefix works with existing subtrees Add MYMETA.json to perl/.gitignore commit 28755dbaa5213032b2da202652c214a9f94ff853 Author: Pete Wyckoff <[email protected]> Date: Sat Dec 24 21:07:40 2011 -0500 git-p4: document and test submit options Clarify there is a -M option, but no -C. These are both configurable through variables. Explain that the allowSubmit variable takes a comma-separated list of branch names. Catch earlier an invalid branch name given as an argument to "git p4 clone". Test option --origin, variable allowSubmit, and explicit master branch name. Signed-off-by: Pete Wyckoff <[email protected]> Signed-off-by: Junio C Hamano <[email protected]> And then compare that with the time to check out the full repo: [master]$ cd .. $ rm -rf git $ time git clone git://github.com/git/git.git Cloning into 'git'... remote: Counting objects: 127389, done. remote: Compressing objects: 100% (41918/41918), done. remote: Total 127389 (delta 92731), reused 117665 (delta 83665) Receiving objects: 100% (127389/127389), 27.95 MiB | 1.35 MiB/s, done. Resolving deltas: 100% (92731/92731), done. real 0m46.661s user 0m14.107s sys 0m1.865s Since you don't care about the history in your use case, you can use a shallow clone to pull down the least amount of data necessary... I think the idea of providing a tarball on the server side is the way to go though... git really is a distributed code management tool meant for keeping track of change. It's not ideally suited for pure distribution. Use the simple git-archive (which also will do the gzip compression for you) on the backend, auto-generated by a git hook whenever code is updated there and just pull that down to the client. --Kevin On 01/04/12 10:23:42, Phillip Moore wrote: > Well, "git archive" comes very close to what we want, but it only > works against remote repositories using ssh, so that's not going to > work for any of the real world sites that are using (or hopefully will > soon be using) EFS. > > This really seems like a short coming in git, really. if you can > anonymously clone an entire repo, it should be easy to get just a > working directory for the HEAD of master anonymously, too. > > I think we need to come up with a mechanism for auto-generating a > "latest" tarball for each of these via a commit hook, so I'll go take > a look at the code Jerry wrote to implement the hooks we have today, > and see how we extend that to add a new one. The creation of the > tarball will end up being a VERY short script, since a one-liner with > git/gzip can create it. > > On Wed, Jan 4, 2012 at 9:55 AM, Phillip Moore <[email protected]> > wrote: > > This is a great idea except that I have no clue how git works, obviously.... > > > > I had confused "git checkout" with "svn export", and now that I look, > > I can't find a way to accomlish this after all. What I wanted might > > not be possible with git -- namely a way to download the repo, and > > just get a working tree with no repo metadata. > > > > What I want is the equivalant result of "svn export", which gives you > > HEAD of your SVN repo, without all the .svn dirs. > > > > Now, obviously, you can do this: > > > > git clone $url . > > rm -r .git > > > > But that will NEVER scale, as the size of the git history grows. > > > > Maybe the better mechanism is to have a commit hook which does this, > > and publishes a tarball on ftp.openefs.org with a "latest" symlink. > > Then the code can use wget and tar to achieve this goal, rather than > > using git directly. > > > > If one of you knows of a means to do this using git, directly, please > > let me know. I will continue researching this... > > > > On Wed, Jan 4, 2012 at 8:28 AM, Phillip Moore <[email protected]> > > wrote: > >> I came up with an alternate way to manage deploying these > >> deploy-config projects, that will make it trivial to keep them > >> uptodate, AND deal with the fact that we're managing them in multiple > >> repos. > >> > >> First of all, for flexibility, I'm still going to implement the search > >> mechanism for the efsdeploy directory as I described before. However, > >> based on the way I've structured the git repos, you can actually do a > >> "git checkout" and drop them all into the same root directory? > >> > >> I'm going to try this today, since it so damn simple. > >> efsdeploy_config_update will be the script that does the following: > >> > >> efs create autorelease efs deploy-config > >> cd /efs/dev/efs/deploy-config/next/install/common > >> git checkout http://git.openefs.org/deploy-config > >> git checkout http://git.openefs.org/deploy-config-aix > >> git checkout http://git.openefs.org/deploy-config-gnu > >> .... > >> efs dist autorelease efs deploy-config > >> > >> Now, you have /efs/dist/efs/deploy-config/current/common with ALL of > >> the published git configs. > >> > >> Note that because ALL of these repos are structures with a > >> metaproj/project structure, they can ALL co-exist in the same > >> directory tree (if you use checkout, I think -- I haven't tried this > >> yet, but since you don't get the .git directory, I don't see why this > >> won't work -- I'll figure out how to make it work :-P) > >> > >> Even better, we can drop a simple file into the root of each repo, > >> giving the name of the "child" repos in the obvious hierarchy here. > >> For example, in the root of deploy-config, the contents of > >> subrepos.txt might be: > >> > >> deploy-config-aix > >> deploy-config-gnu > >> deploy-config-rhel > >> deploy-config-sunos > >> > >> The subrepos.txt file in deploy-config-gnu will have to live in the > >> gnu subdir, to avoid clashes, but then, since the top tells us to > >> checkout deploy-config-gnu, we then know to look for the next > >> subrepos.txt file in ./gnu. This will then contain: > >> > >> deploy-config-gnu-gcc > >> deploy-config-gnu-gcclib > >> > >> This will give us the full flexibility of an easy to use, well managed > >> default (you only get the published, commited master branch), with the > >> ability to create and manage your own local repos as well. For > >> example, there will never be an "fsf" metaproj in the OpenEFS > >> namespace, and in practice, you've going to be migrating stuff to gnu, > >> I assume, but if you wanted to maintain your own deploy-config-fsf git > >> repo, that works fine. You would simply manage it in: > >> > >> /efs/dist/fsf/deploy-config-fsf > >> > >> I can even support publishing this using efsdeploy_config_update via > >> CLI args, if you wanted to use the same, simple mechanism. > >> > >> This is starting to come together very nicely, and now all we really > >> need are.... > >> > >> Users :-( > >> > >> On Fri, Dec 30, 2011 at 12:57 PM, Phillip Moore > >> <[email protected]> wrote: > >>> On Fri, Dec 30, 2011 at 12:09 PM, Phillip Moore > >>> <[email protected]> wrote: > >>>> More thoughts, and some significant progress in this area.... > >>>> > >>>> I spent most of yesterday collecting the efsedploy rules for > >>>> EVERYTHING I've built into /efs/dist over the last few months (it's a > >>>> lot), by copying the src directory to: > >>>> > >>>> ~/dev/efs/deploy-config/$metaproj/$project > >>> > >>> OK, so once everything in that directory has been sanitized of ALL > >>> site-specific information, then we have to figure out how to manage > >>> it. Here's what I'm currently thinking, although this is going to > >>> evolve, of course. > >>> > >>> First of all, note that efsdeploy is going to start whining at you to > >>> switch from efs/deploy-config to efs/deploy-site, because I want to > >>> use the name deploy-config for all of this data. Deal with it.... > >>> It's *trivial* to switch, and takes about 5-10 minutes, if you type > >>> slow. > >>> > >>> I want to create 3 types of git repo to manage this data: > >>> > >>> deploy-config-$metaproj-$project.git > >>> deploy-config-$metaproj.git > >>> deploy-config.git > >>> > >>> For things like gnu/gcc, we'll obviously create a project-specific git > >>> repo, and for large metaprojs where we expect a lot of similarity > >>> among the projects, we can create metaproj-specific ones. The > >>> default, global git repo would contain all the small, simple stuff, > >>> like oss/zlib. For starters, I expect to create these: > >>> > >>> deploy-config-gnu-gcc.git (which will be used for rhel/gcc as well) > >>> deploy-config-gnu-gcclib.git (also for rhel/gcclib) > >>> deploy-config-gnu.git > >>> deploy-config-perl5-core.git > >>> deploy-config-perl5.git > >>> deploy-config-apache.git (might get it's own system, too -- we'll > >>> see...) > >>> > >>> And of course the generic one. What I like about this is we always > >>> migrate things from one to the other pretty easily. if we find that, > >>> say oss/openssl has grown complex enough, we can yank it out of > >>> deploy-config, and create deploy-config-oss-openssl. > >>> > >>> So how do we deploy this data? Having it well managed is git is > >>> great, but how to we access it when building things with efsdeploy, > >>> and where does it get copied/cached? > >>> > >>> Let's start with the generic repo first. Just as we use > >>> efs/deploy-site/current to abstract the site-specific config > >>> information, I think we can do the following: > >>> > >>> deploy-config.git => /efs/dist/efs/deploy-config/current > >>> > >>> The metaproj- and project-specific ones would then map to: > >>> > >>> deploy-config-$metaproj.git => > >>> /efs/dist/$metaproj/deploy-config-$metaproj/current > >>> deploy-config-$metaproj-project.git => > >>> /efs/dist/$metaproj/deploy-config-$metaproj-$project/current > >>> > >>> This would allow us to publish, probably date-based, any of these > >>> repositories with the "latest" set of efsdeploy build rules. > >>> Note that the default rules go into the efs metaproj, obviously, but > >>> we can still have a "deploy-config-efs.git" repo if we want, with no > >>> conflict. > >>> > >>> It is very straight forward to code a solution that allows us to > >>> automate keeping the local copies of these rules uptodate as they > >>> change. I will almost certainly have a first pass at this within the > >>> next month. However, what is NOT clear is just how to use this > >>> information in efsdeploy when building release. > >>> > >>> Reproducibility concerns me. The rules are going to evolve, and when > >>> we make gnu/gcc rule changes to build, say 4.7.0, we don't want to > >>> break builds of 4.4.6, and yet *testing* that is extremely expensive. > >>> For that reason, I think the contents of the efsdeploy directory > >>> should be CACHED in the release, rather than read from these projects > >>> during the build. Just as we are going to provide generic dependency > >>> specs (see email from 30 minutes ago), and expanding those into > >>> specific releasealiases to be used for the duration of the build, I > >>> think we should do the same for the project-specific build rules, or > >>> at least make it optional. > >>> > >>> In theory, if we just have efsdeploy search for these rules the same > >>> way it searches for system-specific (i,e, gnu, perl5, etc) rules, and > >>> then site-specific rules, then I could actually build EVERYTHING I > >>> have in /efs/dist with EMPTY source directories!! If a project is > >>> supported by one of these repos, then you can build a new release with > >>> nothing more than: > >>> > >>> efs create project ... > >>> efs create release ... > >>> cd ..../src > >>> efsdeploy down:up > >>> > >>> The contents of the src directory would contain NOTHING but the > >>> changes you had to make (hooks, configs, whatever) to get the release > >>> to build. Those changes should then be re-integrated with the git > >>> repo in a controlled fashion, so that the next person building that > >>> MPR has no pain. The specific workflow for how a new change gets > >>> rolled into the published git repos will need to be worked out, but I > >>> think that will be straight forward. > >>> > >>> Now, obviously, in order to *develop* changes to the rules, we'll need > >>> a simply means of overriding the path to these published rules. > >>> Maybe you want to install the latest set of gnu/gcc rules, but not > >>> make them current until you've actually done a test-build of the > >>> releases you care about. Maybe something in efsdeploy.conf (which > >>> will now be a site/release-specific file, by definition) like this. > >>> Say we wanted to test out some local changes right from the source > >>> tree (I've been doing this with symlinks for now): > >>> > >>> [rules] > >>> $metaproj/$project = /home/efsops/dev/efs/deploy-config-gnu-gcc > >>> > >>> or, perhaps, if we use date-based releases, you could install the > >>> latest update into /efs/dist, and test it out this way: > >>> > >>> [rules] > >>> $metaproj/$project = /efs/dist/gnu/deploy-config-gnu-gcc/20111230 > >>> > >>> Alternately, you could just rsync the efsdeploy directory right into a > >>> release, and work with a copy. > >>> > >>> OK, that's enough of Phil's rantings for one day. Not that anyone's > >>> paying attention, but you will see commits that implement many of > >>> these features over the next few weeks. > _______________________________________________ > EFS-dev mailing list > [email protected] > http://mailman.openefs.org/mailman/listinfo/efs-dev _______________________________________________ EFS-dev mailing list [email protected] http://mailman.openefs.org/mailman/listinfo/efs-dev
