If Dawid is volunteering to sort out this mess, +1 to let him make it a move to git. I don't care if we disagree about JARs, I trust he will do a good job and that is more important.
On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > It's not true that nobody is working on this. I have been working on the SVN > dump in the meantime. You would not believe how incredibly complex the > process of processing that (remote) dump is. Let me highlight a few key > issues: > > 1) There is no "one" Lucene SVN repository that can be transferred to git. > The history is a mess. Trunk, branches, tags -- all change paths at various > points in history. Entire projects are copied from *outside* the official > Lucene ASF path (when Solr, Nutch or Tika moved from the incubator, for > example). > > 2) The history of commits to Lucene's subpath of the SVN is ~50k commits. > ASF's commit history in which those 50k commits live is 1.8 *million* > commits. I think the git-svn sync crashes due to the sheer number of (empty) > commits in between actual changes. > > 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G > patch, for example, but there are others (the second larger is 190megs, the > third is 136 megs). > > 4) The size of JARs is really not an issue. The entire SVN repo I mirrored > locally (including empty interim commits to cater for svn:mergeinfos) is 4G. > If you strip the stuff like javadocs and side projects (Nutch, Tika, Mahout) > then I bet the entire history can fit in 1G total. Of course stripping JARs > is also doable. > > 5) There is lots of junk at the main SVN path so you can't just version the > top-level folder. If you wanted to checkout /asf/lucene then the size of the > resulting folder is enormous -- I terminated the checkout after I reached > over 20 gigs. Well, technically you *could* do it, it'd preserve perfect > history, but I wouldn't want to git co a past version that checks out all > the tags, branches, etc. This has to be mapped in a sensible way. > > What I think is that all the above makes (straightforward) conversion to git > problematic. Especially moving paths are a problem -- how to mark tags/ > branches, where the main line of development is, etc. This conversion would > have to be guided and hand-tuned to make sense. This effort would only pay > for itself if we move to git, otherwise I don't see the benefit. Paul's > script is fine for keeping short-term history. > > Dawid > > P.S. Either the SVN repo at Apache is broken or the SVN is broken, which > makes processing SVN history even more fun. This dump indicates Tika being > moved from the incubator to Lucene: > > svnrdump dump -r 712381 --incremental https://svn.apache.org/repos/asf/ > > out > > But when you dump just Lucene's subpath, the output is broken (last > changeset in the file is an invalid changeset, it carries no target): > > svnrdump dump -r 712381 --incremental > https://svn.apache.org/repos/asf/lucene > out > > > > On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <ysee...@gmail.com> wrote: >> >> If we move to git, stripping out jars seems to be an independent decision? >> Can you even strip out jars and preserve history (i.e. not change >> hashes and invalidate everyone's forks/clones)? >> I did run across this: >> >> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history >> >> -Yonik >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org