I filed LUCENE-6937 as a parent issue for an SVN->Git migration. I've linked the issue that Dawid is working on, as well as a new issue for converting the build to work correctly in a Git checkout rather than SVN.
- Mark On Tue, Dec 15, 2015 at 1:26 PM Mark Miller <markrmil...@gmail.com> wrote: > Let's just make some JIRA issues. I'm not worried about volunteers for any > of it yet, just a direction we agree upon. Once we know where we are going, > we generally don't have a big volunteer problem. We haven't heard from Uwe > yet, but really does seem like moving to Git makes the most sense. > > I'm certainly willing to spend some free time on this. > > - Mark > > On Tue, Dec 15, 2015 at 1:22 PM Dawid Weiss <dawid.we...@gmail.com> wrote: > >> >> Oh, just for completeness -- moving to git is not just about the version >> management, it's also: >> >> 1) all the scripts that currently do validations, etc. >> 2) what to do with svn:* properties >> 3) what to do with empty folders (not available in git). >> >> I don't volunteer to solve these :) >> >> Dawid >> >> >> On Tue, Dec 15, 2015 at 7:09 PM, Dawid Weiss <dawid.we...@gmail.com> >> wrote: >> >>> >>> Ok, give me some time and I'll see what I can achieve. Now that I >>> actually wrote an SVN dump parser (validator and serializer) things are >>> under much better control... >>> >>> I'll try to achieve the following: >>> >>> 1) selectively drop unnecessary stuff from history (cms/, javadocs/, >>> JARs and perhaps other binaries), >>> 2) *preserve* history of all core sources. So svn log IndexWriter has to >>> go back all the way back to when Doug was young and pretty. Ooops, he's >>> still pretty of course. >>> 3) provide a way to link git history with svn revisions. I would, >>> ideally, include a "imported from svn:rev XXX" in the commit log message. >>> 4) annotate release tags and branches. I don't care much about interim >>> branches -- they are not important to me (please speak up if you think >>> otherwise). >>> >>> Dawid >>> >>> On Tue, Dec 15, 2015 at 7:03 PM, Robert Muir <rcm...@gmail.com> wrote: >>> >>>> If Dawid is volunteering to sort out this mess, +1 to let him make it >>>> a move to git. I don't care if we disagree about JARs, I trust he will >>>> do a good job and that is more important. >>>> >>>> On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <dawid.we...@gmail.com> >>>> wrote: >>>> > >>>> > It's not true that nobody is working on this. I have been working on >>>> the SVN >>>> > dump in the meantime. You would not believe how incredibly complex the >>>> > process of processing that (remote) dump is. Let me highlight a few >>>> key >>>> > issues: >>>> > >>>> > 1) There is no "one" Lucene SVN repository that can be transferred to >>>> git. >>>> > The history is a mess. Trunk, branches, tags -- all change paths at >>>> various >>>> > points in history. Entire projects are copied from *outside* the >>>> official >>>> > Lucene ASF path (when Solr, Nutch or Tika moved from the incubator, >>>> for >>>> > example). >>>> > >>>> > 2) The history of commits to Lucene's subpath of the SVN is ~50k >>>> commits. >>>> > ASF's commit history in which those 50k commits live is 1.8 *million* >>>> > commits. I think the git-svn sync crashes due to the sheer number of >>>> (empty) >>>> > commits in between actual changes. >>>> > >>>> > 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G >>>> > patch, for example, but there are others (the second larger is >>>> 190megs, the >>>> > third is 136 megs). >>>> > >>>> > 4) The size of JARs is really not an issue. The entire SVN repo I >>>> mirrored >>>> > locally (including empty interim commits to cater for svn:mergeinfos) >>>> is 4G. >>>> > If you strip the stuff like javadocs and side projects (Nutch, Tika, >>>> Mahout) >>>> > then I bet the entire history can fit in 1G total. Of course >>>> stripping JARs >>>> > is also doable. >>>> > >>>> > 5) There is lots of junk at the main SVN path so you can't just >>>> version the >>>> > top-level folder. If you wanted to checkout /asf/lucene then the size >>>> of the >>>> > resulting folder is enormous -- I terminated the checkout after I >>>> reached >>>> > over 20 gigs. Well, technically you *could* do it, it'd preserve >>>> perfect >>>> > history, but I wouldn't want to git co a past version that checks out >>>> all >>>> > the tags, branches, etc. This has to be mapped in a sensible way. >>>> > >>>> > What I think is that all the above makes (straightforward) conversion >>>> to git >>>> > problematic. Especially moving paths are a problem -- how to mark >>>> tags/ >>>> > branches, where the main line of development is, etc. This conversion >>>> would >>>> > have to be guided and hand-tuned to make sense. This effort would >>>> only pay >>>> > for itself if we move to git, otherwise I don't see the benefit. >>>> Paul's >>>> > script is fine for keeping short-term history. >>>> > >>>> > Dawid >>>> > >>>> > P.S. Either the SVN repo at Apache is broken or the SVN is broken, >>>> which >>>> > makes processing SVN history even more fun. This dump indicates Tika >>>> being >>>> > moved from the incubator to Lucene: >>>> > >>>> > svnrdump dump -r 712381 --incremental >>>> https://svn.apache.org/repos/asf/ > >>>> > out >>>> > >>>> > But when you dump just Lucene's subpath, the output is broken (last >>>> > changeset in the file is an invalid changeset, it carries no target): >>>> > >>>> > svnrdump dump -r 712381 --incremental >>>> > https://svn.apache.org/repos/asf/lucene > out >>>> > >>>> > >>>> > >>>> > On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <ysee...@gmail.com> >>>> wrote: >>>> >> >>>> >> If we move to git, stripping out jars seems to be an independent >>>> decision? >>>> >> Can you even strip out jars and preserve history (i.e. not change >>>> >> hashes and invalidate everyone's forks/clones)? >>>> >> I did run across this: >>>> >> >>>> >> >>>> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history >>>> >> >>>> >> -Yonik >>>> >> >>>> >> --------------------------------------------------------------------- >>>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> >> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >> >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>> >>> >> -- > - Mark > about.me/markrmiller > -- - Mark about.me/markrmiller