If Dawid is volunteering to sort out this mess, +1 to let him make it
a move to git. I don't care if we disagree about JARs, I trust he will
do a good job and that is more important.

On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <dawid.we...@gmail.com> wrote:
>
> It's not true that nobody is working on this. I have been working on the SVN
> dump in the meantime. You would not believe how incredibly complex the
> process of processing that (remote) dump is. Let me highlight a few key
> issues:
>
> 1) There is no "one" Lucene SVN repository that can be transferred to git.
> The history is a mess. Trunk, branches, tags -- all change paths at various
> points in history. Entire projects are copied from *outside* the official
> Lucene ASF path (when Solr, Nutch or Tika moved from the incubator, for
> example).
>
> 2) The history of commits to Lucene's subpath of the SVN is ~50k commits.
> ASF's commit history in which those 50k commits live is 1.8 *million*
> commits. I think the git-svn sync crashes due to the sheer number of (empty)
> commits in between actual changes.
>
> 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G
> patch, for example, but there are others (the second larger is 190megs, the
> third is 136 megs).
>
> 4) The size of JARs is really not an issue. The entire SVN repo I mirrored
> locally (including empty interim commits to cater for svn:mergeinfos) is 4G.
> If you strip the stuff like javadocs and side projects (Nutch, Tika, Mahout)
> then I bet the entire history can fit in 1G total. Of course stripping JARs
> is also doable.
>
> 5) There is lots of junk at the main SVN path so you can't just version the
> top-level folder. If you wanted to checkout /asf/lucene then the size of the
> resulting folder is enormous -- I terminated the checkout after I reached
> over 20 gigs. Well, technically you *could* do it, it'd preserve perfect
> history, but I wouldn't want to git co a past version that checks out all
> the tags, branches, etc. This has to be mapped in a sensible way.
>
> What I think is that all the above makes (straightforward) conversion to git
> problematic. Especially moving paths are a problem -- how to mark tags/
> branches, where the main line of development is, etc. This conversion would
> have to be guided and hand-tuned to make sense. This effort would only pay
> for itself if we move to git, otherwise I don't see the benefit. Paul's
> script is fine for keeping short-term history.
>
> Dawid
>
> P.S. Either the SVN repo at Apache is broken or the SVN is broken, which
> makes processing SVN history even more fun. This dump indicates Tika being
> moved from the incubator to Lucene:
>
> svnrdump dump -r 712381 --incremental https://svn.apache.org/repos/asf/ >
> out
>
> But when you dump just Lucene's subpath, the output is broken (last
> changeset in the file is an invalid changeset, it carries no target):
>
> svnrdump dump -r 712381 --incremental
> https://svn.apache.org/repos/asf/lucene > out
>
>
>
> On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>>
>> If we move to git, stripping out jars seems to be an independent decision?
>> Can you even strip out jars and preserve history (i.e. not change
>> hashes and invalidate everyone's forks/clones)?
>> I did run across this:
>>
>> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to