FYI.

- All of Lucene's SVN, incremental deltas, uncompressed: 5.0G
- the above, tar.bz2: 1.2G

Sadly, I didn't succeed at recreating a local SVN repo from those
incremental dumps. svnadmin load fails with a cryptic error related to
the fact that revision number of node-copy operations refer to
original SVN numbers and they're apparently renumbered on import.
svnadmin isn't smart enough to somehow keep a reference of those
original numbers and svndumpfilter can't work with incremental dump
files... A seemingly trivial task of splitting a repo on a clean
boundary seems incredibly hard with SVN...

If anybody wishes to play with the dump files, here they are:
http://goo.gl/m6q3J8

Dawid

On Tue, Dec 8, 2015 at 10:49 PM, Upayavira <u...@odoko.co.uk> wrote:
> You can't avoid having the history in SVN. The ASF has one large repo, and
> won't be deleting that repo, so the history will survive in perpetuity,
> regardless of what we do now.
>
> Upayavira
>
> On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
>
> It seems you'd want to preserve that history in a frozen/archiced Apache Svn
> repo for Lucene. Then make the new git repo slimmer before switching. Folks
> that want very old versions or doing research can at least go through the
> original SVN repo.
>
> On Tuesday, December 8, 2015, Dawid Weiss <dawid.we...@gmail.com> wrote:
>
> One more thing, perhaps of importance, the raw Lucene repo contains
> all the history of projects that then turned top-level (Nutch,
> Mahout). These could also be dropped (or ignored) when converting to
> git. If we agree JARs are not relevant, why should projects not
> directly related to Lucene/ Solr be?
>
> Dawid
>
> On Tue, Dec 8, 2015 at 10:05 PM, Dawid Weiss <dawid.we...@gmail.com> wrote:
>>> Don’t know how much we have of historic jars in our history.
>>
>> I actually do know. Or will know. In about ~10 hours. I wrote a script
>> that does the following:
>>
>> 1) git log all revisions touching https://svn.apache.org/repos/asf/lucene
>> 2) grep revision numbers
>> 3) use svnrdump to get every single commit (revision) above, in
>> incremental mode.
>>
>> This will allow me to:
>>
>> 1) recreate only Lucene/ Solr SVN, locally.
>> 2) measure the size of SVN repo.
>> 3) measure the size of any conversion to git (even if it's one-by-one
>> checkout, then-sync with git).
>>
>> From what I see up until now size should not be an issue at all. Even
>> with all binary blobs so far the SVN incremental dumps measure ~3.7G
>> (and I'm about 75% done). There is one interesting super-large commit,
>> this one:
>>
>> svn log -r1240618 https://svn.apache.org/repos/asf/lucene
>> ------------------------------------------------------------------------
>> r1240618 | gsingers | 2012-02-04 22:45:17 +0100 (Sat, 04 Feb 2012) | 1
>> line
>>
>> LUCENE-2748: bring in old Lucene docs
>>
>> This commit diff weights... wait for it... 1.3G! I didn't check what
>> it actually was.
>>
>> Will keep you posted.
>>
>> D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
>
>
> --
> Doug Turnbull | Search Relevance Consultant | OpenSource Connections, LLC |
> 240.476.9983
> Author:Relevant Search
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to