I've made some comments about the conversion process here: https://issues.apache.org/jira/browse/LUCENE-6933?focusedCommentId=15064208#comment-15064208
Feel free to try it out. https://github.com/dweiss/lucene-solr-svn2git I don't know what the next steps are. This looks like a good starting point to switch over to git with all the development? The only thing I still plan on doing is getting rid of a few large binary blobs in historical resources, but even without it this seems acceptable size-wise (~200mb). Dawid On Thu, Dec 17, 2015 at 9:13 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > > The question I had (I am sure a very dumb one): WHY do we care about history > preserved perfectly in Git? > > For me it's for sentimental, archival and task-challenge reasons. Robert's > requirement is that git praise/blame/log works and on a given file and > shows its true history of changes. Everyone has his own reasons I guess. If > the initial clone is small enough then I see no problem in keeping the > history if we can preserve it. > > Dawid > > > > On Thu, Dec 17, 2015 at 4:52 AM, david.w.smi...@gmail.com < > david.w.smi...@gmail.com> wrote: > >> +1 totally agree. Any way; the bloat should largely be the binaries & >> unrelated projects, not code (small text files). >> >> On Wed, Dec 16, 2015 at 10:36 PM Doug Turnbull < >> dturnb...@opensourceconnections.com> wrote: >> >>> In defense of more history immediately available--it is often far more >>> useful to poke around code history/run blame to figure out some code than >>> by taking it at face value. Putting this in a secondary place like >>> Apache SVN repo IMO reduces the readability of the code itself. This is >>> doubly true for new developers that won't know about Apache's SVN. And >>> Lucene can be quite intricate code. Further in my own work poking around in >>> github mirrors I frequently hit the current cutoff. Which is one reason I >>> stopped using them for anything but the casual investigation. >>> >>> I'm not totally against a cutoff point, but I'd advocate for exhausting >>> other options first, such as trimming out unrelated projects, binaries, etc. >>> >>> -Doug >>> >>> >>> On Wednesday, December 16, 2015, Shawn Heisey <apa...@elyograg.org> >>> wrote: >>> >>>> On 12/16/2015 5:53 PM, Alexandre Rafalovitch wrote: >>>> > On 16 December 2015 at 00:44, Dawid Weiss <dawid.we...@gmail.com> >>>> wrote: >>>> >> 4) The size of JARs is really not an issue. The entire SVN repo I >>>> mirrored >>>> >> locally (including empty interim commits to cater for >>>> svn:mergeinfos) is 4G. >>>> >> If you strip the stuff like javadocs and side projects (Nutch, Tika, >>>> Mahout) >>>> >> then I bet the entire history can fit in 1G total. Of course >>>> stripping JARs >>>> >> is also doable. >>>> > I think this answered one of the issues. So, this is not something to >>>> focus on. >>>> > >>>> > The question I had (I am sure a very dumb one): WHY do we care about >>>> > history preserved perfectly in Git? Because that seems to be the real >>>> > bottleneck now. Does anybody still checks out an intermediate commit >>>> > in Solr 1.4 branch? >>>> >>>> I do not think we need every bit of history -- at least in the primary >>>> read/write repository. I wonder how much of a size difference there >>>> would be between tossing all history before 5.0 and tossing all history >>>> before the ivy transition was completed. >>>> >>>> In the interests of reducing the size and download time of a clone >>>> operation, I definitely think we should trim history in the main repo to >>>> some arbitrary point, as long as the full history is available >>>> elsewhere. It's my understanding that it will remain in svn.apache.org >>>> (possibly forever), and I think we could also create "historical" >>>> read-only git repos. >>>> >>>> Almost every time I am working on the code, I only care about the stable >>>> branch and trunk. Sometimes I will check out an older 4.x tag so I can >>>> see the exact code referenced by a stacktrace in a user's error message, >>>> but when this is required, I am willing to go to an entirely different >>>> repository and chew up bandwidth/disk resourcesto obtain it, and I do >>>> not care whether it is git or svn. As time marches on, fewer people >>>> will have reasons to look at the historical record. >>>> >>>> Thanks, >>>> Shawn >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>> >>> -- >>> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections >>> <http://opensourceconnections.com>, LLC | 240.476.9983 >>> Author: Relevant Search <http://manning.com/turnbull> >>> This e-mail and all contents, including attachments, is considered to be >>> Company Confidential unless explicitly stated otherwise, regardless >>> of whether attachments are marked as such. >>> >>> -- >> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker >> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: >> http://www.solrenterprisesearchserver.com >> > >