On Tue, Oct 27, 2020 at 12:17 PM Christopher Schultz <
ch...@christopherschultz.net> wrote:

> Konstantin,
>
> On 10/26/20 20:47, Konstantin Kolinko wrote:
> > пт, 2 окт. 2020 г. в 00:09, Mark Thomas <ma...@apache.org>:
> >>
> >> Hi all,
> >>
> >> The topic came up at the BoF session at the end of the Tomcat track of
> >> migrating the website from svn to git. There were strong opinions both
> >> for migrating and for sticking with svn.
> >>
> >> As a middle ground I'd like to propose we ask Infra to create a git
> >> mirror of the svn repo.
> >>
> >> For those who favour git:
> >> The git mirror would be read-only but it would be possible to:
> >> - clone the git mirror
> >> - make changes in git
> >> - use git-svn to commit those changes back to svn
> >> - then the mirror automatically replicates them back to git
> >>
> >> For those who favour svn there would be no change.
> >>
> >> If there is agreement on this approach, I volunteer to contact infra to
> >> get it set up.
> >
> > My proposal at BoF was for a partial mirror.
> >
> > The issue is that
> >
> > 1. I think that this mirror is intended as a tool to collect feedback
> > / patches from random people, and to lower barriers for contribution.
> >
> > 2. The full Tomcat site is large. It includes documentation for all
> > versions of Tomcat, including javadocs. Those pages are changed rarely
> > and are not needed for people who contribute small changes for the
> > site. The source code for those pages is elsewhere.
>
> The question I have to ask, here is: why do we bother putting all those
> files in revision-control? The users guide for 4 different versions of
> Tomcat is not a problem, but the javadocs are just stupid to store.
>
> Is there some policy we are following by having all those files in
> there? Or is it just to make sure that website "publication" is as
> simple as "svn checkout"?
>
> > 3. Subversion has easy commands to cope with such large source trees.
> > This feature is called "sparse checkouts".
> >
> > For our site the necessary commands are documented in README.txt.
> > Essentially, it is done with --depth and --set-depth arguments to "svn
> > checkout" and "svn update" commands
> >
> > Speaking about Git, there are huge repositories [1] out there, but I
> > think that the majority of people are not accustomed to them.
> >
> > [1] https://en.wikipedia.org/wiki/Monorepo
> >
> > I see that Git developers recently did some work to make dealing with
> > such repositories simpler, with addition of "git sparse-checkout"
> > command in Git 2.25.0 [2], released in January 2020.
> >
> > [2]
> https://github.com/git/git/blob/v2.25.0/Documentation/RelNotes/2.25.0.txt
> >
> > Though I think that support in tools is still lacking. E.g. missing in
> > TortoiseGit. [3]
> >
> > [3] https://gitlab.com/tortoisegit/tortoisegit/issues/1599
> >
> >
> > If we go with a full Git mirror or with migration to Git, then I think
> > that somebody has to prepare an update to README.txt.
> >
> > If we go with a partial Git mirror, I think it could be named
> > "tomcat-site-dev", reserving the name "tomcat-site" for a full mirror
> > if we ever make one.
> >
> >
> > Ignored paths for git-svn are configured with "--ignore-paths"
> > argument or with "svn-remote.<name>.ignore-paths" configuration
> > option. [4]
> >
> > [4] https://git-scm.com/docs/git-svn
> >
> >
> > Other notes:
> >
> > 4. Release managers use Subversion to publish the binaries.
> >
> > Thus I expect that they are able to update the published documentation
> > with Subversion as well.
> >
> > 5. Publishing the javadocs generates small changes over a large number
> > of files. The script that generates the commit email notes that the
> > diff is huge and trims it all to a small summary.
> >
> > If we ever migrate to Git, I wonder whether a similar script in Git is
> > able to cope with it.
>
> We might also want to consider complicating the website-building process
> in order to simplify the repository. Yes, "disk space is cheap" but it's
> kind of ridiculous that we have all that derivative content in RCS,
> separate from its canonical source.
>

That makes a lot of sense to me.  I'm sure that the whole process can be
scripted, including the script that Konstantin mentioned in his item [5].

I also wonder if Git LFS (Large File Storage) [1][2] would solve the issue
of repo size here.

>From [1]: "Git Large File Storage (LFS) replaces large files such as audio
samples, videos, datasets, and graphics with text pointers inside Git,
while storing the file contents on a remote server".  It allows to set file
patterns, e.g. "*.jpg", or "/javadoc/*" for files that should not be
tracked by Git and do not need to be downloaded from the server unless
requested.

Igal

[1] https://git-lfs.github.com/
[2] https://www.atlassian.com/git/tutorials/git-lfs



>
> -chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: dev-h...@tomcat.apache.org
>
>

Reply via email to