On Tue, Oct 27, 2020 at 12:17 PM Christopher Schultz < ch...@christopherschultz.net> wrote:
> Konstantin, > > On 10/26/20 20:47, Konstantin Kolinko wrote: > > пт, 2 окт. 2020 г. в 00:09, Mark Thomas <ma...@apache.org>: > >> > >> Hi all, > >> > >> The topic came up at the BoF session at the end of the Tomcat track of > >> migrating the website from svn to git. There were strong opinions both > >> for migrating and for sticking with svn. > >> > >> As a middle ground I'd like to propose we ask Infra to create a git > >> mirror of the svn repo. > >> > >> For those who favour git: > >> The git mirror would be read-only but it would be possible to: > >> - clone the git mirror > >> - make changes in git > >> - use git-svn to commit those changes back to svn > >> - then the mirror automatically replicates them back to git > >> > >> For those who favour svn there would be no change. > >> > >> If there is agreement on this approach, I volunteer to contact infra to > >> get it set up. > > > > My proposal at BoF was for a partial mirror. > > > > The issue is that > > > > 1. I think that this mirror is intended as a tool to collect feedback > > / patches from random people, and to lower barriers for contribution. > > > > 2. The full Tomcat site is large. It includes documentation for all > > versions of Tomcat, including javadocs. Those pages are changed rarely > > and are not needed for people who contribute small changes for the > > site. The source code for those pages is elsewhere. > > The question I have to ask, here is: why do we bother putting all those > files in revision-control? The users guide for 4 different versions of > Tomcat is not a problem, but the javadocs are just stupid to store. > > Is there some policy we are following by having all those files in > there? Or is it just to make sure that website "publication" is as > simple as "svn checkout"? > > > 3. Subversion has easy commands to cope with such large source trees. > > This feature is called "sparse checkouts". > > > > For our site the necessary commands are documented in README.txt. > > Essentially, it is done with --depth and --set-depth arguments to "svn > > checkout" and "svn update" commands > > > > Speaking about Git, there are huge repositories [1] out there, but I > > think that the majority of people are not accustomed to them. > > > > [1] https://en.wikipedia.org/wiki/Monorepo > > > > I see that Git developers recently did some work to make dealing with > > such repositories simpler, with addition of "git sparse-checkout" > > command in Git 2.25.0 [2], released in January 2020. > > > > [2] > https://github.com/git/git/blob/v2.25.0/Documentation/RelNotes/2.25.0.txt > > > > Though I think that support in tools is still lacking. E.g. missing in > > TortoiseGit. [3] > > > > [3] https://gitlab.com/tortoisegit/tortoisegit/issues/1599 > > > > > > If we go with a full Git mirror or with migration to Git, then I think > > that somebody has to prepare an update to README.txt. > > > > If we go with a partial Git mirror, I think it could be named > > "tomcat-site-dev", reserving the name "tomcat-site" for a full mirror > > if we ever make one. > > > > > > Ignored paths for git-svn are configured with "--ignore-paths" > > argument or with "svn-remote.<name>.ignore-paths" configuration > > option. [4] > > > > [4] https://git-scm.com/docs/git-svn > > > > > > Other notes: > > > > 4. Release managers use Subversion to publish the binaries. > > > > Thus I expect that they are able to update the published documentation > > with Subversion as well. > > > > 5. Publishing the javadocs generates small changes over a large number > > of files. The script that generates the commit email notes that the > > diff is huge and trims it all to a small summary. > > > > If we ever migrate to Git, I wonder whether a similar script in Git is > > able to cope with it. > > We might also want to consider complicating the website-building process > in order to simplify the repository. Yes, "disk space is cheap" but it's > kind of ridiculous that we have all that derivative content in RCS, > separate from its canonical source. > That makes a lot of sense to me. I'm sure that the whole process can be scripted, including the script that Konstantin mentioned in his item [5]. I also wonder if Git LFS (Large File Storage) [1][2] would solve the issue of repo size here. >From [1]: "Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server". It allows to set file patterns, e.g. "*.jpg", or "/javadoc/*" for files that should not be tracked by Git and do not need to be downloaded from the server unless requested. Igal [1] https://git-lfs.github.com/ [2] https://www.atlassian.com/git/tutorials/git-lfs > > -chris > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > >