Thanks Javi, I am aware of github's lfs. It is a good idea, but I am not sure how well it would work for a project with a community as large as ctakes. Using the lfs tool seems like it is an inhibitor to easy adoption - which is not where we want to go.
Just to be clear, I am not talking about a migration to github. The desire is to have a mirror of the svn repo on github. The last I spoke with apache infra on this, the lfs was not a viable solution to the problem because it didn't fit into the mirroring technique. The details on that were all behind a door that I never opened, so that is where my knowledge of the matter ends. Thanks, and keep the ideas rolling, Sean ________________________________________ From: Javi Roman <jroman.espi...@gmail.com> Sent: Monday, May 3, 2021 4:13 AM To: dev@ctakes.apache.org Subject: Re: svn or github [EXTERNAL] * External Email - Caution * A way to properly work with large files and GitHub is to use the Git Large File Storage (LFS) plugin created by GitHub. The following is a session using this feature: $ ctakes-testbed.git(main)]$ find -size +50M ./rest-api/healthnlp-examples/ctakes-temporal-demo/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script ./rest-api/healthnlp-examples/ctakes-temporal-demo/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx.script ./rest-api/healthnlp-examples/ctakes-web-client/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script ./rest-api/healthnlp-examples/ctakes-web-client/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx.script $ git lfs track ctakessnorx.script $ git lfs track ctakessnorx.script $ git lfs track Listing tracked patterns ctakessnorx.script (.gitattributes) ctakessnorx.script (.gitattributes) $ git add .gitattributes $ git add . $ git commit -m "....." $ git push origin main That enables the git version control system to track huge binary blobs. It does so by creating a text-based reference to the blob, then tracking and storing the blob in a location external to the git repository itself, in this case hosted by GitHub. This is just an idea. -- Javi Roman Twitter: @javiromanrh GitHub: github.com/javiroman Linkedin: es.linkedin.com/in/javiroman Big Data Blog: dataintensive.info On Tue, Apr 27, 2021 at 8:16 PM Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Javi, > > https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!8HtG2DkQcka-Vf_LqzSiO-3nGu0P_X_2bTHtEX0UqdhZaxMS0c5Tp0FG7ONqFNniFlzhAI3QDKI$ > is / was an attempt at a mirror of the svn trunk repository. There is > nothing more complicated than that. > > Sean > > ________________________________________ > From: Javi Roman <jroman.espi...@gmail.com> > Sent: Tuesday, April 27, 2021 12:58 PM > To: dev@ctakes.apache.org > Subject: Re: svn or github [EXTERNAL] > > * External Email - Caution * > > > Many thanks Sean. > > Any documentation about the repositories organization in Subversion? If I > understand correctly, the mirror in Github is only the trunk folder in > Subversion. > > -- > Javi Roman > > Twitter: @javiromanrh > GitHub: github.com/javiroman > Linkedin: es.linkedin.com/in/javiroman > Big Data Blog: dataintensive.info > > > On Tue, Apr 27, 2021 at 5:15 PM Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > > > Hi Javi, > > > > I too would like to get more developers and activity with source > available > > on Github. Hopefully you can help us do it. > > > > One problem that we had in the past concerning use of Github is caused by > > large machine learning models in ctakes. Github has file size limits for > > repositories and some of our models surpassed these limits, which caused > a > > corruption of the original migration attempt and errors with subsequent > > auto-merge checkins. ctakes had to be removed from the svn : github > > "mirroring". > > > > While large files (models, etc.) can be hosted as "release" binaries in > > github, modifying ctakes' github use in such a way breaks mirroring that > > would keep both the apache svn and github repositories synchronized. > > Removing the large model files from the svn area could require further > > customization of not only that layout but also getting things published > in > > maven central. > > > > There might be a simple way to reorganize files, simply maintain version > > control on large files, keep repository mirroring and publication > automated > > and document the whole paradigm so that a community can support it. > > Unfortunately, when this topic was last visited nobody authored or > > implemented such a solution. > > > > It has been many years since this topic was discussed, maybe some fresh > > perspectives or modernizations can get ctakes on github. > > > > Thanks, > > Sean > > > > ________________________________________ > > From: Javi Roman <jroman.espi...@gmail.com> > > Sent: Tuesday, April 27, 2021 10:26 AM > > To: dev@ctakes.apache.org > > Subject: Re: svn or github [EXTERNAL] > > > > * External Email - Caution * > > > > > > I've just seen the development is based on subversion. > > > > It looks like some movement for migrating the subversion to GitHub (most > of > > ASF projects migrated to github) in this issue [1], however the issue was > > created at 19/Nov/17 (it's in progress) and there aren't updates. > > > > I would like to open this discussion (fully migration to git) in order to > > get more developers and activity with an easier interface. > > > > Many thanks. > > > > > > [1] > > > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CTAKES-482__;!!NZvER7FxgEiBAiR_!4oyqgJknp0p2BR3zyrRLt-jYvzQkbeztpZ3Dx0lSJIsYxv97mbcSdUFU3W1H4BGE76GgEL4G-58$ > > -- > > Javi Roman > > > > Twitter: @javiromanrh > > GitHub: github.com/javiroman > > Linkedin: es.linkedin.com/in/javiroman > > Big Data Blog: dataintensive.info > > > > > > On Tue, Apr 27, 2021 at 3:53 PM Javi Roman <jroman.espi...@gmail.com> > > wrote: > > > > > Hi community! > > > > > > Is cTakes development currently done in github or subversion? > > > > > > -- > > > Javi Roman > > > > > > Twitter: @javiromanrh > > > GitHub: github.com/javiroman > > > Linkedin: es.linkedin.com/in/javiroman > > > Big Data Blog: dataintensive.info > > > > > >