Thanks Javi,

I am aware of github's lfs.  It is a good idea, but I am not sure how well it 
would work for a project with a community as large as ctakes.  Using the lfs 
tool seems like it is an inhibitor to easy adoption - which is not where we 
want to go.

Just to be clear, I am not talking about a migration to github.  The desire is 
to have a mirror of the svn repo on github.  The last I spoke with apache infra 
on this, the lfs was not a viable solution to the problem because it didn't fit 
into the mirroring technique.  The details on that were all behind a door that 
I never opened, so that is where my knowledge of the matter ends.

Thanks, and keep the ideas rolling,

Sean
________________________________________
From: Javi Roman <jroman.espi...@gmail.com>
Sent: Monday, May 3, 2021 4:13 AM
To: dev@ctakes.apache.org
Subject: Re: svn or github [EXTERNAL]

* External Email - Caution *


A way to properly work with large files and GitHub is to use the Git Large
File Storage (LFS) plugin created by GitHub. The following is a session
using this feature:

$ ctakes-testbed.git(main)]$ find -size +50M
./rest-api/healthnlp-examples/ctakes-temporal-demo/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script
./rest-api/healthnlp-examples/ctakes-temporal-demo/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx.script
./rest-api/healthnlp-examples/ctakes-web-client/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script
./rest-api/healthnlp-examples/ctakes-web-client/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx.script
$ git lfs track ctakessnorx.script
$ git lfs track ctakessnorx.script
$ git lfs track
Listing tracked patterns
    ctakessnorx.script (.gitattributes)
    ctakessnorx.script (.gitattributes)
$ git add .gitattributes
$ git add .
$ git commit -m "....."
$ git push origin main

That enables the git version control system to track huge binary blobs. It
does so by creating a text-based reference to the blob, then tracking and
storing the blob in a location external to the git repository itself, in
this case hosted by GitHub.

This is just an idea.
--
Javi Roman

Twitter: @javiromanrh
GitHub: github.com/javiroman
Linkedin: es.linkedin.com/in/javiroman
Big Data Blog: dataintensive.info


On Tue, Apr 27, 2021 at 8:16 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Javi,
>
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!8HtG2DkQcka-Vf_LqzSiO-3nGu0P_X_2bTHtEX0UqdhZaxMS0c5Tp0FG7ONqFNniFlzhAI3QDKI$
> is / was an attempt at a mirror of the svn trunk repository.  There is
> nothing more complicated than that.
>
> Sean
>
> ________________________________________
> From: Javi Roman <jroman.espi...@gmail.com>
> Sent: Tuesday, April 27, 2021 12:58 PM
> To: dev@ctakes.apache.org
> Subject: Re: svn or github [EXTERNAL]
>
> * External Email - Caution *
>
>
> Many thanks Sean.
>
> Any documentation about the repositories organization in Subversion? If I
> understand correctly, the mirror in Github is only the trunk folder in
> Subversion.
>
> --
> Javi Roman
>
> Twitter: @javiromanrh
> GitHub: github.com/javiroman
> Linkedin: es.linkedin.com/in/javiroman
> Big Data Blog: dataintensive.info
>
>
> On Tue, Apr 27, 2021 at 5:15 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Javi,
> >
> > I too would like to get more developers and activity with source
> available
> > on Github.  Hopefully you can help us do it.
> >
> > One problem that we had in the past concerning use of Github is caused by
> > large machine learning models in ctakes.  Github has file size limits for
> > repositories and some of our models surpassed these limits, which caused
> a
> > corruption of the original migration attempt and errors with subsequent
> > auto-merge checkins.  ctakes had to be removed from the svn : github
> > "mirroring".
> >
> > While large files (models, etc.) can be hosted as "release" binaries in
> > github, modifying ctakes' github use in such a way breaks mirroring that
> > would keep both the apache svn and github repositories synchronized.
> > Removing the large model files from the svn area could require further
> > customization of not only that layout but also getting things published
> in
> > maven central.
> >
> > There might be a simple way to reorganize files, simply maintain version
> > control on large files, keep repository mirroring and publication
> automated
> > and document the whole paradigm so that a community can support it.
> > Unfortunately, when this topic was last visited nobody authored or
> > implemented such a solution.
> >
> > It has been many years since this topic was discussed, maybe some fresh
> > perspectives or modernizations can get ctakes on github.
> >
> > Thanks,
> > Sean
> >
> > ________________________________________
> > From: Javi Roman <jroman.espi...@gmail.com>
> > Sent: Tuesday, April 27, 2021 10:26 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: svn or github [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > I've just seen the development is based on subversion.
> >
> > It looks like some movement for migrating the subversion to GitHub (most
> of
> > ASF projects migrated to github) in this issue [1], however the issue was
> > created at 19/Nov/17 (it's in progress) and there aren't updates.
> >
> > I would like to open this discussion (fully migration to git) in order to
> > get more developers and activity with an easier interface.
> >
> > Many thanks.
> >
> >
> > [1]
> >
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CTAKES-482__;!!NZvER7FxgEiBAiR_!4oyqgJknp0p2BR3zyrRLt-jYvzQkbeztpZ3Dx0lSJIsYxv97mbcSdUFU3W1H4BGE76GgEL4G-58$
> > --
> > Javi Roman
> >
> > Twitter: @javiromanrh
> > GitHub: github.com/javiroman
> > Linkedin: es.linkedin.com/in/javiroman
> > Big Data Blog: dataintensive.info
> >
> >
> > On Tue, Apr 27, 2021 at 3:53 PM Javi Roman <jroman.espi...@gmail.com>
> > wrote:
> >
> > > Hi community!
> > >
> > > Is cTakes development currently done in github or subversion?
> > >
> > > --
> > > Javi Roman
> > >
> > > Twitter: @javiromanrh
> > > GitHub: github.com/javiroman
> > > Linkedin: es.linkedin.com/in/javiroman
> > > Big Data Blog: dataintensive.info
> > >
> >
>

Reply via email to