Hi Kean,

Thank you for the suggestion and the link.  I am really glad that people are 
interested in this guithub topic and taking it seriously.  It would be great if 
we could make it happen.

While definitely a possibility, the git LFS paradigm is something that I would 
like to avoid.  

Like keeping our models on SVN, it would also require separating models from 
code into two different repos, e.g. github and bitbucket.  As opposed to 
bitbucket, the apache svn repos are long established, familiar to and supported 
by the apache infrastructure team.  The same goes for the apache foundation use 
of github.  I like being able to lean on the apache infra team for help.

The apache Jenkins servers are linked to the svn repos, making continuous 
integration easy - on the rare occasion when somebody does change something in 
a model repo.  While I expect anybody savvy enough to work on models to also 
have the knowhow and wherewithal to work with a separate svn repo, I don't want 
them to need to get out to jenkins and manually kick off snapshot builds.

Probably most important is the requirement of the client user to have the LFS 
command line client.  I think that there are enough hoops stuck in front of 
getting ctakes installed/checked out/cloned/etc. and it seems to me that one of 
the biggest reasons to use github is to make things easier for absolute newbies 
to just pull down code and experiment.

Keeping the models on a separate svn repo would mean that they aren't checked 
out as code, but would be put in the .m2 maven area when a user runs maven 
compile.  While the total footprint of full ctakes would still be the same 
size, it would essentially make the code directory smaller and initial 
downloads/checkouts would be faster.  Plus, if done properly maybe it could 
"clean up" all of those nearly identically named modules in my intellij project 
window and I'd stop clicking on the wrong one when I've had too much coffee.

The LFS system is great for people who want to work on (in development) large 
files, but given the very lopsided ratio of model reuse vs. 
creation/modification in ctakes I don't think that we need to go that route.

I am only one voice of many, so this is obviously up for debate.  Thanks again,

Sean

________________________________________
From: Kean Kaufmann <k...@recordsone.com.INVALID>
Sent: Monday, June 6, 2022 9:07 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Is Git LFS an option?
https://urldefense.com/v3/__https://www.atlassian.com/git/tutorials/git-lfs*installing-git-lfs__;Iw!!NZvER7FxgEiBAiR_!rhJYtElNafdN8aZaA2ELRmZRsDeX81m0IVx2yab70SFpsojM4fuIKTHlfGjo-kKfHlz_WjVFM8RgPjuPiEhaTqoWLzBOtKgs$
Needs an LFS-aware host e.g. Bitbucket; I don't know what the Apache
hosting setup is like.


On Fri, Jun 3, 2022 at 9:31 AM Finan, Sean
<sean.fi...@childrens.harvard.edu.invalid> wrote:

> Hi Tim,
>
> >we ran into issues in previous attempts at migration with the large file
> sizes in our repo
>
> Indeed we did, and over the years I have had thoughts on that.
>
> Those large files are large ml models, which are (mostly) static,
> replaceable/interchangeable, not always necessary, and in separate resource
> (-res) modules separated from code modules.
>
> When I was a ctakes newby really disliked the separation of code from
> resources by entirely separate -res modules.  Since then, through working
> on projects that use ctakes code but not (huge) resources as dependencies,
> I have realized the wisdom of the modular separation.  In fact, I put a
> -huge- model in its own -res module so that I could <exclude> it from a
> ctakes-dependent project, saving compile (download) time and disk space.
> Like you, I don't like to "download the internet" with maven   ;^)
>
> Right now we have the ner dictionaries in sourceforge, not the apache
> repos.  While this is done for legal reasons it has worked pretty well.
>
> I think that we could maintain an apache SVN repo of -res modules
> containing only huge model files.   I am guessing that we would have to
> make it a "side/sub project" to maintain a separate repo (jenkins build,
> etc.).
>
> Anyway, it would give us the freedom to use a github repo for code (and
> non-model resources) without users needing to go through the github
> large-file workflow, which I see as a barrier to entry.
>
> Thoughts?
>
> ________________________________________
> From: Miller, Timothy <timothy.mil...@childrens.harvard.edu.INVALID>
> Sent: Thursday, June 2, 2022 6:21 PM
> To: dev@ctakes.apache.org
> Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
> [SUSPICIOUS] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> My recollection was that we ran into issues in previous attempts at
> migration with the large file sizes in our repo.
> Tim
>
>
> On Thu, 2022-06-02 at 20:55 +0000, Finan, Sean wrote:
>
> * External Email - Caution *
>
>
>
> Thank you Gandhi and Richard.
>
>
> Unless somebody else beats me to it I will perform some research and see
> what approaches can be used and which might be best.  In the end the cTAKES
> Project Management Committee will need to vote for any action as sweeping
> as moving to github.
>
>
> Sean
>
> ________________________________________
>
> From: gandhi rajan <
>
> <mailto:gandhiraja...@gmail.com>
>
> gandhiraja...@gmail.com
>
> >
>
> Sent: Thursday, June 2, 2022 9:02 AM
>
> To:
>
> <mailto:dev@ctakes.apache.org>
>
> dev@ctakes.apache.org
>
>
> Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
>
> * External Email - Caution *
>
>
>
> Hi Sean,
>
>
> If we are sure that the SVN has all the latest changes and active
>
> development is primarily on SVN, then why don't we request a fresh git
>
> repository and push all the changes over there.
>
>
> More info on
>
> <
> https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$
> >
>
>
> https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$
>
>
>
> On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean
>
> <
>
> <mailto:sean.fi...@childrens.harvard.edu.invalid>
>
> sean.fi...@childrens.harvard.edu.invalid
>
> > wrote:
>
>
> Hi Richard, you bring up a valid concern.
>
>
> cTAKES Developers:
>
>
> The Apache Foundation has had an initiative to "move" all projects to
>
> GitHub for some time now.
>
>
> I don't know much about how this is done.  If anybody out there has
>
> knowledge or experience that they can pass on, please share.
>
>
> Thanks,
>
> Sean
>
> ________________________________________
>
> From: Richard Eckart de Castilho <
>
> <mailto:r...@apache.org>
>
> r...@apache.org
>
> >
>
> Sent: Thursday, June 2, 2022 3:39 AM
>
> To:
>
> <mailto:dev@ctakes.apache.org>
>
> dev@ctakes.apache.org
>
>
> Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]
>
>
> * External Email - Caution *
>
>
>
> Hi,
>
>
> it appears that the GitHub mirror of Apache cTAKES may be stuck.
>
>
> When I check the svn log of
>
> <
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
> >
>
>
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$
>
>
> , I can
>
> see activity as recent as May 2022.
>
>
> However, on GitHub, I can only see stale branches:
>
>
>
> <
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
> >
>
>
> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$
>
>
>
> Wouldn't it be good if the GitHub mirror would be kept up-to-date?
>
>
> Best,
>
>
> -- Richard
>
>
>
>
> --
>
> Regards,
>
> Gandhi
>
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>

Reply via email to