Hi Tim, >we ran into issues in previous attempts at migration with the large file sizes >in our repo
Indeed we did, and over the years I have had thoughts on that. Those large files are large ml models, which are (mostly) static, replaceable/interchangeable, not always necessary, and in separate resource (-res) modules separated from code modules. When I was a ctakes newby really disliked the separation of code from resources by entirely separate -res modules. Since then, through working on projects that use ctakes code but not (huge) resources as dependencies, I have realized the wisdom of the modular separation. In fact, I put a -huge- model in its own -res module so that I could <exclude> it from a ctakes-dependent project, saving compile (download) time and disk space. Like you, I don't like to "download the internet" with maven ;^) Right now we have the ner dictionaries in sourceforge, not the apache repos. While this is done for legal reasons it has worked pretty well. I think that we could maintain an apache SVN repo of -res modules containing only huge model files. I am guessing that we would have to make it a "side/sub project" to maintain a separate repo (jenkins build, etc.). Anyway, it would give us the freedom to use a github repo for code (and non-model resources) without users needing to go through the github large-file workflow, which I see as a barrier to entry. Thoughts? ________________________________________ From: Miller, Timothy <timothy.mil...@childrens.harvard.edu.INVALID> Sent: Thursday, June 2, 2022 6:21 PM To: dev@ctakes.apache.org Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] * External Email - Caution * My recollection was that we ran into issues in previous attempts at migration with the large file sizes in our repo. Tim On Thu, 2022-06-02 at 20:55 +0000, Finan, Sean wrote: * External Email - Caution * Thank you Gandhi and Richard. Unless somebody else beats me to it I will perform some research and see what approaches can be used and which might be best. In the end the cTAKES Project Management Committee will need to vote for any action as sweeping as moving to github. Sean ________________________________________ From: gandhi rajan < <mailto:gandhiraja...@gmail.com> gandhiraja...@gmail.com > Sent: Thursday, June 2, 2022 9:02 AM To: <mailto:dev@ctakes.apache.org> dev@ctakes.apache.org Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] * External Email - Caution * Hi Sean, If we are sure that the SVN has all the latest changes and active development is primarily on SVN, then why don't we request a fresh git repository and push all the changes over there. More info on <https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$> https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$ On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean < <mailto:sean.fi...@childrens.harvard.edu.invalid> sean.fi...@childrens.harvard.edu.invalid > wrote: Hi Richard, you bring up a valid concern. cTAKES Developers: The Apache Foundation has had an initiative to "move" all projects to GitHub for some time now. I don't know much about how this is done. If anybody out there has knowledge or experience that they can pass on, please share. Thanks, Sean ________________________________________ From: Richard Eckart de Castilho < <mailto:r...@apache.org> r...@apache.org > Sent: Thursday, June 2, 2022 3:39 AM To: <mailto:dev@ctakes.apache.org> dev@ctakes.apache.org Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] * External Email - Caution * Hi, it appears that the GitHub mirror of Apache cTAKES may be stuck. When I check the svn log of <https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$ , I can see activity as recent as May 2022. However, on GitHub, I can only see stale branches: <https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$> https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$ Wouldn't it be good if the GitHub mirror would be kept up-to-date? Best, -- Richard -- Regards, Gandhi "The best way to find urself is to lose urself in the service of others !!!"