Re: CTAKES mirroring on github.
Hi Steve, It may or may not be the issue. You are right, Infra hasn't given any reason for the reason that the repo only goes up to August 2013. I theorize it is the overall repo size causing memory issues to prevent the repo from going beyond August 2013... but it is just a guess. I was able on my local machine with large amounts of ram to run git svn fetch correctly, so it doesn't appear that there is anything corrupt or problematic with the git svn fetch call itself. I've had issues with my own personal git repos consuming all the available memory on VMs before due to large files. Git really doesn't handle large files well, as it usually tries to put everything into ram / swap space. In the case that the entire repo size exceeds the ram / swap space, git will crash... generally making a mess of things. Github limiting file size is just an interesting side note. I'm really interested in making use of the git repo vs the svn repo, so I'm hoping to get things to move forward here. IMAT Solutions http://imatsolutions.com Kim Ebert Software Engineer Office: 208.971.1509 kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com On 05/28/2015 09:31 AM, Steven Bethard wrote: On Thu, May 14, 2015 at 1:56 PM, Kim Ebert kim.eb...@perfectsearchcorp.com wrote: I've done some investigation into using / working with the git repo for cTAKES, and I found that it is a huge. It doesn't work well with GitHub either, as I keep running into timeouts. I would like to make the suggest that we remove two cTAKES build files and the ctakes-gui-0.0.1.zip file. This takes the repo from about 8 GB down to 1.8 GB. It is likely that the reason the git mirror is failing is due to the large size of the repo. While I'm all for removing some of the huge files, note that the file size is not the problem. GitHub is mirroring everything (except maybe the large files), it's just that git://git.apache.org/ctakes.git is not complete. It only goes to up to August 2013. Steve
Re: CTAKES mirroring on github.
On Thu, May 14, 2015 at 1:56 PM, Kim Ebert kim.eb...@perfectsearchcorp.com wrote: I've done some investigation into using / working with the git repo for cTAKES, and I found that it is a huge. It doesn't work well with GitHub either, as I keep running into timeouts. I would like to make the suggest that we remove two cTAKES build files and the ctakes-gui-0.0.1.zip file. This takes the repo from about 8 GB down to 1.8 GB. It is likely that the reason the git mirror is failing is due to the large size of the repo. While I'm all for removing some of the huge files, note that the file size is not the problem. GitHub is mirroring everything (except maybe the large files), it's just that git://git.apache.org/ctakes.git is not complete. It only goes to up to August 2013. Steve
Re: CTAKES mirroring on github.
One of the visions behind the *-res projects was to separate out the resources from code. In theory, one can filter out all *-res projects from their git repo and pull in any version of the resources from maven central... I won't have enough bandwidth at the moment to try it out or work on the git piece though... --Pei On Thu, May 14, 2015 at 1:56 PM, Kim Ebert kim.eb...@perfectsearchcorp.com wrote: I've done some investigation into using / working with the git repo for cTAKES, and I found that it is a huge. It doesn't work well with GitHub either, as I keep running into timeouts. I would like to make the suggest that we remove two cTAKES build files and the ctakes-gui-0.0.1.zip file. This takes the repo from about 8 GB down to 1.8 GB. It is likely that the reason the git mirror is failing is due to the large size of the repo. GitHub will also filter out some of these vary large files, as GitHub's max file size is 100MB. git filter-branch --tree-filter 'rm -rf ctakes-gui-0.0.1.zip' origin/cTAKES-GUI-0.0.1 git filter-branch -f --tree-filter 'rm -rf _cTAKES_build_/cTAKES-2.5*.zip' origin/maven-sandbox git filter-branch -f --tree-filter 'rm -rf _cTAKES_build_/cTAKES-2.5*.zip' origin/SHARPn-cTAKES # Clean out unreferenced objects from repo git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 \ -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc It may also be helpful to remove ctakes-dependency-parser-res/src/main/resources/org/apache/ctakes/dependency/parser/models/clearparser_models.jar from the git repo as well. (238,248,287 bytes) Thoughts? [image: IMAT Solutions] http://imatsolutions.com Kim Ebert Software Engineer [image: Office:] 208.971.1509 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com On 05/06/2015 01:17 PM, Steven Bethard wrote: Yes, I ping this issue every couple months, but no luck so far. (They take a look each time I ask, but haven't yet pushed a working git mirror for us.) Steve On Tue, May 5, 2015 at 12:09 PM, Kim Ebertkim.eb...@perfectsearchcorp.com kim.eb...@perfectsearchcorp.com wrote: Ah, looks like the issue is still being looked into. https://issues.apache.org/jira/browse/INFRA-8553 On Mon, May 4, 2015 at 4:54 PM, jay vyas jayunit100.apa...@gmail.com jayunit100.apa...@gmail.com wrote: Thanks kim. Can you file an infra issue ? they will look into it. I filed one originally On May 4, 2015 6:32 PM, Kim Ebert kim.eb...@perfectsearchcorp.com kim.eb...@perfectsearchcorp.com wrote: It looks like the github hasn't been updated in a while. Any reason? Thanks, Kim On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Our request is for a read-only mirror. However, if it ever becomes i/o, I don't know if this will have what you want, but http://git.apache.org/ Links to documentation (mostly server setup)http://www.apache.org/dev/git.html and a wiki (check toward middle and bottom for committer info) https://wiki.apache.org/general/GitAtApache -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu timothy.mil...@childrens.harvard.edu] Sent: Tuesday, February 17, 2015 12:31 PM To: dev@ctakes.apache.org Subject: Re: CTAKES mirroring on github. Is there any existing resource to help people who want to use git understand the right workflow to contribute to ctakes? (i.e. how this interacts with svn repos). Tim On 02/17/2015 12:23 PM, jay vyas wrote: Hi CTakes. Looks like infra finally got onto the JIRA i made for this a while back. They are currently working on fixing a couple of minor glitches w/ the mirroring (not showing all commits)... but there now is a mirror for CTakes on github. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup- IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90 Lcdzg_EFVvMe=
Re: CTAKES mirroring on github.
...@perfectsearchcorp.com wrote: It looks like the github hasn't been updated in a while. Any reason? Thanks, Kim On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean sean.fi...@childrens.harvard.edu mailto:sean.fi...@childrens.harvard.edu wrote: Our request is for a read-only mirror. However, if it ever becomes i/o, I don't know if this will have what you want, but http://git.apache.org/ Links to documentation (mostly server setup) http://www.apache.org/dev/git.html and a wiki (check toward middle and bottom for committer info) https://wiki.apache.org/general/GitAtApache -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Tuesday, February 17, 2015 12:31 PM To: dev@ctakes.apache.org mailto:dev@ctakes.apache.org Subject: Re: CTAKES mirroring on github. Is there any existing resource to help people who want to use git understand the right workflow to contribute to ctakes? (i.e. how this interacts with svn repos). Tim On 02/17/2015 12:23 PM, jay vyas wrote: Hi CTakes. Looks like infra finally got onto the JIRA i made for this a while back. They are currently working on fixing a couple of minor glitches w/ the mirroring (not showing all commits)... but there now is a mirror for CTakes on github. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup- IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90 Lcdzg_EFVvMe=
Re: CTAKES mirroring on github.
I've done some investigation into using / working with the git repo for cTAKES, and I found that it is a huge. It doesn't work well with GitHub either, as I keep running into timeouts. I would like to make the suggest that we remove two cTAKES build files and the ctakes-gui-0.0.1.zip file. This takes the repo from about 8 GB down to 1.8 GB. It is likely that the reason the git mirror is failing is due to the large size of the repo. GitHub will also filter out some of these vary large files, as GitHub's max file size is 100MB. git filter-branch --tree-filter 'rm -rf ctakes-gui-0.0.1.zip' origin/cTAKES-GUI-0.0.1 git filter-branch -f --tree-filter 'rm -rf _cTAKES_build_/cTAKES-2.5*.zip' origin/maven-sandbox git filter-branch -f --tree-filter 'rm -rf _cTAKES_build_/cTAKES-2.5*.zip' origin/SHARPn-cTAKES # Clean out unreferenced objects from repo git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 \ -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc It may also be helpful to remove ctakes-dependency-parser-res/src/main/resources/org/apache/ctakes/dependency/parser/models/clearparser_models.jar from the git repo as well. (238,248,287 bytes) Thoughts? IMAT Solutions http://imatsolutions.com Kim Ebert Software Engineer Office: 208.971.1509 kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com On 05/06/2015 01:17 PM, Steven Bethard wrote: Yes, I ping this issue every couple months, but no luck so far. (They take a look each time I ask, but haven't yet pushed a working git mirror for us.) Steve On Tue, May 5, 2015 at 12:09 PM, Kim Ebert kim.eb...@perfectsearchcorp.com wrote: Ah, looks like the issue is still being looked into. https://issues.apache.org/jira/browse/INFRA-8553 On Mon, May 4, 2015 at 4:54 PM, jay vyas jayunit100.apa...@gmail.com wrote: Thanks kim. Can you file an infra issue ? they will look into it. I filed one originally On May 4, 2015 6:32 PM, Kim Ebert kim.eb...@perfectsearchcorp.com wrote: It looks like the github hasn't been updated in a while. Any reason? Thanks, Kim On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Our request is for a read-only mirror. However, if it ever becomes i/o, I don't know if this will have what you want, but http://git.apache.org/ Links to documentation (mostly server setup) http://www.apache.org/dev/git.html and a wiki (check toward middle and bottom for committer info) https://wiki.apache.org/general/GitAtApache -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Tuesday, February 17, 2015 12:31 PM To: dev@ctakes.apache.org Subject: Re: CTAKES mirroring on github. Is there any existing resource to help people who want to use git understand the right workflow to contribute to ctakes? (i.e. how this interacts with svn repos). Tim On 02/17/2015 12:23 PM, jay vyas wrote: Hi CTakes. Looks like infra finally got onto the JIRA i made for this a while back. They are currently working on fixing a couple of minor glitches w/ the mirroring (not showing all commits)... but there now is a mirror for CTakes on github. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup- IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90 Lcdzg_EFVvMe=
Re: CTAKES mirroring on github.
Ah, looks like the issue is still being looked into. https://issues.apache.org/jira/browse/INFRA-8553 On Mon, May 4, 2015 at 4:54 PM, jay vyas jayunit100.apa...@gmail.com wrote: Thanks kim. Can you file an infra issue ? they will look into it. I filed one originally On May 4, 2015 6:32 PM, Kim Ebert kim.eb...@perfectsearchcorp.com wrote: It looks like the github hasn't been updated in a while. Any reason? Thanks, Kim On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Our request is for a read-only mirror. However, if it ever becomes i/o, I don't know if this will have what you want, but http://git.apache.org/ Links to documentation (mostly server setup) http://www.apache.org/dev/git.html and a wiki (check toward middle and bottom for committer info) https://wiki.apache.org/general/GitAtApache -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Tuesday, February 17, 2015 12:31 PM To: dev@ctakes.apache.org Subject: Re: CTAKES mirroring on github. Is there any existing resource to help people who want to use git understand the right workflow to contribute to ctakes? (i.e. how this interacts with svn repos). Tim On 02/17/2015 12:23 PM, jay vyas wrote: Hi CTakes. Looks like infra finally got onto the JIRA i made for this a while back. They are currently working on fixing a couple of minor glitches w/ the mirroring (not showing all commits)... but there now is a mirror for CTakes on github. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup- IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90 Lcdzg_EFVvMe=
Re: CTAKES mirroring on github.
It looks like the github hasn't been updated in a while. Any reason? Thanks, Kim On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Our request is for a read-only mirror. However, if it ever becomes i/o, I don't know if this will have what you want, but http://git.apache.org/ Links to documentation (mostly server setup) http://www.apache.org/dev/git.html and a wiki (check toward middle and bottom for committer info) https://wiki.apache.org/general/GitAtApache -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Tuesday, February 17, 2015 12:31 PM To: dev@ctakes.apache.org Subject: Re: CTAKES mirroring on github. Is there any existing resource to help people who want to use git understand the right workflow to contribute to ctakes? (i.e. how this interacts with svn repos). Tim On 02/17/2015 12:23 PM, jay vyas wrote: Hi CTakes. Looks like infra finally got onto the JIRA i made for this a while back. They are currently working on fixing a couple of minor glitches w/ the mirroring (not showing all commits)... but there now is a mirror for CTakes on github. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup- IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90 Lcdzg_EFVvMe=
Re: CTAKES mirroring on github.
Is there any existing resource to help people who want to use git understand the right workflow to contribute to ctakes? (i.e. how this interacts with svn repos). Tim On 02/17/2015 12:23 PM, jay vyas wrote: Hi CTakes. Looks like infra finally got onto the JIRA i made for this a while back. They are currently working on fixing a couple of minor glitches w/ the mirroring (not showing all commits)... but there now is a mirror for CTakes on github. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOpkTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90Lcdzg_EFVvMe=
Re: CTAKES mirroring on github.
For now, its read only. 0) click Fork for Ctakes on github. https://github.com/apache/ctakes. 1) git clone https://github.com/your github id/ctakes 2) write some code 3) git diff mypatch.patch 4) Attach patch to JIRA and have a CTakes commiter push it to SVN for you :) Should be painless for most ? On Tue, Feb 17, 2015 at 12:25 PM, Miller, Timothy timothy.mil...@childrens.harvard.edu wrote: Is there any existing resource to help people who want to use git understand the right workflow to contribute to ctakes? (i.e. how this interacts with svn repos). Tim On 02/17/2015 12:23 PM, jay vyas wrote: Hi CTakes. Looks like infra finally got onto the JIRA i made for this a while back. They are currently working on fixing a couple of minor glitches w/ the mirroring (not showing all commits)... but there now is a mirror for CTakes on github. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOpkTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90Lcdzg_EFVvMe= -- jay vyas
RE: CTAKES mirroring on github.
Our request is for a read-only mirror. However, if it ever becomes i/o, I don't know if this will have what you want, but http://git.apache.org/ Links to documentation (mostly server setup) http://www.apache.org/dev/git.html and a wiki (check toward middle and bottom for committer info) https://wiki.apache.org/general/GitAtApache -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Tuesday, February 17, 2015 12:31 PM To: dev@ctakes.apache.org Subject: Re: CTAKES mirroring on github. Is there any existing resource to help people who want to use git understand the right workflow to contribute to ctakes? (i.e. how this interacts with svn repos). Tim On 02/17/2015 12:23 PM, jay vyas wrote: Hi CTakes. Looks like infra finally got onto the JIRA i made for this a while back. They are currently working on fixing a couple of minor glitches w/ the mirroring (not showing all commits)... but there now is a mirror for CTakes on github. https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup- IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90 Lcdzg_EFVvMe=