Re: CTAKES mirroring on github.

2015-05-28 Thread Kim Ebert
Hi Steve,

It may or may not be the issue. You are right, Infra hasn't given any
reason for the reason that the repo only goes up to August 2013. I
theorize it is the overall repo size causing memory issues to prevent
the repo from going beyond August 2013... but it is just a guess. I was
able on my local machine with large amounts of ram to run git svn fetch
correctly, so it doesn't appear that there is anything corrupt or
problematic with the git svn fetch call itself.

I've had issues with my own personal git repos consuming all the
available memory on VMs before due to large files. Git really doesn't
handle large files well, as it usually tries to put everything into ram
/ swap space. In the case that the entire repo size exceeds the ram /
swap space, git will crash... generally making a mess of things.

Github limiting file size is just an interesting side note.

I'm really interested in making use of the git repo vs the svn repo, so
I'm hoping to get things to move forward here.

IMAT Solutions http://imatsolutions.com
Kim Ebert
Software Engineer
Office: 208.971.1509
kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com
On 05/28/2015 09:31 AM, Steven Bethard wrote:
 On Thu, May 14, 2015 at 1:56 PM, Kim Ebert
 kim.eb...@perfectsearchcorp.com wrote:
 I've done some investigation into using / working with the git repo for 
 cTAKES, and I found that it is a huge. It doesn't work well with GitHub 
 either, as I keep running into timeouts.

 I would like to make the suggest that we remove two cTAKES build files and 
 the ctakes-gui-0.0.1.zip file. This takes the repo from about 8 GB down to 
 1.8 GB. It is likely that the reason the git mirror is failing is due to the 
 large size of the repo.
 While I'm all for removing some of the huge files,  note that the file
 size is not the problem. GitHub is mirroring everything (except maybe
 the large files), it's just that git://git.apache.org/ctakes.git is
 not complete. It only goes to up to August 2013.

 Steve




Re: CTAKES mirroring on github.

2015-05-28 Thread Steven Bethard
On Thu, May 14, 2015 at 1:56 PM, Kim Ebert
kim.eb...@perfectsearchcorp.com wrote:
 I've done some investigation into using / working with the git repo for 
 cTAKES, and I found that it is a huge. It doesn't work well with GitHub 
 either, as I keep running into timeouts.

 I would like to make the suggest that we remove two cTAKES build files and 
 the ctakes-gui-0.0.1.zip file. This takes the repo from about 8 GB down to 
 1.8 GB. It is likely that the reason the git mirror is failing is due to the 
 large size of the repo.

While I'm all for removing some of the huge files,  note that the file
size is not the problem. GitHub is mirroring everything (except maybe
the large files), it's just that git://git.apache.org/ctakes.git is
not complete. It only goes to up to August 2013.

Steve


Re: CTAKES mirroring on github.

2015-05-18 Thread Pei Chen
One of the visions behind the *-res projects was to separate out the
resources from code.  In theory, one can filter out all *-res projects from
their git repo and pull in any version of the resources from maven
central...  I won't have enough bandwidth at the moment to try it out or
work on the git piece though...
--Pei

On Thu, May 14, 2015 at 1:56 PM, Kim Ebert kim.eb...@perfectsearchcorp.com
wrote:

  I've done some investigation into using / working with the git repo for
 cTAKES, and I found that it is a huge. It doesn't work well with GitHub
 either, as I keep running into timeouts.

 I would like to make the suggest that we remove two cTAKES build files and
 the ctakes-gui-0.0.1.zip file. This takes the repo from about 8 GB down to
 1.8 GB. It is likely that the reason the git mirror is failing is due to
 the large size of the repo. GitHub will also filter out some of these vary
 large files, as GitHub's max file size is 100MB.

 git filter-branch --tree-filter 'rm -rf ctakes-gui-0.0.1.zip'
 origin/cTAKES-GUI-0.0.1
 git filter-branch -f --tree-filter 'rm -rf _cTAKES_build_/cTAKES-2.5*.zip'
 origin/maven-sandbox
 git filter-branch -f --tree-filter 'rm -rf _cTAKES_build_/cTAKES-2.5*.zip'
 origin/SHARPn-cTAKES

 # Clean out unreferenced objects from repo
 git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c
 gc.rerereresolved=0 \
 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc


 It may also be helpful to remove
 ctakes-dependency-parser-res/src/main/resources/org/apache/ctakes/dependency/parser/models/clearparser_models.jar
 from the git repo as well. (238,248,287 bytes)

 Thoughts?

   [image: IMAT Solutions] http://imatsolutions.com
  Kim Ebert
 Software Engineer
 [image: Office:] 208.971.1509
 kim.eb...@imatsolutions.com greg.hub...@imatsolutions.com
  On 05/06/2015 01:17 PM, Steven Bethard wrote:

 Yes, I ping this issue every couple months, but no luck so far. (They
 take a look each time I ask, but haven't yet pushed a working git
 mirror for us.)

 Steve

 On Tue, May 5, 2015 at 12:09 PM, Kim Ebertkim.eb...@perfectsearchcorp.com 
 kim.eb...@perfectsearchcorp.com wrote:

  Ah, looks like the issue is still being looked into.
 https://issues.apache.org/jira/browse/INFRA-8553

 On Mon, May 4, 2015 at 4:54 PM, jay vyas jayunit100.apa...@gmail.com 
 jayunit100.apa...@gmail.com
 wrote:


  Thanks kim.

 Can you file an infra issue ?

 they will look into it.

 I filed one originally
 On May 4, 2015 6:32 PM, Kim Ebert kim.eb...@perfectsearchcorp.com 
 kim.eb...@perfectsearchcorp.com
 wrote:


  It looks like the github hasn't been updated in a while. Any reason?

 Thanks,

 Kim

 On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:


  Our request is for a read-only mirror.  However, if it ever becomes

  i/o,

  I

  don't know if this will have what you want, but http://git.apache.org/
 Links to documentation (mostly server 
 setup)http://www.apache.org/dev/git.html and a wiki (check toward middle and
 bottom for committer info) https://wiki.apache.org/general/GitAtApache



 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu 
 timothy.mil...@childrens.harvard.edu]
 Sent: Tuesday, February 17, 2015 12:31 PM
 To: dev@ctakes.apache.org
 Subject: Re: CTAKES mirroring on github.

 Is there any existing resource to help people who want to use git
 understand the right workflow to contribute to ctakes? (i.e. how this
 interacts with svn repos).
 Tim


 On 02/17/2015 12:23 PM, jay vyas wrote:

  Hi CTakes.  Looks like infra finally got  onto the JIRA i made for
 this a while back.  They are currently working on fixing a couple of
 minor glitches w/ the mirroring (not showing all commits)... but

   there

   now is a mirror for CTakes on github.




   https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache

 _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-

 IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp

 kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90

   Lcdzg_EFVvMe=






Re: CTAKES mirroring on github.

2015-05-18 Thread Kim Ebert
...@perfectsearchcorp.com
 wrote:

 It looks like the github hasn't been updated in a while. Any reason?

 Thanks,

 Kim

 On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean 
 sean.fi...@childrens.harvard.edu 
 mailto:sean.fi...@childrens.harvard.edu wrote:

 Our request is for a read-only mirror.  However, if it ever becomes
 i/o,
 I
 don't know if this will have what you want, but 
 http://git.apache.org/
 Links to documentation (mostly server setup)
 http://www.apache.org/dev/git.html and a wiki (check toward middle 
 and
 bottom for committer info) 
 https://wiki.apache.org/general/GitAtApache



 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
 Sent: Tuesday, February 17, 2015 12:31 PM
 To: dev@ctakes.apache.org mailto:dev@ctakes.apache.org
 Subject: Re: CTAKES mirroring on github.

 Is there any existing resource to help people who want to use git
 understand the right workflow to contribute to ctakes? (i.e. how this
 interacts with svn repos).
 Tim


 On 02/17/2015 12:23 PM, jay vyas wrote:
 Hi CTakes.  Looks like infra finally got  onto the JIRA i made for
 this a while back.  They are currently working on fixing a couple of
 minor glitches w/ the mirroring (not showing all commits)... but
 there
 now is a mirror for CTakes on github.



 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache
 _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-
 IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp
 kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90
 Lcdzg_EFVvMe=






Re: CTAKES mirroring on github.

2015-05-14 Thread Kim Ebert
I've done some investigation into using / working with the git repo for
cTAKES, and I found that it is a huge. It doesn't work well with GitHub
either, as I keep running into timeouts.

I would like to make the suggest that we remove two cTAKES build files
and the ctakes-gui-0.0.1.zip file. This takes the repo from about 8 GB
down to 1.8 GB. It is likely that the reason the git mirror is failing
is due to the large size of the repo. GitHub will also filter out some
of these vary large files, as GitHub's max file size is 100MB.

git filter-branch --tree-filter 'rm -rf ctakes-gui-0.0.1.zip'
origin/cTAKES-GUI-0.0.1
git filter-branch -f --tree-filter 'rm -rf
_cTAKES_build_/cTAKES-2.5*.zip' origin/maven-sandbox
git filter-branch -f --tree-filter 'rm -rf
_cTAKES_build_/cTAKES-2.5*.zip' origin/SHARPn-cTAKES

# Clean out unreferenced objects from repo
git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c
gc.rerereresolved=0 \
-c gc.rerereunresolved=0 -c gc.pruneExpire=now gc


It may also be helpful to remove
ctakes-dependency-parser-res/src/main/resources/org/apache/ctakes/dependency/parser/models/clearparser_models.jar
from the git repo as well. (238,248,287 bytes)

Thoughts?

IMAT Solutions http://imatsolutions.com
Kim Ebert
Software Engineer
Office: 208.971.1509
kim.eb...@imatsolutions.com mailto:greg.hub...@imatsolutions.com
On 05/06/2015 01:17 PM, Steven Bethard wrote:
 Yes, I ping this issue every couple months, but no luck so far. (They
 take a look each time I ask, but haven't yet pushed a working git
 mirror for us.)

 Steve

 On Tue, May 5, 2015 at 12:09 PM, Kim Ebert
 kim.eb...@perfectsearchcorp.com wrote:
 Ah, looks like the issue is still being looked into.

 https://issues.apache.org/jira/browse/INFRA-8553

 On Mon, May 4, 2015 at 4:54 PM, jay vyas jayunit100.apa...@gmail.com
 wrote:

 Thanks kim.

 Can you file an infra issue ?

 they will look into it.

 I filed one originally
 On May 4, 2015 6:32 PM, Kim Ebert kim.eb...@perfectsearchcorp.com
 wrote:

 It looks like the github hasn't been updated in a while. Any reason?

 Thanks,

 Kim

 On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:

 Our request is for a read-only mirror.  However, if it ever becomes
 i/o,
 I
 don't know if this will have what you want, but http://git.apache.org/
 Links to documentation (mostly server setup)
 http://www.apache.org/dev/git.html and a wiki (check toward middle and
 bottom for committer info) https://wiki.apache.org/general/GitAtApache



 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
 Sent: Tuesday, February 17, 2015 12:31 PM
 To: dev@ctakes.apache.org
 Subject: Re: CTAKES mirroring on github.

 Is there any existing resource to help people who want to use git
 understand the right workflow to contribute to ctakes? (i.e. how this
 interacts with svn repos).
 Tim


 On 02/17/2015 12:23 PM, jay vyas wrote:
 Hi CTakes.  Looks like infra finally got  onto the JIRA i made for
 this a while back.  They are currently working on fixing a couple of
 minor glitches w/ the mirroring (not showing all commits)... but
 there
 now is a mirror for CTakes on github.



 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache
 _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-
 IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp
 kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90
 Lcdzg_EFVvMe=





Re: CTAKES mirroring on github.

2015-05-05 Thread Kim Ebert
Ah, looks like the issue is still being looked into.

https://issues.apache.org/jira/browse/INFRA-8553

On Mon, May 4, 2015 at 4:54 PM, jay vyas jayunit100.apa...@gmail.com
wrote:

 Thanks kim.

 Can you file an infra issue ?

 they will look into it.

 I filed one originally
 On May 4, 2015 6:32 PM, Kim Ebert kim.eb...@perfectsearchcorp.com
 wrote:

  It looks like the github hasn't been updated in a while. Any reason?
 
  Thanks,
 
  Kim
 
  On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean 
  sean.fi...@childrens.harvard.edu wrote:
 
   Our request is for a read-only mirror.  However, if it ever becomes
 i/o,
  I
   don't know if this will have what you want, but http://git.apache.org/
   Links to documentation (mostly server setup)
   http://www.apache.org/dev/git.html and a wiki (check toward middle and
   bottom for committer info) https://wiki.apache.org/general/GitAtApache
  
  
  
   -Original Message-
   From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
   Sent: Tuesday, February 17, 2015 12:31 PM
   To: dev@ctakes.apache.org
   Subject: Re: CTAKES mirroring on github.
  
   Is there any existing resource to help people who want to use git
   understand the right workflow to contribute to ctakes? (i.e. how this
   interacts with svn repos).
   Tim
  
  
   On 02/17/2015 12:23 PM, jay vyas wrote:
Hi CTakes.  Looks like infra finally got  onto the JIRA i made for
this a while back.  They are currently working on fixing a couple of
minor glitches w/ the mirroring (not showing all commits)... but
 there
now is a mirror for CTakes on github.
   
   
   
 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache
   
 _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-
   
 IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp
   
 kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90
Lcdzg_EFVvMe=
   
  
  
 



Re: CTAKES mirroring on github.

2015-05-04 Thread Kim Ebert
It looks like the github hasn't been updated in a while. Any reason?

Thanks,

Kim

On Tue, Feb 17, 2015 at 10:36 AM, Finan, Sean 
sean.fi...@childrens.harvard.edu wrote:

 Our request is for a read-only mirror.  However, if it ever becomes i/o, I
 don't know if this will have what you want, but http://git.apache.org/
 Links to documentation (mostly server setup)
 http://www.apache.org/dev/git.html and a wiki (check toward middle and
 bottom for committer info) https://wiki.apache.org/general/GitAtApache



 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
 Sent: Tuesday, February 17, 2015 12:31 PM
 To: dev@ctakes.apache.org
 Subject: Re: CTAKES mirroring on github.

 Is there any existing resource to help people who want to use git
 understand the right workflow to contribute to ctakes? (i.e. how this
 interacts with svn repos).
 Tim


 On 02/17/2015 12:23 PM, jay vyas wrote:
  Hi CTakes.  Looks like infra finally got  onto the JIRA i made for
  this a while back.  They are currently working on fixing a couple of
  minor glitches w/ the mirroring (not showing all commits)... but there
  now is a mirror for CTakes on github.
 
 
  https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache
  _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-
  IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp
  kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90
  Lcdzg_EFVvMe=
 




Re: CTAKES mirroring on github.

2015-02-17 Thread Miller, Timothy
Is there any existing resource to help people who want to use git
understand the right workflow to contribute to ctakes? (i.e. how this
interacts with svn repos).
Tim


On 02/17/2015 12:23 PM, jay vyas wrote:
 Hi CTakes.  Looks like infra finally got  onto the JIRA i made for this a
 while back.  They are currently working on fixing a couple of minor
 glitches w/ the mirroring (not showing all commits)... but there now is a
 mirror for CTakes on github.


 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOpkTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90Lcdzg_EFVvMe=
  




Re: CTAKES mirroring on github.

2015-02-17 Thread jay vyas
For now, its read only.

0) click Fork for Ctakes on github. https://github.com/apache/ctakes.
1) git clone https://github.com/your github id/ctakes
2) write some code
3) git diff  mypatch.patch
4) Attach patch to JIRA and have a CTakes commiter push it to SVN for you
:)

Should be painless for most ?


On Tue, Feb 17, 2015 at 12:25 PM, Miller, Timothy 
timothy.mil...@childrens.harvard.edu wrote:

 Is there any existing resource to help people who want to use git
 understand the right workflow to contribute to ctakes? (i.e. how this
 interacts with svn repos).
 Tim


 On 02/17/2015 12:23 PM, jay vyas wrote:
  Hi CTakes.  Looks like infra finally got  onto the JIRA i made for this a
  while back.  They are currently working on fixing a couple of minor
  glitches w/ the mirroring (not showing all commits)... but there now is a
  mirror for CTakes on github.
 
 
 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOpkTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90Lcdzg_EFVvMe=
 




-- 
jay vyas


RE: CTAKES mirroring on github.

2015-02-17 Thread Finan, Sean
Our request is for a read-only mirror.  However, if it ever becomes i/o, I 
don't know if this will have what you want, but http://git.apache.org/
Links to documentation (mostly server setup) http://www.apache.org/dev/git.html 
and a wiki (check toward middle and bottom for committer info) 
https://wiki.apache.org/general/GitAtApache



-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, February 17, 2015 12:31 PM
To: dev@ctakes.apache.org
Subject: Re: CTAKES mirroring on github.

Is there any existing resource to help people who want to use git understand 
the right workflow to contribute to ctakes? (i.e. how this interacts with svn 
repos).
Tim


On 02/17/2015 12:23 PM, jay vyas wrote:
 Hi CTakes.  Looks like infra finally got  onto the JIRA i made for 
 this a while back.  They are currently working on fixing a couple of 
 minor glitches w/ the mirroring (not showing all commits)... but there 
 now is a mirror for CTakes on github.


 https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache
 _ctakesd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-
 IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=4sEI9mOp
 kTz6K-DjmNU1s8Do1TGA0_10HqJcowKpDxcs=fNVbyXzpBLSAG6-DIjBZ1vbMp0JGaX90
 Lcdzg_EFVvMe=