Hi Tim, Alex,

Great ideas.  I like your (Tim) idea to 
1. start with commented code removal.
Then maybe move on to 
2. sanity-test type unit tests - Little two or three-line "does this method 
crack" tests.  
And another that is simply 
3. "populate a test cas with type(s) X" and a factory with "getSectionTestCas" 
"getSetenceTestCas" "getPosTestCas" "getChunkTestCas" ...  just really simple 
reusables for tests.
Then 
4. refactor to extract and consolidate duplicate code - it is all over the 
place ...

These are just my initial thoughts and suggestions, but I think that those 4 
tasks can be performed by anybody of any experience level.   They build upon 
each other and should help the implementers better understand ctakes.  After 
that the sky is the limit.

A couple of years ago I sat on a panel at a workshop for open source scientific 
software.  For the half dozen or so highlighted projects (ctakes was one!) the 
common thread was that getting people to contribute is extremely difficult.
I have a tendency to assume that people always act in their best interests.  
Any student thinking of going towards industry should be jumping at the 
opportunity to contribution to a large, production-quality project.  They 
should also realize that contribution means potential recommendation (and 
possibly hiring interest) by established developers, physicians and researchers 
that use ctakes.  Even just answering questions on a user or dev list creates 
credibility and can build a network. 
Active researchers could discover common thoughts and directions that could 
lead to collaboration outside ctakes.  Researchers and companies trying to 
build upon open source should realize that direct contribution is easier than 
custom substitution.  Plus, it is in their best interests that code does what 
they need it to do in the fastest, lightest, most stable way possible.   
With a project like ctakes there are a lot of things that can be done, there 
are great opportunities to really shine.  "I wrote this tool for my thesis that 
performs some nlp task" sounds good.  Appending "in an Apache product and it 
has been taken up by thousands across the globe" makes it sound a lot better.
At my previous job in industry the company actively contributed to several open 
source projects.  We had a few people for whom that was 50% of their job.  Why? 
 Because we made a commitment to use that open source software.  It was a 
better use of our resources to contribute to it, improve it and keep its 
momentum going and prevent it from becoming stale (or abandoned) while our 
software continued to move forward.

Hmm, that was a touch more than I had planned to write.  A whole cup of coffee 
in that one.

Sean




-----Original Message-----
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Saturday, November 18, 2017 8:13 AM
To: dev@ctakes.apache.org
Subject: Re: unknown dependencies [EXTERNAL] [SUSPICIOUS]

Thanks Alex, looks like that was probably a fat-fingered auto-import on my part.

I like your idea, and I don't know the best way to to start either, but maybe 
one suggestion is to start with one or two focused things to clean up, and then 
ask for volunteers to take on specific modules? Then people can contribute an 
hour here and there to do cleanup on their task/module and try to fix that 
thing in a 1-2-month long sprint. I am happy to contribute to cleanup, I am 
responsible for my fair share of unclean code, but since I don't have strong 
software engineering chops it would be good to have people with that background 
propose the tasks and describe exactly what needs to be done. My idea of 
cleaning is just to delete commented out sections of evaluation code.

Tim

________________________________________
From: Alexandru Zbarcea <al...@apache.org>
Sent: Friday, November 17, 2017 4:46 PM
To: Apache cTAKES Dev
Subject: unknown dependencies [EXTERNAL]

Hi,

I notice that a miss-dependency has slipped in the code:
jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter;

Now, that the Jenkins builds is successful, I think it is easier to clean-up 
the code. I would like to be a common effort. I don't know the best way to 
approach this.

Looking forward to your advice,
Alex

Reply via email to