Hi Tim, Alex, Great ideas. I like your (Tim) idea to 1. start with commented code removal. Then maybe move on to 2. sanity-test type unit tests - Little two or three-line "does this method crack" tests. And another that is simply 3. "populate a test cas with type(s) X" and a factory with "getSectionTestCas" "getSetenceTestCas" "getPosTestCas" "getChunkTestCas" ... just really simple reusables for tests. Then 4. refactor to extract and consolidate duplicate code - it is all over the place ...
These are just my initial thoughts and suggestions, but I think that those 4 tasks can be performed by anybody of any experience level. They build upon each other and should help the implementers better understand ctakes. After that the sky is the limit. A couple of years ago I sat on a panel at a workshop for open source scientific software. For the half dozen or so highlighted projects (ctakes was one!) the common thread was that getting people to contribute is extremely difficult. I have a tendency to assume that people always act in their best interests. Any student thinking of going towards industry should be jumping at the opportunity to contribution to a large, production-quality project. They should also realize that contribution means potential recommendation (and possibly hiring interest) by established developers, physicians and researchers that use ctakes. Even just answering questions on a user or dev list creates credibility and can build a network. Active researchers could discover common thoughts and directions that could lead to collaboration outside ctakes. Researchers and companies trying to build upon open source should realize that direct contribution is easier than custom substitution. Plus, it is in their best interests that code does what they need it to do in the fastest, lightest, most stable way possible. With a project like ctakes there are a lot of things that can be done, there are great opportunities to really shine. "I wrote this tool for my thesis that performs some nlp task" sounds good. Appending "in an Apache product and it has been taken up by thousands across the globe" makes it sound a lot better. At my previous job in industry the company actively contributed to several open source projects. We had a few people for whom that was 50% of their job. Why? Because we made a commitment to use that open source software. It was a better use of our resources to contribute to it, improve it and keep its momentum going and prevent it from becoming stale (or abandoned) while our software continued to move forward. Hmm, that was a touch more than I had planned to write. A whole cup of coffee in that one. Sean -----Original Message----- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Saturday, November 18, 2017 8:13 AM To: dev@ctakes.apache.org Subject: Re: unknown dependencies [EXTERNAL] [SUSPICIOUS] Thanks Alex, looks like that was probably a fat-fingered auto-import on my part. I like your idea, and I don't know the best way to to start either, but maybe one suggestion is to start with one or two focused things to clean up, and then ask for volunteers to take on specific modules? Then people can contribute an hour here and there to do cleanup on their task/module and try to fix that thing in a 1-2-month long sprint. I am happy to contribute to cleanup, I am responsible for my fair share of unclean code, but since I don't have strong software engineering chops it would be good to have people with that background propose the tasks and describe exactly what needs to be done. My idea of cleaning is just to delete commented out sections of evaluation code. Tim ________________________________________ From: Alexandru Zbarcea <al...@apache.org> Sent: Friday, November 17, 2017 4:46 PM To: Apache cTAKES Dev Subject: unknown dependencies [EXTERNAL] Hi, I notice that a miss-dependency has slipped in the code: jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter; Now, that the Jenkins builds is successful, I think it is easier to clean-up the code. I would like to be a common effort. I don't know the best way to approach this. Looking forward to your advice, Alex