Sean, I can share a couple things that have been an obstacle for me. It may
seem a minor point to some, but I left Subversion behind years ago and
really have no desire to go back. If the project were moved over to
Git/Github it would really smooth the way for me at least. I would be happy
to help out with this. One of the other things I would really like to see
is the mailing list moved onto a discussion board platform. It seems to me
that a discussion board style of tool tends to create a more active
community than a mailing list does.

The other thing that might help get new people involved is making it easier
to find information about the development environment. Things like
branching strategies, coding conventions, etc are really hard to find from
the main cTAKES web site. I saw some references to Jenkins builds recently
on the list. I had no idea there was a Jenkins CI server for the project
somewhere. It also takes some digging to find a link to Jira. Maybe we
could create a Wiki page that describes where all these tools are and how
they are used.

You guys have really done some great work over the last couple of years
cleaning up the code base and improving the documentation by a ton. Things
like the fast dictionary annotator, dictionary creator GUI are a great
addition and make it a lot easier for other people to get up and running
more quickly. As I'm ramping up my research as well as some proof of
concept stuff at work I'll be working more and more with cTAKES and would
love to contribute more to the project.

Just my thoughts.

- Dave


On Sat, Nov 18, 2017 at 11:10 AM, Finan, Sean <
[email protected]> wrote:

> Hi Tim, Alex,
>
> Great ideas.  I like your (Tim) idea to
> 1. start with commented code removal.
> Then maybe move on to
> 2. sanity-test type unit tests - Little two or three-line "does this
> method crack" tests.
> And another that is simply
> 3. "populate a test cas with type(s) X" and a factory with
> "getSectionTestCas" "getSetenceTestCas" "getPosTestCas" "getChunkTestCas"
> ...  just really simple reusables for tests.
> Then
> 4. refactor to extract and consolidate duplicate code - it is all over the
> place ...
>
> These are just my initial thoughts and suggestions, but I think that those
> 4 tasks can be performed by anybody of any experience level.   They build
> upon each other and should help the implementers better understand ctakes.
> After that the sky is the limit.
>
> A couple of years ago I sat on a panel at a workshop for open source
> scientific software.  For the half dozen or so highlighted projects (ctakes
> was one!) the common thread was that getting people to contribute is
> extremely difficult.
> I have a tendency to assume that people always act in their best
> interests.  Any student thinking of going towards industry should be
> jumping at the opportunity to contribution to a large, production-quality
> project.  They should also realize that contribution means potential
> recommendation (and possibly hiring interest) by established developers,
> physicians and researchers that use ctakes.  Even just answering questions
> on a user or dev list creates credibility and can build a network.
> Active researchers could discover common thoughts and directions that
> could lead to collaboration outside ctakes.  Researchers and companies
> trying to build upon open source should realize that direct contribution is
> easier than custom substitution.  Plus, it is in their best interests that
> code does what they need it to do in the fastest, lightest, most stable way
> possible.
> With a project like ctakes there are a lot of things that can be done,
> there are great opportunities to really shine.  "I wrote this tool for my
> thesis that performs some nlp task" sounds good.  Appending "in an Apache
> product and it has been taken up by thousands across the globe" makes it
> sound a lot better.
> At my previous job in industry the company actively contributed to several
> open source projects.  We had a few people for whom that was 50% of their
> job.  Why?  Because we made a commitment to use that open source software.
> It was a better use of our resources to contribute to it, improve it and
> keep its momentum going and prevent it from becoming stale (or abandoned)
> while our software continued to move forward.
>
> Hmm, that was a touch more than I had planned to write.  A whole cup of
> coffee in that one.
>
> Sean
>
>
>
>
> -----Original Message-----
> From: Miller, Timothy [mailto:[email protected]]
> Sent: Saturday, November 18, 2017 8:13 AM
> To: [email protected]
> Subject: Re: unknown dependencies [EXTERNAL] [SUSPICIOUS]
>
> Thanks Alex, looks like that was probably a fat-fingered auto-import on my
> part.
>
> I like your idea, and I don't know the best way to to start either, but
> maybe one suggestion is to start with one or two focused things to clean
> up, and then ask for volunteers to take on specific modules? Then people
> can contribute an hour here and there to do cleanup on their task/module
> and try to fix that thing in a 1-2-month long sprint. I am happy to
> contribute to cleanup, I am responsible for my fair share of unclean code,
> but since I don't have strong software engineering chops it would be good
> to have people with that background propose the tasks and describe exactly
> what needs to be done. My idea of cleaning is just to delete commented out
> sections of evaluation code.
>
> Tim
>
> ________________________________________
> From: Alexandru Zbarcea <[email protected]>
> Sent: Friday, November 17, 2017 4:46 PM
> To: Apache cTAKES Dev
> Subject: unknown dependencies [EXTERNAL]
>
> Hi,
>
> I notice that a miss-dependency has slipped in the code:
> jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter;
>
> Now, that the Jenkins builds is successful, I think it is easier to
> clean-up the code. I would like to be a common effort. I don't know the
> best way to approach this.
>
> Looking forward to your advice,
> Alex
>

Reply via email to