Hi,

To grow the community and bring even more adoption is my desire, too. I
cannot agree more with what you said, Sean, Tim.

I have discussed with Hadrian (Apache member) about cTAKES adoption and I
think he has great ideas about the priorities for this community to grow. I
will like to introduce him to the community and let him express some ideas.

In regards to the technical issues that where already identified on this
thread, I would like to understand your perspective and prioritization.

   - There is code commented out, but much of this code seems to still be
   valuable, like it was commented from some migrations and was left over for
   somebody to follow-up (e.g. unit tests).
   - There are issues reported by SonnarQube [1] like:
      - 3.3K bugs [2]
      - 16.5% code duplication (24K LoC) [3]
      174 bugs in the last month [4]


   - I would like to see more Unit Tests for the code. There are new
   commits unrelated to a feature description and so, there is no clear
   understanding about what the review should focus on. I think it relates to
   the same request from Sean to have "sanity-test type unit tests - Little
   two or three-line "does this method crack" tests.". I see this task as one
   of the most important one.
   - Removal of hardcoded paths like: "/tmp",
   "C:/Users/<some-user>/<some-path>.
   - Migrate scripts from Ant (files like build-*.xml) to maven. It makes
   the code so unpredictable. I find it difficult to navigate through these
   when tests are dependent upon these executions.
   - Classpaths manually specified.
   - Deprecated code
   - Old libraries which involve security risks in production (e.g. Spring
   that was just upgraded)

Other tasks that are related more to productivity.

   - I think it is time to define some conventions for:
      - formatting (identation),
      - crlf conventions (see .gitattributes)
      - etc
   - For git vs Subversion, I am able to use the same folder with a .git
   and .svn VCS and documented on the wiki [5].
   - There are commits without any reference to Jira issues or other type
   of documentation. In consequence, when release will come, it will be very
   hard to hunt those changes and understand why those commits were made: bugs
   vs features. Also, based on the decision to use semantic versioning, it
   will need to choose between 4.0.1 or 4.1.0.

My $0.02,
Alex

[1] -
https://builds.apache.org/analysis/overview?id=org.apache.ctakes%3Actakes
[2] -
https://builds.apache.org/analysis/component_issues?id=org.apache.ctakes%3Actakes#resolved=false|types=BUG
[3] -
https://builds.apache.org/analysis/component_measures/metric/duplicated_blocks/list?id=org.apache.ctakes%3Actakes
[4] -
https://builds.apache.org/analysis/component_issues?id=org.apache.ctakes%3Actakes#resolved=false|types=BUG|sinceLeakPeriod=true
[5] -
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+Developer+Install+Guide#cTAKES4.0DeveloperInstallGuide-Subversion+Git



On Mon, Nov 20, 2017 at 6:32 AM, Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Git is available to apache projects, and many projects have moved over
> (see here: https://git-wip-us.apache.org/repos/asf):
> Here is the general info on what that looks like:
> https://www.apache.org/dev/writable-git
>
> A few points from that link:
> > Projects can request moving to Git as their main code repository, by
> creating an INFRA issue. See also the infra-contact page. > Projects can
> request new, blank repositories by using reporeq.apache.org.
> > The current system has basic git support only. We are working on
> extending this service in the near future.
> > Custom commit or other hooks will not be supported, all projects get the
> same hooks. Setting up gitpubsub should provide sufficient flexiblity
> without impacting the core Git setup, volunteers are welcome to make that
> happen.
>
> (Not sure what basic support only means.)
>
> There are also read-only git repos available by default for every project
> and updated in near-real-time:
> https://www.apache.org/dev/git.html
>
> with those I guess the suggested workflow is to work off of that repo and
> then just submit patches to someone who commits with svn rather than
> committing directly.
>
> I've been using the git-svn connector myself recently since I just vastly
> prefer the git lightweight branching for focused development, as it helps
> me keep a cleaner working directory. But that adds some additional annoying
> steps.
>
> Tim
>
> ________________________________________
> From: Finan, Sean <sean.fi...@childrens.harvard.edu>
> Sent: Saturday, November 18, 2017 1:23 PM
> To: dev@ctakes.apache.org
> Subject: RE: Contribute to ctakes: it is in your best interests! RE:
> unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
>
> Hi Dave,
>
> Those are some great thoughts.  Being an apache project I am not sure how
> far we can move from svn, but there may be a way.  You are not the first to
> voice this desire for an active github repo and I'm sure that you won't be
> the last.
>
> I completely agree with your discussion board preference.  Do you have any
> recommendations?
>
> You make a great point regarding documentation.  In reference to things
> that anybody can quickly contribute ... that would be a big one.
> Volunteers?!?
>
> I am really happy to hear that you want to contribute - more than you
> already have, which is actually quite a bit!
>
> Cheers,
> Sean
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.d...@gmail.com]
> Sent: Saturday, November 18, 2017 1:10 PM
> To: dev@ctakes.apache.org
> Subject: Re: Contribute to ctakes: it is in your best interests! RE:
> unknown dependencies [EXTERNAL] [SUSPICIOUS]
>
> Sean, I can share a couple things that have been an obstacle for me. It
> may seem a minor point to some, but I left Subversion behind years ago and
> really have no desire to go back. If the project were moved over to
> Git/Github it would really smooth the way for me at least. I would be happy
> to help out with this. One of the other things I would really like to see
> is the mailing list moved onto a discussion board platform. It seems to me
> that a discussion board style of tool tends to create a more active
> community than a mailing list does.
>
> The other thing that might help get new people involved is making it
> easier to find information about the development environment. Things like
> branching strategies, coding conventions, etc are really hard to find from
> the main cTAKES web site. I saw some references to Jenkins builds recently
> on the list. I had no idea there was a Jenkins CI server for the project
> somewhere. It also takes some digging to find a link to Jira. Maybe we
> could create a Wiki page that describes where all these tools are and how
> they are used.
>
> You guys have really done some great work over the last couple of years
> cleaning up the code base and improving the documentation by a ton. Things
> like the fast dictionary annotator, dictionary creator GUI are a great
> addition and make it a lot easier for other people to get up and running
> more quickly. As I'm ramping up my research as well as some proof of
> concept stuff at work I'll be working more and more with cTAKES and would
> love to contribute more to the project.
>
> Just my thoughts.
>
> - Dave
>
>
> On Sat, Nov 18, 2017 at 11:10 AM, Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Tim, Alex,
> >
> > Great ideas.  I like your (Tim) idea to 1. start with commented code
> > removal.
> > Then maybe move on to
> > 2. sanity-test type unit tests - Little two or three-line "does this
> > method crack" tests.
> > And another that is simply
> > 3. "populate a test cas with type(s) X" and a factory with
> > "getSectionTestCas" "getSetenceTestCas" "getPosTestCas" "getChunkTestCas"
> > ...  just really simple reusables for tests.
> > Then
> > 4. refactor to extract and consolidate duplicate code - it is all over
> > the place ...
> >
> > These are just my initial thoughts and suggestions, but I think that
> those
> > 4 tasks can be performed by anybody of any experience level.   They build
> > upon each other and should help the implementers better understand
> ctakes.
> > After that the sky is the limit.
> >
> > A couple of years ago I sat on a panel at a workshop for open source
> > scientific software.  For the half dozen or so highlighted projects
> > (ctakes was one!) the common thread was that getting people to
> > contribute is extremely difficult.
> > I have a tendency to assume that people always act in their best
> > interests.  Any student thinking of going towards industry should be
> > jumping at the opportunity to contribution to a large,
> > production-quality project.  They should also realize that
> > contribution means potential recommendation (and possibly hiring
> > interest) by established developers, physicians and researchers that
> > use ctakes.  Even just answering questions on a user or dev list creates
> credibility and can build a network.
> > Active researchers could discover common thoughts and directions that
> > could lead to collaboration outside ctakes.  Researchers and companies
> > trying to build upon open source should realize that direct
> > contribution is easier than custom substitution.  Plus, it is in their
> > best interests that code does what they need it to do in the fastest,
> > lightest, most stable way possible.
> > With a project like ctakes there are a lot of things that can be done,
> > there are great opportunities to really shine.  "I wrote this tool for
> > my thesis that performs some nlp task" sounds good.  Appending "in an
> > Apache product and it has been taken up by thousands across the globe"
> > makes it sound a lot better.
> > At my previous job in industry the company actively contributed to
> > several open source projects.  We had a few people for whom that was
> > 50% of their job.  Why?  Because we made a commitment to use that open
> source software.
> > It was a better use of our resources to contribute to it, improve it
> > and keep its momentum going and prevent it from becoming stale (or
> > abandoned) while our software continued to move forward.
> >
> > Hmm, that was a touch more than I had planned to write.  A whole cup
> > of coffee in that one.
> >
> > Sean
> >
> >
> >
> >
> > -----Original Message-----
> > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > Sent: Saturday, November 18, 2017 8:13 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: unknown dependencies [EXTERNAL] [SUSPICIOUS]
> >
> > Thanks Alex, looks like that was probably a fat-fingered auto-import
> > on my part.
> >
> > I like your idea, and I don't know the best way to to start either,
> > but maybe one suggestion is to start with one or two focused things to
> > clean up, and then ask for volunteers to take on specific modules?
> > Then people can contribute an hour here and there to do cleanup on
> > their task/module and try to fix that thing in a 1-2-month long
> > sprint. I am happy to contribute to cleanup, I am responsible for my
> > fair share of unclean code, but since I don't have strong software
> > engineering chops it would be good to have people with that background
> > propose the tasks and describe exactly what needs to be done. My idea
> > of cleaning is just to delete commented out sections of evaluation code.
> >
> > Tim
> >
> > ________________________________________
> > From: Alexandru Zbarcea <al...@apache.org>
> > Sent: Friday, November 17, 2017 4:46 PM
> > To: Apache cTAKES Dev
> > Subject: unknown dependencies [EXTERNAL]
> >
> > Hi,
> >
> > I notice that a miss-dependency has slipped in the code:
> > jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter;
> >
> > Now, that the Jenkins builds is successful, I think it is easier to
> > clean-up the code. I would like to be a common effort. I don't know
> > the best way to approach this.
> >
> > Looking forward to your advice,
> > Alex
> >
>

Reply via email to