Re: Junk E-Mail Fwd: Initial CTakes analysis

Peter Abramowitsch Fri, 11 Aug 2023 10:53:28 -0700

Paul
There are many images and papers online that describe cTakes at a high
level.  Probably none of them are 100% comprehensive but they will get you
started.


try this:
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Fast+Dictionary+Lookup

Peter

On Fri, Aug 11, 2023 at 10:37 AM Peter Abramowitsch <pabramowit...@gmail.com>
wrote:

> Hi Paul
>
> 1.  The cTakes ecosystem is Java with a some optional Python code. I have
> little experience running it in a Windows environment and so perhaps
> someone else in the group can give you pointers.   My instinct would be to
> run it in a Linux based Docker instance - which I do anyway for some
> clients.   You can package it yourself as a standalone application talking
> to a database or you can use a Webservice wrapper around it which exists in
> the codebase (that is either Dockerized or packaged as.a WAR or both).
> Then you can implement a REST client in a pure Windows environment if that
> is easier for you.
>
> 2.  cTakes is an open source project going back to 2012, and as such, uses
> many different technical approaches in its various components:  pattern
> recognition, state machines, POS and treebank extractors  and some ML
> techniques but it does not have a user friendly training mechanism for
> those components, although there are some examples.
>
> The best way to understand it is to download it and get started.
>
> Peter
>
> On Fri, Aug 11, 2023 at 6:45 AM Paul Stearns <pa...@compuace.com.invalid>
> wrote:
>
>> Peter:
>>
>> Thanks for the detailed and thoughtful explanation.
>>
>> The easiest part for me to understand and work through would be #6. My MO
>> for this sort of thing with both currently used in the existing target
>> system are Windows services with associated DB queues and DLLs called from
>> the application. The former for items which are not needed as part of the
>> "real time" application and the latter for those which are.
>>
>> I currently have a homegrown application which looks for keywords and
>> negation modifiers within a certain distance from the keywords which works
>> moderately well.
>>
>> My ignorance regarding NLP algorithms like CTakes is whether it is
>> keyword driven, or it is self learning. If it is the latter, I have a
>> fairly large collection of human curated data which I could feed a training
>> module.
>>
>> Where can I find an "executive overview" (30,000 foot view) of how the
>> CTakes works?
>>
>> Paul R. Stearns
>> Advanced Consulting Enterprises, Inc.
>> 15150 NW 79th Court,
>> Suite: 206
>> Miami Lakes Fl, 33016
>>
>> Voice: (305)623-0360 x107
>> Fax: (305)623-4588
>>
>> ----------------------------------------
>> From: "Peter Abramowitsch" <pabramowit...@gmail.com>
>> Sent: 8/10/23 11:59 PM
>> To: dev@ctakes.apache.org
>> Subject: Junk E-Mail Fwd: Initial CTakes analysis
>>
>> Hi Paul
>> Out of the box, cTakes would get you part of the way there, but would
>> require several types of customization to meet your requirements. All of
>> these are the kind of customizations that most of us have had to do, so
>> there's nothing new here, but they are not trivial. As I see it they fall
>> into these categories.
>>
>> 1. getting familiar with the cTakes Application, pipeline, annotator and
>> vocabulary ecosystem
>> 2. choosing a vocabulary subset that gives the best coverage of the terms
>> you are looking for
>> 3. adding one or more custom dictionaries to add terms & synonyms that are
>> not present -
>> 4. maybe employing the anatomical site annotator in your pipeline
>> 5. deciding how to harvest and structure the data you extract from the CAS
>> object which all the annotators target
>> 6. decide how to deploy the application (standalone?, webservices host?
>> multi-instance? ). Many considerations go into this and greatly affect
>> ability to scale. There is more than one architectural solution that will
>> work and allow you to get to your "fully automated" goal, but you will
>> need
>> to implement that yourself.
>>
>> A hint about highlighting the text - all annotations carry text offsets so
>> with these you can write code (usually JS and CSS) to do your
>> highlighting. native cTakes does not have any graphical display
>> functionality.
>>
>> Another hint learned from experience. If you have many large texts (say,
>> 20kb and above with lots of potential terms to discover), you can achieve
>> much better throughput by breaking these into smaller chunks at sentence
>> boundaries and tweaking offsets accordingly as you reassemble the chunks.
>> The memory requirements grow rapidly with the size of the note.
>>
>> In summary, a strong developer background is a good starting point. To
>> that you'd want to add medical informatics, and experience with scalable
>> architectures. cTakes is a great kernel to your system but be prepared to
>> dive deep.
>>
>> Peter
>>
>> On Thu, Aug 10, 2023 at 10:06 AM Paul Stearns <pa...@compuace.com.invalid
>> >
>> wrote:
>>
>> > I am looking for a NLP to read pathology reports and extract cancer
>> > related site, histology, stage and any other DX/RX data available. In
>> > looking at CTakes, I have a few questions;
>> >
>> > - Is CTakes an appropriate tool to automate this task?
>> > - The end goal would be a fully automated tool where text was presented
>> to
>> > an API and data was returned.
>> > - An added bonus, would be for the tool to annotate the text, so that a
>> > reviewer can more easily find the relevant data.
>> > - For someone with a strong IT/software development background, but no
>> NLP
>> > background what is the level of difficulty in getting started with this
>> > product?
>> >
>> > Paul R. Stearns
>> > Advanced Consulting Enterprises, Inc.
>> > 15150 NW 79th Court,
>> > Suite: 206
>> > Miami Lakes Fl, 33016
>> >
>> > Voice: (305)623-0360 x107
>> > Fax: (305)623-4588
>> >
>>
>

Re: Junk E-Mail Fwd: Initial CTakes analysis

Reply via email to