Paul There are many images and papers online that describe cTakes at a high level. Probably none of them are 100% comprehensive but they will get you started.
try this: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Fast+Dictionary+Lookup Peter On Fri, Aug 11, 2023 at 10:37 AM Peter Abramowitsch <pabramowit...@gmail.com> wrote: > Hi Paul > > 1. The cTakes ecosystem is Java with a some optional Python code. I have > little experience running it in a Windows environment and so perhaps > someone else in the group can give you pointers. My instinct would be to > run it in a Linux based Docker instance - which I do anyway for some > clients. You can package it yourself as a standalone application talking > to a database or you can use a Webservice wrapper around it which exists in > the codebase (that is either Dockerized or packaged as.a WAR or both). > Then you can implement a REST client in a pure Windows environment if that > is easier for you. > > 2. cTakes is an open source project going back to 2012, and as such, uses > many different technical approaches in its various components: pattern > recognition, state machines, POS and treebank extractors and some ML > techniques but it does not have a user friendly training mechanism for > those components, although there are some examples. > > The best way to understand it is to download it and get started. > > Peter > > On Fri, Aug 11, 2023 at 6:45 AM Paul Stearns <pa...@compuace.com.invalid> > wrote: > >> Peter: >> >> Thanks for the detailed and thoughtful explanation. >> >> The easiest part for me to understand and work through would be #6. My MO >> for this sort of thing with both currently used in the existing target >> system are Windows services with associated DB queues and DLLs called from >> the application. The former for items which are not needed as part of the >> "real time" application and the latter for those which are. >> >> I currently have a homegrown application which looks for keywords and >> negation modifiers within a certain distance from the keywords which works >> moderately well. >> >> My ignorance regarding NLP algorithms like CTakes is whether it is >> keyword driven, or it is self learning. If it is the latter, I have a >> fairly large collection of human curated data which I could feed a training >> module. >> >> Where can I find an "executive overview" (30,000 foot view) of how the >> CTakes works? >> >> Paul R. Stearns >> Advanced Consulting Enterprises, Inc. >> 15150 NW 79th Court, >> Suite: 206 >> Miami Lakes Fl, 33016 >> >> Voice: (305)623-0360 x107 >> Fax: (305)623-4588 >> >> ---------------------------------------- >> From: "Peter Abramowitsch" <pabramowit...@gmail.com> >> Sent: 8/10/23 11:59 PM >> To: dev@ctakes.apache.org >> Subject: Junk E-Mail Fwd: Initial CTakes analysis >> >> Hi Paul >> Out of the box, cTakes would get you part of the way there, but would >> require several types of customization to meet your requirements. All of >> these are the kind of customizations that most of us have had to do, so >> there's nothing new here, but they are not trivial. As I see it they fall >> into these categories. >> >> 1. getting familiar with the cTakes Application, pipeline, annotator and >> vocabulary ecosystem >> 2. choosing a vocabulary subset that gives the best coverage of the terms >> you are looking for >> 3. adding one or more custom dictionaries to add terms & synonyms that are >> not present - >> 4. maybe employing the anatomical site annotator in your pipeline >> 5. deciding how to harvest and structure the data you extract from the CAS >> object which all the annotators target >> 6. decide how to deploy the application (standalone?, webservices host? >> multi-instance? ). Many considerations go into this and greatly affect >> ability to scale. There is more than one architectural solution that will >> work and allow you to get to your "fully automated" goal, but you will >> need >> to implement that yourself. >> >> A hint about highlighting the text - all annotations carry text offsets so >> with these you can write code (usually JS and CSS) to do your >> highlighting. native cTakes does not have any graphical display >> functionality. >> >> Another hint learned from experience. If you have many large texts (say, >> 20kb and above with lots of potential terms to discover), you can achieve >> much better throughput by breaking these into smaller chunks at sentence >> boundaries and tweaking offsets accordingly as you reassemble the chunks. >> The memory requirements grow rapidly with the size of the note. >> >> In summary, a strong developer background is a good starting point. To >> that you'd want to add medical informatics, and experience with scalable >> architectures. cTakes is a great kernel to your system but be prepared to >> dive deep. >> >> Peter >> >> On Thu, Aug 10, 2023 at 10:06 AM Paul Stearns <pa...@compuace.com.invalid >> > >> wrote: >> >> > I am looking for a NLP to read pathology reports and extract cancer >> > related site, histology, stage and any other DX/RX data available. In >> > looking at CTakes, I have a few questions; >> > >> > - Is CTakes an appropriate tool to automate this task? >> > - The end goal would be a fully automated tool where text was presented >> to >> > an API and data was returned. >> > - An added bonus, would be for the tool to annotate the text, so that a >> > reviewer can more easily find the relevant data. >> > - For someone with a strong IT/software development background, but no >> NLP >> > background what is the level of difficulty in getting started with this >> > product? >> > >> > Paul R. Stearns >> > Advanced Consulting Enterprises, Inc. >> > 15150 NW 79th Court, >> > Suite: 206 >> > Miami Lakes Fl, 33016 >> > >> > Voice: (305)623-0360 x107 >> > Fax: (305)623-4588 >> > >> >