Re: [I] [Question]: Initial CTakes analysis [ctakes]

via GitHub Mon, 28 Apr 2025 20:57:56 -0700


pabramowitsch commented on issue #56:
URL: https://github.com/apache/ctakes/issues/56#issuecomment-2837378968


   I've been using cTakes at UCSF for 7 years on millions of notes and am 
familiar with the task(s) you're suggesting.   I'm guessing you're new to the 
system so my answers will be fairly high level. 
   
      
      - In general given an appropriate configuration,  disease types, 
anatomical location, DX, RX data can be fairly accurately obtained from any 
report including Path reports using cTakes  However staging, specific sites, 
lesion types and sizes etc are not well detected.   We have experiments here at 
UCSF and also UCSD using some supplementary ML techniques and LLMS to do a 
better job of the latter data items.
      - The API part has been implemented several times in the cTakes archive 
and I've done my own.  See the included project  ctakes-web-rest for one 
example.  How you do it has a lot to do with the way you want to consume the 
output... as a raw CAS object or some new data structure to serve your needs.   
A raw CAS is very large so often it makes sense to create the output message as 
a new object you populate by scanning the CAS in the server process.  The CAS 
is part of the Apache UIMA project which is integral to cTakes.
      - We've written one, but it belongs to UCSF.    There isn't one built 
into the open source code.       
   
      - As a system built mostly on deterministic methods such as pattern 
matching, regexp with some basic classification machinery etc it's not too 
difficult to get started.   But without a clinical informatics background or 
NLP basics you'll have a serious ramp-up period to understand the components of 
the pipeline, the data types you're looking for and how to harvest them from 
the CAS,   It's a pretty complex in-memory representation of the complete parse 
results.   The Java infrastructure retains quite a bit of the 2010-2015 
pluggable / dependency-injection design patterns, which, if you haven't 
encountered them, may also take time to understand.   You'll also need to 
familiarize yourself with the dictionary structures, and customization.
      - Best thing is to get started and allow yourself several months at least.
   
   Peter
      On Monday, April 28, 2025 at 10:22:32 AM PDT, Johnsd11 ***@***.***> 
wrote:  
    
    Johnsd11 created an issue (apache/ctakes#56)
   What’s your question?
   
   I am looking for a NLP to read pathology reports and extract cancer related 
site, histology, stage and any other DX/RX data available. In looking at 
CTakes, I have a few questions;
      
      - Is CTakes an appropriate tool to automate this task?
      - The end goal would be a fully automated tool where text was presented 
to an API and data was returned.
      - An added bonus, would be for the tool to annotate the text, so that a 
reviewer can more easily find the relevant data.
      - For someone with a strong IT/software development background, but no 
NLP background what is the level of difficulty in getting started with this 
product?
   
   Context
   
   No response
   
   What category does this question fall under?
   
   None
   
   Contact Details
   
   No response
   
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you are subscribed to this thread.Message ID: 
***@***.***>
      


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Question]: Initial CTakes analysis [ctakes]

Reply via email to