pabramowitsch commented on issue #56:
URL: https://github.com/apache/ctakes/issues/56#issuecomment-2837378968
I've been using cTakes at UCSF for 7 years on millions of notes and am
familiar with the task(s) you're suggesting. I'm guessing you're new to the
system so my answers will be fairly high level.
- In general given an appropriate configuration, disease types,
anatomical location, DX, RX data can be fairly accurately obtained from any
report including Path reports using cTakes However staging, specific sites,
lesion types and sizes etc are not well detected. We have experiments here at
UCSF and also UCSD using some supplementary ML techniques and LLMS to do a
better job of the latter data items.
- The API part has been implemented several times in the cTakes archive
and I've done my own. See the included project ctakes-web-rest for one
example. How you do it has a lot to do with the way you want to consume the
output... as a raw CAS object or some new data structure to serve your needs.
A raw CAS is very large so often it makes sense to create the output message as
a new object you populate by scanning the CAS in the server process. The CAS
is part of the Apache UIMA project which is integral to cTakes.
- We've written one, but it belongs to UCSF. There isn't one built
into the open source code.
- As a system built mostly on deterministic methods such as pattern
matching, regexp with some basic classification machinery etc it's not too
difficult to get started. But without a clinical informatics background or
NLP basics you'll have a serious ramp-up period to understand the components of
the pipeline, the data types you're looking for and how to harvest them from
the CAS, It's a pretty complex in-memory representation of the complete parse
results. The Java infrastructure retains quite a bit of the 2010-2015
pluggable / dependency-injection design patterns, which, if you haven't
encountered them, may also take time to understand. You'll also need to
familiarize yourself with the dictionary structures, and customization.
- Best thing is to get started and allow yourself several months at least.
Peter
On Monday, April 28, 2025 at 10:22:32 AM PDT, Johnsd11 ***@***.***>
wrote:
Johnsd11 created an issue (apache/ctakes#56)
What’s your question?
I am looking for a NLP to read pathology reports and extract cancer related
site, histology, stage and any other DX/RX data available. In looking at
CTakes, I have a few questions;
- Is CTakes an appropriate tool to automate this task?
- The end goal would be a fully automated tool where text was presented
to an API and data was returned.
- An added bonus, would be for the tool to annotate the text, so that a
reviewer can more easily find the relevant data.
- For someone with a strong IT/software development background, but no
NLP background what is the level of difficulty in getting started with this
product?
Context
No response
What category does this question fall under?
None
Contact Details
No response
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID:
***@***.***>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]