This is an automated email from the ASF dual-hosted git repository.
seanfinan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/ctakes.git
The following commit(s) were added to refs/heads/main by this push:
new b94c667 Improved getting started for new developers.
new 3918e0e Merge remote-tracking branch 'origin/main' into main
b94c667 is described below
commit b94c667a7ca2b0e55678090dfc86b36383aa4467
Author: Sean Finan <[email protected]>
AuthorDate: Mon Jun 9 10:22:45 2025 -0400
Improved getting started for new developers.
---
README.md | 93 ++++++++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 63 insertions(+), 30 deletions(-)
diff --git a/README.md b/README.md
index 29619e7..23e76ae 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,6 @@
## Introduction
-
The Apache™ clinical Text Analysis and Knowledge Extraction System (cTAKES™)
focuses on extracting knowledge
from clinical text through Natural Language Processing (NLP) techniques.
@@ -31,58 +30,92 @@ We encourage people from all backgrounds to get involved!
(link)
<br>
## Supported Environments
-1. **Java 1.8** is required to run cTAKES versions 5.x and older. Versions 6+
require java 17. Run this command to check your Java version:
-```
-$ java -version
-```
-2. **Maven 3** is required to build cTAKES. Run this to command to check your
Maven version:
+1. **Java 17** is required to run cTAKES 6.0.0 and higher. **Java 8 or Java
11** is required to run cTAKES 5. Run this command to check your Java version:
```
-$ mvn -version
+java -version
```
-3. A license for the [Unified Medical Language System
(UMLS)](https://www.nlm.nih.gov/research/umls/index.html)
+> [!NOTE]
+> If you are using an integrated development environment (IDE), please see its
documentation on using Java.
+2. A license for the National Library of Medicine's [Unified Medical Language
System (UMLS)](https://www.nlm.nih.gov/research/umls/index.html)
is required to use the named entity recognition module (dictionary lookup)
with the default dictionary.
-4. **Python 3** is required to use cTAKES [Python Bridge to Java
(PBJ)](https://github.com/apache/ctakes/wiki/pbj_intro).
-Run this to command to check your Python version:
+3. **Python 3** is required to use cTAKES [Python Bridge to Java
(PBJ)](https://github.com/apache/ctakes/wiki/pbj_intro).
+ Run this to command to check your Python version:
```
-$ python -V
+python -V
```
+> [!NOTE]
+> If you are using an integrated development environment (IDE), please see its
documentation on using python.
+<br/>
+### For developers:
+1. Apache **Maven 3** is required to build cTAKES. Run this to command to
check your Maven version:
+```
+mvn -version
+```
+> [!NOTE]
+> If you are using an integrated development environment (IDE), please see its
documentation on using Apache Maven.
<br/>
## Getting Started
-### New Users
-
-The easiest way for new users to get a jump start running cTAKES is to use the
[Standard Pipeline Installation Facility](artifacts).
-The Standard Pipeline Installation Facility is a tool that can install cTAKES
configured to run the most popular cTAKES pre-built pipelines.
-You can then use the [Piper File
Submitter](https://github.com/apache/ctakes/wiki/Piper+File+Submitter) GUI to
submit jobs or submit them from the command line.
+### New Users (Non-Developers)
-For access to all cTAKES capabilities, download a [zip]() or [tar.z]() file
containing a fully-built installation of the most recent cTAKES release.
-Then, after obtaining a UMLS license, use the [UMLS Package
Fetcher](https://github.com/apache/ctakes/wiki/cTAKES+UMLS+Package+Fetcher) GUI
to install a copy of the
+For access to all cTAKES capabilities, download a pre-built copy of a cTAKES
installation from the [release
area](https://github.com/apache/ctakes/releases).
+The names of pre-built installations follow the format
`apache-ctakes-#.#.#-bin.zip`.
+After unzipping the release file and obtaining a UMLS license, use the [UMLS
Package
Fetcher](https://github.com/apache/ctakes/wiki/cTAKES+UMLS+Package+Fetcher) GUI
to install a copy of the
default dictionary for Named Entity Recognition (NER) using cTAKES Fast
Dictionary Lookup.
+You can then use the [Piper File
Submitter](https://github.com/apache/ctakes/wiki/Piper+File+Submitter) GUI to
submit jobs,
+or run any of the scripts in the `bin/` directory.
-### New Developers
-__Notice:__ cTAKES 7.0.0-SNAPSHOT requires jdk 17 to build and run.
+### New Developers
All source code for cTAKES versions 5+ is available from the [cTAKES GitHub
repository](https://github.com/apache/ctakes).
-1. Clone this repository
+1. Clone the cTAKES code repository using git.
```
-$ git clone https://github.com/apache/ctakes.git
+git clone https://github.com/apache/ctakes.git
```
-2. Open your local copy of the repository in an IDE of your choice.
-3. Run directly from the code (link).
- or
-4. Build a binary installation (link), and
-5. Run a binary installation (link).
+> [!NOTE]
+> If you are using an integrated development environment (IDE), please see its
documentation on using git.
+2. Compile the cTAKES code using Apache Maven. In your cTAKES root directory,
run this command:
+```
+mvn clean compile
+```
+> [!NOTE]
+> If you are using an integrated development environment (IDE), please see its
documentation on using Apache Maven.
+3.
[Download](https://sourceforge.net/projects/ctakesresources/files/sno_rx_16ab.zip)
the default cTAKES dictionary zip file.
+4. Copy the contents of the zip file to the
`resources/org/apache/ctakes/dictionary/lookup/fast` directory.
+> [!NOTE]
+> As an alternative to steps 3 and 4, you can use the [UMLS Package
Fetcher](https://github.com/apache/ctakes/wiki/cTAKES+UMLS+Package+Fetcher) GUI.
+> Run the class `DictionaryDownloader.java` to launch that tool, or use the
`getUmlsDictionary` script if using a full build of cTAKES.
+5. Run the cTAKES default pipeline using the Java class
`PiperFileRunner.java`. To use the [Piper File
Submitter](https://github.com/apache/ctakes/wiki/Piper+File+Submitter) GUI, run
the `PiperRunnerGui.java` class.
+> [!NOTE]
+> To run the cTAKES Java classes, the full Java classpath must be configured.
Setting up a classpath is beyond the scope of this document.
+> An integrated development environment (IDE) should set up the classpath for
you, please see its documentation.
+
+<br>
+> [!IMPORTANT]
+> You cannot run scripts in the `bin/` directory within a development
environment.
+> Within a cTAKES development environment you can run Java classes and Maven
profiles, but no scripts in the `bin` directory.
+
+> [!TIP]
+> You can build your own cTAKES installation from a development environment
using Apache Maven.
+> A cTAKES installation is required to run scripts in the `bin/` directory.
+6. Build using Apache Maven:
+```
+mvn clean compile package
+```
+> [!NOTE]
+> If you are using an integrated development environment (IDE), please see its
documentation on using Apache Maven.
+After packaging, there should be tar and zip files for `apache-ctakes-...-bin`
and ` apache-ctakes-...-src` in your `ctakes-distribution/target/` directory.
+7. Unzip the `apache-ctakes-...-bin` into a directory *outside* your cTAKES
development area.
-## More information
-Much more information can be found on the [cTAKES
wiki](https://github.com/apache/ctakes/wiki).
+## More information
-You can also write to the cTAKES user and developer mailing lists: user at
ctakes.apache.org and dev at apache.ctakes.org
+You can write to the cTAKES user and developer mailing lists: **user** at
`ctakes.apache.org` and **dev** at `apache.ctakes.org`
and find answers to previously asked questions by searching the
[user](https://lists.apache.org/[email protected])
and [developer](https://lists.apache.org/[email protected])
mail archives.
\ No newline at end of file