Hi Sebastien Ctakes is a large package with many moving parts and a long history. It would be impossible to answer your questions in a single email and it has been packaged in many different ways by various users.
If you haven't done it already, I suggest going back to basics: downloading and configuring just stable stable binary version exactly as documented. and run the simplest possible configuration, just to get off the ground..... make sure UMLS User Credentials are present in your environment. copy ./resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper into a work area create infiles and outfiles sub folders in that work area put a single note into the infiles folder source the ./bin/runPiperCreator script, and have it read in that copy of DefaultFastPipeline.piper Become familiar with that. Insert a reader entry at the top - a simple folder reader You can copy paste this if you want - and modify the input path *reader org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader InputDirectory=<full-path to input files folder>* and add a simple writer at the bottom an XMI writer. You can copy paste this if you want - and modify the output path *add org.apache.ctakes.core.cc.XmiWriterCasConsumerCtakes OutputDirectory=<full-path-to ouput-folder>* Now just validate it from within the piperCreator application using the YELLOW button at upper right. When valid you may get a green button to run it, but if not.... (I find that sometimes the run button remains disabled even though the piper is valid) source ./bin/runPiperFile.sh -p <your modified piper file> If it's worked you will see an XMI file in the ouput file folder. This is not the only way to visualize the output, but it is a simple tested mechanism. If it finishes with a non-zero length XMI you have a basis for learning the system and later on, thinking about expanding your scope. If not, there's still something amiss with your configuration. Next time, send us the entire error trace, not just the line at which the exception message appears. Everything is customizable, including how the piper can be invoked, but before you start, get the simple case running first. - Peter On Mon, Aug 12, 2019 at 3:43 PM Sébastien Boussard <bouss...@bu.edu> wrote: > I would also like to add, I do have the dictionaries. > > On Mon, Aug 12, 2019 at 3:39 PM Sébastien Boussard <bouss...@bu.edu> > wrote: > > > Thank you, everyone, for looking at this, > > My project is to understand how to use ctakes well and make it flexible > > for it to be used for everyone else in our lab when I leave. The first > > thing I wanted to do was just be able to get inputs and outputs. As I > > understand it more, I want to be able to transform it after. What are the > > full capabilities of piper files? When would it be advantageous to just > use > > that over what I was doing before? > > > > Thank you, > > Sebastien Boussard > > > > On Mon, Aug 12, 2019 at 8:21 AM Finan, Sean < > > sean.fi...@childrens.harvard.edu> wrote: > > > >> Hi all, > >> > >> I think that there are a lot of things going on here. > >> > >> Jeff's question is on point - do you actually have the dictionary? > >> > >> I think that doing all of this with code is unnecessary. > >> - I don't see anything in the code that cannot be done in a piper file. > >> - Piper files can set the collection reader. Use the "reader" command. > >> For your use, that would be " reader LinesFromFileCollectionReader > >> InputFileName=<filePath> " > >> - Piper files can load other piper files. Use the "load" command. > >> For your use, that would be " load DefaultFastPipeline " > >> https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files > >> So, instead of writing and debugging code, you can create a 2 line piper > >> file and just run it using > org.apache.ctakes.core.pipeline.PiperFileRunner > >> > >> " <java etc.> PiperFileRunner > >> -p <pathToMyPiper> > >> -o <pathToMyOutputDir> > >> --user <myUmlsUsername> > >> --pass <myUmlsPassword> " > >> > >> Or if you really want to run the piper from code then you can do so, but > >> I would rely more upon the piper such as in the examples code > >> HelloWorldPiperRunner.java > >> > >> I would just use a piper file. If you want to get fancy, then instead > of > >> explicitly specifying the InputFileName in the piper, use the "cli" > command > >> in the piper. > >> " cli InputFileName=in " > >> Then you can remove the specification from the piper command ( simplify > >> it to " reader LinesFromFileCollectionReader " ) > >> and your PiperFileRunner would be the same as above but with "--in > >> <filePath> " added. > >> Then you can change the input using the command line instead of > >> constantly editing the piper. > >> > >> Besides the obvious simplicity for the user of only using a piper file, > >> it should be easier for others to assist with problems as they do not > need > >> to go through your code. > >> > >> I have to ask why you are using LinesFromCollectionReader ? It treats > >> each line like a different document. > >> Your first attempt points to "right_knee_arthroscopy" in the example > >> notes. > >> This would give you two output documents, one for each line in that > >> file. Is that your intention? > >> > >> > >> Sean > >> > >> > >> ________________________________________ > >> From: Jeffrey Miller <jeff...@gmail.com> > >> Sent: Saturday, August 10, 2019 2:36 PM > >> To: dev@ctakes.apache.org > >> Subject: Re: Struggling initializing [EXTERNAL] > >> > >> Sebastien, > >> > >> Just wanted to confirm that you have the sno_rx_16ab.script file > >> in org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/ > >> > >> > >> Jeff > >> > >> On Sat, Aug 10, 2019, 2:16 PM gandhi rajan <gandhiraja...@gmail.com> > >> wrote: > >> > >> > Sorry Sebastien I still don't get what you are trying to do. > >> > > >> > On Saturday, August 10, 2019, Sebastien Boussard <bouss...@bu.edu> > >> wrote: > >> > > >> > > Hello Mr. Rajan, > >> > > I have realized that I have sent you no context! I am currently > >> working > >> > on > >> > > the Process Lines Clinical Runner. Previously, I was having many > >> errors > >> > > with the directories. I made a link from my resources folder to the > >> > apache > >> > > takes resources folder. I have no link between the source code and > the > >> > user > >> > > interface. > >> > > > >> > > Here is the code: > >> > > > >> > > import java.io.File; > >> > > import java.io.IOException; > >> > > > >> > > > >> > > import org.apache.ctakes.core.cr.LinesFromFileCollectionReader; > >> > > import org.apache.ctakes.core.pipeline.EntityCollector; > >> > > import org.apache.ctakes.core.pipeline.PipelineBuilder; > >> > > import org.apache.ctakes.core.pipeline.PiperFileReader; > >> > > import org.apache.ctakes.core.resource.FileLocator; > >> > > import org.apache.ctakes.dictionary.lookup2.ae > >> .DefaultJCasTermAnnotator; > >> > > import org.apache.uima.UIMAException; > >> > > import org.apache.log4j.Logger; > >> > > final public class ClinicalProcessor { > >> > > > >> > > > >> > > static private final Logger LOGGER = > >> Logger.getLogger(" > >> > > ClinicalProcessor"); > >> > > > >> > > static private final String PIPER_FILE_PATH = > >> > > "/Users/sboussard/Desktop/apache-ctakes-4.0.0/resources/ > >> > > org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper"; > >> > > > >> > > static private final String INPUT_FILE_PATH = > >> > > "/Users/sboussard/Desktop/apache-ctakes-4.0.0/resources/ > >> > > org/apache/ctakes/examples/notes/right_knee_arthroscopy"; > >> > > > >> > > private ClinicalProcessor() { > >> > > } > >> > > > >> > > public static void main( final String[] args ) { > >> > > System.out.println(PIPER_FILE_PATH); > >> > > > >> > > try { > >> > > // Create a piper file reader, but don't > load > >> > the > >> > > piper yet - we want to create a reader with parameters > >> > > final PiperFileReader reader = new > >> > > PiperFileReader(); > >> > > final PipelineBuilder builder = > >> > > reader.getBuilder(); > >> > > // Add the Lines from File reader > >> > > //final File inputFile = > >> FileLocator.locateFile( > >> > > INPUT_FILE_PATH ); > >> > > //final File inputFile = > FileLocator.getFile( > >> > > INPUT_FILE_PATH ); > >> > > final File inputFile = new > >> > File("/Users/sboussard/ > >> > > Desktop/ClampMac_1.6.0/workspace/MyPipeline/clamp- > >> > > ner/Data/Input/sample_2788.txt"); > >> > > builder.reader( > >> LinesFromFileCollectionReader. > >> > > class, > >> > > > >> > LinesFromFileCollectionReader.PARAM_INPUT_FILE_NAME, > >> > > inputFile.getAbsolutePath() ); > >> > > // Add the lines from the piper file > >> > > reader.loadPipelineFile( PIPER_FILE_PATH ); > >> > > // Collect IdentifiedAnnotation object > >> > > information for output - simple for examples > >> > > builder.collectEntities(); > >> > > // Run the pipeline with specified text > >> > > builder.run(); > >> > > // Log the IdentifiedAnnotation object > >> > information > >> > > LOGGER.info( "\n" + > >> > EntityCollector.getInstance().toString() > >> > > ); > >> > > } catch ( IOException | UIMAException multE ) > { > >> > > LOGGER.error( multE.getMessage() ); > >> > > } > >> > > } > >> > > > >> > > > >> > > } > >> > > > >> > > Thank you for all your help, > >> > > Sebastien Boussard > >> > > > >> > > > On Aug 10, 2019, at 3:00 AM, gandhi rajan < > gandhiraja...@gmail.com> > >> > > wrote: > >> > > > > >> > > > As far as I know, it's a more generic error. Could you please let > us > >> > know > >> > > > what action you are trying to perform and steps involved in > >> reproducing > >> > > the > >> > > > issue. > >> > > > > >> > > > On Saturday, August 10, 2019, Sebastien Boussard <bouss...@bu.edu > > > >> > > wrote: > >> > > > > >> > > >> Hello, > >> > > >> I’m an intern in the Stanford Biomedical Informatics Lab and I've > >> been > >> > > >> working on getting a ctakes page for a week, and I’ve been > getting > >> a > >> > > lot of > >> > > >> errors. I have been getting a filed to initialize error for the > >> last > >> > day > >> > > >> and a half and I can not solve it. I will send you the whole log, > >> if > >> > you > >> > > >> can help me out it would be greatly appreciated. > >> > > >> > >> > > >> log4j: reset attribute= "false". > >> > > >> log4j: Threshold ="null". > >> > > >> log4j: Retreiving an instance of org.apache.log4j.Logger. > >> > > >> log4j: Setting [ProgressAppender] additivity to [false]. > >> > > >> log4j: Level value for ProgressAppender is [INFO]. > >> > > >> log4j: ProgressAppender level set to INFO > >> > > >> log4j: Class name: [org.apache.log4j.ConsoleAppender] > >> > > >> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" > >> > > >> log4j: Setting property [conversionPattern] to [%m]. > >> > > >> log4j: Adding appender named [noEolAppender] to category > >> > > >> [ProgressAppender]. > >> > > >> log4j: Retreiving an instance of org.apache.log4j.Logger. > >> > > >> log4j: Setting [ProgressDone] additivity to [false]. > >> > > >> log4j: Level value for ProgressDone is [INFO]. > >> > > >> log4j: ProgressDone level set to INFO > >> > > >> log4j: Class name: [org.apache.log4j.ConsoleAppender] > >> > > >> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" > >> > > >> log4j: Setting property [conversionPattern] to [%m%n]. > >> > > >> log4j: Adding appender named [eolAppender] to category > >> [ProgressDone]. > >> > > >> log4j: Level value for root is [INFO]. > >> > > >> log4j: root level set to INFO > >> > > >> log4j: Class name: [org.apache.log4j.ConsoleAppender] > >> > > >> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" > >> > > >> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy > >> > HH:mm:ss} > >> > > >> %5p %c{1} - %m%n]. > >> > > >> log4j: Adding appender named [consoleAppender] to category > [root]. > >> > > >> /Users/sboussard/Desktop/apache-ctakes-4.0.0/resources/ > >> > > >> org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper > >> > > >> 09 Aug 2019 11:28:50 INFO SentenceDetector - Sentence detector > >> model > >> > > >> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip > >> > > >> 09 Aug 2019 11:28:50 INFO TokenizerAnnotatorPTB - Initializing > >> > > >> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB > >> > > >> 09 Aug 2019 11:28:50 INFO ContextDependentTokenizerAnnotator - > >> Finite > >> > > >> state machines loaded. > >> > > >> 09 Aug 2019 11:28:50 INFO POSTagger - POS tagger model file: > >> > > >> org/apache/ctakes/postagger/models/mayo-pos.zip > >> > > >> 09 Aug 2019 11:28:51 INFO Chunker - Chunker model file: > >> > > >> org/apache/ctakes/chunker/models/chunker-model.zip > >> > > >> 09 Aug 2019 11:28:52 INFO AbstractJCasTermAnnotator - Using > >> > dictionary > >> > > >> lookup window type: > >> > org.apache.ctakes.typesystem.type.textspan.Sentence > >> > > >> 09 Aug 2019 11:28:52 INFO AbstractJCasTermAnnotator - Exclusion > >> > tagset > >> > > >> loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD > >> VBG > >> > > VBN > >> > > >> VBP VBZ WDT WP WPS WRB > >> > > >> 09 Aug 2019 11:28:52 INFO AbstractJCasTermAnnotator - Using > >> minimum > >> > > term > >> > > >> text span: 3 > >> > > >> 09 Aug 2019 11:28:52 INFO AbstractJCasTermAnnotator - Using > >> > Dictionary > >> > > >> Descriptor: > >> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml > >> > > >> 09 Aug 2019 11:28:52 INFO DictionaryDescriptorParser - Parsing > >> > > dictionary > >> > > >> specifications: > >> > > >> 09 Aug 2019 11:28:52 INFO UmlsUserApprover - Checking UMLS > >> Account at > >> > > >> > >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5TDSCM85vULZYZcSTh2NL3qaVFV_2sJkBfV7zPV4StI&s=7C7YUGjMyzZq1eabffg_1uxCewLyf619heJ6Xbm84aQ&e= > >> for user boussard: > >> > > >> ..09 Aug 2019 11:28:53 INFO UmlsUserApprover - UMLS Account at > >> > > >> > >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5TDSCM85vULZYZcSTh2NL3qaVFV_2sJkBfV7zPV4StI&s=7C7YUGjMyzZq1eabffg_1uxCewLyf619heJ6Xbm84aQ&e= > >> for user boussard > >> > > has > >> > > >> been validated > >> > > >> > >> > > >> 09 Aug 2019 11:28:53 ERROR ClinicalProcessor - Initialization of > >> > > annotator > >> > > >> class "org.apache.ctakes.dictionary.lookup2.ae. > >> > > DefaultJCasTermAnnotator" > >> > > >> failed. (Descriptor: <unknown>) > >> > > >> > >> > > >> > >> > > > > >> > > > -- > >> > > > Regards, > >> > > > Gandhi > >> > > > > >> > > > "The best way to find urself is to lose urself in the service of > >> others > >> > > !!!" > >> > > > >> > > > >> > > >> > -- > >> > Regards, > >> > Gandhi > >> > > >> > "The best way to find urself is to lose urself in the service of > others > >> > !!!" > >> > > >> > > >