RE: Question about the pipeline

Lingren, Todd Wed, 04 Feb 2015 06:35:25 -0800

Sean and Maite,
FWIW, I use CmdLineCpeRunner frequently. I employ it with a bash script to 
automatically create a new xml file based on the subfolder names contained in 
the target directory. So in our HPC, it spawns a new job for each subfolder 
(which may have between 5 and 2500 notes).


Todd Lingren
Biomedical Informatics
Cincinnati Children’s Hospital
todd.ling...@cchmc.org
513-803-9032


-----Original Message-----
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, February 03, 2015 2:47 PM
To: dev@ctakes.apache.org
Subject: RE: Question about the pipeline

Hi Maite,

RunCPE is a good find, and if it fits your bil hten you should use it.  But it 
(if you mean the yTex class) doesn't take input and output directories from the 
command line.  It does take the path to a CPE.xml file.  There is a cTakes 
(non-yTex) equivalent named CmdLineCpeRunner.  Either one of them should print 
a usage if you run it without arguments.  As the CmdLineCpeRunner indicates, 
you can create a cpe .xml file with the cpe gui.  Basically, start the cpe gui, 
select your input (reader), output (writer) and pipeline (ae) in the gui and 
then save the cpe descriptor (via the menubar).  You can exit the gui and run 
either one of the cmd line utilities with the path to that cpe .xml descriptor 
as the argument.  Please note: sometimes you have to explicitly type ".xml" in 
the filename when saving with the cpe gui.  If you run with the cpe gui and 
then exit it should automatically ask you if you want to save the cpe .xml 
descriptor.  Anyway, once you have the .xml file you can always edit the input 
and output paths in that file to change your run parameters.  

Sean

-----Original Message-----
From: Maite Meseure Hugues [mailto:meseure.ma...@gmail.com]
Sent: Tuesday, February 03, 2015 9:01 AM
To: dev@ctakes.apache.org
Subject: Re: Question about the pipeline

Thanks a lot Sean for your detailed reply. I've also found RunCPE.java that 
allows to put the input and outpur directories in arguments in the environment 
and do the same job than the CPE-GUI -at least in Eclipse, I haven't managed to 
run it via the command line yet.

On Mon, Feb 2, 2015 at 7:12 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

> Hi Tol (and Maite),
>
> I'm not entirely certain that I understand the question, but here is 
> an attempt to help.  If I'm oversimplifying then I apologize.
>
> I think that ExampleAggregatePipeline is intended to represent a very 
> simple single-note pipeline and that custom code could be produced by 
> using it as an example.
>
> If you want to process texts in a directory, you can find with a web 
> search plenty of ways to list files in a directory and read text from 
> files.  org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
> might be what you used in the CPE, and you can certainly peruse the 
> code and take what you need.  Or, if you decide to write a simple diy, 
> here is one
> possibility:
>
> Static public Collection<File> getFilesInDir( final File directory ) {
>    final Collection<File> fileList = new ArrayList<>();
>    final File[] fileList = directory.listFiles();
>    if ( fileList == null ) {
>       System.err.println( "please check the directory " +
> directory.getAbsolutePath() );
>       System.exit( 1 );
>    }
>     for ( final File file : directory.listFiles() ) {
>         if ( file.canRead() ) {
>             fileList.add( file );
>         }
>     }
> }
>
> Static public String getTextInFile( final File file ) throws IOException
> {   -- or handle ioE herein
>    final Path nioPath = file.toPath();
>    return new String( Files.readAllBytes( nioPath ) ); }
>
> Static public void main( String ... args ) {
>    If ( args[0].isEmpty() ) {
>       System.out.println( "Enter a directory path" );
>       System.exit( 0 );
>    }
>    Final Collection<File> files = getFilesInDir( new File( args[0] );
>    For ( File file : files ) {
>       Final String note = getTextInFile( file );
>       ---  Insert here code a' la ExampleAggregatePipeline  ---
>       ---  swap out the writer in ExampleAggregatePipeline with 
> CasIOUtil method (below)  ---
>    }
> }
>
> I must admit that I have never directly used it, but there is an xmi 
> file writing method in org.apache.uima.fit.util.CasIOUtil named 
> writeXmi( JCas jCas, File file ).  You could give this a try and see 
> if it produces the type of output that you want.  The same utility 
> class has a writeXCas(..) method.
>
>
> If the above has absolutely nothing to do with your needs then please 
> send me a bulleted list of items, example workflow, etc. and I'll see 
> if I can be of service.
>
> Oh, and I wrote the above code freehand, so MS Outlook is adding 
> capital letters, etc.  If you cut and paste you'll need to change that
> - plus I haven't run/compiled, so there might be a typo or missed 
> exception or something.  Or it may not work (in which case I'll throw 
> in a little more effort).
>
> Sean
>
>
> -----Original Message-----
> From: Tol O. [mailto:tol...@gmail.com]
> Sent: Monday, February 02, 2015 6:56 PM
> To: dev@ctakes.apache.org
> Subject: Re: Question about the pipeline
>
> Maite Meseure Hugues <meseure.maite@...> writes:
>
> >
> > Hello all,
> >
> > Thank you for your preceding answers.
> > I have a few questions regarding the pipeline example to run cTakes 
> > programmatically.
> > I am running ExampleAggregatePipeline.java with 
> > ExampleHelloWorldAnnotator but I would like to know how I can change 
> > it to run my data, as the CPE where we can choose the directory of 
> > our
> data.
> > My second question is about the xml output generated with the CPE, 
> > can I get the same xml output in using the example pipeline? and How?
> > Thanks for your time.
>
>
> I would like to ask the same question. After successfully setting up 
> CTAKES following the Developers Guide I would also like to use a 
> modified ExampleAggregatePipeline to output a CAS file identical to 
> the output obtained by the CPE or the CVD when following the Users Guide.
>
> This would be a great help for developers as a starting class to be 
> able to programmatically obtain an annotated file based on a plaintext 
> or XML input, same as through the two GUIs.
>
> Right now I am reading through the Component Use Guide to replicate 
> the CPE or the CVD tutorial with the test input, but it is a bit overwhelming.
>
> Any pointers or suggestions would be really appreciated.
>
> Tol O.
>
>


--
--
 Maïté Meseure Hugues

RE: Question about the pipeline

Reply via email to