-- You could also do this, which should move the files that have been completed 
when ctakes exits. It only requires the one class to be added at the end of the 
pipeline.

public class FileDoneWriter extends AbstractJCasFileWriter {
   public void writeFile( JCas jCas,
                          String outputDir,
                          String documentId,
                          String fileName ) throws IOException {
      String sourceFilePath = getSourceFilePath( jCas );
      try ( Writer writer = new BufferedWriter( new FileWriter( 
"path/to/my/files/done/list.txt", true ) ) ) {
         writer.write( sourceFilePath + "\n" );
      }
   }

 // Registers a shutdown hook to move source files even if you kill the process.
public void initialize( final UimaContext context ) throws 
ResourceInitializationException {
      super.initialize( context );
        Runtime.getRuntime().addShutdownHook( new Thread( () -> {
          Files.lines( new File("path/to/my/files/done/list.txt").toPath() )
           .map( p -> new File( p ) )
           .foreach( f -> f.renameTo( "my/done/dir/"+f.getName() );
      } ) );
   }
}

________________________________________
From: Finan, Sean <sean.fi...@childrens.harvard.edu>
Sent: Tuesday, February 2, 2021 11:44 PM
To: dev@ctakes.apache.org; Akram
Subject: Re: cTAKES to move files after it finishes [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


You could add an annotator to the end of your pipeline that persistently marks 
a document as read, and then a "first" annotator that redirects anything done 
to some "null device"

For instance:
// Out this at the end of the pipeline and it will write a list of files that 
have been processed to the file path/to/my/files/done/list.txt
public class FileDoneWriter extends AbstractJCasFileWriter {
   public void writeFile( JCas jCas,
                          String outputDir,
                          String documentId,
                          String fileName ) throws IOException {
      String sourceFilePath = getSourceFilePath( jCas );
      try ( Writer writer = new BufferedWriter( new FileWriter( 
"path/to/my/files/done/list.txt", true ) ) ) {
         writer.write( sourceFilePath + "\n" );
      }
   }
}

// Put this at the very beginning of the pipeline and it will read a list of 
files that have been processed from path/to/my/files/done/list.txt
// If the current file is in that list then the JCas should be reset and all 
output will go to some null files.  The JCas will be empty so processing should 
be fast.
public class IsFileDoneChecker extends JCasAnnotator_ImplBase {
   static private Collection<String> ALREADY_DONE = Files.lines( new 
File("path/to/my/files/done/list.txt").toPath() ).collect( Collectors.toSet() );
   public void process( JCas jCas ) throws AnalysisEngineProcessException {
        String sourceFilePath = getSourceFilePath( jCas );
        if ( !sourceFilePath.isEmpty() && ALREADY_DONE.contains( sourceFilePath 
) ) {
            new JCasBuilder().setDocId( "none" ).setDocPath( "path/to/nothing" 
).rebuild( jCas );
        }
    }
   protected String getSourceFilePath( final JCas jCas ) {
      final Collection<DocumentPath> documentPaths = JCasUtil.select( jCas, 
DocumentPath.class );
      if ( documentPaths == null || documentPaths.isEmpty() ) {
         return "";
      }
      for ( DocumentPath documentPath : documentPaths ) {
         final String path = documentPath.getDocumentPath();
         if ( path != null && !path.isEmpty() ) {
            return path;
         }
      }
      return "";
   }
}

________________________________________
From: Akram <as...@yahoo.com.INVALID>
Sent: Tuesday, February 2, 2021 10:45 PM
To: dev@ctakes.apache.org
Subject: cTAKES to move files after it finishes [EXTERNAL]

* External Email - Caution *


Hi All..
The new version of cTAKES
cTAKES runs perfectly fine.
Sometime I have to stop cTAKES in the middle for some reason
and when I re-run it again it starts all over and re-do same files were done 
already.
I wonder if there is a method/config/command that force cTAKES to move finished 
files to another folder?
If so, how to do it.
Thanks

Reply via email to