Hi!

On Tue, Apr 22, 2014 at 05:10:56PM -0400, Marshall Schor wrote:
> If you plan on running your pipeline in one JVM (rather than having it scaled
> out over multiple JVMs), you can consider using an external resource which 
> would
> be a plain Java Set<String> of the unique covered text so far found.  Then, in
> the annotator (or annotators) that are adding new FeatureStructures 
> representing
> the possibly duplication annotation, you can first check the shared resource 
> to
> see if its been already annotated, and if so, skip both creating the 
> additional
> FeatureStructure, and adding it to the indexes.
> 
> Would that work for your use case?

  That's an interesting approach, thanks for the suggestion.  While I
could do it this way now, I plan to scale out my setup to multiple
machines in the future and this solution would become inconvenient
then.  For the time being, I have simply loaded all the FSes to a
coveredText-addressed map and then removed duplicates.

                                Petr "Pasky" Baudis

Reply via email to