Thanks Alex, This code is for processing a clinical text data corpus stored as a lucene index -- data that cannot be redistributed for privacy reasons. Since it's so related to the coref stuff I thought it should go alongside the coreference module. But maybe it makes more sense as an external project since it can't really function without externally created resources -- what do you think? Tim
On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote: > Hi, > > I was trying to do a UTest for the > org.apache.ctakes.coreference.data.PrintMimicMarkables (recently > added), > but I couldn't find any of the existing resources that can be used > for > this. Can anyone help me pointing to a resource (Lucene index) > folder. > > org.apache.ctakes.coreference.data.PrintMimicMarkables \ > > /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup- > res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index > \ > index.out > > I was trying with the following lucene folder/resource: > ./ctakes-coreference- > res/src/main/resources/org/apache/ctakes/coreference/models/index_med > _5k > > And also the dictionaries: > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed- > like_codes_sample > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_ > cue_phrase_index > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed- > like_sample > ./ctakes-dictionary-lookup- > res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index > > Any execution looks like: > 01 Oct 2017 19:50:19 INFO ConstituencyParser - Initializing > parser... > Oct 01, 2017 7:50:20 PM > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process > WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::) > Message: > docID must be >= 0 and < maxDoc=5000 (got docID=5000) > Oct 01, 2017 7:50:20 PM > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820) > WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000) > java.lang.IllegalArgumentException: docID must be >= 0 and < > maxDoc=5000 > (got docID=5000) > at > org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite > Reader.java:152) > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea > der.java:115) > at org.apache.lucene.index.IndexReader.document(IndexReader.java:436) > at > org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec > tionReader.java:90) > at > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext( > ArtifactProducer.java:494) > at > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif > actProducer.java:711) > > Collection process complete called, closing file writer. > > I appreciate any of your help, > Alex
