Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

Miller, Timothy Mon, 02 Oct 2017 06:46:08 -0700

Thanks Alex,
This code is for processing a clinical text data corpus stored as a
lucene index -- data that cannot be redistributed for privacy reasons.
Since it's so related to the coref stuff I thought it should go
alongside the coreference module. But maybe it makes more sense as an
external project since it can't really function without externally
created resources -- what do you think?
Tim



On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote:
> Hi,
> 
> I was trying to do a UTest for the
> org.apache.ctakes.coreference.data.PrintMimicMarkables (recently
> added),
> but I couldn't find any of the existing resources that can be used
> for
> this. Can anyone help me pointing to a resource (Lucene index)
> folder.
> 
> org.apache.ctakes.coreference.data.PrintMimicMarkables \
> 
> /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup-
> res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index
> \
>     index.out
> 
> I was trying with the following lucene folder/resource:
> ./ctakes-coreference-
> res/src/main/resources/org/apache/ctakes/coreference/models/index_med
> _5k
> 
> And also the dictionaries:
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> like_codes_sample
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_
> cue_phrase_index
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> like_sample
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index
> 
> Any execution looks like:
> 01 Oct 2017 19:50:19  INFO ConstituencyParser - Initializing
> parser...
> Oct 01, 2017 7:50:20 PM
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process
> WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::)
> Message:
> docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> Oct 01, 2017 7:50:20 PM
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820)
> WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> java.lang.IllegalArgumentException: docID must be >= 0 and <
> maxDoc=5000
> (got docID=5000)
> at
> org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite
> Reader.java:152)
> at
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea
> der.java:115)
> at org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
> at
> org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec
> tionReader.java:90)
> at
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext(
> ArtifactProducer.java:494)
> at
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif
> actProducer.java:711)
> 
> Collection process complete called, closing file writer.
> 
> I appreciate any of your help,
> Alex

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

Reply via email to