You can find all kinds of background information on the web with a search like 
"nlp tokenization".  You can look at 
org.apache.ctakes.gui.dictionary.util.TextTokenizer in the ctakes-gui module to 
see how the dictionary creator does it.  You can run .getTokenizedText( text ) 
to get a tokenized string or .getTokens( text ) to get a list of words.  
Apparently I was lazy and didn't write javadocs ...

Sean

-----Original Message-----
From: Jeff Headley [mailto:[email protected]] 
Sent: Tuesday, October 03, 2017 9:19 AM
To: [email protected]
Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]

Thanks Sean. Not quite, sorry for the confusion. We keep the default dictionary 
hsqldb. We just empty the CUI_TERMS, RXNORM, PREFTERM, and TUI tables and move 
over data from a sql server db. I don't seem to recall doing anything with a 
tcount column. I'll have to check our code tonight.
That could very well be it. So maybe the old ctakes had a bug and this should 
not have been working to begin with. Got anywhere I could read about the 
tokenizing rules and calculating the tcount value? Or maybe a java class I 
could look at?

Jeff


On Tue, Oct 3, 2017 at 9:07 AM, Finan, Sean < [email protected]> 
wrote:

> Ok, let me see if I understand your current setup:
>
> Ctakes 4.0 fast lookup,
> Dictionary configuration file points to an sql server, Sql server uses 
> cui_terms  (cui, rword, rindex, tcount, text) and perhaps other 
> secondary tables ...
>
> Now that I write out the column names, I have a thought.  Is it 
> possible that for some term the number in tcount does not match the 
> number of non-whitespace 'words' in the text column?  If those numbers 
> are off then you will have problems similar to the one that you are seeing.
> If you are populating your own table you need to make sure that the 
> text is being properly tokenized.  For instance, the term "alpha-beta" 
> should have text "alpha - beta" with tcount 3.  There are some 
> exceptions to the dash -separation rule and a few oddities.
>
> Sean
>
> -----Original Message-----
> From: Jeff Headley [mailto:[email protected]]
> Sent: Tuesday, October 03, 2017 8:52 AM
> To: [email protected]
> Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
>
> I updated our pom to use the same hsqldb version as what I saw in the 
> ctakes lib folder. The data coming in is from a SQL Server database.
>
> On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean < 
> [email protected]> wrote:
>
> > Hi Jeff,
> >
> > I don't think that a custom dictionary should cause a null pointer 
> > exception on that line unless you have an odd null character in text 
> > or something of that ilk.
> >
> > One thing that changed in ctakes 4.0 is the version of hsqldb that 
> > is being used for the dictionary database.  I don’t know if that has 
> > anything to do with your problem, but it may be causing others.
> > What is the source of your custom dictionary?  There may be a better 
> > way to populate a database.
> >
> > Sean
> >
> > -----Original Message-----
> > From: Jeff Headley [mailto:[email protected]]
> > Sent: Tuesday, October 03, 2017 12:53 AM
> > To: [email protected]
> > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator 
> > [EXTERNAL]
> >
> > Thank you Sean. That helped to figure out what we did. Not quite 
> > sure where we went wrong but now at least we know the cause. So a 
> > long time ago in our project using ctakes, we emptied out the tables 
> > CUI_TERMS, RXNORM, PREFTERM, and TUI and then loaded them with the 
> > values we wanted. Worked great. Now in the new version the
> > /desc/ctakes-clinical-
> > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xm
> > l
> > engine seems to be
> > using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> > 16ab/sno_rx_16ab
> > and that seems to be where things went sideways. If I don't mess 
> > with the db and keep the original, no errors.
> >
> > So somewhere in this if statement at line 102 in
> DefaultJCASTermAnnotator:
> > if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() )
> >               || hitTokens[ hit ].equals( allTokens.get( i
> > ).getVariant() )
> > ) {
> >
> > It's expecting to not ever have a null and I suspect we are leaving 
> > something null somewhere that really shouldn't have nulls. If it's 
> > obvioius where I've went wrong, the assistance would be appreciated.
> > Otherwise, I'll get it figured out eventually. I suspect it's 
> > possibly because we never did anything with the SNOMEDCT_US in the prior 
> > version.
> >
> > On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean < 
> > [email protected]> wrote:
> >
> > > Hi Jeff,
> > >
> > > I have no problem running on your example "DIDANOSINE, 250MG (PO 
> > > Capsule Delayed Release)" or any other text.
> > >
> > > I don't know how you  are running ctakes through
> > com.clientproject.ctakes.
> > > processors.CommandLineProcessor, so I don't know how closely the 
> > > standard pipeline approximates yours.
> > >
> > > Sean
> > >
> > > -----Original Message-----
> > > From: Jeff Headley [mailto:[email protected]]
> > > Sent: Sunday, October 01, 2017 11:31 PM
> > > To: [email protected]
> > > Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
> > >
> > > After upgrading our project to version 4, we are getting a NPE 
> > > from
> > cTAKES.
> > > The text that was being processed was DIDANOSINE, 250MG (PO 
> > > Capsule Delayed Release), though it seems to be happening to us no 
> > > matter what text we submit.  The stack trace is below. Any help 
> > > would be appreciated as I'm at a loss at to what we might be doing 
> > > wrong if this
> > is not a bug in cTAKES.
> > >
> > > Thank you,
> > > Jeff
> > >
> > > Oct 01, 2017 11:10:16 PM
> > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl
> > > processAndOutputNewCASes(273)
> > > SEVERE: Exception occurred
> > > org.apache.uima.analysis_engine.AnalysisEngineProcessException:
> > > Annotator processing failed.
> > > at
> > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:412
> > > )
> > > at
> > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > > processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314)
> > > at
> > > org.apache.uima.analysis_engine.asb.impl.ASB_impl$
> AggregateCasIterator.
> > > processUntilNextOutputCas(ASB_impl.java:570)
> > > at
> > > org.apache.uima.analysis_engine.asb.impl.ASB_impl$
> > > AggregateCasIterator.<init>(ASB_impl.java:412)
> > > at
> > > org.apache.uima.analysis_engine.asb.impl.ASB_impl.
> > > process(ASB_impl.java:344)
> > > at
> > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.
> > > processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
> > > at
> > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.proces
> > > s(
> > > AnalysisEngineImplBase.java:269)
> > > at
> > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.proces
> > > s(
> > > AnalysisEngineImplBase.java:284)
> > > at
> > > com.clientproject.ctakes.processors.CommandLineProcessor.processLi
> > > ne
> > > (
> > > CommandLineProcessor.java:163)
> > > at
> > > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.
> > > java:1374)
> > > at
> > > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.
> > > java:580)
> > > at
> > > com.clientproject.ctakes.processors.CommandLineProcessor.run(
> > > CommandLineProcessor.java:114)
> > > at com.clientproject.ctakes.App.main(App.java:109)
> > > Caused by: java.lang.NullPointerException at 
> > > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator.
> > > isTermMatch(DefaultJCasTermAnnotator.java:102)
> > > at
> > > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator.
> > > findTerms(DefaultJCasTermAnnotator.java:79)
> > > at
> > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.
> > > findTerms(AbstractJCasTermAnnotator.java:236)
> > > at
> > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.
> > > processWindow(AbstractJCasTermAnnotator.java:219)
> > > at
> > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.
> > > pr
> > > oc
> > > ess(
> > > AbstractJCasTermAnnotator.java:156)
> > > at
> > > org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(
> > > JCasAnnotator_ImplBase.java:48)
> > > at
> > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:396
> > > )
> > > ... 12 more
> > >
> >
>

Reply via email to