That's great Sean. Thanks for all the help.

On Tue, Oct 3, 2017 at 9:37 AM, Finan, Sean <
[email protected]> wrote:

> You can find all kinds of background information on the web with a search
> like "nlp tokenization".  You can look at 
> org.apache.ctakes.gui.dictionary.util.TextTokenizer
> in the ctakes-gui module to see how the dictionary creator does it.  You
> can run .getTokenizedText( text ) to get a tokenized string or .getTokens(
> text ) to get a list of words.  Apparently I was lazy and didn't write
> javadocs ...
>
> Sean
>
> -----Original Message-----
> From: Jeff Headley [mailto:[email protected]]
> Sent: Tuesday, October 03, 2017 9:19 AM
> To: [email protected]
> Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
>
> Thanks Sean. Not quite, sorry for the confusion. We keep the default
> dictionary hsqldb. We just empty the CUI_TERMS, RXNORM, PREFTERM, and TUI
> tables and move over data from a sql server db. I don't seem to recall
> doing anything with a tcount column. I'll have to check our code tonight.
> That could very well be it. So maybe the old ctakes had a bug and this
> should not have been working to begin with. Got anywhere I could read about
> the tokenizing rules and calculating the tcount value? Or maybe a java
> class I could look at?
>
> Jeff
>
>
> On Tue, Oct 3, 2017 at 9:07 AM, Finan, Sean <
> [email protected]> wrote:
>
> > Ok, let me see if I understand your current setup:
> >
> > Ctakes 4.0 fast lookup,
> > Dictionary configuration file points to an sql server, Sql server uses
> > cui_terms  (cui, rword, rindex, tcount, text) and perhaps other
> > secondary tables ...
> >
> > Now that I write out the column names, I have a thought.  Is it
> > possible that for some term the number in tcount does not match the
> > number of non-whitespace 'words' in the text column?  If those numbers
> > are off then you will have problems similar to the one that you are
> seeing.
> > If you are populating your own table you need to make sure that the
> > text is being properly tokenized.  For instance, the term "alpha-beta"
> > should have text "alpha - beta" with tcount 3.  There are some
> > exceptions to the dash -separation rule and a few oddities.
> >
> > Sean
> >
> > -----Original Message-----
> > From: Jeff Headley [mailto:[email protected]]
> > Sent: Tuesday, October 03, 2017 8:52 AM
> > To: [email protected]
> > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
> >
> > I updated our pom to use the same hsqldb version as what I saw in the
> > ctakes lib folder. The data coming in is from a SQL Server database.
> >
> > On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean <
> > [email protected]> wrote:
> >
> > > Hi Jeff,
> > >
> > > I don't think that a custom dictionary should cause a null pointer
> > > exception on that line unless you have an odd null character in text
> > > or something of that ilk.
> > >
> > > One thing that changed in ctakes 4.0 is the version of hsqldb that
> > > is being used for the dictionary database.  I don’t know if that has
> > > anything to do with your problem, but it may be causing others.
> > > What is the source of your custom dictionary?  There may be a better
> > > way to populate a database.
> > >
> > > Sean
> > >
> > > -----Original Message-----
> > > From: Jeff Headley [mailto:[email protected]]
> > > Sent: Tuesday, October 03, 2017 12:53 AM
> > > To: [email protected]
> > > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator
> > > [EXTERNAL]
> > >
> > > Thank you Sean. That helped to figure out what we did. Not quite
> > > sure where we went wrong but now at least we know the cause. So a
> > > long time ago in our project using ctakes, we emptied out the tables
> > > CUI_TERMS, RXNORM, PREFTERM, and TUI and then loaded them with the
> > > values we wanted. Worked great. Now in the new version the
> > > /desc/ctakes-clinical-
> > > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xm
> > > l
> > > engine seems to be
> > > using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> > > 16ab/sno_rx_16ab
> > > and that seems to be where things went sideways. If I don't mess
> > > with the db and keep the original, no errors.
> > >
> > > So somewhere in this if statement at line 102 in
> > DefaultJCASTermAnnotator:
> > > if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() )
> > >               || hitTokens[ hit ].equals( allTokens.get( i
> > > ).getVariant() )
> > > ) {
> > >
> > > It's expecting to not ever have a null and I suspect we are leaving
> > > something null somewhere that really shouldn't have nulls. If it's
> > > obvioius where I've went wrong, the assistance would be appreciated.
> > > Otherwise, I'll get it figured out eventually. I suspect it's
> > > possibly because we never did anything with the SNOMEDCT_US in the
> prior version.
> > >
> > > On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean <
> > > [email protected]> wrote:
> > >
> > > > Hi Jeff,
> > > >
> > > > I have no problem running on your example "DIDANOSINE, 250MG (PO
> > > > Capsule Delayed Release)" or any other text.
> > > >
> > > > I don't know how you  are running ctakes through
> > > com.clientproject.ctakes.
> > > > processors.CommandLineProcessor, so I don't know how closely the
> > > > standard pipeline approximates yours.
> > > >
> > > > Sean
> > > >
> > > > -----Original Message-----
> > > > From: Jeff Headley [mailto:[email protected]]
> > > > Sent: Sunday, October 01, 2017 11:31 PM
> > > > To: [email protected]
> > > > Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL]
> > > >
> > > > After upgrading our project to version 4, we are getting a NPE
> > > > from
> > > cTAKES.
> > > > The text that was being processed was DIDANOSINE, 250MG (PO
> > > > Capsule Delayed Release), though it seems to be happening to us no
> > > > matter what text we submit.  The stack trace is below. Any help
> > > > would be appreciated as I'm at a loss at to what we might be doing
> > > > wrong if this
> > > is not a bug in cTAKES.
> > > >
> > > > Thank you,
> > > > Jeff
> > > >
> > > > Oct 01, 2017 11:10:16 PM
> > > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl
> > > > processAndOutputNewCASes(273)
> > > > SEVERE: Exception occurred
> > > > org.apache.uima.analysis_engine.AnalysisEngineProcessException:
> > > > Annotator processing failed.
> > > > at
> > > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > > > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:412
> > > > )
> > > > at
> > > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > > > processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314)
> > > > at
> > > > org.apache.uima.analysis_engine.asb.impl.ASB_impl$
> > AggregateCasIterator.
> > > > processUntilNextOutputCas(ASB_impl.java:570)
> > > > at
> > > > org.apache.uima.analysis_engine.asb.impl.ASB_impl$
> > > > AggregateCasIterator.<init>(ASB_impl.java:412)
> > > > at
> > > > org.apache.uima.analysis_engine.asb.impl.ASB_impl.
> > > > process(ASB_impl.java:344)
> > > > at
> > > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.
> > > > processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
> > > > at
> > > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.proces
> > > > s(
> > > > AnalysisEngineImplBase.java:269)
> > > > at
> > > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.proces
> > > > s(
> > > > AnalysisEngineImplBase.java:284)
> > > > at
> > > > com.clientproject.ctakes.processors.CommandLineProcessor.processLi
> > > > ne
> > > > (
> > > > CommandLineProcessor.java:163)
> > > > at
> > > > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.
> > > > java:1374)
> > > > at
> > > > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.
> > > > java:580)
> > > > at
> > > > com.clientproject.ctakes.processors.CommandLineProcessor.run(
> > > > CommandLineProcessor.java:114)
> > > > at com.clientproject.ctakes.App.main(App.java:109)
> > > > Caused by: java.lang.NullPointerException at
> > > > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator.
> > > > isTermMatch(DefaultJCasTermAnnotator.java:102)
> > > > at
> > > > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator.
> > > > findTerms(DefaultJCasTermAnnotator.java:79)
> > > > at
> > > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.
> > > > findTerms(AbstractJCasTermAnnotator.java:236)
> > > > at
> > > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.
> > > > processWindow(AbstractJCasTermAnnotator.java:219)
> > > > at
> > > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.
> > > > pr
> > > > oc
> > > > ess(
> > > > AbstractJCasTermAnnotator.java:156)
> > > > at
> > > > org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(
> > > > JCasAnnotator_ImplBase.java:48)
> > > > at
> > > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.
> > > > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:396
> > > > )
> > > > ... 12 more
> > > >
> > >
> >
>

Reply via email to