That's great Sean. Thanks for all the help. On Tue, Oct 3, 2017 at 9:37 AM, Finan, Sean < [email protected]> wrote:
> You can find all kinds of background information on the web with a search > like "nlp tokenization". You can look at > org.apache.ctakes.gui.dictionary.util.TextTokenizer > in the ctakes-gui module to see how the dictionary creator does it. You > can run .getTokenizedText( text ) to get a tokenized string or .getTokens( > text ) to get a list of words. Apparently I was lazy and didn't write > javadocs ... > > Sean > > -----Original Message----- > From: Jeff Headley [mailto:[email protected]] > Sent: Tuesday, October 03, 2017 9:19 AM > To: [email protected] > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL] > > Thanks Sean. Not quite, sorry for the confusion. We keep the default > dictionary hsqldb. We just empty the CUI_TERMS, RXNORM, PREFTERM, and TUI > tables and move over data from a sql server db. I don't seem to recall > doing anything with a tcount column. I'll have to check our code tonight. > That could very well be it. So maybe the old ctakes had a bug and this > should not have been working to begin with. Got anywhere I could read about > the tokenizing rules and calculating the tcount value? Or maybe a java > class I could look at? > > Jeff > > > On Tue, Oct 3, 2017 at 9:07 AM, Finan, Sean < > [email protected]> wrote: > > > Ok, let me see if I understand your current setup: > > > > Ctakes 4.0 fast lookup, > > Dictionary configuration file points to an sql server, Sql server uses > > cui_terms (cui, rword, rindex, tcount, text) and perhaps other > > secondary tables ... > > > > Now that I write out the column names, I have a thought. Is it > > possible that for some term the number in tcount does not match the > > number of non-whitespace 'words' in the text column? If those numbers > > are off then you will have problems similar to the one that you are > seeing. > > If you are populating your own table you need to make sure that the > > text is being properly tokenized. For instance, the term "alpha-beta" > > should have text "alpha - beta" with tcount 3. There are some > > exceptions to the dash -separation rule and a few oddities. > > > > Sean > > > > -----Original Message----- > > From: Jeff Headley [mailto:[email protected]] > > Sent: Tuesday, October 03, 2017 8:52 AM > > To: [email protected] > > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL] > > > > I updated our pom to use the same hsqldb version as what I saw in the > > ctakes lib folder. The data coming in is from a SQL Server database. > > > > On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean < > > [email protected]> wrote: > > > > > Hi Jeff, > > > > > > I don't think that a custom dictionary should cause a null pointer > > > exception on that line unless you have an odd null character in text > > > or something of that ilk. > > > > > > One thing that changed in ctakes 4.0 is the version of hsqldb that > > > is being used for the dictionary database. I don’t know if that has > > > anything to do with your problem, but it may be causing others. > > > What is the source of your custom dictionary? There may be a better > > > way to populate a database. > > > > > > Sean > > > > > > -----Original Message----- > > > From: Jeff Headley [mailto:[email protected]] > > > Sent: Tuesday, October 03, 2017 12:53 AM > > > To: [email protected] > > > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator > > > [EXTERNAL] > > > > > > Thank you Sean. That helped to figure out what we did. Not quite > > > sure where we went wrong but now at least we know the cause. So a > > > long time ago in our project using ctakes, we emptied out the tables > > > CUI_TERMS, RXNORM, PREFTERM, and TUI and then loaded them with the > > > values we wanted. Worked great. Now in the new version the > > > /desc/ctakes-clinical- > > > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xm > > > l > > > engine seems to be > > > using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_ > > > 16ab/sno_rx_16ab > > > and that seems to be where things went sideways. If I don't mess > > > with the db and keep the original, no errors. > > > > > > So somewhere in this if statement at line 102 in > > DefaultJCASTermAnnotator: > > > if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() ) > > > || hitTokens[ hit ].equals( allTokens.get( i > > > ).getVariant() ) > > > ) { > > > > > > It's expecting to not ever have a null and I suspect we are leaving > > > something null somewhere that really shouldn't have nulls. If it's > > > obvioius where I've went wrong, the assistance would be appreciated. > > > Otherwise, I'll get it figured out eventually. I suspect it's > > > possibly because we never did anything with the SNOMEDCT_US in the > prior version. > > > > > > On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean < > > > [email protected]> wrote: > > > > > > > Hi Jeff, > > > > > > > > I have no problem running on your example "DIDANOSINE, 250MG (PO > > > > Capsule Delayed Release)" or any other text. > > > > > > > > I don't know how you are running ctakes through > > > com.clientproject.ctakes. > > > > processors.CommandLineProcessor, so I don't know how closely the > > > > standard pipeline approximates yours. > > > > > > > > Sean > > > > > > > > -----Original Message----- > > > > From: Jeff Headley [mailto:[email protected]] > > > > Sent: Sunday, October 01, 2017 11:31 PM > > > > To: [email protected] > > > > Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL] > > > > > > > > After upgrading our project to version 4, we are getting a NPE > > > > from > > > cTAKES. > > > > The text that was being processed was DIDANOSINE, 250MG (PO > > > > Capsule Delayed Release), though it seems to be happening to us no > > > > matter what text we submit. The stack trace is below. Any help > > > > would be appreciated as I'm at a loss at to what we might be doing > > > > wrong if this > > > is not a bug in cTAKES. > > > > > > > > Thank you, > > > > Jeff > > > > > > > > Oct 01, 2017 11:10:16 PM > > > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl > > > > processAndOutputNewCASes(273) > > > > SEVERE: Exception occurred > > > > org.apache.uima.analysis_engine.AnalysisEngineProcessException: > > > > Annotator processing failed. > > > > at > > > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > > > > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:412 > > > > ) > > > > at > > > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > > > > processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314) > > > > at > > > > org.apache.uima.analysis_engine.asb.impl.ASB_impl$ > > AggregateCasIterator. > > > > processUntilNextOutputCas(ASB_impl.java:570) > > > > at > > > > org.apache.uima.analysis_engine.asb.impl.ASB_impl$ > > > > AggregateCasIterator.<init>(ASB_impl.java:412) > > > > at > > > > org.apache.uima.analysis_engine.asb.impl.ASB_impl. > > > > process(ASB_impl.java:344) > > > > at > > > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl. > > > > processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) > > > > at > > > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.proces > > > > s( > > > > AnalysisEngineImplBase.java:269) > > > > at > > > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.proces > > > > s( > > > > AnalysisEngineImplBase.java:284) > > > > at > > > > com.clientproject.ctakes.processors.CommandLineProcessor.processLi > > > > ne > > > > ( > > > > CommandLineProcessor.java:163) > > > > at > > > > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList. > > > > java:1374) > > > > at > > > > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline. > > > > java:580) > > > > at > > > > com.clientproject.ctakes.processors.CommandLineProcessor.run( > > > > CommandLineProcessor.java:114) > > > > at com.clientproject.ctakes.App.main(App.java:109) > > > > Caused by: java.lang.NullPointerException at > > > > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator. > > > > isTermMatch(DefaultJCasTermAnnotator.java:102) > > > > at > > > > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator. > > > > findTerms(DefaultJCasTermAnnotator.java:79) > > > > at > > > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator. > > > > findTerms(AbstractJCasTermAnnotator.java:236) > > > > at > > > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator. > > > > processWindow(AbstractJCasTermAnnotator.java:219) > > > > at > > > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator. > > > > pr > > > > oc > > > > ess( > > > > AbstractJCasTermAnnotator.java:156) > > > > at > > > > org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process( > > > > JCasAnnotator_ImplBase.java:48) > > > > at > > > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > > > > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:396 > > > > ) > > > > ... 12 more > > > > > > > > > >
