Hopefully the speed difference will be negligible. It only makes the conversion at two times: 1. When internally storing a custom dictionary, 2. When storing discovered CUIs in the cas. Since custom dictionaries are only read once #1 shouldn’t have any real impact. #2 should require an execution per unique cui in the document, so if there are 100 cuis per doc * 10,000,000 docs it will probably add up to a few seconds – minor in the grande scheme of things. However, there may be a situation that I’m missing. There shouldn’t be any impact upon accuracy as the adjustments occur completely outside the lookup loop.
From: britt fitch [mailto:[email protected]] Sent: Friday, July 10, 2015 5:57 PM To: [email protected] Subject: Re: dictionary-look-fast fails to handle alternative CUIs No issues so far. I think you are already handling the 1 edge case I could come up with which was if the numeral portion of the code started with a 0 and it 0 was lost during the divide step but it looks like you are inserting leading zeros to the numeral portion if needed with digitCount. I’ll definitely report back if I notice any performance change given the new logic though. Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 http://wiredinformatics.com [email protected]<mailto:[email protected]> On Jul 10, 2015, at 5:31 PM, Finan, Sean <[email protected]<mailto:[email protected]>> wrote: Great, thanks. Any issues or concerns? Possible enhancements? Like the source, I’m open to change … From: britt fitch [mailto:[email protected]] Sent: Friday, July 10, 2015 5:29 PM To: [email protected]<mailto:[email protected]> Subject: Re: dictionary-look-fast fails to handle alternative CUIs Thanks, just finished testing and closed the ticket. Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 http://wiredinformatics.com [email protected]<mailto:[email protected]<mailto:[email protected]%3cmailto:[email protected]>> On Jul 9, 2015, at 3:44 PM, Finan, Sean <[email protected]<mailto:[email protected]<mailto:[email protected]%3cmailto:[email protected]>>> wrote: Checked in, please give it a test and close the ticket if it fits your purposes. From: britt fitch [mailto:[email protected]] Sent: Thursday, July 09, 2015 3:30 PM To: [email protected]<mailto:[email protected]<mailto:[email protected]%3cmailto:[email protected]>> Subject: Re: dictionary-look-fast fails to handle alternative CUIs Linking ticket here for completeness https://issues.apache.org/jira/browse/CTAKES-368 Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 http://wiredinformatics.com [email protected]<mailto:[email protected]<mailto:[email protected]%3cmailto:[email protected]<mailto:[email protected]%3cmailto:[email protected]%3cmailto:[email protected]%3cmailto:[email protected]>>> On Jul 9, 2015, at 3:19 PM, britt fitch <[email protected]<mailto:[email protected]<mailto:[email protected]%3cmailto:[email protected]<mailto:[email protected]%3cmailto:[email protected]%3cmailto:[email protected]%3cmailto:[email protected]>>>> wrote: Absolutely. I’ll create it now.
