Hi all, This is not intended behavior, it is a bug. I will check in a fix soon ...
-----Original Message----- From: Tomasz Oliwa [mailto:ol...@uchicago.edu] Sent: Thursday, November 12, 2015 6:53 PM To: britt fitch; dev@ctakes.apache.org Subject: RE: cTAKES dictionary lookup behavior question Britt, I observed it also depends on what the "missed" word is. "baby to" , "baby too" match C1305907 of "baby tooth", however "baby token" does not match it. "electrolyte le", "electrolyte lev" match C0428284 "electrolyte level", but "electrolyte dev" does not match. It seems if the "missed" word contains the same characters that the word found in the fast dictionary starts with, a match is made? Is there any way to tweak or customize this behavior? Thanks, Tomasz ________________________________ From: britt fitch [britt.fi...@wiredinformatics.com] Sent: Thursday, November 12, 2015 5:36 PM To: dev@ctakes.apache.org Subject: Re: cTAKES dictionary lookup behavior question The rare words, given the example terms below are "primary", "milk", and "baby". The lookup allows for a certain number of "misses". The "baby to" hits on "baby" as the rare word. "baby to" compared to "baby tooth" is 1 "miss" and qualifies as a match. (in practice, if I recall correctly, "to" is actually discarded entirely, so the comparison is actually "baby" : "baby tooth"). Others can correct my napkin logic though. This is a pretty common scenario when a single term ends up matching to a larger term because of the allowance of misses. For example: "oxygen" > "oxygen therapy" "pathology" > "pathology department" , "pathology procedure" "exercise" > "exercise pain management" Those are just some quick examples. It depends heavily on what the ontology contains though. Cheers, Britt Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 https://urldefense.proofpoint.com/v2/url?u=http-3A__wiredinformatics.com&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=6LcknYupSIqPd8Uml-tNRhwLudfDpVLBcC5JjZFhFQo&e= britt.fi...@wiredinformatics.com On Nov 12, 2015, at 6:27 PM, Tomasz Oliwa <ol...@uchicago.edu<mailto:ol...@uchicago.edu>> wrote: Hi, cTAKES has a dictionary lookup behavior that I cannot explain, you can verify the queries via the cTAKES demo that has been posted here at: https://urldefense.proofpoint.com/v2/url?u=http-3A__52.27.22.206-3A8080_index.jsp&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=nrheHTAYzgKYX9njwAR5G_NJXfSe_sbYbOMaifjWZwQ&s=UmyBQ5X4UBJggOqmIQkANeD0eUz0nrLqGN8Z6__iB8o&e= but it also happens with the current 3.2.2 version and the fast dictionary UMLS lookup SENTENCE: Took the baby to the hospital. VB DT NN IN DT NN |===| |======| Event Anatomy C1305907 It finds the "baby tooth" annotation. The only CUI texts in the default fast dictionary for C1305907 are C1305907|primary tooth C1305907|milk tooth C1305907|baby tooth How can "baby to" trigger the "baby tooth" annotation? Regards, Tomasz