Seems like some kind of simple heuristic should work: Isn't it just a case of looking at the in/out text offsets of the source text for an identified annotation and then comparing that with the canonical text of the CUI or SnomedID. If the source text is just a few of characters (say less than 5) and the Levenstein difference between it and the canonical text is > than the length of the source text, you're pretty sure to have an acronym.
For instance if cTakes finds "MI" and assigns SNOMED 22298006 or CUI C0027051 with canonical text "Myocardial Infarction"*, *then with the in/out offsets into the text you should be able to run this heuristic The problem (and I see this in my work) is that many acronyms have multiple meanings. Thus, you may accurately be able to tell that your identified concept came from an acronym, but it was the wrong concept!! Peter On Thu, May 16, 2019 at 4:31 AM Greg Silverman <[email protected]> wrote: > Got it! > > Yes, I understand the formidability, given the need for disambiguation, > etc. Was just curious if this existed. > > Thanks! > > > On Wed, May 15, 2019 at 9:11 PM Finan, Sean < > [email protected]> wrote: > > > Hi Greg, > > > > Ok, that gives me a great vector toward addressing your needs. > > > > I don't know of any ctakes components that indicate whether or not > > discovered concepts come from acronyms, abbreviations or -replete- text > > mentions. > > > > There should be something that does that. Open source ----> Any > > champions available? > > > > Right now no abbreviation or metonym information is provided in the > > standard components. If it can be extruded from source then it should > be > > provided. > > > > If anybody has such a component, please let us know ! This is a > > formidable (imio) nlp problem, so call your kudos with a solution! > > > > Sean > > > > ________________________________________ > > From: Greg Silverman <[email protected]> > > Sent: Wednesday, May 15, 2019 9:21 PM > > To: [email protected] > > Subject: Re: acronyms/abbreviations [EXTERNAL] > > > > I'm just wondering how acronyms are identified as acronyms in cTAKES (for > > example, in MetaMap, there is an attribute in the Document annotation > with > > ids of where they are in the Utterance annotation; and in BioMedICUS, > there > > is an acronym annotation type, etc.). From examining the XMI CAS, it is > not > > obvious. > > > > We're extracting the desired annotations from the XMI CAS using a custom > > Groovy client. > > > > Thanks! > > > > On Wed, May 15, 2019 at 7:43 PM Finan, Sean < > > [email protected]> wrote: > > > > > Hi Greg, > > > > > > What exactly do you need ? > > > > > > There are a lot of output components that can produce different formats > > > containing various types of information. > > > > > > Do you prefer to parse ml ? Or is columnized text output ok? Does > this > > > go to a post-processing engine or a human user? > > > > > > Thanks, > > > > > > Sean > > > ________________________________________ > > > From: Greg Silverman <[email protected]> > > > Sent: Wednesday, May 15, 2019 7:09 PM > > > To: [email protected] > > > Subject: acronyms/abbreviations [EXTERNAL] > > > > > > How can I get these from the XMI annotations? > > > > > > Thanks! > > > > > > Greg-- > > > > > > -- > > > Greg M. Silverman > > > Senior Systems Developer > > > NLP/IE < > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e= > > > > > > > University of Minnesota > > > [email protected] > > > > > > › evaluate-it.org ‹ > > > > > > > > > -- > > Greg M. Silverman > > Senior Systems Developer > > NLP/IE < > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e= > > > > > University of Minnesota > > [email protected] > > > > › evaluate-it.org ‹ > > > > > -- > Greg M. Silverman > Senior Systems Developer > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > University of Minnesota > [email protected] > > › evaluate-it.org ‹ >
