Yes, and regarding your last paragraph: This is where disambiguation comes into play. Here is one method: https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume23/montoyo05a-html/node9.html
I'm not sure how either MetaMap or BioMedICUS do disambiguation, but since are both open source, they would be potential resources.. Greg-- On Fri, May 17, 2019 at 2:17 AM Peter Abramowitsch <[email protected]> wrote: > Seems like some kind of simple heuristic should work: Isn't it just a > case of looking at the in/out text offsets of the source text for an > identified annotation and then comparing that with the canonical text of > the CUI or SnomedID. If the source text is just a few of characters (say > less than 5) and the Levenstein difference between it and the canonical > text is > than the length of the source text, you're pretty sure to have > an acronym. > > For instance if cTakes finds "MI" and assigns SNOMED 22298006 or CUI > C0027051 with canonical text "Myocardial Infarction"*, *then with the > in/out offsets into the text you should be able to run this heuristic > > The problem (and I see this in my work) is that many acronyms have multiple > meanings. Thus, you may accurately be able to tell that your identified > concept came from an acronym, but it was the wrong concept!! > > Peter > > On Thu, May 16, 2019 at 4:31 AM Greg Silverman <[email protected]> wrote: > > > Got it! > > > > Yes, I understand the formidability, given the need for disambiguation, > > etc. Was just curious if this existed. > > > > Thanks! > > > > > > On Wed, May 15, 2019 at 9:11 PM Finan, Sean < > > [email protected]> wrote: > > > > > Hi Greg, > > > > > > Ok, that gives me a great vector toward addressing your needs. > > > > > > I don't know of any ctakes components that indicate whether or not > > > discovered concepts come from acronyms, abbreviations or -replete- text > > > mentions. > > > > > > There should be something that does that. Open source ----> Any > > > champions available? > > > > > > Right now no abbreviation or metonym information is provided in the > > > standard components. If it can be extruded from source then it > should > > be > > > provided. > > > > > > If anybody has such a component, please let us know ! This is a > > > formidable (imio) nlp problem, so call your kudos with a solution! > > > > > > Sean > > > > > > ________________________________________ > > > From: Greg Silverman <[email protected]> > > > Sent: Wednesday, May 15, 2019 9:21 PM > > > To: [email protected] > > > Subject: Re: acronyms/abbreviations [EXTERNAL] > > > > > > I'm just wondering how acronyms are identified as acronyms in cTAKES > (for > > > example, in MetaMap, there is an attribute in the Document annotation > > with > > > ids of where they are in the Utterance annotation; and in BioMedICUS, > > there > > > is an acronym annotation type, etc.). From examining the XMI CAS, it is > > not > > > obvious. > > > > > > We're extracting the desired annotations from the XMI CAS using a > custom > > > Groovy client. > > > > > > Thanks! > > > > > > On Wed, May 15, 2019 at 7:43 PM Finan, Sean < > > > [email protected]> wrote: > > > > > > > Hi Greg, > > > > > > > > What exactly do you need ? > > > > > > > > There are a lot of output components that can produce different > formats > > > > containing various types of information. > > > > > > > > Do you prefer to parse ml ? Or is columnized text output ok? Does > > this > > > > go to a post-processing engine or a human user? > > > > > > > > Thanks, > > > > > > > > Sean > > > > ________________________________________ > > > > From: Greg Silverman <[email protected]> > > > > Sent: Wednesday, May 15, 2019 7:09 PM > > > > To: [email protected] > > > > Subject: acronyms/abbreviations [EXTERNAL] > > > > > > > > How can I get these from the XMI annotations? > > > > > > > > Thanks! > > > > > > > > Greg-- > > > > > > > > -- > > > > Greg M. Silverman > > > > Senior Systems Developer > > > > NLP/IE < > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e= > > > > > > > > > University of Minnesota > > > > [email protected] > > > > > > > > › evaluate-it.org ‹ > > > > > > > > > > > > > -- > > > Greg M. Silverman > > > Senior Systems Developer > > > NLP/IE < > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e= > > > > > > > University of Minnesota > > > [email protected] > > > > > > › evaluate-it.org ‹ > > > > > > > > > -- > > Greg M. Silverman > > Senior Systems Developer > > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > > University of Minnesota > > [email protected] > > > > › evaluate-it.org ‹ > > > -- Greg M. Silverman Senior Systems Developer NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> University of Minnesota [email protected] › evaluate-it.org ‹
