Hi Sean, Thanks for your response. I had two follow-up questions that would be very helpful to understand if you have a few moments:
1) Are the specific filters used in the official sno_rx_16ab codified anywhere so that I could reproduce them? 2) Do these filters explain all the changes? For example, when I use the dictionary creator to export sno_med and rx_norm, I only get "diabetes mellitus" where as sno_rx_16ab contains both "diabetes" and "dm". Especially with the addition of "dm" it feels like I must be missing a step or a setting somewhere. Thanks! Jeff On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean < [email protected]> wrote: > Hi all, > > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and > rxnorm terms with certain symantic types. Nothing was added, but synonyms > are filtered based upon various rules. For instance, unnecessary suffixes > are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded > ("can walk straight line with only minimal assistance"), terms with dose or > form are ignored and so forth. > > Some filters can be changed by adding/removing from prefix/suffix/contains > lists in plaintext files or by modifying the dictionary creator code. > > There was no manual curation (or nothing major). As Remy mentioned that > requires a lot of attention and time. The dictionary database was not > intended to be perfect, just as good as possible without major investment - > and reproducible with updates to the umls. > > As the dictionary is released as a sql database, you should be able to add > and remove fairly easily if sql savvy. I have long wanted to add a "manual > edit" panel to the dictionary gui, but haven't had the time. If anybody > else would like to work on such a tool that would be tonic. > > Sean > > > ________________________________________ > From: Harish Kulkarni <[email protected]> > Sent: Saturday, June 15, 2019 5:16 PM > To: [email protected] > Subject: Re: Differences in dictionary built with dictionaryBuilder and > sno_rx16ab from sourceforge [EXTERNAL] > > unsubscribe > > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <[email protected]> > wrote: > > > Yes, I agree it would be nice because the tokenization that occurs when > > creating the dictionaries from the releases make comparisons a bit tricky > > and is not 100% reversible. I would love to hear an answer to your > > quandary. > > > > Remy > > > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <[email protected]> > wrote: > > > > > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab > > > dictionary had put the differences applied to the default UMLS output > > into > > > version control in some form. I imagine the > > > additions/synonyms/abbreviations that were added manually must have > been > > > collected over time somewhere prior to merging them with 2016ab UMLS > > > release? I basically want to recreate the default cTAKES 4.0.0 release > > with > > > an additional ontology and the latest terms. I can likely come up with > a > > > diff myself but was wondering if this was already maintained as part of > > > cTAKES. > > > > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <[email protected] > > > > > wrote: > > > > > > > Yes, that's pretty much what we do too. Not only to enhance the > > > dictionary, > > > > but to put in corrections because, lo and behold, there are some > errors > > > in > > > > there!. As you know, an ontology is a constant curation job and that > > > > script, under SCM, allows you to isolate those changes and, if > > necessary, > > > > re-apply them to new versions. > > > > > > > > Remy > > > > > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan < > [email protected]> > > > > wrote: > > > > > > > > > Hi Jeff, > > > > > > > > > > As far as I know, maintaining a separate SQL script to add > additional > > > > > entries should work seamlessly. > > > > > > > > > > On Saturday, June 15, 2019, Jeffrey Miller <[email protected]> > > wrote: > > > > > > > > > > > Thanks Remy. Does anyone know if these manually curated > > > > > > modifications/synonyms are tracked anywhere (aside from the > > > dictionary > > > > > > itself) so they can be carried forward in future dictionary > > updates? > > > > > > > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > From my experience, it seems pretty obvious that sno_rx_16ab > is a > > > > > curated > > > > > > > dictionary based on the SNOMED 2016AB release. It does not > > contain > > > > the > > > > > > full > > > > > > > set but it has additional edits and synonyms that are pretty > > useful > > > > > > > (including 'dm'). > > > > > > > > > > > > > > We have had to manage those mods as an adjunct. > > > > > > > > > > > > > > Remy > > > > > > > > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > I have created a custom dictionary from the latest UMLS > release > > > > with > > > > > > > > SNOMEDCT_US and RxNorm and I've noticed it seems to be > > > generating > > > > > > > .script > > > > > > > > file with unexpected differences as compared to the > sno_rx_16ab > > > > file > > > > > > > > available as part of the cTAKES release. Specifically, for > > > > diabetes, > > > > > it > > > > > > > is > > > > > > > > missing these two rows: > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm') > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes') > > > > > > > > > > > > > > > > and only has this one: > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes > > > > > mellitus','mellitus') > > > > > > > > > > > > > > > > The end result is that "diabetes" is not being picked up in > the > > > > test > > > > > > > text I > > > > > > > > am running through- it requires the full 'diabetes mellitus'. > > > > > > > > > > > > > > > > Is there any setting on the UMLS install side or the ctTAKES > > > > > dictionary > > > > > > > > creator that could account for missing alternative forms like > > > this? > > > > > > I've > > > > > > > > tried downloading the 2016AB release (which I think is the > one > > > used > > > > > to > > > > > > > > create the bundled sno_rx_16ab package?) and I am not getting > > the > > > > > > > alternate > > > > > > > > forms in that dictionary either. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Jeff > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > Gandhi > > > > > > > > > > "The best way to find urself is to lose urself in the service of > > others > > > > > !!!" > > > > > > > > > > > > > > >
