Thanks Sean. I actually think I figured out what is causing the difference.
When I create the UMLS install on my machine, I only install RxNorm and
SNOMEDCT_US, so when I use the dictionaryCreator GUI, there are only those
two sources on the left. I noticed in the screenshots on the wiki page for
the dictionary creator GUI that many sources were installed, but only
SNOMEDCT_US and RxNorm were selected. So, I tried installing all of the
active UMLS set (but still only selecting RxNorm and SNOMEDCT_US in the
dictionaryCreator GUI) and it made a difference as to which terms appeared
in the final cTAKES dictionary. As an example, I now get the "DM" entry for
diabetes. I don't know why this should make a difference, but it appears
that it does.

Another odd observation related to this. In the sno_rx_2016ab file, I
noticed there seems to be an error:
INSERT INTO CUI_TERMS VALUES(11849,0,2,'diabete mellitus','diabete')

The 's' is missing from diabetes. When I created my dictionary (from the
restricted UMLS install, but still 2016ab) the cTAKES dictionary entry for
that term is correct:
INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')

When I created the dictionary from the full cTAKES install tonight, that
error appeared again.

Jeff



On Mon, Jun 17, 2019 at 8:08 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> Thanks for doing the research.  Since the sno_rx_16ab was made 3+ years
> ago I can't swear to any of those filter sets being exactly what was used.
>
> I think that the key to working with any project is to check the
> dictionary against a project's needs.  Fill in the gaps by either editing
> the sql (.script) file or by adding a second dictionary.  In smaller
> "focus" projects I usually end up augmenting the default dictionary with a
> small custom bsv dictionary to catch any known synonyms or terms that
> aren't represented in the default.  In projects requiring larger nets I
> have built dictionaries that are horribly inclusive - 2 to 3 times the
> sno_rx_16ab.
>
> Sean
> ________________________________________
> From: Jeffrey Miller <jeff...@gmail.com>
> Sent: Monday, June 17, 2019 4:39 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Thanks for following up Sean. I've looked into the links you sent along.
> There are different groups of filters and it appears that the
> dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
> directory. I don't think this is the set of filters used to make
> sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
> veterinary product.  310367) in "UnwantedTexts.txt", but the
> sno_rx_16ab.script file has "today" still in there. If you create a
> dictionary with the dictionary builder, it does not include that term.
>
> I thought maybe the set of files under the "default" filter directory might
> be the one used for the sno_rx_16ab package so I recompiled the
> dictionaryCreator GUI to use the "default" filter files and created a new
> snomed rxnorm dictionary from the 2016ab umls release, but the output is
> still quite different that the packaged sno_rx_16ab dictionary. From
> looking at diffs, it looks like there are a substantial number of additions
> to the sno_rx_16ab, so much so that I really must be missing something. For
> example, for CUI 12169 which describes a low sodium diet, there are about
> 27 CUI terms in sno_rx_16ab.script, but in the script generated by the
> dictionaryGUI there are only 7 (with the "tiny" or "default" filter
> groups).
>
> On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet <re...@foreseemed.com>
> wrote:
>
> > Thanks for the clarifications, Sean. That was very enlightening. I look
> > forward to the documentation (even if it entails some suffering on your
> > part.)
> >
> > If/when you stumble on some idle time allowing you to implement the
> manual
> > edit panel, it would be nice to have it allow for re-partitioning the
> > ontology. As you are very aware, UMLS CUIs and SNOMED do not always have
> a
> > one-to-one correspondence resulting in a CUI matching multiples SNOMEDs
> or
> > a SNOMED being mapped to several CUIs.
> >
> > In some cases, clinicians don't agree with that partitioning in
> specialized
> > contexts and the inheritance that ensues and would like to re-assign
> them.
> >
> > Not holding my breath, but just something to keep in mind.
> >
> >       Remy
> >
> > On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi Jeff,
> > >
> > > >1) ...
> > > There are several collections of filter sets here:
> > >
> ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
> > >
> > > 2) ...
> > > There is additional logic within the dictionary creator code:
> > > ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
> > >
> > > I haven't gone through it in a really long time, and without doing so
> now
> > > I can't enumerate the filters.  I have family visiting, otherwise my
> > > curiosity would force me to do so and get back to you.   Honestly, it
> > > should be documented somewhere, but writing (especially technical) is
> > > pretty much my least favorite activity.
> > >
> > > Sean
> > >
> > >
> > > p.s.
> > > Please don't wait for it, but I am currently working on new dictionary
> > > code and plan to introduce that in ctakes.  Again, please don't wait
> for
> > it
> > > as it is mixed in with other work and will not be available for several
> > > months (if at all).
> > >
> > >
> > > ________________________________________
> > > From: Jeffrey Miller <jeff...@gmail.com>
> > > Sent: Sunday, June 16, 2019 9:49 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > > sno_rx16ab from sourceforge [EXTERNAL]
> > >
> > > Hi Sean,
> > >
> > > Thanks for your response. I had two follow-up questions that would be
> > very
> > > helpful to understand if you have a few moments:
> > >
> > > 1) Are the specific filters used in the official sno_rx_16ab codified
> > > anywhere so that I could reproduce them?
> > >
> > > 2) Do these filters explain all the changes? For example, when I use
> the
> > > dictionary creator to export sno_med and rx_norm, I only get "diabetes
> > > mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> > > Especially with the addition of "dm" it feels like I must be missing a
> > step
> > > or a setting somewhere.
> > >
> > > Thanks!
> > > Jeff
> > >
> > > On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> > > sean.fi...@childrens.harvard.edu> wrote:
> > >
> > > > Hi all,
> > > >
> > > > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed
> > and
> > > > rxnorm terms with certain symantic types.  Nothing was added, but
> > > synonyms
> > > > are filtered based upon various rules.  For instance, unnecessary
> > > suffixes
> > > > are removed ("Wart (Finding)" -> "Wart"), really long terms are
> > excluded
> > > > ("can walk straight line with only minimal assistance"), terms with
> > dose
> > > or
> > > > form are ignored and so forth.
> > > >
> > > > Some filters can be changed by adding/removing from
> > > prefix/suffix/contains
> > > > lists in plaintext files or by modifying the dictionary creator code.
> > > >
> > > > There was no manual curation (or nothing major).  As Remy mentioned
> > that
> > > > requires a lot of attention and time.  The dictionary database was
> not
> > > > intended to be perfect, just as good as possible without major
> > > investment -
> > > > and reproducible with updates to the umls.
> > > >
> > > > As the dictionary is released as a sql database, you should be able
> to
> > > add
> > > > and remove fairly easily if sql savvy.  I have long wanted to add a
> > > "manual
> > > > edit" panel to the dictionary gui, but haven't had the time.  If
> > anybody
> > > > else would like to work on such a tool that would be tonic.
> > > >
> > > > Sean
> > > >
> > > >
> > > > ________________________________________
> > > > From: Harish Kulkarni <harish.m.kulka...@gmail.com>
> > > > Sent: Saturday, June 15, 2019 5:16 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: Differences in dictionary built with dictionaryBuilder
> and
> > > > sno_rx16ab from sourceforge [EXTERNAL]
> > > >
> > > > unsubscribe
> > > >
> > > > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <
> re...@foreseemed.com>
> > > > wrote:
> > > >
> > > > > Yes, I agree it would be nice because the tokenization that occurs
> > when
> > > > > creating the dictionaries from the releases make comparisons a bit
> > > tricky
> > > > > and is not 100% reversible. I would love to hear an answer to your
> > > > > quandary.
> > > > >
> > > > >      Remy
> > > > >
> > > > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <jeff...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks, I was curious if the cTAKES devs that created the
> > sno_rx_16ab
> > > > > > dictionary had put the differences applied to the default UMLS
> > output
> > > > > into
> > > > > > version control in some form. I imagine the
> > > > > > additions/synonyms/abbreviations that were added manually must
> have
> > > > been
> > > > > > collected over time somewhere prior to merging them with 2016ab
> > UMLS
> > > > > > release? I basically want to recreate the default cTAKES 4.0.0
> > > release
> > > > > with
> > > > > > an additional ontology and the latest terms. I can likely come up
> > > with
> > > > a
> > > > > > diff myself but was wondering if this was already maintained as
> > part
> > > of
> > > > > > cTAKES.
> > > > > >
> > > > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> > > re...@foreseemed.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > > > > dictionary,
> > > > > > > but to put in corrections because, lo and behold, there are
> some
> > > > errors
> > > > > > in
> > > > > > > there!. As you know, an ontology is a constant curation job and
> > > that
> > > > > > > script, under SCM, allows you to isolate those changes and, if
> > > > > necessary,
> > > > > > > re-apply them to new versions.
> > > > > > >
> > > > > > >       Remy
> > > > > > >
> > > > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > > > gandhiraja...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jeff,
> > > > > > > >
> > > > > > > > As far as I know, maintaining a separate SQL script to add
> > > > additional
> > > > > > > > entries should work seamlessly.
> > > > > > > >
> > > > > > > > On Saturday, June 15, 2019, Jeffrey Miller <
> jeff...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > > > > dictionary
> > > > > > > > > itself) so they can be carried forward in future dictionary
> > > > > updates?
> > > > > > > > >
> > > > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > > > re...@foreseemed.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > From my experience, it seems pretty obvious that
> > sno_rx_16ab
> > > > is a
> > > > > > > > curated
> > > > > > > > > > dictionary based on the SNOMED 2016AB release. It does
> not
> > > > > contain
> > > > > > > the
> > > > > > > > > full
> > > > > > > > > > set but it has additional edits and synonyms that are
> > pretty
> > > > > useful
> > > > > > > > > > (including 'dm').
> > > > > > > > > >
> > > > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > > > >
> > > > > > > > > >       Remy
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > > > jeff...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > > I have created a custom dictionary from the latest UMLS
> > > > release
> > > > > > > with
> > > > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > > > > > generating
> > > > > > > > > > .script
> > > > > > > > > > > file with unexpected differences as compared to the
> > > > sno_rx_16ab
> > > > > > > file
> > > > > > > > > > > available as part of the cTAKES release. Specifically,
> > for
> > > > > > > diabetes,
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > missing these two rows:
> > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > > > INSERT INTO CUI_TERMS
> > > VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > > > >
> > > > > > > > > > > and only has this one:
> > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > > > mellitus','mellitus')
> > > > > > > > > > >
> > > > > > > > > > > The end result is that "diabetes" is not being picked
> up
> > in
> > > > the
> > > > > > > test
> > > > > > > > > > text I
> > > > > > > > > > > am running through- it requires the full 'diabetes
> > > mellitus'.
> > > > > > > > > > >
> > > > > > > > > > > Is there any setting on the UMLS install side or the
> > > ctTAKES
> > > > > > > > dictionary
> > > > > > > > > > > creator that could account for missing alternative
> forms
> > > like
> > > > > > this?
> > > > > > > > > I've
> > > > > > > > > > > tried downloading the 2016AB release (which I think is
> > the
> > > > one
> > > > > > used
> > > > > > > > to
> > > > > > > > > > > create the bundled sno_rx_16ab package?) and I am not
> > > getting
> > > > > the
> > > > > > > > > > alternate
> > > > > > > > > > > forms in that dictionary either.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Jeff
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards,
> > > > > > > > Gandhi
> > > > > > > >
> > > > > > > > "The best way to find urself is to lose urself in the service
> > of
> > > > > others
> > > > > > > > !!!"
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to