Re: The 2020 UMLS dictionary and our default SNO_RX

2020-08-07 Thread Jeffrey Miller
Hi Peter,

Yes, I've chosen active subsets then I think I actually choose the select
sources to exclude option, but I don't believe that should matter. I leave
the precedence defaults alone.

Jeff

On Thu, Aug 6, 2020, 2:13 PM Peter Abramowitsch 
wrote:

> Hi Jeff
>
> You are absolutely right:  when I use sno_rx with the term WBC in a simple
> context it is not showing up as a T059.  I was surprised about that
>
> I was wrong about the term I was looking at.   Here's the scenario that did
> change
>
> Text context
> afebrile, but has elevated WBC count;
>
> *Using sno_rx*
> canonical text:  White blood cell count increased (lab result)
> CUI: C0750426,
> location:  Leukocytes,
> location_snomed: 52501007
> range_text:  elevated WBC count,
> vocab_term: 414478003,
> vocab_type: SNOMEDCT_US
> ...other params.
>
> *Using new dict based on 2020AA*
> Missing:
>
> Reason:
> *grep elevated newdict_750426*
> INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
> count','elevated')
> INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
> count','elevated')
> *grep elevated olddict_750426*
> INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
> count','elevated')
> INSERT INTO CUI_TERMS VALUES(750426,1,3,'elevated wbc count','wbc')
> <--  missing
> INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
> count','elevated')
>
> So back to your recommendation on using MMSYS
>
> You chose the ACTIVE_SUBSETS option - right?
> And on the Sources to Exclude/Include page, do you deselect all sources to
> exclude?
> Have you tweaked the precedence of subsets or do you leave the default
> order alone?
>
> Many thanks,
> Peter
>
> On Thu, Aug 6, 2020 at 8:11 AM Jeffrey Miller  wrote:
>
> > Peter,
> >
> > I have experienced similar issues with how text spans translate to
> > different CUIs depending on the included vocabularies as well. I had a
> > similar conversation with Sean on the dev forum last year I believe.
> >
> > I do not believe the behavior of 'wbc' has changed- if I run the clinical
> > pipeline with sno_rx_16ab dictionary, it is tagged as an
> > AnatomicalSiteMention. Are you seeing something different?
> >
> > Jeff
> >
> > On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> >
> > > Hi Jeff
> > >
> > > I thought I did load them all, but I'll go back and check.
> > >
> > > When looking at my gene issue  the result is that the lookup
> arbitrarily
> > > (seemingly anyway) flips between one and another when there are
> overlaps
> > > between vocabularies.Ie. I see that both Vocab A & B both contain
> > geneX
> > > and geneY.   Neither of these are in SNOMED. So in my output, I get one
> > of
> > > the genes associated with Vocab A and another with Vocab B.   When I
> > remove
> > > Vocab B then obviously both are associated with Vocab A - which is
> what I
> > > wanted.
> > >
> > > If, for you, WBC is showing up as an anatomical location, rather than a
> > > T059  then probably it's not getting the correct SNOMED code though.
> > > Wouldn't that be a problem for your researchers?
> > >
> > > Peter
> > >
> > > On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller 
> wrote:
> > >
> > > > Hi Peter,
> > > >
> > > > If I create a dictionary using UMLS 2020aa with just snomed and
> rxnorm
> > my
> > > > cTAKES dictionary still seems to have a CUI associated with the
> string
> > > > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not
> > > mapping
> > > > to a lab result TUI, but rather an anatomical site, but it seems to
> be
> > > the
> > > > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is
> > conflicting
> > > > with that too?
> > > >
> > > > Just to double check, when you installed UMLS through Metamorphosys,
> > did
> > > > you install all of the available vocabularies?
> > > >
> > > > Jeff
> > > >
> > > > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <
> > > pabramowit...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi All
> > > > >
> > > > > I've been setting up a custom dictionary using UMLS with the goal
> of
> > > > simply
> > > > &

Re: The 2020 UMLS dictionary and our default SNO_RX

2020-08-06 Thread Jeffrey Miller
Peter,

I have experienced similar issues with how text spans translate to
different CUIs depending on the included vocabularies as well. I had a
similar conversation with Sean on the dev forum last year I believe.

I do not believe the behavior of 'wbc' has changed- if I run the clinical
pipeline with sno_rx_16ab dictionary, it is tagged as an
AnatomicalSiteMention. Are you seeing something different?

Jeff

On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch 
wrote:

> Hi Jeff
>
> I thought I did load them all, but I'll go back and check.
>
> When looking at my gene issue  the result is that the lookup arbitrarily
> (seemingly anyway) flips between one and another when there are overlaps
> between vocabularies.Ie. I see that both Vocab A & B both contain geneX
> and geneY.   Neither of these are in SNOMED. So in my output, I get one of
> the genes associated with Vocab A and another with Vocab B.   When I remove
> Vocab B then obviously both are associated with Vocab A - which is what I
> wanted.
>
> If, for you, WBC is showing up as an anatomical location, rather than a
> T059  then probably it's not getting the correct SNOMED code though.
> Wouldn't that be a problem for your researchers?
>
> Peter
>
> On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller  wrote:
>
> > Hi Peter,
> >
> > If I create a dictionary using UMLS 2020aa with just snomed and rxnorm my
> > cTAKES dictionary still seems to have a CUI associated with the string
> > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not
> mapping
> > to a lab result TUI, but rather an anatomical site, but it seems to be
> the
> > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is conflicting
> > with that too?
> >
> > Just to double check, when you installed UMLS through Metamorphosys, did
> > you install all of the available vocabularies?
> >
> > Jeff
> >
> > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <
> pabramowit...@gmail.com
> > >
> > wrote:
> >
> > > Hi All
> > >
> > > I've been setting up a custom dictionary using UMLS with the goal of
> > simply
> > > adding a comprehensive genetic vocabulary HGNC  to the latest UMLS
> SNOMED
> > > and RXNORM vocabularies in the hope of getting somewhere close to the
> > > cTakes default dictionary again.
> > >
> > > However, there are changes to concept vocabularies in UMLS2020AA that
> > > affect the ability of cTakes to work well with older notes and possibly
> > the
> > > note-writing practices of older physicians and labs.   Some of the
> tried
> > > and true acronyms such as WBC for leukocytes, RBC, and EOS (eosinophil
> > > count) are no longer part of SNOMED.  Probably this is because the
> > > components of these parameters are now broken out into  more granular
> > > types.   The other reason this may be is that a few of these acronyms
> now
> > > overlap the names of Genes.  EOS is one of them.  This is just
> > speculation.
> > >
> > > In order to have these common parameters re-included via their common
> lab
> > > acronyms, it is necessary to add another common US vocabulary such as
> > > HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED by
> > adding
> > > insert statements into the dictionary script, but it might be a
> > > non-scalable exercise.
> > >
> > > So my point here is that if, one day, we plan to create a new cTakes
> > > release, and with it, a new UMLS lookup, we may need to consider
> adding a
> > > third basic vocabulary into our current set of two.
> > >
> > > Thoughts?
> > > Peter
> > >
> >
>


Re: The 2020 UMLS dictionary and our default SNO_RX

2020-08-05 Thread Jeffrey Miller
Hi Peter,

If I create a dictionary using UMLS 2020aa with just snomed and rxnorm my
cTAKES dictionary still seems to have a CUI associated with the string
'wbc' that links to the snomed term for Leukocyte (Cell). It is not mapping
to a lab result TUI, but rather an anatomical site, but it seems to be the
same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is conflicting
with that too?

Just to double check, when you installed UMLS through Metamorphosys, did
you install all of the available vocabularies?

Jeff

On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch 
wrote:

> Hi All
>
> I've been setting up a custom dictionary using UMLS with the goal of simply
> adding a comprehensive genetic vocabulary HGNC  to the latest UMLS SNOMED
> and RXNORM vocabularies in the hope of getting somewhere close to the
> cTakes default dictionary again.
>
> However, there are changes to concept vocabularies in UMLS2020AA that
> affect the ability of cTakes to work well with older notes and possibly the
> note-writing practices of older physicians and labs.   Some of the tried
> and true acronyms such as WBC for leukocytes, RBC, and EOS (eosinophil
> count) are no longer part of SNOMED.  Probably this is because the
> components of these parameters are now broken out into  more granular
> types.   The other reason this may be is that a few of these acronyms now
> overlap the names of Genes.  EOS is one of them.  This is just speculation.
>
> In order to have these common parameters re-included via their common lab
> acronyms, it is necessary to add another common US vocabulary such as
> HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED by adding
> insert statements into the dictionary script, but it might be a
> non-scalable exercise.
>
> So my point here is that if, one day, we plan to create a new cTakes
> release, and with it, a new UMLS lookup, we may need to consider adding a
> third basic vocabulary into our current set of two.
>
> Thoughts?
> Peter
>


Re: RE Tuning custom dictionary recommendations

2020-08-04 Thread Jeffrey Miller
Where in the source code is this feature implemented?

On Tue, Aug 4, 2020 at 7:30 PM Peter Abramowitsch 
wrote:

> Blacklist format
> Actually I got it inverted, its:
>
> semantic_code1, semantic_code2,...|text1
> semantic_code1, semantic_code2,...|text2
>
> Peter
>
> On Tue, Aug 4, 2020 at 4:16 PM Peter Abramowitsch  >
> wrote:
>
> > Ok Thanks Jeff.  I'm glad I wasn't missing something important.
> >
> > There already is a blacklist text mechanism which suppresses
> > identification of specific text by clinical domain.
> > Looking at the code it collects entries like
> > cTakesSemanticCode,texta,textb,textc
> > NE_TYPE_ID_DRUG, jasmine, coriander, bleach
> > There's a case sensitive list and a case insensitive one.
> >
> > So I will try that.
> > in one of my examples, I'll say that  'bed' is not a disorder, while
> 'BED'
> > could be one.
> >
> >
> >
> > On Tue, Aug 4, 2020 at 2:12 PM Jeffrey Miller  wrote:
> >
> >> Hi Peter,
> >>
> >> To your question about sno_rx_16ab I suspect that the CUI is new since
> >> 2016, or if it existed in UMLS back then, it was not associated with a
> >> term
> >> in snomed or rxnorm at that time.
> >>
> >> To those solutions, if you are able to use the trunk I know Sean said
> >> there
> >> was a suppression text feature, otherwise in the past I have removed the
> >> lines from the .script file
> >>
> >> I definitely think the acronym case sensitive feature would be great.
> >>
> >> Jeff
> >>
> >> On Tue, Aug 4, 2020 at 3:28 PM Peter Abramowitsch <
> >> pabramowit...@gmail.com>
> >> wrote:
> >>
> >> > Hi Jeff et al
> >> >
> >> > To take up the thread from a few days ago where a simple english word
> >> such
> >> > as bed, soft, shop also maps into a legitimate but rarely used acronym
> >> and
> >> > shows up in the same POS as a potentially interesting entity,  what is
> >> the
> >> > mechanism you would use to disambiguate?
> >> >
> >> > This problem only started since I  constructed a SNO+RX+HGNC
> dictionary
> >> > from the 2020A UMLS dump.   Adding more TUIS where a more conventional
> >> > word-sense of the target word occurs, does not fix this problem.
> >> >
> >> > For instance, why does the sno_rx dictionary not contain this disease
> >> which
> >> > aliases to  "bed" ?
> >> >
> >> > ucsf_dict_v1 $ grep 3159311 *.script
> >> > *INSERT INTO CUI_TERMS VALUES(3159311,0,1,'bed','bed')*
> >> > INSERT INTO CUI_TERMS VALUES(3159311,5,8,'myopia , high , with
> >> > nonprogressive cone dysfunction','nonprogressive')
> >> > INSERT INTO CUI_TERMS VALUES(3159311,0,3,'bornholm eye
> >> disease','bornholm')
> >> > INSERT INTO CUI_TERMS VALUES(3159311,5,6,'x-linked cone dysfunction
> >> > syndrome with myopia','myopia')
> >> > INSERT INTO TUI VALUES(3159311,47)
> >> > *INSERT INTO PREFTERM VALUES(3159311,'BORNHOLM EYE DISEASE')*
> >> > INSERT INTO SNOMEDCT_US VALUES(3159311,718718009)
> >> >
> >> >
> >> > sno_rx_16ab $ grep 3159311 *.script
> >> > nada
> >> >
> >> > Solutions good or evil?
> >> >
> >> >- Strip the relevant lines out of ths dict.script file?
> >> >- Blacklist the text?
> >> >- Add to my stopCUI list (a little feature I added)?
> >> >- Some other configuration I don't  know about?
> >> >For instance, is there a CUI:ACRONYM table?
> >> >I'm tempted to create one.  This would require the matching term to
> >> be
> >> >present in upper case.
> >> >
> >> > Peter
> >> >
> >>
> >
>


Re: RE Tuning custom dictionary recommendations

2020-08-04 Thread Jeffrey Miller
Hi Peter,

To your question about sno_rx_16ab I suspect that the CUI is new since
2016, or if it existed in UMLS back then, it was not associated with a term
in snomed or rxnorm at that time.

To those solutions, if you are able to use the trunk I know Sean said there
was a suppression text feature, otherwise in the past I have removed the
lines from the .script file

I definitely think the acronym case sensitive feature would be great.

Jeff

On Tue, Aug 4, 2020 at 3:28 PM Peter Abramowitsch 
wrote:

> Hi Jeff et al
>
> To take up the thread from a few days ago where a simple english word such
> as bed, soft, shop also maps into a legitimate but rarely used acronym and
> shows up in the same POS as a potentially interesting entity,  what is the
> mechanism you would use to disambiguate?
>
> This problem only started since I  constructed a SNO+RX+HGNC dictionary
> from the 2020A UMLS dump.   Adding more TUIS where a more conventional
> word-sense of the target word occurs, does not fix this problem.
>
> For instance, why does the sno_rx dictionary not contain this disease which
> aliases to  "bed" ?
>
> ucsf_dict_v1 $ grep 3159311 *.script
> *INSERT INTO CUI_TERMS VALUES(3159311,0,1,'bed','bed')*
> INSERT INTO CUI_TERMS VALUES(3159311,5,8,'myopia , high , with
> nonprogressive cone dysfunction','nonprogressive')
> INSERT INTO CUI_TERMS VALUES(3159311,0,3,'bornholm eye disease','bornholm')
> INSERT INTO CUI_TERMS VALUES(3159311,5,6,'x-linked cone dysfunction
> syndrome with myopia','myopia')
> INSERT INTO TUI VALUES(3159311,47)
> *INSERT INTO PREFTERM VALUES(3159311,'BORNHOLM EYE DISEASE')*
> INSERT INTO SNOMEDCT_US VALUES(3159311,718718009)
>
>
> sno_rx_16ab $ grep 3159311 *.script
> nada
>
> Solutions good or evil?
>
>- Strip the relevant lines out of ths dict.script file?
>- Blacklist the text?
>- Add to my stopCUI list (a little feature I added)?
>- Some other configuration I don't  know about?
>For instance, is there a CUI:ACRONYM table?
>I'm tempted to create one.  This would require the matching term to be
>present in upper case.
>
> Peter
>


Re: With custom dictionary - over-eager resolution of acronyms [EXTERNAL]

2020-08-02 Thread Jeffrey Miller
wrong or I'm missing
> something to say that it only applies (as an acronym) if it's capitalized
>
> In sno_rx  there is neither a CUI 3542022 nor the definition of "soft" as a
> solitary word, nor even a mention of ONYCHODYSPLASIA or HYPOTRICHOSIS
>
> In any case, I would have thought that ctakes will only create an event
> mention from a term tagged as NN or NP slot, not a ADJ as in "soft tissue"
>
> Anyway  Thanks!  Now I will keep poking around.
>
>
> Peter
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Aug 1, 2020 at 5:06 PM Jeffrey Miller  wrote:
>
> > Sorry, I meant suggest to search for 'soft' in the dictionary file not
> > 'short'
> >
> > grep -i ,\'soft\', *.script
> >
> > On Sat, Aug 1, 2020 at 7:47 PM Jeffrey Miller  wrote:
> >
> > > Hi Peter,
> > >
> > > To my knowledge, there isn't any drastic difference in the behavior of
> > the
> > > dictionary gui creator and the way the sno_rx dictionary was created. I
> > > originally thought there was, but I realized the difference was that I
> > had
> > > not installed all of UMLS to my machine (just the vocabularies I was
> > > interested in) and I was missing synonyms. The first thing I would
> check,
> > > are you able to find a matching entry in the .script file for your
> ctakes
> > > dictionary when you do this:
> > >
> > > grep -i ,\'short\', *.script
> > >
> > > That would confirm whether or not you have a term in your dictionary
> made
> > > up only of 'short' and whether it mapped to the CUI equal to "SHORT
> > > STATURE, ONYCHODYSPLASIA, FACIAL DYSMORPHISM, AND HYPOTRICHOSIS
> > SYNDROME".
> > > If it's not in there, something else is going on. You could do the same
> > for
> > > 'bed'.
> > >
> > > If not, another thing I might check is that I noticed you are using
> > > the OverlapJCasTermAnnotator in your prior e-mail. I don't have much
> > > experience with it, and I don't think it should cause this behavior,
> but
> > I
> > > wonder if that could be making the difference (as compared
> > > to DefaultJCasTermAnnotator).
> > >
> > > Jeff
> > >
> > > On Sat, Aug 1, 2020 at 5:27 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > > wrote:
> > >
> > >>
> > >> Hi All
> > >>
> > >> Having created a new dictionary from the 2020AA UMLS and added Genes
> and
> > >> Receptors to the dictionary-creator's default selections, I have a
> > curious
> > >> problem where cTakes now assigns the most bizarre acronyms to ordinary
> > >> words used in POS contexts where it shouldn't  find Mentions.
> > >>
> > >> Here are two examples:
> > >>
> > >> 1.   soft (in "soft tissue...")
> > >> becomes   "SHORT STATURE, ONYCHODYSPLASIA, FACIAL DYSMORPHISM, AND
> > >> HYPOTRICHOSIS SYNDROME",
> > >>
> > >> 2.   bed in ("The wound bed was...")
> > >> becomes  "BORNHOLM EYE DISEASE"
> > >>
> > >> I have not changed the TermConsumer type in the descriptor XML.
> > >>
> > >> Are the DictionaryCreator's defaults, the equivalent to the default
> > >> sno_rx that's delivered with the app?
> > >>
> > >> Attached is the vocab subsets list I used
> > >>
> > >>
> > >> Peter
> > >>
> > >>
> > >>
> >
>


Re: With custom dictionary - over-eager resolution of acronyms

2020-08-01 Thread Jeffrey Miller
Sorry, I meant suggest to search for 'soft' in the dictionary file not
'short'

grep -i ,\'soft\', *.script

On Sat, Aug 1, 2020 at 7:47 PM Jeffrey Miller  wrote:

> Hi Peter,
>
> To my knowledge, there isn't any drastic difference in the behavior of the
> dictionary gui creator and the way the sno_rx dictionary was created. I
> originally thought there was, but I realized the difference was that I had
> not installed all of UMLS to my machine (just the vocabularies I was
> interested in) and I was missing synonyms. The first thing I would check,
> are you able to find a matching entry in the .script file for your ctakes
> dictionary when you do this:
>
> grep -i ,\'short\', *.script
>
> That would confirm whether or not you have a term in your dictionary made
> up only of 'short' and whether it mapped to the CUI equal to "SHORT
> STATURE, ONYCHODYSPLASIA, FACIAL DYSMORPHISM, AND HYPOTRICHOSIS SYNDROME".
> If it's not in there, something else is going on. You could do the same for
> 'bed'.
>
> If not, another thing I might check is that I noticed you are using
> the OverlapJCasTermAnnotator in your prior e-mail. I don't have much
> experience with it, and I don't think it should cause this behavior, but I
> wonder if that could be making the difference (as compared
> to DefaultJCasTermAnnotator).
>
> Jeff
>
> On Sat, Aug 1, 2020 at 5:27 PM Peter Abramowitsch 
> wrote:
>
>>
>> Hi All
>>
>> Having created a new dictionary from the 2020AA UMLS and added Genes and
>> Receptors to the dictionary-creator's default selections, I have a curious
>> problem where cTakes now assigns the most bizarre acronyms to ordinary
>> words used in POS contexts where it shouldn't  find Mentions.
>>
>> Here are two examples:
>>
>> 1.   soft (in "soft tissue...")
>> becomes   "SHORT STATURE, ONYCHODYSPLASIA, FACIAL DYSMORPHISM, AND
>> HYPOTRICHOSIS SYNDROME",
>>
>> 2.   bed in ("The wound bed was...")
>> becomes  "BORNHOLM EYE DISEASE"
>>
>> I have not changed the TermConsumer type in the descriptor XML.
>>
>> Are the DictionaryCreator's defaults, the equivalent to the default
>> sno_rx that's delivered with the app?
>>
>> Attached is the vocab subsets list I used
>>
>>
>> Peter
>>
>>
>>


Re: Problem trying to load a custom dictionary [EXTERNAL]

2020-07-31 Thread Jeffrey Miller
How would I go about getting edit access to the Wiki (is that the preferred
path)?

On Fri, Jul 31, 2020 at 11:08 AM Peter Abramowitsch 
wrote:

> Thank you Jeff and Gandhi for offers of help.I'm not trying to renege
> on my offer, but as I have only done this once, I'm wondering if your
> combined experience makes it much more appropriate for one of you to do
> this documentation and I do a review rather than the other way round.   --
> Especially if Jeff has actually written up the basis for the enhancement.
>
> However I'm willing to give it a shot if neither of you wants to take the
> reins
>
> Peter
>
>
>
> On Fri, Jul 31, 2020 at 7:39 AM Jeffrey Miller  wrote:
>
> > I can help with this as well. I have some documentation that I have
> written
> > for myself that would probably be useful. I've tried to keep a list of
> > useful forum posts that contain information that could probably be more
> > prominently displayed on the wiki.
> >
> > On Fri, Jul 31, 2020 at 10:34 AM gandhi rajan 
> > wrote:
> >
> > > Hi Peter,
> > >
> > > We can work together on this if you are interested.
> > >
> > > On Fri, Jul 31, 2020 at 7:44 PM Peter Abramowitsch <
> > > pabramowit...@gmail.com>
> > > wrote:
> > >
> > > > I could do it while the experience is fresh, although I only know the
> > > happy
> > > > path and not the deeper details in this area of the suite
> > > > If you want me to, let me know how to get editing privileges on the
> > Wiki.
> > > >
> > > > Peter
> > > >
> > > > On Fri, Jul 31, 2020 at 4:28 AM Finan, Sean <
> > > > sean.fi...@childrens.harvard.edu> wrote:
> > > >
> > > > > Obviously Jeff is correct in all of his answers.  Thank you Jeff!
> > > > >
> > > > > One comment: DictionaryDescriptor is a deprecated parameter name
> that
> > > is
> > > > > picked up by the piper creator when it inspects the code.
> However, I
> > > am
> > > > > not sure why the deprecated parameter name isn't working ...
> > > > >
> > > > > The wiki needs additional and more thorough information.  If
> anybody
> > > can
> > > > > volunteer to work on it I (and future users) would really
> appreciate
> > > it!
> > > > >
> > > > > Thanks,
> > > > > Sean
> > > > >
> > > > >
> > > > > 
> > > > > From: Peter Abramowitsch 
> > > > > Sent: Thursday, July 30, 2020 9:02 PM
> > > > > To: dev@ctakes.apache.org
> > > > > Subject: Re: Problem trying to load a custom dictionary [EXTERNAL]
> > > > >
> > > > > * External Email - Caution *
> > > > >
> > > > >
> > > > > Thanks Jeff
> > > > >
> > > > > That worked!
> > > > >
> > > > > Seems like something that should get fixed in the PiperCreator and
> in
> > > the
> > > > > documentation.
> > > > >
> > > > > With a life of assuming that every mistake is my own error, the
> last
> > > > thing
> > > > > I would have expected was
> > > > > a generator of incorrect params.
> > > > >
> > > > > Peter
> > > > >
> > > > > On Thu, Jul 30, 2020 at 4:59 PM Jeffrey Miller 
> > > > wrote:
> > > > >
> > > > > > Peter,
> > > > > >
> > > > > > 1) This is loaded by cTAKES, you don't need to manually create
> the
> > > > > > database.
> > > > > > 2) I can't see the highlights here, but I think that file should
> be
> > > > okay
> > > > > as
> > > > > > created by the GUI.
> > > > > > 3) I think the parameter name to configure your dictionary
> location
> > > is
> > > > > > LookupXml instead of DictionaryDescriptor
> > > > > >
> > > > > > Jeff
> > > > > >
> > > > > > On Thu, Jul 30, 2020 at 6:49 PM Peter Abramowitsch <
> > > > > > pabramowit...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > A couple of questions about installing a custom dictionary in
> > > lookup
> > > > 

Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL] [EXTERNAL]

2020-07-31 Thread Jeffrey Miller
Sean,

When I use cTAKES I'd like to be able to refer to the version number for
reproducibility. If I run just the latest trunk (to get access to a new
feature), it is not easily referenced. How is it decided to make a new
cTAKES release? Do you think there will be any future releases or would it
be better to begin referring to cTAKES by svn commit rather than version?

Also, unrelatedly, I am not sure when this happened, but the github mirror
for cTAKES (https://github.com/apache/ctakes) doesn't seem to be updating.
It doesn't have dockhand (as an example).

Thanks,
Jeff

On Wed, Jul 29, 2020 at 1:31 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Tomasz,
>
> As far as I know there aren't any upcoming releases planned.
>
> Sean
> 
> From: Tomasz Oliwa 
> Sent: Wednesday, July 29, 2020 1:17 PM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Sean,
>
> Since you mention a new release, is there any expected time for a new
> stable cTAKES release? An up-to-date stable release for the user
> installation would be appreciated I think.
>
> Regards,
> Tomasz
>
> 
> From: Finan, Sean 
> Sent: Friday, July 24, 2020 10:45 AM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
>
> I don't think that anybody does.  It is not in the release, not
> documented, not necessarily ready for widespread use, etc.  Everything
> associated with types List and ListEntry is new.
>
> Hopefully when ctakes 4.0.1 ( should be 5.0 at this point ) is released
> these types will be much more usable.
>
> Sean
> 
> From: Peter Abramowitsch 
> Sent: Friday, July 24, 2020 10:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: Clarification regarding NegationFSM [EXTERNAL] [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thanks Sean.  I didn't know about that annotator.
>
> On Fri, Jul 24, 2020, 3:51 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu>
> wrote:
>
> > Hi Sreejith,
> >
> > Without seeing an example of text I can't say whether my next words will
> > help you or not.
> >
> > If you are using trunk then you should have access to two 'new'
> annotation
> > engines in ctakes-core.
> > ListAnnotator- Annotates formatted List Sections by detecting
> them
> > using Regular Expressions provided in an input File.
> > ListEntryNegator  - Checks List Entries for negation, which may be
> > exhibited differently from unstructured negation.
> >
> > ListAnnotator can use any list of regular expressions in a file.  The
> > default file is in ctakes-core-res, called DefaultListRegex.bsv
> > The format for each line in the regex list is
> > NAME||LIST_REGEX||ENTRY_SEPARATOR_REGEX   where
> > NAME - name of list type.  Can be anything.
> > LIST_REGEX   - some regular expression for which a block of text will
> > match a list in its entirety.
> > ENTRY_SEPARATOR_REGEX   - some regular expression for which text within
> > the entire list will match a single list entry.
> > For instance, the List
> > Smoker Status: N
> > Drinking Status: Y
> > Pregnant: N/A
> > A -simple- line in the regex file could be
> > Colonized
> >
> List||(?:^(?:[^\r\n:]+:[^\r\n:]+)+\r?\n){2,}||(?:^(?:[^\r\n:]+:[^\r\n:]+)+\r?\n)
> > Notice that each item is separated by two bar characters "||".
> >
> > The file of regular expressions can be changed using the LIST_TYPES_PATH
> > parameter.
> >
> > ListEntryNegator will iterate through each ListEntry in the cas and use a
> > regular expression to determine whether or not items in the list should
> be
> > negated.
> > Right now that regex is hard-coded in the class.  There should probably
> be
> > a mechanism to overwrite it.  ": N" is not in there.   Also, only
> > Disease/Disorders and Sign/Symptom mentions in the ListEntry are negated.
> >  You would need to add SmokingStatusAnnotation as a negatable.
> >
> > I don't know if any of this is helpful, but I thought that I would throw
> > it out there.
> >
> > Sean
> > 
> > From: Sreejith Pk 
> > Sent: Friday, July 24, 2020 4:09 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Clarification regarding NegationFSM [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Peter, Thanks a lot for the reply.
> >
> > Let me elaborate more on the changes I have done so far. I have added
> > KuRuleBasedClassifierAnnotator to the pipeline inorder to fetch Smoking
> > related keywords from the document. I have
> > modified KuRuleBasedClassifierAnnotator in such a way that it will
> iterate
> > through the identified tokens and if the token matches any smoking
> related
> > word which are configured inside a keyword.txt file. The identified
> tokens
> > will be then set to SmokerNamedEntityAnnotation and thus can be read from
> > the output XMI.
> > 

Re: Problem trying to load a custom dictionary [EXTERNAL]

2020-07-31 Thread Jeffrey Miller
I can help with this as well. I have some documentation that I have written
for myself that would probably be useful. I've tried to keep a list of
useful forum posts that contain information that could probably be more
prominently displayed on the wiki.

On Fri, Jul 31, 2020 at 10:34 AM gandhi rajan 
wrote:

> Hi Peter,
>
> We can work together on this if you are interested.
>
> On Fri, Jul 31, 2020 at 7:44 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > I could do it while the experience is fresh, although I only know the
> happy
> > path and not the deeper details in this area of the suite
> > If you want me to, let me know how to get editing privileges on the Wiki.
> >
> > Peter
> >
> > On Fri, Jul 31, 2020 at 4:28 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Obviously Jeff is correct in all of his answers.  Thank you Jeff!
> > >
> > > One comment: DictionaryDescriptor is a deprecated parameter name that
> is
> > > picked up by the piper creator when it inspects the code.  However, I
> am
> > > not sure why the deprecated parameter name isn't working ...
> > >
> > > The wiki needs additional and more thorough information.  If anybody
> can
> > > volunteer to work on it I (and future users) would really appreciate
> it!
> > >
> > > Thanks,
> > > Sean
> > >
> > >
> > > 
> > > From: Peter Abramowitsch 
> > > Sent: Thursday, July 30, 2020 9:02 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Problem trying to load a custom dictionary [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Thanks Jeff
> > >
> > > That worked!
> > >
> > > Seems like something that should get fixed in the PiperCreator and in
> the
> > > documentation.
> > >
> > > With a life of assuming that every mistake is my own error, the last
> > thing
> > > I would have expected was
> > > a generator of incorrect params.
> > >
> > > Peter
> > >
> > > On Thu, Jul 30, 2020 at 4:59 PM Jeffrey Miller 
> > wrote:
> > >
> > > > Peter,
> > > >
> > > > 1) This is loaded by cTAKES, you don't need to manually create the
> > > > database.
> > > > 2) I can't see the highlights here, but I think that file should be
> > okay
> > > as
> > > > created by the GUI.
> > > > 3) I think the parameter name to configure your dictionary location
> is
> > > > LookupXml instead of DictionaryDescriptor
> > > >
> > > > Jeff
> > > >
> > > > On Thu, Jul 30, 2020 at 6:49 PM Peter Abramowitsch <
> > > > pabramowit...@gmail.com>
> > > > wrote:
> > > >
> > > > > A couple of questions about installing a custom dictionary in
> lookup
> > > > fast.
> > > > > I hope I'm not too far off the track.
> > > > >
> > > > > I've used the dictionary creator with a UMLS install to create the
> > > > > dictionary script, prop file, and xml file in my ctakes resources
> > tree
> > > > >
> > > > > 1.  Do I have to manually run this script to execute all its SQL
> > > > statements
> > > > > into hsqldb or is this executed by the cTakes program when it
> > > encounters
> > > > > the XML descriptor?
> > > > > If manual running is needed, are there instructions on how and
> where
> > to
> > > > > load the script?
> > > > >
> > > > > 2.  The xml file generated by the dictionary creator contains lines
> > > with
> > > > > duplicate names-- see yellow highlight.  Is this  correct?
> > > > > my_dict_v1Terms
> > > > >   > > > >
> > > >
> > >
> >
> value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/
> > > > > my_dict_v1/my_dict_v1"/>
> > > > >
> > > > > 3.  Try as I might I cannot get ctakes to load anything other than
> > > > > sno_rx.   I'm using a piper file with an entry looking like
> > > > >   
> > > > >   add org.apache.ctakes.dictionary.lookup2.ae
> > > > .OverlapJCasTermAnnotator
> > > > >
> > > > >
&

Re: Problem trying to load a custom dictionary

2020-07-30 Thread Jeffrey Miller
Peter,

1) This is loaded by cTAKES, you don't need to manually create the database.
2) I can't see the highlights here, but I think that file should be okay as
created by the GUI.
3) I think the parameter name to configure your dictionary location is
LookupXml instead of DictionaryDescriptor

Jeff

On Thu, Jul 30, 2020 at 6:49 PM Peter Abramowitsch 
wrote:

> A couple of questions about installing a custom dictionary in lookup fast.
> I hope I'm not too far off the track.
>
> I've used the dictionary creator with a UMLS install to create the
> dictionary script, prop file, and xml file in my ctakes resources tree
>
> 1.  Do I have to manually run this script to execute all its SQL statements
> into hsqldb or is this executed by the cTakes program when it encounters
> the XML descriptor?
> If manual running is needed, are there instructions on how and where to
> load the script?
>
> 2.  The xml file generated by the dictionary creator contains lines with
> duplicate names-- see yellow highlight.  Is this  correct?
> my_dict_v1Terms
>   value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/
> my_dict_v1/my_dict_v1"/>
>
> 3.  Try as I might I cannot get ctakes to load anything other than
> sno_rx.   I'm using a piper file with an entry looking like
>   
>   add org.apache.ctakes.dictionary.lookup2.ae.OverlapJCasTermAnnotator
>
> DictionaryDescriptor=org/apache/ctakes/dictionary/lookup/fast/my_dict_v1.xml
>  
> Not sure if they're looked at any more but I also changed these xml files
> under desc as well.
>
>
>
> desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsOverlapLookupAnnotator.xml
>
>
> desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml
>
> But I can't even get it to fail to try to load mine.
> my log looks like this
>
>   30 Jul 2020 15:43:37  INFO AbstractJCasTermAnnotator - Exclusion
> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD
> VBG VBN VBP VBZ WDT WP WPS WRB
>   30 Jul 2020 15:43:37  INFO AbstractJCasTermAnnotator - Using minimum
> term text span: 3
>   30 Jul 2020 15:43:37  INFO AbstractJCasTermAnnotator - Using
> Dictionary Descriptor: org/apache/ctakes/dictionary/lookup/fast/
> sno_rx_16ab.xml
>
>
> Any suggestions?
>
> Regards,  Peter
>


Re: DefaultJCasTermAnnotator behavior with period and semicolon in UMLS terms [EXTERNAL]

2020-02-06 Thread Jeffrey Miller
Sean,

Thanks for the detailed answer- I will take a look and update this thread
if I find out the cause.

Jeff

On Thu, Feb 6, 2020 at 9:13 AM Finan, Sean 
wrote:

> Hi Jeff,
>
> I think that sentence splitting is possibly a cause for this behavior and
> is worth checking.
>
> You can get some quick debug output by adding a writer to the end of your
> pipeline.
>
> add pretty.plaintext.PrettyTextWriterFit SubDirectory=POS
>
> The SubDirectory= parameter is optional.
> This writer creates a file that (in part) lists output sentence -by-
> sentence.  So you should be able to see how the sentence splitter is
> behaving in each circumstance.
>
> If it is the Sentence Splitter then you could try using a different lookup
> window in the dictionary lookup and see if your results improve or get
> worse.  In the piper file, just insert above the Dictionary lookup addition
>
> set windowAnnotations=Section
>
> or
> set windowAnnotations=Paragraph
> if you are using a paragraph parser.
>
> Sean
>
>
> 
> From: Jeffrey Miller 
> Sent: Wednesday, February 5, 2020 12:24 PM
> To: dev@ctakes.apache.org
> Subject: DefaultJCasTermAnnotator behavior with period and semicolon in
> UMLS terms [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi,
>
> I've noticed that if a term contains a period or a semicolon, as an
> example, from the sno_rx_16ab dictionary, "antibody ; toxoplasma", that
> this will not be found if the semicolon is attached to the first word, but
> will be found if it is either "antibody ; toxoplasma" or "antibody
> ;toxoplasma". There is similar behavior with a period in the same place. My
> first instinct was that this had to do with the sentence splitter and
> sentences being the default lookup window. I found an older discussion
> about this in reference to periods in genes, but it was from a while back.
> Just curious if anyone has dealt with this issue.
>
> Thanks,
> Jeff
>


DefaultJCasTermAnnotator behavior with period and semicolon in UMLS terms

2020-02-05 Thread Jeffrey Miller
Hi,

I've noticed that if a term contains a period or a semicolon, as an
example, from the sno_rx_16ab dictionary, "antibody ; toxoplasma", that
this will not be found if the semicolon is attached to the first word, but
will be found if it is either "antibody ; toxoplasma" or "antibody
;toxoplasma". There is similar behavior with a period in the same place. My
first instinct was that this had to do with the sentence splitter and
sentences being the default lookup window. I found an older discussion
about this in reference to periods in genes, but it was from a while back.
Just curious if anyone has dealt with this issue.

Thanks,
Jeff


Re: Manually editing dictionary script file [EXTERNAL]

2020-01-09 Thread Jeffrey Miller
Great, thanks Sean.

On Thu, Jan 9, 2020 at 3:54 PM Finan, Sean 
wrote:

> Hi Jeff,
>
> There shouldn't be any problems doing that.
>
> And here is a secret 
> In the class DefaultTermConsumer there is the ability to read in a
> "blacklist" of terms that should be excluded.  This is in the trunk version
> of ctakes.
> If you are comfortable reading java code you can have a look and see if
> that is easier for you to use.
>
>
> Sean
>
> ____
> From: Jeffrey Miller 
> Sent: Thursday, January 9, 2020 3:32 PM
> To: dev@ctakes.apache.org
> Subject: Manually editing dictionary script file [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi,
>
> Are there any issues I have to be on the look out for if I want to remove a
> few synonyms from the CUI_TERMS table in the .script file created by the
> dictionary creator gui? Is there any concern about corrupting the rare term
> look up?
>
> If there is another way to suppress certain synonyms for a CUI that would
> be fine too, but as far as I know I can only augment a dictionary with a
> BSV file, not take away a synonym.
>
> Thanks,
> Jeff
>


Manually editing dictionary script file

2020-01-09 Thread Jeffrey Miller
Hi,

Are there any issues I have to be on the look out for if I want to remove a
few synonyms from the CUI_TERMS table in the .script file created by the
dictionary creator gui? Is there any concern about corrupting the rare term
look up?

If there is another way to suppress certain synonyms for a CUI that would
be fine too, but as far as I know I can only augment a dictionary with a
BSV file, not take away a synonym.

Thanks,
Jeff


Re: How does cTAKES work? [EXTERNAL]

2019-12-17 Thread Jeffrey Miller
Akram,

The .xmi format that the ctakes utilities output is a XML serialized
version of the JCAS (
https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/jcas/JCas.html)
which contains all the information that each annotation engine in your
cTAKES pipeline extracted. You can either parse the XMI yourself and output
whatever format you need, or you can manually run a piper file via Java
code which will give you access to the JCAS object in memory (prior to .xmi
serialization) and you can extract the data and serialize it however you
want. I think the latter is easier, but either would work.

The code for the ctakes-web-rest web service will show you one way to run
compose a ctakes pipeline via code (there are a number of ways):
https://github.com/apache/ctakes/blob/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CtakesRestController.java#L47


and then I found this code from Tim helpful in figuring out how to extract
specific information from a JCAS object. The apache uima documentation
would probably also be helpful
https://github.com/tmills/merlot-negation/blob/master/src/main/java/fr/limsi/talmed/negation/BratMerlotWriter.java#L33

Jeff

On Tue, Dec 17, 2019 at 7:40 AM Akram  wrote:

>  Many thanks for answering me,
>
> cTAKES is the core of my research and I am stuck
>
> How can I generate cTAKES without CVD? is there a command or GUI for that?
> How to generate other format such as html or marked text?
>
> 
> The way I know is :
> There is misunderstanding for sure here.
> We feed CVD with text such as "This patient has diabetes and no signs for
> kidney failure"
> we also provide the Run > Load AE with the pipeline we are going to use.
> Once we click on Run AE, cTakes work and analyse the provided text.
> Then we save the results as .XMI which can be taken to any tool suck as
> UIMA to display results visually
> am I right here?
> Thanks
>
> On Tuesday, 17 December 2019, 02:54:03 am AEDT, Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
>  Hi Akram,
>
> Gandhi has provided some good links, and I agree that you should read that
> information.
> In case you haven't found it, there is also a "quick start" manual is on
> this page:  https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0
> Under "Documentation", there is a download named "A pamphlet/manual on
> cTAKES basics".
> It was meant to have an accompanying human tutor, but it does contain some
> handy information.
>
> > I can get CVD run on the binary version of cTAKES.
> -- Excellent!
> > but I have problem on the Developer version.
> -- Are you using an IDE?  There might be a maven profile listed named
> "runCVD".  You can try to compile ctakes with that profile.
> From a command line: "mvn compile -PrunCVD"
> -- Regardless, if you've already built a binary then at least you can run
> it there.  The CVD is not a ctakes product, but is bundled with uima.  So
> if you change ctakes code CVD will still remain the same.
> --  https://uima.apache.org/d/uimaj-current/tools.html#ugr.tools.cvd
>
> I think that there is a misunderstanding here:
> >When I try to Load AE on the CVD (Development Version) I get this error
> -- The CVD is meant to display output from ctakes.
> -- If you run ctakes to produce an .xmi file(s) then you can load the .xmi
> file into the CVD and view what ctakes discovered in the document.
>
> While the CVD is very good for debugging and roaming details, you can also
> produce simpler output types such as html and marked text.
> Other output types might be easier for new users, and they do not require
> running a second tool (CVD).
>
> Sean
>
>
>
>
>
>
>
>
>
> 
> From: Akram 
> Sent: Sunday, December 15, 2019 6:35 AM
> To: dev@ctakes.apache.org
> Subject: Re: How does cTAKES work? [EXTERNAL]
>
> Warning: Email originated outside Boston Children's. Don't click
> links/attachments unless you know sender & content seems safe.
>
>
> **
>  Thanks Gandhi
> I can get CVD run on the binary version of cTAKES.
> but I have problem on the Developer version.
> When I try to Load AE on the CVD (Development Version) I get this error
> When I try to load : AggregatePlaintextProcessor.xml
> I get Error : org.apache.uima.resource.ResourceInitializationException:
> More detailed information in the log file
>
> When I try to load : AggregatePlaintextFastUMLSProcessor.xml
> I get Error : org.apache.uima.resource.ResourceInitializationException: an
> import could not be resolved. No file with name
> "org/apache/ctakes/drugner/types/TypeSystem.xml"  was found in the class
> path or data path
> (Descriptor:file:/D:/cTAKES/ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml)
> More detailed information in the log file
>
> P.S. I changed CTAKES_HOME to D:\cTAKES which is the development folder
> that has 

Re: Relating MeasurementAnnotations to other IdentifiedAnnotations

2019-08-23 Thread Jeffrey Miller
Thank you Peter and Tim, your responses were very helpful.

On Tue, Aug 20, 2019, 5:01 PM Peter Abramowitsch 
wrote:

> Hi Jeff
>
> I've experimented with three approaches.
>
> One is with the LabValueFinder which is included in the cTakes release -
> that looks specifically for values associated with LabMentions.  It also
> has an "eager" mode where it converts some MedicationMentions into
> LabMentions, when the context seems right.  O2, Sodium etc.   I can't say
> it works all that well and it is not capable of many different semantic
> forms of the Name/Value association.  It is also too eager.. sometimes
> creating LabMentions out of Medications when it shouldn't.
>
> Another approach was to use something like Stanford's TokensRegex, that
> allows you to construct regex-like rules where the segments are not strings
> but Tokens, where you can query the attributes like POS, and NER .   For
> Ctakes I had to adapt a UIMA package that must have been someone's thesis
> project from the university of Nantes.
>
> Copyright 2015 - CNRS (Centre National de Recherche Scientifique)
> package fr.univnantes.lina.uima.tkregex
>
> What I have is not ready for prime time and is still very rough.  It works
> well but only for a limited set of rules
>
> I used it to create a vitals detector.  Here's a snippet of the rules that
> this package loads in at runtime, that creates an annotation called WGT
> given these matchers
> matcher NUM: [ postag == "CD" ];
> matcher BE: [ lemma == "be" | lemma == "at"];
> matcher WT: /(?i)^wt|^weight/;
> matcher WUOM: /(?i)^kg|^lb|^pounds/;
> term "WGT": WT BE? SYM? NUM WUOM;
>
> The last approach was a home-built mechanism using the ConllDependencyNode
> collection and the RelationArguments to detect the same connection between
> certain typed pairs of Identified annotations.
>
> Problem is. I've always been in prototyping mode and never had time to push
> these methods to production ready status
>
> Peter
>
> On Tue, Aug 20, 2019 at 1:15 PM Jeffrey Miller  wrote:
>
> > Hi,
> >
> > Is there any configuration or component in cTAKES that can be used to
> > attribute a measurement annotation to another annotation that it applies
> > to? For example, for "2 mm incision" where we relate "2 mm" to
> "incision"?
> > It looks like there might be a roundabout way to find the head of the
> span
> > of the MeasurementAnnotation in the output of the dependency parser, but
> I
> > was wondering if this has been explored before? Perhaps the
> > RelationExtractor component?
> >
> > I also have another more general question if anyone can help- how does
> the
> > structure of the cTAKES type system effect how cTAKES works? I am looking
> > for a general intuition of how the structure of the typesystem drives the
> > larger cTAKES architecture?
> >
> > Thanks!
> > Jeff
> >
>


Relating MeasurementAnnotations to other IdentifiedAnnotations

2019-08-20 Thread Jeffrey Miller
Hi,

Is there any configuration or component in cTAKES that can be used to
attribute a measurement annotation to another annotation that it applies
to? For example, for "2 mm incision" where we relate "2 mm" to "incision"?
It looks like there might be a roundabout way to find the head of the span
of the MeasurementAnnotation in the output of the dependency parser, but I
was wondering if this has been explored before? Perhaps the
RelationExtractor component?

I also have another more general question if anyone can help- how does the
structure of the cTAKES type system effect how cTAKES works? I am looking
for a general intuition of how the structure of the typesystem drives the
larger cTAKES architecture?

Thanks!
Jeff


Re: Struggling initializing

2019-08-10 Thread Jeffrey Miller
Sebastien,

Just wanted to confirm that you have the sno_rx_16ab.script file
in org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/


Jeff

On Sat, Aug 10, 2019, 2:16 PM gandhi rajan  wrote:

> Sorry Sebastien I still don't get what you are trying to do.
>
> On Saturday, August 10, 2019, Sebastien Boussard  wrote:
>
> > Hello Mr. Rajan,
> > I have realized that I have sent you no context! I am currently working
> on
> > the Process Lines Clinical Runner. Previously, I was having many errors
> > with the directories. I made a link from my resources folder to the
> apache
> > takes resources folder. I have no link between the source code and the
> user
> > interface.
> >
> > Here is the code:
> >
> > import java.io.File;
> > import java.io.IOException;
> >
> >
> > import org.apache.ctakes.core.cr.LinesFromFileCollectionReader;
> > import org.apache.ctakes.core.pipeline.EntityCollector;
> > import org.apache.ctakes.core.pipeline.PipelineBuilder;
> > import org.apache.ctakes.core.pipeline.PiperFileReader;
> > import org.apache.ctakes.core.resource.FileLocator;
> > import org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator;
> > import org.apache.uima.UIMAException;
> > import org.apache.log4j.Logger;
> > final public class ClinicalProcessor {
> >
> >
> > static private final Logger LOGGER = Logger.getLogger("
> > ClinicalProcessor");
> >
> > static private final  String PIPER_FILE_PATH =
> > "/Users/sboussard/Desktop/apache-ctakes-4.0.0/resources/
> > org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.piper";
> >
> > static private final String INPUT_FILE_PATH =
> > "/Users/sboussard/Desktop/apache-ctakes-4.0.0/resources/
> > org/apache/ctakes/examples/notes/right_knee_arthroscopy";
> >
> > private ClinicalProcessor() {
> > }
> >
> >  public static void main( final String[] args ) {
> >  System.out.println(PIPER_FILE_PATH);
> >
> >   try {
> >  // Create a piper file reader, but don't load
> the
> > piper yet - we want to create a reader with parameters
> >  final PiperFileReader reader = new
> > PiperFileReader();
> >  final PipelineBuilder builder =
> > reader.getBuilder();
> >  // Add the Lines from File reader
> >  //final File inputFile = FileLocator.locateFile(
> > INPUT_FILE_PATH );
> >  //final File inputFile = FileLocator.getFile(
> > INPUT_FILE_PATH );
> >  final File inputFile = new
> File("/Users/sboussard/
> > Desktop/ClampMac_1.6.0/workspace/MyPipeline/clamp-
> > ner/Data/Input/sample_2788.txt");
> >  builder.reader( LinesFromFileCollectionReader.
> > class,
> >
> LinesFromFileCollectionReader.PARAM_INPUT_FILE_NAME,
> > inputFile.getAbsolutePath() );
> >  // Add the lines from the piper file
> >  reader.loadPipelineFile( PIPER_FILE_PATH );
> >  // Collect IdentifiedAnnotation object
> > information for output - simple for examples
> >  builder.collectEntities();
> >  // Run the pipeline with specified text
> >  builder.run();
> >  // Log the IdentifiedAnnotation object
> information
> >  LOGGER.info( "\n" +
> EntityCollector.getInstance().toString()
> > );
> >   } catch ( IOException | UIMAException multE ) {
> >  LOGGER.error( multE.getMessage() );
> >   }
> >}
> >
> >
> > }
> >
> > Thank you for all your help,
> > Sebastien Boussard
> >
> > > On Aug 10, 2019, at 3:00 AM, gandhi rajan 
> > wrote:
> > >
> > > As far as I know, it's a more generic error. Could you please let us
> know
> > > what action you are trying to perform and steps involved in reproducing
> > the
> > > issue.
> > >
> > > On Saturday, August 10, 2019, Sebastien Boussard 
> > wrote:
> > >
> > >> Hello,
> > >> I’m an intern in the Stanford Biomedical Informatics Lab and I've been
> > >> working on getting a ctakes page for a week, and I’ve been getting a
> > lot of
> > >> errors. I have been getting a filed to initialize error for the last
> day
> > >> and a half and I can not solve it. I will send you the whole log, if
> you
> > >> can help me out it would be greatly appreciated.
> > >>
> > >> log4j: reset attribute= "false".
> > >> log4j: Threshold ="null".
> > >> log4j: Retreiving an instance of org.apache.log4j.Logger.
> > >> log4j: Setting [ProgressAppender] additivity to [false].
> > >> log4j: Level value for ProgressAppender is  [INFO].
> > >> log4j: ProgressAppender level set to INFO
> > >> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> > >> log4j: Parsing layout of class: 

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

2019-06-25 Thread Jeffrey Miller
Hi Sean,

Thanks for the clarification, I think that help explains some of the
unexpected synonyms that appear in the sno_rx_16ab dictionary (for example,
DM for diabetes mellitus is coming in from another ontology (could be
MEDCIN) that was installed as part of UMLS, it was not manually added to
sno_rx_16ab). I suspect this confusion stems from people who only installed
the subset of UMLS they were interested in, like only installing snomed and
rxnorm using Metamorphsys. If you do that and compare the resulting cTAKES
dictionary to the sno_rx_16ab it will be missing many synonyms. I did
realize where the "diabete mellitus" was coming from- this is from the
Consumer Health Vocabulary (CHV, also part of UMLS), which intentionally
contains common misspellings and other term usages (see
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/CHV/). One
thing I noticed- there appears to be a reconciliation process when
processing synonyms from other ontologies in the dictionary creator. It
seems like it tries to reduce the number of synonyms for a term if there
seems to be coverage for the text span of one term with another in the same
CUI, but the result can sometimes be a little odd. For example, when you
choose snomed and rxnorm, but have other ontologies available for synonyms,
I think 'diabetes' (from another ontology, MEDCIN for one, but mapped to
the same CUI) ends up consuming "diabetes mellitus", so that term does not
actually appear (you can see this in sno_rx_16ab), but "diabete mellitus"
does persist (likely because diabetes is not a subset of that string).

grep -i "'diabetes mellitus'" sno_rx_16ab.script
INSERT INTO PREFTERM VALUES(11849,'Diabetes Mellitus')

There other examples of similar issues- for example, CUI 729346, "juvenile
osteochondrosis" is present in a dictionary if created with only snomed
installed, but if you also install CHV, it does not make it into the final
dictionary, only these do:

729346|2|3|osteochondropathy - juven|juven
729346|1|2|osteochondritis juvenilis|juvenilis
729346|1|2|juvenile osteochondritis|osteochondritis

A specific example that I have run into involves HPO alone versus a
dictionary created when Snomed was also available for synonyms. In that
case there are a few oddities that arise. For example, "severe short
stature", which is in the HPO, does not make it into the dictionary when
Snomed is installed alongside it using Metamorphsys, but is in there if HPO
alone is installed.

Out of curiosity, is there a practical difference in the resulting cTAKES
dictionary if you select the Source and Target column for a one ontology
(and nothing else), versus selecting the Source and Target columns for one
ontology and just the Source of all other ontologies installed? I know that
with the Source of all the ontologies checked, the ontology terms all end
up in the CUI_TERMS table, but since they aren't in the any target table,
would the effect be the same as leaving them unchecked (the synonyms of the
unchecked ontologies would be matched when running cTAKES if they were of
the same CUI as the selected ontology)?

Thanks,
Jeff

On Mon, Jun 24, 2019 at 10:58 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> The dictionary creator uses the CUI set from selected sources, but
> synonyms from all available sources for CUIs in that set.
>
> I am not sure what is going on with the 's' in "diabetes".  A grep for
> "diabetes mellitus" and "diabete mellitus" in the umls mrconso file might
> have a hint.  Perhaps some code thinks that it is fixing a plural term?
>
> Sean
> 
> From: Jeffrey Miller 
> Sent: Tuesday, June 18, 2019 10:23 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Thanks Sean. I actually think I figured out what is causing the difference.
> When I create the UMLS install on my machine, I only install RxNorm and
> SNOMEDCT_US, so when I use the dictionaryCreator GUI, there are only those
> two sources on the left. I noticed in the screenshots on the wiki page for
> the dictionary creator GUI that many sources were installed, but only
> SNOMEDCT_US and RxNorm were selected. So, I tried installing all of the
> active UMLS set (but still only selecting RxNorm and SNOMEDCT_US in the
> dictionaryCreator GUI) and it made a difference as to which terms appeared
> in the final cTAKES dictionary. As an example, I now get the "DM" entry for
> diabetes. I don't know why this should make a difference, but it appears
> that it does.
>
> Another odd observation related to this. In the sno_rx_2016ab file, I
> noticed there seems to be an error:
> INSERT INTO CUI_TERMS VALUES(11849,0,2,'diabete melli

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

2019-06-18 Thread Jeffrey Miller
Thanks Sean. I actually think I figured out what is causing the difference.
When I create the UMLS install on my machine, I only install RxNorm and
SNOMEDCT_US, so when I use the dictionaryCreator GUI, there are only those
two sources on the left. I noticed in the screenshots on the wiki page for
the dictionary creator GUI that many sources were installed, but only
SNOMEDCT_US and RxNorm were selected. So, I tried installing all of the
active UMLS set (but still only selecting RxNorm and SNOMEDCT_US in the
dictionaryCreator GUI) and it made a difference as to which terms appeared
in the final cTAKES dictionary. As an example, I now get the "DM" entry for
diabetes. I don't know why this should make a difference, but it appears
that it does.

Another odd observation related to this. In the sno_rx_2016ab file, I
noticed there seems to be an error:
INSERT INTO CUI_TERMS VALUES(11849,0,2,'diabete mellitus','diabete')

The 's' is missing from diabetes. When I created my dictionary (from the
restricted UMLS install, but still 2016ab) the cTAKES dictionary entry for
that term is correct:
INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')

When I created the dictionary from the full cTAKES install tonight, that
error appeared again.

Jeff



On Mon, Jun 17, 2019 at 8:08 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> Thanks for doing the research.  Since the sno_rx_16ab was made 3+ years
> ago I can't swear to any of those filter sets being exactly what was used.
>
> I think that the key to working with any project is to check the
> dictionary against a project's needs.  Fill in the gaps by either editing
> the sql (.script) file or by adding a second dictionary.  In smaller
> "focus" projects I usually end up augmenting the default dictionary with a
> small custom bsv dictionary to catch any known synonyms or terms that
> aren't represented in the default.  In projects requiring larger nets I
> have built dictionaries that are horribly inclusive - 2 to 3 times the
> sno_rx_16ab.
>
> Sean
> 
> From: Jeffrey Miller 
> Sent: Monday, June 17, 2019 4:39 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Thanks for following up Sean. I've looked into the links you sent along.
> There are different groups of filters and it appears that the
> dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
> directory. I don't think this is the set of filters used to make
> sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
> veterinary product.  310367) in "UnwantedTexts.txt", but the
> sno_rx_16ab.script file has "today" still in there. If you create a
> dictionary with the dictionary builder, it does not include that term.
>
> I thought maybe the set of files under the "default" filter directory might
> be the one used for the sno_rx_16ab package so I recompiled the
> dictionaryCreator GUI to use the "default" filter files and created a new
> snomed rxnorm dictionary from the 2016ab umls release, but the output is
> still quite different that the packaged sno_rx_16ab dictionary. From
> looking at diffs, it looks like there are a substantial number of additions
> to the sno_rx_16ab, so much so that I really must be missing something. For
> example, for CUI 12169 which describes a low sodium diet, there are about
> 27 CUI terms in sno_rx_16ab.script, but in the script generated by the
> dictionaryGUI there are only 7 (with the "tiny" or "default" filter
> groups).
>
> On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet 
> wrote:
>
> > Thanks for the clarifications, Sean. That was very enlightening. I look
> > forward to the documentation (even if it entails some suffering on your
> > part.)
> >
> > If/when you stumble on some idle time allowing you to implement the
> manual
> > edit panel, it would be nice to have it allow for re-partitioning the
> > ontology. As you are very aware, UMLS CUIs and SNOMED do not always have
> a
> > one-to-one correspondence resulting in a CUI matching multiples SNOMEDs
> or
> > a SNOMED being mapped to several CUIs.
> >
> > In some cases, clinicians don't agree with that partitioning in
> specialized
> > contexts and the inheritance that ensues and would like to re-assign
> them.
> >
> > Not holding my breath, but just something to keep in mind.
> >
> >   Remy
> >
> > On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi Jeff,
> > >
&g

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

2019-06-17 Thread Jeffrey Miller
Thanks for following up Sean. I've looked into the links you sent along.
There are different groups of filters and it appears that the
dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
directory. I don't think this is the set of filters used to make
sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
veterinary product.  310367) in "UnwantedTexts.txt", but the
sno_rx_16ab.script file has "today" still in there. If you create a
dictionary with the dictionary builder, it does not include that term.

I thought maybe the set of files under the "default" filter directory might
be the one used for the sno_rx_16ab package so I recompiled the
dictionaryCreator GUI to use the "default" filter files and created a new
snomed rxnorm dictionary from the 2016ab umls release, but the output is
still quite different that the packaged sno_rx_16ab dictionary. From
looking at diffs, it looks like there are a substantial number of additions
to the sno_rx_16ab, so much so that I really must be missing something. For
example, for CUI 12169 which describes a low sodium diet, there are about
27 CUI terms in sno_rx_16ab.script, but in the script generated by the
dictionaryGUI there are only 7 (with the "tiny" or "default" filter groups).

On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet 
wrote:

> Thanks for the clarifications, Sean. That was very enlightening. I look
> forward to the documentation (even if it entails some suffering on your
> part.)
>
> If/when you stumble on some idle time allowing you to implement the manual
> edit panel, it would be nice to have it allow for re-partitioning the
> ontology. As you are very aware, UMLS CUIs and SNOMED do not always have a
> one-to-one correspondence resulting in a CUI matching multiples SNOMEDs or
> a SNOMED being mapped to several CUIs.
>
> In some cases, clinicians don't agree with that partitioning in specialized
> contexts and the inheritance that ensues and would like to re-assign them.
>
> Not holding my breath, but just something to keep in mind.
>
>   Remy
>
> On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > >1) ...
> > There are several collections of filter sets here:
> > ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
> >
> > 2) ...
> > There is additional logic within the dictionary creator code:
> > ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
> >
> > I haven't gone through it in a really long time, and without doing so now
> > I can't enumerate the filters.  I have family visiting, otherwise my
> > curiosity would force me to do so and get back to you.   Honestly, it
> > should be documented somewhere, but writing (especially technical) is
> > pretty much my least favorite activity.
> >
> > Sean
> >
> >
> > p.s.
> > Please don't wait for it, but I am currently working on new dictionary
> > code and plan to introduce that in ctakes.  Again, please don't wait for
> it
> > as it is mixed in with other work and will not be available for several
> > months (if at all).
> >
> >
> > 
> > From: Jeffrey Miller 
> > Sent: Sunday, June 16, 2019 9:49 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > sno_rx16ab from sourceforge [EXTERNAL]
> >
> > Hi Sean,
> >
> > Thanks for your response. I had two follow-up questions that would be
> very
> > helpful to understand if you have a few moments:
> >
> > 1) Are the specific filters used in the official sno_rx_16ab codified
> > anywhere so that I could reproduce them?
> >
> > 2) Do these filters explain all the changes? For example, when I use the
> > dictionary creator to export sno_med and rx_norm, I only get "diabetes
> > mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> > Especially with the addition of "dm" it feels like I must be missing a
> step
> > or a setting somewhere.
> >
> > Thanks!
> > Jeff
> >
> > On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi all,
> > >
> > > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed
> and
> > > rxnorm terms with certain symantic types.  Nothing was added, but
> > synonyms
> > > are filtered based upon various rules.  For instance, unnecessary
> > suffixes
> > > are remo

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

2019-06-16 Thread Jeffrey Miller
Hi Sean,

Thanks for your response. I had two follow-up questions that would be very
helpful to understand if you have a few moments:

1) Are the specific filters used in the official sno_rx_16ab codified
anywhere so that I could reproduce them?

2) Do these filters explain all the changes? For example, when I use the
dictionary creator to export sno_med and rx_norm, I only get "diabetes
mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
Especially with the addition of "dm" it feels like I must be missing a step
or a setting somewhere.

Thanks!
Jeff

On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi all,
>
> The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and
> rxnorm terms with certain symantic types.  Nothing was added, but synonyms
> are filtered based upon various rules.  For instance, unnecessary suffixes
> are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded
> ("can walk straight line with only minimal assistance"), terms with dose or
> form are ignored and so forth.
>
> Some filters can be changed by adding/removing from prefix/suffix/contains
> lists in plaintext files or by modifying the dictionary creator code.
>
> There was no manual curation (or nothing major).  As Remy mentioned that
> requires a lot of attention and time.  The dictionary database was not
> intended to be perfect, just as good as possible without major investment -
> and reproducible with updates to the umls.
>
> As the dictionary is released as a sql database, you should be able to add
> and remove fairly easily if sql savvy.  I have long wanted to add a "manual
> edit" panel to the dictionary gui, but haven't had the time.  If anybody
> else would like to work on such a tool that would be tonic.
>
> Sean
>
>
> 
> From: Harish Kulkarni 
> Sent: Saturday, June 15, 2019 5:16 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> unsubscribe
>
> On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet 
> wrote:
>
> > Yes, I agree it would be nice because the tokenization that occurs when
> > creating the dictionaries from the releases make comparisons a bit tricky
> > and is not 100% reversible. I would love to hear an answer to your
> > quandary.
> >
> >  Remy
> >
> > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller 
> wrote:
> >
> > > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > > dictionary had put the differences applied to the default UMLS output
> > into
> > > version control in some form. I imagine the
> > > additions/synonyms/abbreviations that were added manually must have
> been
> > > collected over time somewhere prior to merging them with 2016ab UMLS
> > > release? I basically want to recreate the default cTAKES 4.0.0 release
> > with
> > > an additional ontology and the latest terms. I can likely come up with
> a
> > > diff myself but was wondering if this was already maintained as part of
> > > cTAKES.
> > >
> > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet  >
> > > wrote:
> > >
> > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > dictionary,
> > > > but to put in corrections because, lo and behold, there are some
> errors
> > > in
> > > > there!. As you know, an ontology is a constant curation job and that
> > > > script, under SCM, allows you to isolate those changes and, if
> > necessary,
> > > > re-apply them to new versions.
> > > >
> > > >   Remy
> > > >
> > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> gandhiraja...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jeff,
> > > > >
> > > > > As far as I know, maintaining a separate SQL script to add
> additional
> > > > > entries should work seamlessly.
> > > > >
> > > > > On Saturday, June 15, 2019, Jeffrey Miller 
> > wrote:
> > > > >
> > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > dictionary
> > > > > > itself) so they can be carried forward in future dictionary
> > updates?
> > > > > >
> > > > > >

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

2019-06-15 Thread Jeffrey Miller
Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
dictionary had put the differences applied to the default UMLS output into
version control in some form. I imagine the
additions/synonyms/abbreviations that were added manually must have been
collected over time somewhere prior to merging them with 2016ab UMLS
release? I basically want to recreate the default cTAKES 4.0.0 release with
an additional ontology and the latest terms. I can likely come up with a
diff myself but was wondering if this was already maintained as part of
cTAKES.

On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet 
wrote:

> Yes, that's pretty much what we do too. Not only to enhance the dictionary,
> but to put in corrections because, lo and behold, there are some errors in
> there!. As you know, an ontology is a constant curation job and that
> script, under SCM, allows you to isolate those changes and, if necessary,
> re-apply them to new versions.
>
>   Remy
>
> On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan 
> wrote:
>
> > Hi Jeff,
> >
> > As far as I know, maintaining a separate SQL script to add additional
> > entries should work seamlessly.
> >
> > On Saturday, June 15, 2019, Jeffrey Miller  wrote:
> >
> > > Thanks Remy. Does anyone know if these manually curated
> > > modifications/synonyms are tracked anywhere (aside from the dictionary
> > > itself) so they can be carried forward in future dictionary updates?
> > >
> > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet 
> > > wrote:
> > >
> > > > From my experience, it seems pretty obvious that sno_rx_16ab is a
> > curated
> > > > dictionary based on the SNOMED 2016AB release. It does not contain
> the
> > > full
> > > > set but it has additional edits and synonyms that are pretty useful
> > > > (including 'dm').
> > > >
> > > > We have had to manage those mods as an adjunct.
> > > >
> > > >   Remy
> > > >
> > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller 
> > > wrote:
> > > >
> > > > > Hi,
> > > > > I have created a custom dictionary from the latest UMLS release
> with
> > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating
> > > > .script
> > > > > file with unexpected differences as compared to the sno_rx_16ab
> file
> > > > > available as part of the cTAKES release. Specifically, for
> diabetes,
> > it
> > > > is
> > > > > missing these two rows:
> > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > >
> > > > > and only has this one:
> > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > mellitus','mellitus')
> > > > >
> > > > > The end result is that "diabetes" is not being picked up in the
> test
> > > > text I
> > > > > am running through- it requires the full 'diabetes mellitus'.
> > > > >
> > > > > Is there any setting on the UMLS install side or the ctTAKES
> > dictionary
> > > > > creator that could account for missing alternative forms like this?
> > > I've
> > > > > tried downloading the 2016AB release (which I think is the one used
> > to
> > > > > create the bundled sno_rx_16ab package?) and I am not getting the
> > > > alternate
> > > > > forms in that dictionary either.
> > > > >
> > > > > Thanks,
> > > > > Jeff
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > Gandhi
> >
> > "The best way to find urself is to lose urself in the service of others
> > !!!"
> >
>


Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

2019-06-15 Thread Jeffrey Miller
Thanks Remy. Does anyone know if these manually curated
modifications/synonyms are tracked anywhere (aside from the dictionary
itself) so they can be carried forward in future dictionary updates?

On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet 
wrote:

> From my experience, it seems pretty obvious that sno_rx_16ab is a curated
> dictionary based on the SNOMED 2016AB release. It does not contain the full
> set but it has additional edits and synonyms that are pretty useful
> (including 'dm').
>
> We have had to manage those mods as an adjunct.
>
>   Remy
>
> On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller  wrote:
>
> > Hi,
> > I have created a custom dictionary from the latest UMLS release with
> > SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating
> .script
> > file with unexpected differences as compared to the sno_rx_16ab file
> > available as part of the cTAKES release. Specifically, for diabetes, it
> is
> > missing these two rows:
> > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> >
> > and only has this one:
> > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')
> >
> > The end result is that "diabetes" is not being picked up in the test
> text I
> > am running through- it requires the full 'diabetes mellitus'.
> >
> > Is there any setting on the UMLS install side or the ctTAKES dictionary
> > creator that could account for missing alternative forms like this? I've
> > tried downloading the 2016AB release (which I think is the one used to
> > create the bundled sno_rx_16ab package?) and I am not getting the
> alternate
> > forms in that dictionary either.
> >
> > Thanks,
> > Jeff
> >
>


Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

2019-06-14 Thread Jeffrey Miller
Hi,
I have created a custom dictionary from the latest UMLS release with
SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating .script
file with unexpected differences as compared to the sno_rx_16ab file
available as part of the cTAKES release. Specifically, for diabetes, it is
missing these two rows:
INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')

and only has this one:
INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')

The end result is that "diabetes" is not being picked up in the test text I
am running through- it requires the full 'diabetes mellitus'.

Is there any setting on the UMLS install side or the ctTAKES dictionary
creator that could account for missing alternative forms like this? I've
tried downloading the 2016AB release (which I think is the one used to
create the bundled sno_rx_16ab package?) and I am not getting the alternate
forms in that dictionary either.

Thanks,
Jeff


Re: MySQL web rest version unstable (with question) and note about official web rest Dockerfile

2019-05-30 Thread Jeffrey Miller
Hi Matthew,

I don't know if you've run into this issue, but one of the problems I had
when playing with the service was that the code allowing the HTTP request
to switch between Pipelines did not work when using the TS components if
different dictionaries were used in each pipeline. I think the
NER/FastLookup component only gets loaded once in memory and whichever
dictionary was used in the first pipeline is used across both.

Out of curiosity, what is the reason for using MySQL over HSQLDB- is it to
consume less RAM?

Thanks,
Jeff



On Wed, May 29, 2019 at 9:21 PM Matthew Vita 
wrote:

> Hi Gandhi, Tim, Sean, and Community,
>
> I’ve been fixing up some of the README instructions for
> https://github.com/GoTeamEpsilon/ctakes-rest-service on my local.
> Unfortunately, it’s not working in its current state. I'm still debugging
> it - is svn co https://svn.apache.org/repos/asf/ctakes/trunk@1850060
> ctakes
> still the best version of cTAKES to base web-rest on?
>
> Also, it looks like the ctakes-web-rest Dockerfile in the official
> repository is pointing to a broken Tomcat link:
>
> *“The requested URL
> /pub/software/apache/tomcat/tomcat-9/v9.0.14/bin/apache-tomcat-9.0.14.zip
> was not found on this server.”*
>
> There appear to be updated releases here:
> http://mirror.cc.columbia.edu/pub/software/apache/tomcat/tomcat-9/ - hope
> that helps.
>
>
> Talk soon,
> Matthew
>


Re: RxNorm and Orange book [EXTERNAL]

2019-05-10 Thread Jeffrey Miller
Thanks Sean,

I am just looking to duplicate the default functionality of cTAKES (with
the addition of one more dictionary from UMLS- HPO) and wanted to make sure
I had not inadvertently left out something by using a custom dictionary.


On Fri, May 10, 2019 at 10:11 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> The 4. default dictionary no longer filters rxnorm to only contain orange
> book terms.
>
> If you want to filter your rxnorm terms using orange book it is possible
> by filtering your umls db before using the dictionary creator or by culling
> rxnorm terms in the created dictionary.
>
> It might be easier to just ignore non-orange book cuis during analysis
> (post-ctakes).
>
>
> https://www.fda.gov/drugs/drug-approvals-and-databases/approved-drug-products-therapeutic-equivalence-evaluations-orange-book
>
> Sean
> ____
> From: Jeffrey Miller 
> Sent: Friday, May 10, 2019 9:46 AM
> To: dev@ctakes.apache.org
> Subject: RxNorm and Orange book [EXTERNAL]
>
> Hi,
>
> Does cTAKES 4.0.0 still make use of the Orange Book and RxNorm_Index as
> indicated in the docs here:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.1-2BDictionaries-2Band-2BModels=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=AM_4Qph_pbOQcEpv6TwWx_Rs0Yr4GNuWRGh_COiU4lI=RNX2CGey1QeHhYIL6xr2Gj9PQuH_3BEbWGBHolx1fLM=
> ?
>
> I ask because we are building our own dictionary via the dictionary creator
> GUI and have included RxNorm from the UMLS and wasn't sure what role the
> Orange Book and index play and if I need to do something special to include
> them when not using the ctakesresources hosted on sourceforge.
>
> Thanks,
> Jeff
>


RxNorm and Orange book

2019-05-10 Thread Jeffrey Miller
Hi,

Does cTAKES 4.0.0 still make use of the Orange Book and RxNorm_Index as
indicated in the docs here:
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+Dictionaries+and+Models
?

I ask because we are building our own dictionary via the dictionary creator
GUI and have included RxNorm from the UMLS and wasn't sure what role the
Orange Book and index play and if I need to do something special to include
them when not using the ctakesresources hosted on sourceforge.

Thanks,
Jeff


Re: Threading and cTAKES (on Spark) [EXTERNAL]

2019-03-28 Thread Jeffrey Miller
Thanks again Sean, that is all very helpful.

On Thu, Mar 28, 2019 at 4:20 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> > 1) do you think it might not crash yet produce unreliable results when
> using the components in the DefaultClinicalPipeline?
>
> -- I am pretty certain that you would get unreliable results.  I seem to
> recall attempts with the default pipeline crashing, but with a small corpus
> one could get lucky.
>
> > 2) Do you have any more information about [Spark]
>
> -- No, not really.  I don't work with it, I am just regurgitating from
> memory things read or heard.
>
> > 3) In the TS pipelines, what does the "threads" keyword ...
>
> -- "threads" specifies how many threads share a single pipeline.
> -- All annotators in this pipeline must be thread-safe.
> -- It is up to that single instance of a pipeline to be thread safe.
> "threads" does not enforce anything.
> -- "threads n" will attempt to process a maximum of n documents
> simultaneously on a pipeline.
> -- "threads n" works by running the single pipeline on n threads and
> running a single document through the pipeline on each thread.
> -- It is entirely up to the pipeline to determine the concurrency of
> processing documents.
> -- The more thread-safe annotators that don't require locking, the more
> utilized the threads will be.
>
> I hope that makes sense.
>
>
>
> 
> From: Jeffrey Miller 
> Sent: Thursday, March 28, 2019 3:51 PM
> To: dev@ctakes.apache.org
> Subject: Threading and cTAKES (on Spark) [EXTERNAL]
>
> Hi,
>
> I am following up on a discussion previously in the "re: ctakes web
> service" thread from this month. Apologies if I summarize anyone's comments
> incorrectly. Sean had commented that it would not be advisable to create a
> pool of pipelines and dispatch 1 per thread in the same process because the
> individual AEs have static variables and resources that would be shared
> across instances. I can comment that anecdotally, we have not seen crashes
> when doing this (but we have seen crashes when we are trying to share 1
> pipeline across > 1 thread). Nevertheless, I cannot guarantee that the
> annotations are happening correctly all the time or that we might not
> occasionally get unlucky and enter into a race condition. It also sounds
> like from Peter's comment in the previous thread,
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_93da8248b03b1c59135fb9b4030b0546a4631ec32d6f5c779d2821cc-40-253Cdev.ctakes.apache.org-253E=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=uYabaJeyLV-qVc3xJyB-6w9LVawSFytQEU37NnkdHV0=bwkSz7ZhmUnXJZmcm7zVEKuaMpsv_IH-Xs-UYZU3u3M=
> that a pipeline pool across multiple threads has been stable for his work.
> I have a couple of questions:
>
> 1) Does anyone else have experience with this? Sean, from your comments
> before, do you think it might not crash yet produce unreliable results when
> using the components in the DefaultClinicalPipeline?
>
> 2) Sean, you commented before
>
> > That being said, supposedly you can configure Spark to handle this by
> keeping everything contained in a unique copy per thread.  Sort of like
> ThreadLocal (I think), but more effective on a full-pipeline level.
>
> Do you have any more information about this- we are currently looking into
> it, and it looks like it should be possible to limit each executor (JVM) to
> a single thread, but I was wondering if you had any references to the
> ThreadLocal-style setup or knew anyone else that had tried it.
>
> 3) In the TS pipelines, what does the "threads" keyword in the piper file
> actually enforce? Is it the number of threads it will allow you to share
> the pipeline with or does it automatically create a threaded pipeline for
> you?
>
> Thanks!
> Jeff
>


Threading and cTAKES (on Spark)

2019-03-28 Thread Jeffrey Miller
Hi,

I am following up on a discussion previously in the "re: ctakes web
service" thread from this month. Apologies if I summarize anyone's comments
incorrectly. Sean had commented that it would not be advisable to create a
pool of pipelines and dispatch 1 per thread in the same process because the
individual AEs have static variables and resources that would be shared
across instances. I can comment that anecdotally, we have not seen crashes
when doing this (but we have seen crashes when we are trying to share 1
pipeline across > 1 thread). Nevertheless, I cannot guarantee that the
annotations are happening correctly all the time or that we might not
occasionally get unlucky and enter into a race condition. It also sounds
like from Peter's comment in the previous thread,
https://lists.apache.org/thread.html/93da8248b03b1c59135fb9b4030b0546a4631ec32d6f5c779d2821cc@%3Cdev.ctakes.apache.org%3E
that a pipeline pool across multiple threads has been stable for his work.
I have a couple of questions:

1) Does anyone else have experience with this? Sean, from your comments
before, do you think it might not crash yet produce unreliable results when
using the components in the DefaultClinicalPipeline?

2) Sean, you commented before

> That being said, supposedly you can configure Spark to handle this by
keeping everything contained in a unique copy per thread.  Sort of like
ThreadLocal (I think), but more effective on a full-pipeline level.

Do you have any more information about this- we are currently looking into
it, and it looks like it should be possible to limit each executor (JVM) to
a single thread, but I was wondering if you had any references to the
ThreadLocal-style setup or knew anyone else that had tried it.

3) In the TS pipelines, what does the "threads" keyword in the piper file
actually enforce? Is it the number of threads it will allow you to share
the pipeline with or does it automatically create a threaded pipeline for
you?

Thanks!
Jeff


Re: ctake web service [EXTERNAL]

2019-03-12 Thread Jeffrey Miller
Hi Sean,

I just wanted to follow up on this one more time with a few follow-up.

>From Peter's description of how he is using a pool of pipelines:

>  I started from scratch to create a pipeline pool that sizes itself
according to the memory that’s available.  Each instance contains the
complete pipeline including the Term Annotator and a re-settable JCas
object.   I don’t use any of the thread constructs in piper files - to not
confuse the issue.  All of this is accessed via a web service with a multi
threaded dispatcher (SparkJava).

and our experience with doing something similar, it seems that this does
not lead to crashing, at least not with the components in the Default
Clinical Pipeline. We did have crashes when we tried to access the same
pipeline from two threads, but that was expected. I just wanted to verify
that you have seen problems with this specific setup- that is, a pool of
pipelines, where each one is only accessed in a single thread
simultaneously. It sounds like you have from your prior messages, but I am
just trying make sure I have not confused something.

We are not that concerned with the initial startup cost of loading the
piper file multiple times. I am hesitant to use use the wrapped thread-safe
components because we are concerned with compute time and I suspect that
much of the time in our pipeline is spent in the DefaultJCasTermAnnotator
and the threads would just have to wait in line.

With respect to you message:

>  Anyway, if I am running on a cluster (etc.) then it is a completely
different ballgame.  When I do that I don't bother with the TS pipeline.

When you are running on a cluster do you just use multiple processes, one
pipeline per process?

Lastly, do you know what the piper file "threads" command actually enforces
(
https://github.com/apache/ctakes/blob/trunk/ctakes-clinical-pipeline-res/src/main/resources/org/apache/ctakes/clinical/pipeline/TsDefaultFastPipeline.piper#L4
)?

Thanks again for your help.
Jeff


On Sat, Mar 9, 2019 at 10:24 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> >  I assumed the TS wrappers were so you could avoid creating multiple
> pipelines and just run one instance of the pipeline with a separate JCAS
> per thread.
>
> -- Your assumption was correct.  The idea is to have only a single copy of
> any resource in memory : Dictionary, ML models (which can get big), graphs,
> etc.  The other advantage is a single initialization.  It may not seem like
> a big deal, but I have inits that take > 2 minutes, and parallel inits
> don't work too well wrt disk thrashing.  So, if I'm just running a 5 minute
> test, I'd rather run 3 threads with a single init.  And on my laptop 3
> instances of a memory hog is not really an option.  Yes, thread copies, but
> it is still more friendly.   Anyway, if I am running on a cluster (etc.)
> then it is a completely different ballgame.  When I do that I don't bother
> with the TS pipeline.
>
>
> > Do you know if this is a problem for any of the annotators in the
> default clinical pipeline
>
> -- Oh yeah.  That is why I made the TS wrappers.   A good number of the
> default AEs are not thread safe.  And really, it only takes 1 to ruin your
> day.  Resources, static variables and collections, i/o ...  And really,
> some things are not ctakes per se, but 3rd party libraries that are used by
> several -standard- AEs.
>
>
> > I'd like to really understand thread-safe with respect to core cTAKES
> components
>
> -- I don't know if anybody has done a formal writeup or anything of the
> sort.  I set out to do a deep dive into the code and refactor for TS, but
> quickly changed my mind.  See mention of 3rd parties above, though that
> certainly wasn't everything.  It was easier to write the wrappers.  Plus, I
> could rubber stamp and quickly wrap any ae that I came across for testing
> or use to be ts.
>
>
> Cheers for the curiosity,
>
> Sean
>
>
>
>
> 
> From: Jeffrey Miller 
> Sent: Saturday, March 9, 2019 12:20 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctake web service [EXTERNAL]
>
> Thanks for your response Sean- we are still working on this (and have some
> things to look into given your last response), but I will share details
> when we have it working. We are still deciding on whether to use Spark or
> Apache Beam.
>
> Just to clarify my previous confusion, I assumed the TS wrappers were so
> you could avoid creating multiple pipelines and just run one instance of
> the pipeline with a separate JCAS per thread. I thought the main motivation
> behind that would be to avoid loading >1 dictionaries into memory, for
> example. But it sounds like I was mistaken. With respect to sharing
> resources, are static var

Re: ctake web service [EXTERNAL]

2019-03-09 Thread Jeffrey Miller
Thank you Peter, that is helpful.

On Sat, Mar 9, 2019 at 3:46 PM Peter Abramowitsch 
wrote:

> I haven't made our code available, Sorry.  Not sure if I can.  But from my
> description, you should find it pretty easy.  I started by extending a
> GenericObjectPool and going from there. As each instance is instantiated, I
> re-read the piper file - creating a new engine which is assigned to a Pool
> member.
>
> Peter
>
> On Sat, Mar 9, 2019 at 9:20 AM Jeffrey Miller  wrote:
>
> > Thanks for your response Sean- we are still working on this (and have
> some
> > things to look into given your last response), but I will share details
> > when we have it working. We are still deciding on whether to use Spark or
> > Apache Beam.
> >
> > Just to clarify my previous confusion, I assumed the TS wrappers were so
> > you could avoid creating multiple pipelines and just run one instance of
> > the pipeline with a separate JCAS per thread. I thought the main
> motivation
> > behind that would be to avoid loading >1 dictionaries into memory, for
> > example. But it sounds like I was mistaken. With respect to sharing
> > resources, are static variables the main concern? Do you know if this is
> a
> > problem for any of the annotators in the default clinical pipeline (the
> > regular components, not the thread safe ones)? From Peter's response (I
> am
> > not sure if that split off into another forum thread because the subject
> > changed), it sounds like it may not be a problem? I'd like to really
> > understand thread-safe with respect to core cTAKES components (with the
> > caveat that community-created annotators could be implemented in any
> number
> > of ways, making it hard to declare cTAKES is "thread-safe"). I'd be happy
> > to contribute documentation back to the wiki once I feel I have a solid
> > grasp on it.
> >
> > Peter- have you made your pipeline pool code available anywhere?
> >
> > On Fri, Mar 8, 2019 at 12:49 PM Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi all,
> > >
> > > >Is there any known reason that you can't create a pipeline pool, but
> > keep
> > > everything in the same process?
> > > -- No, but ...
> > > > Is it safe to load multiple pipelines in
> > > the same process as long as only one thread can access each one at a
> time
> > > (we plan to use this in a Spark pipeline).
> > > -- If you are talking about oob ctakes being the process, only a single
> > > pipeline will run on multiple threads.  The threads will share
> resources,
> > > static variables, etc. and the  pipeline will give you terrible results
> > and
> > > very quickly crash.  That is why I wrote the thread-safe wrappers.
> > > -- That being said, supposedly you can configure spark to handle this
> by
> > > keeping everything contained in a unique copy per thread.  Sort of like
> > > ThreadLocal (I think), but more effective on a full-pipeline level.
> > >
> > > > it must have reduced the DefaultJCasTermAnnotator to a singleton
> object
> > > in memory.
> > > -- Yes.  The thread-safe pipeline is not meant to have siblings in the
> > > same process - the wrappers can only do so much.  That being said, I am
> > > pretty sure that the Default... is thread-safe so it doesn't actually
> > need
> > > the wrapper.  Regardless, the rest of the pipeline would crash.
> > >
> > > Jeff, can you share information about your efforts on spark?  If we
> could
> > > get that working and in standard ctakes it would be fantastic.
> > >
> > > I hope that this information is useful.
> > >
> > > Sean
> > >
> > >
> > >
> > > 
> > > From: Jeffrey Miller 
> > > Sent: Friday, March 8, 2019 11:23 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: ctake web service [EXTERNAL]
> > >
> > > Is there any known reason that you can't create a pipeline pool, but
> keep
> > > everything in the same process? Is it safe to load multiple pipelines
> in
> > > the same process as long as only one thread can access each one at a
> time
> > > (we plan to use this in a Spark pipeline). One caveat I have noticed-
> it
> > > seems like if I use the thread safe components to build a pipeline
> pool,
> > > only one dictionary for the DefaultJCasTermAnnotator can be loaded per
> > > process. For example, I was trying to take advantage 

Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]

2019-03-09 Thread Jeffrey Miller
Sean,

I may just be missing something obvious, but having signed up for the
confluence wiki I don't see any Edit button on the cTAKES pages even though
it does look like everyone has permission to edit.

On Fri, Feb 22, 2019 at 11:01 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> >Do you accept documentation
> contributions?
>
> Of course!! We are completely open source, documentation included.  I
> think that you should be able to edit the wiki after signing up on
> confluence:
> https://cwiki.apache.org/confluence/signup.action
>
> Cheers,
> Sean
>
> ____
> From: Jeffrey Miller 
> Sent: Friday, February 22, 2019 10:57 AM
> To: dev@ctakes.apache.org
> Subject: Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]
>
> Thank you Sean, that clears it up for me. Do you accept documentation
> contributions? I might be able to document a few of the things I have
> learned along the way setting up ctakes.
>
> On Tue, Feb 19, 2019 at 12:14 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > The short answer: No, LVG is not in the pipeline created by the
> > DefaultFastPipeline.piper
> >
> > Longer answer:
> > In older versions of dictionary lookup the Lexical Variant Generator
> > module (LVG) was recommended to capture lexical variants of terms.
> > However, the dictionary resource already contains variants so the LVG
> > module should not make much of a difference. When the fast lookup was new
> > several years ago I ran a test with and without LVG on two datasets and
> the
> > difference was along the lines of +1-2% recall, -1% precision.
> >
> > I think that ClinicalPipelineFactory.getFastPipeline() was a copy-paste
> of
> > the previous .getClinicalPipeline() but with the dictionary module
> > replaced.  So, LVG is still in that method -created pipeline.
> >
> > When I (more recently) wrote that piper file that you reference I left
> out
> > LVG as the added burden didn't seem to warrant its presence.  When I say
> > burden I don't just mean speed decrease and memory footprint.  There have
> > been a lot of configuration problems with LVG on various systems which
> led
> > to difficulty using ctakes.
> >
> > The diagram that you reference places LVG after the dictionary lookup,
> and
> > after the part of speech tagger, while the page on lvg
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BLVG=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=Odd-RqfBFrKxLVWy-Nf_-gmZ-UKh9phdcGO0ifqffis=ly656xPq-DlDPCj5eTsrlErYHA6FU7gC8h_nofoZRTo=
> lists
> > those as the two modules that may benefit from its presence.  That
> diagram
> > is very old and should definitely be updated.  Both the diagram and the
> > page on lvg include information that precedes (does not account for) the
> > existence of the fast dictionary lookup.
> >
> > Sean
> >
> >
> > 
> > From: Jeffrey Miller 
> > Sent: Tuesday, February 19, 2019 10:53 AM
> > To: dev@ctakes.apache.org
> > Subject: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]
> >
> > Hi,
> >
> > I was wondering if the LVG Annotator is included
> DefaultFastPipeline.piper
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dclinical-2Dpipeline-2Dres_src_main_resources_org_apache_ctakes_clinical_pipeline_DefaultFastPipeline.piper=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo=3Sgs1Jc-C37kcy1efCEhU_3RV4aFipAt1lbTO0Wu_Ns=
> > >.
> > I have tried to trace through all the includes, but I cannot find it.
> > However, when I look at the code for the
> > ClinicalPipelineFactory.getFastPipeline() it seems to be included.
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes_blob_513bb49ebb98c4ac63f690c7b88a82aff18947b8_ctakes-2Dclinical-2Dpipeline_src_main_java_org_apache_ctakes_clinicalpipeline_ClinicalPipelineFactory.java-23L98=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo=kmZDExXBOyXg84kix__UvgD3LniSHa8MgL8K5fK3XC4=
> > >
> > From
> > documentation in this flow diagram
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_6871

Re: ctake web service [EXTERNAL]

2019-03-09 Thread Jeffrey Miller
Thanks for your response Sean- we are still working on this (and have some
things to look into given your last response), but I will share details
when we have it working. We are still deciding on whether to use Spark or
Apache Beam.

Just to clarify my previous confusion, I assumed the TS wrappers were so
you could avoid creating multiple pipelines and just run one instance of
the pipeline with a separate JCAS per thread. I thought the main motivation
behind that would be to avoid loading >1 dictionaries into memory, for
example. But it sounds like I was mistaken. With respect to sharing
resources, are static variables the main concern? Do you know if this is a
problem for any of the annotators in the default clinical pipeline (the
regular components, not the thread safe ones)? From Peter's response (I am
not sure if that split off into another forum thread because the subject
changed), it sounds like it may not be a problem? I'd like to really
understand thread-safe with respect to core cTAKES components (with the
caveat that community-created annotators could be implemented in any number
of ways, making it hard to declare cTAKES is "thread-safe"). I'd be happy
to contribute documentation back to the wiki once I feel I have a solid
grasp on it.

Peter- have you made your pipeline pool code available anywhere?

On Fri, Mar 8, 2019 at 12:49 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi all,
>
> >Is there any known reason that you can't create a pipeline pool, but keep
> everything in the same process?
> -- No, but ...
> > Is it safe to load multiple pipelines in
> the same process as long as only one thread can access each one at a time
> (we plan to use this in a Spark pipeline).
> -- If you are talking about oob ctakes being the process, only a single
> pipeline will run on multiple threads.  The threads will share resources,
> static variables, etc. and the  pipeline will give you terrible results and
> very quickly crash.  That is why I wrote the thread-safe wrappers.
> -- That being said, supposedly you can configure spark to handle this by
> keeping everything contained in a unique copy per thread.  Sort of like
> ThreadLocal (I think), but more effective on a full-pipeline level.
>
> > it must have reduced the DefaultJCasTermAnnotator to a singleton object
> in memory.
> -- Yes.  The thread-safe pipeline is not meant to have siblings in the
> same process - the wrappers can only do so much.  That being said, I am
> pretty sure that the Default... is thread-safe so it doesn't actually need
> the wrapper.  Regardless, the rest of the pipeline would crash.
>
> Jeff, can you share information about your efforts on spark?  If we could
> get that working and in standard ctakes it would be fantastic.
>
> I hope that this information is useful.
>
> Sean
>
>
>
> 
> From: Jeffrey Miller 
> Sent: Friday, March 8, 2019 11:23 AM
> To: dev@ctakes.apache.org
> Subject: Re: ctake web service [EXTERNAL]
>
> Is there any known reason that you can't create a pipeline pool, but keep
> everything in the same process? Is it safe to load multiple pipelines in
> the same process as long as only one thread can access each one at a time
> (we plan to use this in a Spark pipeline). One caveat I have noticed- it
> seems like if I use the thread safe components to build a pipeline pool,
> only one dictionary for the DefaultJCasTermAnnotator can be loaded per
> process. For example, I was trying to take advantage of the ability to
> switch pipelines via a query parameter that is suggested at in the code for
> the rest service. The two pipelines used different ontology dictionaries,
> but it seemed like with the thread safe components it must have reduced
> the DefaultJCasTermAnnotator to a singleton object in memory, because it
> only used the first dictionary instantiated. Either way, given how Sean
> described how the thread safe components worked above, you probably
> wouldn't want to use them in a pipeline pool, assuming that the problems
> with threading was limited to multiple threads access the same pipeline at
> the same time, and not having multiple pipelines loaded into memory each
> accessed by only a single thread.
>
> On Fri, Mar 8, 2019 at 11:06 AM Kathy Ferro 
> wrote:
>
> > I thought about creating a queue that acts as traffic cop.  Only the
> > traffic cop calls the WS.  I also want to test multiple WS running on
> > different port.  Traffic cop calls which every WS is available and keep
> > track of WS statuses.  With all this processing going, it might kill the
> > power for blocks.
> >
> > On Fri, Mar 8, 2019 at 10:34 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
&g

Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]

2019-02-22 Thread Jeffrey Miller
Thank you Sean, that clears it up for me. Do you accept documentation
contributions? I might be able to document a few of the things I have
learned along the way setting up ctakes.

On Tue, Feb 19, 2019 at 12:14 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> The short answer: No, LVG is not in the pipeline created by the
> DefaultFastPipeline.piper
>
> Longer answer:
> In older versions of dictionary lookup the Lexical Variant Generator
> module (LVG) was recommended to capture lexical variants of terms.
> However, the dictionary resource already contains variants so the LVG
> module should not make much of a difference. When the fast lookup was new
> several years ago I ran a test with and without LVG on two datasets and the
> difference was along the lines of +1-2% recall, -1% precision.
>
> I think that ClinicalPipelineFactory.getFastPipeline() was a copy-paste of
> the previous .getClinicalPipeline() but with the dictionary module
> replaced.  So, LVG is still in that method -created pipeline.
>
> When I (more recently) wrote that piper file that you reference I left out
> LVG as the added burden didn't seem to warrant its presence.  When I say
> burden I don't just mean speed decrease and memory footprint.  There have
> been a lot of configuration problems with LVG on various systems which led
> to difficulty using ctakes.
>
> The diagram that you reference places LVG after the dictionary lookup, and
> after the part of speech tagger, while the page on lvg
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+LVG lists
> those as the two modules that may benefit from its presence.  That diagram
> is very old and should definitely be updated.  Both the diagram and the
> page on lvg include information that precedes (does not account for) the
> existence of the fast dictionary lookup.
>
> Sean
>
>
> 
> From: Jeffrey Miller 
> Sent: Tuesday, February 19, 2019 10:53 AM
> To: dev@ctakes.apache.org
> Subject: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]
>
> Hi,
>
> I was wondering if the LVG Annotator is included DefaultFastPipeline.piper
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dclinical-2Dpipeline-2Dres_src_main_resources_org_apache_ctakes_clinical_pipeline_DefaultFastPipeline.piper=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo=3Sgs1Jc-C37kcy1efCEhU_3RV4aFipAt1lbTO0Wu_Ns=
> >.
> I have tried to trace through all the includes, but I cannot find it.
> However, when I look at the code for the
> ClinicalPipelineFactory.getFastPipeline() it seems to be included.
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes_blob_513bb49ebb98c4ac63f690c7b88a82aff18947b8_ctakes-2Dclinical-2Dpipeline_src_main_java_org_apache_ctakes_clinicalpipeline_ClinicalPipelineFactory.java-23L98=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo=kmZDExXBOyXg84kix__UvgD3LniSHa8MgL8K5fK3XC4=
> >
> From
> documentation in this flow diagram
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_68718172_ctakes-2D3.1-2Ddependencies.png-3Fversion-3D1-26modificationDate-3D1488992146000-26api-3Dv2=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo=4yYVqkyLiodAWATji1EjSwoMh-YpU7qTz2J8tZvRT6I=
> >
> from
> the components documentation page
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BComponent-2BUse-2BGuide=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo=m-9MenhmNTr2vdVAhCvKgBt48OUiQB8R2TkR7fEYtsY=
> >,
> it seems to be a recommended component for the dictionary annotator.
>
> Thanks for your help,
> Jeff
>


DefaultFastPipeline.piper and LVG Annotator

2019-02-19 Thread Jeffrey Miller
Hi,

I was wondering if the LVG Annotator is included DefaultFastPipeline.piper
.
I have tried to trace through all the includes, but I cannot find it.
However, when I look at the code for the
ClinicalPipelineFactory.getFastPipeline() it seems to be included.

From
documentation in this flow diagram

from
the components documentation page
,
it seems to be a recommended component for the dictionary annotator.

Thanks for your help,
Jeff