Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

Peter Klügl Fri, 11 Mar 2016 07:38:12 -0800

Hi Pei,

the content of the new files is duplicated again, e.g., see
I2B2Evaluation.java


No idea what caused that...

Best,

Peter

Am 11.03.2016 um 11:32 schrieb Peter Klügl:
> Hi,
>
> thanks for the notes and links, Andy and Guergana. The software and
> articles are very interesting, but, as for my personal interest, we have
> our own clinical deidentification software solution at our company
> (which works good enough as far as I know). My focus is rather on
> helping out in translating the contribution from GATE/JAPE to UIMA/Ruta.
> Thus, I concentrate on the existing functionality for now.
>
> What is the final goal of the cTAKES comunity concerning clinical deid
> components? Will both sandbox projects be merged, what about statistical
> approaches?
>
> @Pei: there was again a problem with the patch (I also missed to add
> some files). I attached a new one.
>
> @Azad: I am just curious on which data the rules exactly rely. I think
> I'll find the information in the article.
> I assume that the 521 docuemnts have been utilized to develop the rules
> and the 269 documents to evaluate them. Did you correct the rules also
> using the second set? I need to reread to article :-)
>
> Best,
>
> Peter
>
>
> Am 10.03.2016 um 23:22 schrieb andy mcmurry:
>> *** For cross-validation, you can evaluate de-identified notes data from
>> i2b2 challenge** *
>> https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/data/models/
>>
>> *Methods for model generation of FeatureSet described here: *
>>
>> *Improved de-identification of physician notes through integrative modeling
>> of both public and private medical text*
>> http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-112
>>
>> Major objective of that study was to help provide external examples to
>> cross train / retrain other methods.
>>
>> hope this helps,
>> --Andy
>>
>>
>>
>> On Thu, Mar 10, 2016 at 1:27 PM, Savova, Guergana <
>> guergana.sav...@childrens.harvard.edu> wrote:
>>
>>> You can re-build the models that feed into MIST. I personally would not
>>> use the default model that MIST comes with as it is not trained on clinical
>>> data. In our previous work we found that hand-annotating about 200 docs for
>>> PHI (representative of the sample you are going to run the models on)
>>> results in building a pretty good model - in the 90's for p, r and f1.
>>> However, even with that high performance, the institution that owns the
>>> data might be still reluctant to share as it might pose a violation of
>>> HIPAA through some potential PHI leaks. In cTAKES our approach has been to
>>> de-couple the de-identifcation from the NLP/information extraction. If a
>>> user has the need for de-identified data, they could choose their method --
>>> manual or otherwise -- and then process through cTAKES. Our focus is the
>>> NLP/IE space, while de-identification is a blend of that plus policy....
>>>
>>> --Guergana
>>>
>>> -----Original Message-----
>>> From: Azad Dehghan [mailto:azad.dehg...@gmail.com]
>>> Sent: Thursday, March 10, 2016 4:19 PM
>>> To: dev@ctakes.apache.org
>>> Subject: RE: Combining Knowledge- and Data-driven Methods for
>>> De-identification of Clinical Narratives
>>>
>>> Thanks Guergana.
>>>
>>>> Yes, the current release of cTAKES has a module for the temporal
>>> expressions which includes dates. The normalizer for the temporal
>>> expressions is Steven Bethard's timenorm code.
>>> Great.
>>>
>>>> However, if you do de-identification of dates/temporal expressions,
>>>> you
>>> run the risk of creating incorrect timelines as many of the relative
>>> temporal expressions (e.g. spring of this year, x-mas time, etc.) are
>>> unlikely to be correctly shifted by any de-identification tool.
>>> Indeed, a reason I have not included the dates component.
>>>
>>>> One de-identification tool is MIST --
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__mist-2Ddeid.sourceforge.net_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=5awdXn2I-hRE0-161tqFDGgmYgQQviQg360uHI4fs2s&e=
>>> .
>>> I don't remember them doing well in the community held evaluation in 2014.
>>> Hence, cDeid :)
>>>> Guergana Savova, PhD, FACMI
>>>> Associate Professor
>>>> PI Natural Language Processing Lab
>>>> Boston Children's Hospital and Harvard Medical School
>>>> 300 Longwood Avenue
>>>> Mailstop: BCH3092
>>>> Enders 144.1
>>>> Boston, MA 02115
>>>> Tel: (617) 919-2972
>>>> Fax: (617) 730-0817
>>>> Harvard Scholar:
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
>>>> u_guergana-5Fk-5Fsavova_biocv&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
>>>> ZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGm
>>>> RCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=3taiTxFp55
>>>> iQUnc6A6Yemg-XzFQrRjo5QZRQeKHQ29c&e=
>>>>
>>>> -----Original Message-----
>>>> From: Azad Dehghan [mailto:azad.dehg...@gmail.com]
>>>> Sent: Thursday, March 10, 2016 3:42 PM
>>>> To: dev@ctakes.apache.org
>>>> Subject: Re: Combining Knowledge- and Data-driven Methods for
>>> De-identification of Clinical Narratives
>>>>> This means both training data folders? I have access to the data but
>>>>> not
>>>> to the challenge description.
>>>>
>>>> Yes. Is there any specific information that you are missing?
>>>>>> It would be good to incorporate/refactor (basically, GATE API needs
>>>>>> to be replaced with UIMA API to generate annotation) the two-pass
>>>>>> recognition method for cTAKES - which has a wider application on
>>> longitudinal data.
>>>>>> This method is used on-top of a number NERs.
>>>>> I'll take a look.
>>>>>
>>>>> I do not know how much time I can invest this month. Let's see how
>>>>> many
>>>> phases I can translate.
>>>>> I added the rules for age. Are there jape rules for creating date
>>>> annotations?
>>>> No. I believe cTAKES has existing component(s) to capture dates?
>>>>
>>>>> After all rules are translated, they need some major refactoring.
>>>>> Jape
>>>> and Ruta are quite different in some aspects.
>>>> Ok.
>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Please let me know where I can help. I will be available again in
>>> April.
>>>>>> Cheers,
>>>>>> Azad
>>>>>>
>>>>>> On 10 March 2016 at 13:13, Peter Klügl <peter.klu...@averbis.com>
>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> sorry, I was quite busy last month.
>>>>>>>
>>>>>>> I added a new patch, which needs to be applied.
>>>>>>>
>>>>>>> No new rules, but it's possible now to evaluate everything against
>>>>>>> the labelled data of the challenge.
>>>>>>>
>>>>>>> @Azad:
>>>>>>> Which documents exactly did you use to develop the rules?
>>>>>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
>>>> testing-PHI-Gold-fixed?
>>>>>>> Best,
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> the last patch fixed almost all problems.
>>>>>>>>
>>>>>>>> I added another one that adds the csv file for the unit test and
>>>> extends
>>>>>>>> svn-ignore.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I added another patch. I missed to manually add one test file to
>>>> version
>>>>>>>>> control, and there are still duplicate lines.
>>>>>>>>> I hope this patch fixes the remaining problems.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> the problems were caused by the svn client in my Eclipse. Sorry
>>>>>>>>>> for
>>>> the
>>>>>>>>>> trouble, I should have looked more closely at the ciomplete patch.
>>>>>>>>>>
>>>>>>>>>> I attached a new patch created with commandline tools wich
>>>>>>>>>> looks
>>>>>>> correct
>>>>>>>>>> now.
>>>>>>>>>>
>>>>>>>>>> Pei, can you apply the new patch?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
>>>>>>>>>>> Thanks Pei.
>>>>>>>>>>>
>>>>>>>>>>> I fear there was again a problem with the patch. All new files
>>>>>>>>>>> are missing (and also the svn-ignore settings).
>>>>>>>>>>>
>>>>>>>>>>> Can you take a look?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
>>>>>>>>>>>> patch applied.
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Pei
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl <
>>>>>>> peter.klu...@averbis.com> wrote:
>>>>>>>>>>>>> Hi Pei,
>>>>>>>>>>>>>
>>>>>>>>>>>>> can you commit the recent patch for us?
>>>>>>>>>>>>>
>>>>>>>>>>>>> CTAKES-384-20160120.patch
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> Sorry I was swamped recently.
>>>>>>>>>>>>>> But yeah, we can even create an extended type system to
>>>>>>>>>>>>>> store
>>>>>>> these items temporarily and add them into the main/core type
>>>>>>> system afterwards.
>>>>>>>>>>>>>> There was an existing item to upgrade UIMA, but agreed- it
>>>>>>>>>>>>>> will
>>>>>>> require much more testing.  If it works, we can upgrade it in our
>>>> sandbox
>>>>>>> area or create a branch if necessary.
>>>>>>>>>>>>>> —Pei
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl <
>>>>>>> peter.klu...@averbis.com> wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> a new patch is attached.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @Pei:
>>>>>>>>>>>>>>> are there suitable annotation types in the cTAKES type
>>> system?
>>>>>>> Some
>>>>>>>>>>>>>>> project in cTAKES uses something like OntologyMatch... I
>>>>>>>>>>>>>>> map it
>>>> to
>>>>>>>>>>>>>>> IdentifiedAnnotation right now, but there are many empty
>>>>>>> features...
>>>>>>>>>>>>>>> @Azad:
>>>>>>>>>>>>>>> I changed the rules a bit, especially the capitalization
>>>>>>>>>>>>>>> like I
>>>>>>> use it
>>>>>>>>>>>>>>> in ruta normally. The wordlist are compiled to a trie by
>>>>>>>>>>>>>>> the
>>>> maven
>>>>>>>>>>>>>>> plugin. I also added the two regexes for url and email. I
>>>>>>> extended the
>>>>>>>>>>>>>>> regex for the url. I also changed the evaluation order of
>>>>>>>>>>>>>>> some
>>>>>>> rules
>>>>>>>>>>>>>>> (with @). Feel free to add simple examples to examples.csv
>>>>>>>>>>>>>>> for
>>>>>>> the unit
>>>>>>>>>>>>>>> tests.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know if you need more information about the changes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you wanna have help with the other rule sets? Or should
>>>>>>>>>>>>>>> we
>>>>>>> split them up?
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> great. I will integrate them in the project and in the
>>>>>>>>>>>>>>>> next
>>>>>>> patch.
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>>>>>>>>>>>>>>>>> Three NERs translated and uploaded.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> PS. I will validate all NERs once we have them all
>>> completed.
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan <
>>>>>>> azad.dehg...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> This is on my todo list for Dec. as well. If there are
>>>>>>>>>>>>>>>>>> any
>>>>>>> more volunteers
>>>>>>>>>>>>>>>>>> for translating JAPE to RUTA, please get in touch.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl"
>>>>>>>>>>>>>>>>>> <peter.klu...@averbis.com
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I just wanted to mention that I haven't forgot about it.
>>>>>>> Unfortunately,
>>>>>>>>>>>>>>>>>>> there is just no spare time right now. I hope I will
>>>>>>>>>>>>>>>>>>> be able
>>>>>>> to provide
>>>>>>>>>>>>>>>>>>> the patches in December.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
>>>>>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>>>>> I think the ctakes-examples is probably a good
>>>>>>>>>>>>>>>>>>>> starting
>>>>>>> point at least
>>>>>>>>>>>>>>>>>>>> in terms of maven modules, etc.  I think it would be
>>>>>>>>>>>>>>>>>>>> good
>>>> if
>>>>>>> we use
>>>>>>>>>>>>>>>>>>>> uimaFIT style as primary approach to wiring
>>>>>>>>>>>>>>>>>>>> components
>>>>>>> together and
>>>>>>>>>>>>>>>>>>>> generate desc's as secondary...
>>>>>>>>>>>>>>>>>>>> I think the actual components that would be required
>>>>>>>>>>>>>>>>>>>> is
>>>>>>> probably best
>>>>>>>>>>>>>>>>>>>> left up to what is actually required for best
>>>>>>>>>>>>>>>>>>>> performing
>>>>>>> c-deid.  The
>>>>>>>>>>>>>>>>>>>> output would be interesting, I'm not sure if we
>>>>>>>>>>>>>>>>>>>> should
>>>> treat
>>>>>>> this as
>>>>>>>>>>>>>>>>>>>> an independent preprocessing component or part of a
>>>> pipeline
>>>>>>> (in which
>>>>>>>>>>>>>>>>>>>> case, we may need to propose a change to the type
>>>>>>>>>>>>>>>>>>>> system or
>>>>>>> perhaps an
>>>>>>>>>>>>>>>>>>>> alternative JCas view.  You can probably open up that
>>>>>>> discussion to
>>>>>>>>>>>>>>>>>>>> the dev group as you see fit.)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> My 2 cents...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <
>>>>>>> peter.klu...@averbis.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Is there a cTAKES project that may serve as an
>>>>>>>>>>>>>>>>>>>>> example on
>>>>>>> how the
>>>>>>>>>>>>>>>>>> cTAKES
>>>>>>>>>>>>>>>>>>>>> community develops or how a project should look like?
>>>>>>>>>>>>>>>>>>>>> I learned that different people set up UIMA project
>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>> quite
>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>> manner and I do not what to get inspired by "some
>>>>>>>>>>>>>>>>>>>>> sort of
>>>>>>> out-dated"
>>>>>>>>>>>>>>>>>>>>> approach in the cTAKES repo.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Are there restriction or preferences about the
>>>> preprocessing
>>>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>>>>>>> that should be used and the kind of "output" of the
>>>> project.
>>>>>>>>>>>>>>>>>>>>> Components: On which components may the componetns
>>> rely:
>>>>>>> tokenizer,
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>> parser, ... dict lookup?
>>>>>>>>>>>>>>>>>>>>> "output": Should the project provide a pipeline or a
>>>> single
>>>>>>> AE?
>>>>>>>>>>>>>>>>>>>>> More comments below.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
>>>>>>>>>>>>>>>>>>>>>>> Who else plans to provide patches for it? Just to
>>>>>>>>>>>>>>>>>>>>>>> avoid
>>>>>>> duplicate
>>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>>>>> and to coordnate the efforts ...
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I would like to help with the translating JAPE to
>>> RUTA.
>>>>>>>>>>>>>>>>>>>>> You can already go ahead with the UIMA Ruta
>>>>>>>>>>>>>>>>>>>>> Workbench if
>>>>>>> you want, or
>>>>>>>>>>>>>>>>>>>>> wait until I set up the project with ruta integration.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If any questions arise, just ask :-)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Is there a development dataset which was utilized
>>>>>>>>>>>>>>>>>>>>>>> for
>>>> the
>>>>>>> initial
>>>>>>>>>>>>>>>>>>>>>>> development, and if yes, is it possible to
>>>>>>>>>>>>>>>>>>>>>>> contribute it
>>>>>>> too?
>>>>>>>>>>>>>>>>>>>>>> The data set is unfortunately not publicly
>>>>>>>>>>>>>>>>>>>>>> available;
>>>> i2b2
>>>>>>>>>>>>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3
>>>>>>>>>>>>>>>>>>>>>> A_
>>>>>>>>>>>>>>>>>>>>>> _www.i2b2.org_NLP_DataSets_Main.php&d=BQIFaQ&c=qS4g
>>>>>>>>>>>>>>>>>>>>>> oW
>>>>>>>>>>>>>>>>>>>>>> BT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNn
>>>>>>>>>>>>>>>>>>>>>> J9
>>>>>>>>>>>>>>>>>>>>>> mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP
>>>>>>>>>>>>>>>>>>>>>> &m
>>>>>>>>>>>>>>>>>>>>>> =1Qpd4A2PgVD13w31PkkvmJf6I0PTCatCzgBgsnetPOg&s=aAEe
>>>>>>>>>>>>>>>>>>>>>> OR yMtz7NCv-6EEgiABVY_Rf6zLnJghQh2DA_CKQ&e= >
>>>>>>>>>>>>>>>>>>>>>> typically
>>>>>>> releases the
>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>> sets 12 months after a given challenge; this is
>>>>>>>>>>>>>>>>>>>>>> done on
>>>> an
>>>>>>>>>>>>>>>>>> individual basis
>>>>>>>>>>>>>>>>>>>>>> and involve a Data Use Agreement.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> However, I will be able to conduct and coordinate
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>> validation.
>>>>>>>>>>>>>>>>>>>>> Ok, I'll investigate if we have already access to
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>> dataset here.
>>>>>>>>>>>>>>>>>>>>>>> My first step would be:
>>>>>>>>>>>>>>>>>>>>>>> - set up a maven project
>>>>>>>>>>>>>>>>>>>>>>> - set up a development pipeline in a test (with
>>>>>>>>>>>>>>>>>>>>>>> cTAKES
>>>>>>> components
>>>>>>>>>>>>>>>>>>>>>>> replacing the previous ANNIE preprocessing)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> But one item that we need to review is the 3rd
>>>>>>>>>>>>>>>>>>>>>>> party
>>>> libs
>>>>>>> jars that
>>>>>>>>>>>>>>>>>>>>>>> were included to ensure compatibility.  I’ll be
>>>>>>>>>>>>>>>>>>>>>>> sure to
>>>>>>> take a look
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>> that over the next few weeks.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> —Pei
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> @Pei - once ANNIE components are replaced there is
>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>> not be a
>>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>>>>>> worry about the 3rd party libs.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Also, just a thought: we may want to create an
>>>> independent
>>>>>>> component
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this
>>>>>>>>>>>>>>>>>>>>>> method
>>>>>>> have shown
>>>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>>>>>>>>> for general NER on longitudinal data and surely
>>>>>>>>>>>>>>>>>>>>>> useful
>>>>>>> independent
>>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>>> deid component.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>>>>>>

Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

Reply via email to