Thanks for the insight Sean. On Wednesday, July 17, 2019, Finan, Sean <sean.fi...@childrens.harvard.edu> wrote:
> Hi All, > > ctakes-scrubber is not in any ctakes release and it is not in the main > repository. It never went beyond experimental and resides within the > ctakes sandbox. https://svn.apache.org/repos/asf/ctakes/sandbox/ > > From what I recall, scrubber does not have "real" name replacement, but > instead de-identifies entities by removing them and inserting a tag > indicating the type of entity. For instance: "John has a rash" -> > "[person] has a rash". That is not verbatim, but it is the general idea. > > If you can get ctakes-scrubber working in your project then it would be > pretty easy to create an engine that does nothing except replace such > generic tags with random names, dates, institutions, etc. > > Sean > ________________________________________ > From: gandhi rajan <gandhiraja...@gmail.com> > Sent: Wednesday, July 17, 2019 12:26 PM > To: dev@ctakes.apache.org > Subject: Re: Synthetic replacement feature in cTAKES Scrubber [EXTERNAL] > > Hi Masoud, we had a similar requirement to identify patient names in the > narratives text and I had a discussion with Sean Finan on patient name > identification feature in cTAKES. What he told at that point in time was > cTAKES dint supported patient name identification feature. Also as far as I > know, I m not really sure whether scrubber made it to the cTAKES codebase. > > Sean, Please correct me if I m wrong. > > On Wednesday, July 17, 2019, Masoud Rouhizadeh <m...@jhu.edu> wrote: > > > Dear cTAKES developer, > > This is Masoud Rouhizadeh from JHU. I'm leading the NLP effort at the > > Institute for Clinical and Translational Research and work on > > enterprise-level NLP projects at Johns Hopkins Medicine. One of the major > > goals we are targeting is de-identification of a large number of notes > > (350M) to prepare them for search and indexing (Elasticsearch and Solr). > I > > have been in touch with Dr. Guergana Savova about cTAKES Scrubber and she > > has been very helpful. > > > > One of our most desired features in the de-identification pipeline is > > synthetic replacement (e.g. Nancy->Sally; random female first name > > consistently replaces a female first name.). I wasn't able to find > > information about this feature in cTAKES Scrubber. Is synthetic > replacement > > functionality part of the cTAKES Scrubber, or can it be added by > > post-processing the output? For instance, if we know the name Nancy is > > removed from multiple places, can we use a name dictionary to insert > random > > female first names in those places (just a thought)? > > Overall, I wanted to emphasize that cTAKES Scrubber is one of our main > > candidates and I'm hoping that we could find ways to collaborate. > > > > Thank you very much, > > Masoud > > > > ---- > > Masoud Rouhizadeh, PhD > > Faculty - Division of Health Science Informatics (DHSI) > > NLP Lead - Institute for Clinical and Translational Research (ICTR) > > Johns Hopkins University School of Medicine > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cs. > jhu.edu_-7Emrou_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r= > fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m= > aIXsCuGWJqYNNtMb1ZfvZ0gAiw57gtrpZGqLVZjn5o4&s=9mLpsY5OPs7_ > sAMhA60kB0PJcsttBBK6BYRN_xThZSo&e= > > > > > > -- > Regards, > Gandhi > > "The best way to find urself is to lose urself in the service of others > !!!" > -- Regards, Gandhi "The best way to find urself is to lose urself in the service of others !!!"