On 3/2/07, Will Ross <[EMAIL PROTECTED]> wrote: > I'm looking for a tool to suppress sensitive information (e.g., HIV > status, etc.) from free text clinical notes
Will: In general, this falls under a number of natural language processing tools and specific steps toward tokenizing, chunking, part-of-speech tagging, classifying (e.g. to vocabularies or concept identifiers), matching, anonymizing/de-identifying, etc. The two tools I've worked with a little in the past are: 1) GATE - General Architecture for Text Engineering, developed at the University of Sheffield NLP group in the UK http://gate.ac.uk/ 2) Project DIAsDEM and the DIAsDEM workbench developed by a great group, including Karsten Winkler, in Germany http://wwwiti.cs.uni-magdeburg.de/~graubitz/diasdem/ These more are less are a suite of tools containing a number of individual components to perform the various NLP tasks, in some sequence (steps can be recorded and set as a macro or batch) and with customization for using language specific knowledge bases (e.g. names derived from a US Census database) or UMLS concept identifiers. A couple of example projects using these kinds of tools is the caTIES application (http://caties.cabig.upmc.edu/index.html) for extraction and classification of free text pathology reports and the RODS biosurveillance application, also from the Univ of Pittsburgh (http://openrods.sourceforge.net/), that takes free text and ICD-9 data for syndromic classification. A specific NLP approach to HIV/AIDS notes performed at Columbia is here (the MedLEE or "Medical Language Processing"): http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1480114 ~ Stuart -- Dr. Stuart Turner Health Informatics Graduate Program and Biomedical Informatics Research & Consulting Service University of California Davis Health System http://www.ucdmc.ucdavis.edu/informatics http://www.ucdmc.ucdavis.edu/bircs UCDHS-ASB 2450 48th St, Suite 2685 Sacramento, CA 95817 916.734.3857 (voice) | 916.734.3975 (fax) 916.873.4325 (cell) | stuart.turner.ucdavis (Skype)