Hi Abhishek, You have some interesting timing ... I can give you the xml specifications that you require if you send me the format of your dictionary.
Since you are new to the current dictionary module setup, I might also have a simpler solution for you ... A couple of days ago I checked a new module into Sandbox called ctakes-dictionary-lookup2 (how novel a name). It is a complete replacement of the current dictionary lookup module, but both can sit side-by-side in your local trunk sandbox or build. It has an example descriptor that tells it to read a bar-separated value file (BSV) as a dictionary, storing it (indexed) in memory for fast lookup. There is an example dictionary and xml descriptor for that dictionary. It accepts 2 or 3 column files in the format CUI|Text or CUI|TUI|Text. It automatically detects the number of columns, but they must be in that order. It also does not need the text fields to be tokenized, allowing it to accept "Tumor, malignant" as well as "tumor , malignant" as it will perform the tokenization upon reading the file. As the dictionary will be stored in-memory it should not be huge. If you do have a very large number of terms (>50k) then I recommend an hsql db. The new module will take an hsql db with the fixed field names CUI, TUI, RINDEX, TCOUNT, TEXT, RWORD. I will explain what those mean in some documentation that I plan to check into sandbox later today, but I can help you build an hsql dictionary db ... Yesterday I checked into sandbox a project named "dictionarytool". It is source-only, but I can give you a jar if you want one. Out-of-the-box it will build various dictionaries from a UMLS download. It can build BSV, Hsql (new format) and Hsql (current format) to be used by the new or current dictionary lookup modules. This devlist announcement is a little premature on my part. I will not get usage documentation into sandbox for a day or two, but I can send you copies as I go if you are in a hurry, or just give you xml snippets for the current module descriptors. If you send the format of your dictionary then that can be done quickly. I just wanted to let you know that there is another option wrt dictionary lookup. Sean -----Original Message----- From: Abhishek De [mailto:abhishek...@alumnux.com] Sent: Friday, February 28, 2014 6:58 AM To: dev@ctakes.apache.org Subject: How to add a new dictionary database to cTAKES Hi, How do I add a new database to the cTAKES pipeline to perform lookup from? How do I specify what columns to look up and how to annotate the text with the returned hits? I have gone through the DictionaryLookupAnnotatorDB.xml and LookupDesc_Db.xml files. However, I could not understand the meanings of the terms like "lookupField", "metaField", "maxPermutationLevel" and "exclusionTags". If I add a new database, I need to configure this xml file properly. Please guide me regarding these problems. Thanks and Regards, Abhishek De