Hi - My question is a little different, and I'm OK if there is a way to solve this puzzle either through cTAKES, OR, through UMLS lookups, OR, through lookups in other published databases. At this time, I really don't know if this can be solved through Machine Learning algorithms.
Problem: I've been asked to find out if the following is possible: "Given a pharma regulatory document (say a searchable PDF document) related to drug(s), predict the corresponding 'Primary Compound ID'. The format of a primary compound ID could be - <<pharma company name>>-<<numeric digits>>-<<three or two letters abbreviation>>. To make the scenario easier, I'll consider the following case: Primary Compound ID: CNTO148. This is a deviation to the above format. If we split this ID, it would represent CNTO as the pharma company (Centocor Biotech, Inc). I don't know what the number 148 represent. However, CNTO148 is the pre-marketing name given during clinical trial phases. It's actual trademark is "SIMPONI" and the International Non-proprietary name (INN) is "Golimumab". The condition mentioned for this drug is 'Rheumatoid Arthritis' Question: Using cTAKES if I could identify the product as "SIMPONI" and the indication as 'Rheumatoid Arthritis', is there a way to identify or derive its 'Primary Compound ID' - in this case CNTO148 - (or sometimes called as 'Controlling Product') through some mechanism? My analysis: If I query the ClinicalTrials.gov data using the drug name, I'm able to find the corresponding 'Primary Compound ID' that was used during clinical study. But this ID is not available for all drug products from ClinicalTrials.gov database. I'm looking at a consistent way to derive the 'Primary Compound ID' if these IDs are registered anywhere. Other questions: What meaning does the abbreviations used in 'Primary Compound ID' contain (three or two letters abbreviation in the format defined above)? Some example abbreviations (there are many more): * AAB * AC * AN * AAA * AAC * AMK * ZBR * AER * AEN Is there a vocabulary where these are listed that I could study? Thanks Sekhar Hari | AI Program Lead | Health Sciences R&D | Asia Pacific Solutions Delivery Center +91 814 7027 779 (C)