Hi -

My question is a little different, and I'm OK if there is a way to solve this 
puzzle either through cTAKES, OR, through UMLS lookups, OR, through lookups in 
other published databases. At this time, I really don't know if this can be 
solved through Machine Learning algorithms.

Problem:
I've been asked to find out if the following is possible:
"Given a pharma regulatory document (say a searchable PDF document) related to 
drug(s), predict the corresponding 'Primary Compound ID'.

The format of a primary compound ID could be - <<pharma company 
name>>-<<numeric digits>>-<<three or two letters abbreviation>>.

To make the scenario easier, I'll consider the following case:
Primary Compound ID: CNTO148.
This is a deviation to the above format. If we split this ID, it would 
represent CNTO as the pharma company (Centocor Biotech, Inc). I don't know what 
the number 148 represent.

However, CNTO148 is the pre-marketing name given during clinical trial phases. 
It's actual trademark is "SIMPONI" and the International Non-proprietary name 
(INN) is "Golimumab". The condition mentioned for this drug is 'Rheumatoid 
Arthritis'

Question:
Using cTAKES if I could identify the product as "SIMPONI" and the indication as 
'Rheumatoid Arthritis', is there a way to identify or derive its 'Primary 
Compound ID' - in this case CNTO148 - (or sometimes called as 'Controlling 
Product') through some mechanism?

My analysis:
If I query the ClinicalTrials.gov data using the drug name, I'm able to find 
the corresponding 'Primary Compound ID' that was used during clinical study. 
But this ID is not available for all drug products from ClinicalTrials.gov 
database. I'm looking at a consistent way to derive the 'Primary Compound ID' 
if these IDs are registered anywhere.

Other questions:
What meaning does the abbreviations used in 'Primary Compound ID' contain 
(three or two letters abbreviation in the format defined above)?
Some example abbreviations (there are many more):

*         AAB

*         AC

*         AN

*         AAA

*         AAC

*         AMK

*         ZBR

*         AER

*         AEN

Is there a vocabulary where these are listed that I could study?

Thanks
Sekhar Hari | AI Program Lead | Health Sciences R&D | Asia Pacific Solutions 
Delivery Center
+91 814 7027 779 (C)

Reply via email to