Hi Britt,

This did come up briefly wrt NCI custom cuis, but there was no urgency so it 
got shoved onto a back burner.  I’m glad that you brought it up again as it is 
something that I’ve been wanting to enhance.

My thought at the moment is to change CuiCodeUtil (…lookup2.util.) from a 
non-instantiable utility class to a singleton.  In the singleton we could keep 
an array of String prefixes.  At each conversion of custom cuis String to long, 
store each unique String prefix in the array and add [arrayIndex*10000000] to 
the long version of the cui.  That should allow for more than enough custom cui 
prefixes.  Upon conversion back to the String version, grab the prefix 
according to the long/10000000 and append accordingly.  Of course, this would 
work just fine for single-character prefixes with a 7 digit cui.  
Multiple-character prefixes might require an array of Integer,prefix pairs, 
where each integer is the number of digits in the cui:  NLM003 ->  3,”NLM”.

I think that the array of pairs would work – it would just need to be carried 
through and tested.  Any thoughts?

Sean

From: britt fitch [mailto:britt.fi...@wiredinformatics.com]
Sent: Wednesday, July 08, 2015 2:23 PM
To: dev@ctakes.apache.org
Subject: dictionary-look-fast fails to handle alternative CUIs

This is largely directed to Sean but open to other feedback as well.

The current fast lookup using a BSV parses the first field as “C” and up to 7 
numerals, padding with “0" as needed to reach that length when applicable [see 
CuiCodeUtil.getCuiCode(String)]

The CUI string is then substring’d from 1 to len and parsed as a Long.

This is producing issues with other related, but separate, ontologies (MedGen) 
where the bulk of concepts use UMLS CUIs but some additional concepts were 
created by the NCBI where no CUI previously existed.
These MedGen-specific concepts are created with a prefix “CN” + 6 numerals, 
resulting in “N123456” failing to produce a Long.

I wanted Sean’s thoughts on this and to get some feedback on if others are 
running into this issue and if the community wants a solution to providing a 
CUI format beyond the standard C + 7 numerals.

I’m happy to make these edits and check them in whether that means updating the 
CuiCodeUtil class or creating an entirely new BSVConceptFactory if thats what 
makes the most sense.

Thoughts?









Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com<mailto:britt.fi...@wiredinformatics.com>

Reply via email to