Re: [gpc-informatics] #109: mapping to PCORI CDM (aka mini-sentinel data model)

Klann, Jeffrey G. Wed, 11 Jun 2014 14:19:29 -0700

Dan, sounds good, I spent some time sync’ing up with your version
(pathnames only) and here are the results of the coin toss afterward:


1. DATE and TIME fields are implicit in my version. I added them to mine,
but I left them hidden. It is more queryable if they are implicit, but I
agree nice to have it in your face if it is an interoperability widget.

2. ENROLLMENT\CHART is not a modifier in your version. This is somewhat
arbitrary, but it’s a little cleaner as a modifier I think. If it is a
fact and not a modifier, you can’t do a query in the UI for ‘all with an
enrollment start greater than 5/5/2005 who have CHART:Y during that
enrollment event’. (It’s the ‘during that enrollment event’ part you need
it to be a modifier for.)

3. DIAGNOSIS\DX_TYPE\ and PROCEDURE\PX_TYPE\ are compressed to just
DIAGNOSIS\ and PROCEDURE\ in my version, with the types underneath. This
just eliminates an unnecessary level of the tree and is easier to use.

4. PROCEDURE\DRG_TYPE\ has just DRGs under it (and now I added null
flavors), not 01\ and 02\. Perhaps I need to change this because it’s
inconsistent with everything else. But I thought it would be cleaner to
create a combined tree. Probably not for this version, though.

5. My version has ORIGDX and ORIGPX underneath Diagnoses and Procedures,
respectively (and the null flavor codes are in this subtree). Shawn
suggested this when he saw “Local Homegrown” under procedures - that’s
disappeared but this could facilitate populating the RAW_ fields. I’m
still pondering this one and how it’d work. We don’t expect sites to
actually add modifiers to their fact table about whether this was mapped
from a local code, it’s something we’d want to pick up at query time.

To your questions… you’re right, our basecodes don’t have to sync up for
querying. I was thinking ahead to collaborating on an ETL process and
thinking it might be helpful. The query method (which you probably know)
is full name -> dim code -> pathname (in concept dimension) -> concept_cd
(in concept dimension). The c_basecode generally equals the concept
dimension’s concept_cd.

I might have added spaces in manually. That’s not a good solution though,
I agree. The logic I used for capitalization could be used for spaces.

Basecodes don’t require colons, only if you want to search by code type in
the UI. In which case the part before the colon is a scheme (code type to
search by) and the part after is a code. So in cases where there is no
code list (e.g., scalars) it doesn’t make a lot of sense. Then again, I
suppose it doesn’t hurt anything. We could consider prepending everything
with ‘PCORNET|’ to avoid naming conflicts with local sites’ codes...

Ok, I’m out of time. I think we’re very close!
 

Jeffrey Klann, PhD
Instructor of Medicine, Harvard Medical School
Assistant in Computer Science, Massachusetts General Hospital
PhD in Research Information Systems and Computing, Partners Healthcare
ofc: 617-643-5879
jeff.kl...@mgh.harvard.edu





On 6/10/14, 11:13 AM, "Dan Connolly" <dconno...@kumc.edu> wrote:

>As close as we are, I think it would be a waste to not converge on one
>ontology. It seems to be coin tosses from here on in.
>
>That said, it's not clear to me why shared basecodes are essential for
>query compatibility. I've never seen a basecode in a query.
>
>On mapped vs. unmapped... I thought that was something in the earlier
>drafts but not in CDM v1.0. oops. I'm not sure I understand how it works.
>Care to elaborate with an example?
>
>I've been incorporating your code changes manually. I thought all
>basecodes were supposed to have colons, but I suppose it doesn't matter.
>I can take them off the scalars.
>
>On admit date and discharge date, I waffle back and forth. At first I hid
>them, assuming they're subsumed by start_date and end_date; then I
>removed the special case because I wanted to be sure it was "in my face"
>as I thought through steps b, c, an d.
>
>I'm still not clear on how you came up with the labels. I tried the code
>you added...
>
>>>> word="HUMPTY_DUMPTY"
>>>> ''.join(x.capitalize() or '_' for x in word.split('_'))
>'HumptyDumpty'
>
>Did you add spaces back in manually?
>
>More on steps b, c, and d separately...
>
>
>-- 
>Dan
>
>________________________________________
>From: Klann, Jeffrey G. [jeff.kl...@mgh.harvard.edu]
>Sent: Monday, June 09, 2014 10:23 PM
>To: gpc-dev@listserv.kumc.edu; Russ Waitman; Dan Connolly
>Cc: Matthew Hoag; Nathan Graham; campb...@unmc.edu; Murphy, Shawn N.
>Subject: Re: [gpc-informatics] #109: mapping to PCORI CDM (aka
>mini-sentinel data model)
>
>Hi Dan,
>
>Some comments:
>
>On A - I agree, we¹re nearing a workable ontology. Per our discussion
>today, I gather that query compatibility across our networks is of utmost
>importance. So we need to synchronize our fullnames and basecodes in the
>PCORnet ontology. I started looking at differences between implementations
>tonight. There are some things I need to update that have changed (e.g.,
>race codes are apparently now prepended with a 0), but there are also some
>customizations we made that we need to figure out how to handle. For
>example, Shawn suggested a modifier for ³Mapped² vs ³Unmapped² rather than
>having (empty) trees for homegrown ontologies. And third, I seem to have
>removed the : from the base code if it is a scalar. I can¹t remember if I
>did that in the code or manually. Any chance you can merge in the relevant
>parts of my code changes?
>
>Also, do you want to try to synchronize our ontology representations or
>just make them query-compatible? If the latter, there is more work to do
>but we¹d both be maintaining one ontology, which could be advantageous.
>I¹ll look at this more Tues & Wed but will also send you my version of the
>ontology in a separate email in case you have time to compare.
>
>Also, I noticed in a brief glance that you still have some date fields
>explicit that I made implicit in mine (I.e. Admit and discharge date) -
>any reason for that? We¹re trying to prevent people from needing to modify
>their fact table if possible, so a date constraint on a fact makes more
>sense to me.
>
>Before step B, there are a couple other items on my plate. One, we want
>some demo data to convince ourselves that the ontology works. That¹s in
>process - a lot of the PCORnet terms that represent things in the demo
>data are now query able on i2b2.org through the CDM ontology, just by
>modifications to the ontology. I haven¹t added facts for things not in the
>data at all yet (which you¹re welcome to take on if the mood strikes you).
>Two, we¹re also working on standardized mapping processes to get i2b2
>facts working with the PCORnet ontology, through adding children of
>PCORnet items and extending dimension dim codes.
>
>Step B, AFAIK, will be SQL, not SAS. I¹ve heard they¹re writing an adapter
>to convert their XML representation to SQL, so maybe we can use the XML
>(perhaps not programmatically - perhaps only for humans to hand-enter
>queries). Not 100% sure.
>
>Step C, I¹m starting to think about the materialization of the CDM. I
>wrote some SQL today that makes a nice table of ETL operations based on
>the PCORnet ontology table - but it breaks anywhere there¹s a special case
>- like: base codes under _CODETYPE paths actually represent a code, not a
>code type; implicit dates are not represented, etc. Options to handle
>this:
>1) Special cases in SQL
>2) Use other fields in the ontology, like comment
>3) Define the ETL process from your Python code, not from the metadata
>table. Possibly there is more power here, but it is not generalizable
>beyond PCORnet which I don¹t like.
>4) Use an open source ETL engine. Shawn suggested this, and I don¹t know
>much about them, but I installed a couple today and they were all giant
>and overbearing. Probably more than we need, with a steep learning curve.
>But if you have experience, I¹m open.
>Thoughts?
>
>Well, that got long. Hope it¹s helpful.
>
>- Jeff K.
>
>Jeffrey Klann, PhD
>Instructor of Medicine, Harvard Medical School
>Assistant in Computer Science, Massachusetts General Hospital
>PhD in Research Information Systems and Computing, Partners Healthcare
>ofc: 617-643-5879
>jeff.kl...@mgh.harvard.edu
>
>
>On 6/9/14, 4:30 PM, "GPC Informatics" <d...@madmode.com> wrote:
>
>>#109: mapping to PCORI CDM (aka mini-sentinel data model)
>>--------------------------+-----------------------------------
>> Reporter:  rwaitman      |       Owner:  dconnolly
>>     Type:  enhancement   |      Status:  accepted
>> Priority:  major         |   Milestone:  initial-data-domains
>>Component:  data-sharing  |  Resolution:
>> Keywords:                |  Blocked By:  89
>> Blocking:                |
>>--------------------------+-----------------------------------
>>
>>Comment (by dconnolly):
>>
>> While Jeff and I are perhaps about done with representing the CDM in
>>i2b2,
>> other discussions suggest this is one of parts of getting the whole
>>thing
>> working:
>>
>>   a. represent CDM in i2b2 #109
>>   b. approximate a popmednet query (SAS script? plain text description?)
>> by an i2b2 query
>>   c. run the query and materialize the results a la the CDM
>>   d. run the SAS code that came in via popmednet
>>      - needs SAS environment #117?
>>
>> I gather the [http://scilhs.org/2014/03/11/scilhs-query-workflow/ SCILHS
>> Query Workflow] includes parts b, c, and d.
>>
>>--
>>Ticket URL:
>><http://informatics.gpcnetwork.org/trac/Project/ticket/109#comment:14>
>>gpc-informatics <http://informatics.gpcnetwork.org/>
>>Greater Plains Network - Informatics
>
>
>
>The information in this e-mail is intended only for the person to whom it
>is
>addressed. If you believe this e-mail was sent to you in error and the
>e-mail
>contains patient information, please contact the Partners Compliance
>HelpLine at
>http://www.partners.org/complianceline . If the e-mail was sent to you in
>error
>but does not contain patient information, please contact the sender and
>properly
>dispose of the e-mail.
>

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Re: [gpc-informatics] #109: mapping to PCORI CDM (aka mini-sentinel data model)

Reply via email to