Re: [gpc-informatics] #109: mapping to PCORI CDM (aka mini-sentinel data model)

Klann, Jeffrey G. Mon, 09 Jun 2014 20:24:22 -0700

Hi Dan,

Some comments:

On A - I agree, we¹re nearing a workable ontology. Per our discussion
today, I gather that query compatibility across our networks is of utmost
importance. So we need to synchronize our fullnames and basecodes in the
PCORnet ontology. I started looking at differences between implementations
tonight. There are some things I need to update that have changed (e.g.,
race codes are apparently now prepended with a 0), but there are also some
customizations we made that we need to figure out how to handle. For
example, Shawn suggested a modifier for ³Mapped² vs ³Unmapped² rather than
having (empty) trees for homegrown ontologies. And third, I seem to have
removed the : from the base code if it is a scalar. I can¹t remember if I
did that in the code or manually. Any chance you can merge in the relevant
parts of my code changes?

Also, do you want to try to synchronize our ontology representations or
just make them query-compatible? If the latter, there is more work to do
but we¹d both be maintaining one ontology, which could be advantageous.
I¹ll look at this more Tues & Wed but will also send you my version of the
ontology in a separate email in case you have time to compare.

Also, I noticed in a brief glance that you still have some date fields
explicit that I made implicit in mine (I.e. Admit and discharge date) -
any reason for that? We¹re trying to prevent people from needing to modify
their fact table if possible, so a date constraint on a fact makes more
sense to me. 

Before step B, there are a couple other items on my plate. One, we want
some demo data to convince ourselves that the ontology works. That¹s in
process - a lot of the PCORnet terms that represent things in the demo
data are now query able on i2b2.org through the CDM ontology, just by
modifications to the ontology. I haven¹t added facts for things not in the
data at all yet (which you¹re welcome to take on if the mood strikes you).
Two, we¹re also working on standardized mapping processes to get i2b2
facts working with the PCORnet ontology, through adding children of
PCORnet items and extending dimension dim codes.

Step B, AFAIK, will be SQL, not SAS. I¹ve heard they¹re writing an adapter
to convert their XML representation to SQL, so maybe we can use the XML
(perhaps not programmatically - perhaps only for humans to hand-enter
queries). Not 100% sure.

Step C, I¹m starting to think about the materialization of the CDM. I
wrote some SQL today that makes a nice table of ETL operations based on
the PCORnet ontology table - but it breaks anywhere there¹s a special case
- like: base codes under _CODETYPE paths actually represent a code, not a
code type; implicit dates are not represented, etc. Options to handle this:
1) Special cases in SQL
2) Use other fields in the ontology, like comment
3) Define the ETL process from your Python code, not from the metadata
table. Possibly there is more power here, but it is not generalizable
beyond PCORnet which I don¹t like.
4) Use an open source ETL engine. Shawn suggested this, and I don¹t know
much about them, but I installed a couple today and they were all giant
and overbearing. Probably more than we need, with a steep learning curve.
But if you have experience, I¹m open.
Thoughts?

Well, that got long. Hope it¹s helpful.

- Jeff K.

Jeffrey Klann, PhD
Instructor of Medicine, Harvard Medical School
Assistant in Computer Science, Massachusetts General Hospital
PhD in Research Information Systems and Computing, Partners Healthcare
ofc: 617-643-5879
jeff.kl...@mgh.harvard.edu

On 6/9/14, 4:30 PM, "GPC Informatics" <d...@madmode.com> wrote:

>#109: mapping to PCORI CDM (aka mini-sentinel data model)
>--------------------------+-----------------------------------
> Reporter:  rwaitman      |       Owner:  dconnolly
>     Type:  enhancement   |      Status:  accepted
> Priority:  major         |   Milestone:  initial-data-domains
>Component:  data-sharing  |  Resolution:
> Keywords:                |  Blocked By:  89
> Blocking:                |
>--------------------------+-----------------------------------
>
>Comment (by dconnolly):
>
> While Jeff and I are perhaps about done with representing the CDM in
>i2b2,
> other discussions suggest this is one of parts of getting the whole thing
> working:
>
>   a. represent CDM in i2b2 #109
>   b. approximate a popmednet query (SAS script? plain text description?)
> by an i2b2 query
>   c. run the query and materialize the results a la the CDM
>   d. run the SAS code that came in via popmednet
>      - needs SAS environment #117?
>
> I gather the [http://scilhs.org/2014/03/11/scilhs-query-workflow/ SCILHS
> Query Workflow] includes parts b, c, and d.
>
>--
>Ticket URL: 
><http://informatics.gpcnetwork.org/trac/Project/ticket/109#comment:14>
>gpc-informatics <http://informatics.gpcnetwork.org/>
>Greater Plains Network - Informatics

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Re: [gpc-informatics] #109: mapping to PCORI CDM (aka mini-sentinel data model)

Reply via email to