#145: transform (ETL) GPC i2b2 data to PCORNet CDM
--------------------------+----------------------------
 Reporter:  dconnolly     |       Owner:  ngraham
     Type:  enhancement   |      Status:  assigned
 Priority:  major         |   Milestone:  data-domains2
Component:  data-sharing  |  Resolution:
 Keywords:                |  Blocked By:  109
 Blocking:  160           |
--------------------------+----------------------------

Comment (by ngraham):

 Thanks to Alex at UTHSCSA (via Dan) and Dan for pointing us to the fact
 that the `c_dimcode` is what's used to by i2b2 to find the relevant
 concept codes from the concept dimension and can be different from the
 `c_fullname` which defines what we see in the concept hierarchy.

 Given that pointer, I was able to run a query at KU using the i2b2 PCORI
 hierarchy ([https://bitbucket.org/DanC/pcornet-dm pconet-dm]) that
 returned patient counts based on our local terminology.  Of course, we
 want to use GPC terminology eventually.

 I then used Dan's [wiki:PortQuery Port Query plugin] to run the query on
 Babel (which I then put in the shared folder):
 `SHARED->CDM_ETL_145->CDM_GPC_Mapping_Prototype [12-18-2014]
 [ngra...@kumc.edu]`.


 ==== Details ====
 Load the PCORI terminlology that Dan built ([https://bitbucket.org/DanC
 /pcornet-dm pconet-dm]) on the KUMC i2b2 test instance.

 On the test instance:
 {{{
 create table blueheronmetadata.pcori_terms as
   select * from blueheronmetadata.heron_terms where 1=0;
 }}}

 Export the PCORI hierarchy from Babel and import on our test instance:
 {{{
 select * from i2b2metadata.table_access where c_table_name =
 'pcori_terms'; -- for reference
 select * from i2b2metadata.pcori_terms;
 }}}

 Imported the resulting csv and updated the table_access table accordingly.

 Then, as a test,  pick a term to translate to local terminology (PCORI:
 `\\PCORI_\PCORI\DEMOGRAPHIC\HISPANIC\Y\` HERON:
 `\i2b2\Demographics\Ethnicity\Hispanic, Latino or Spanish Origin\`)

 {{{
 update BlueHeronMetadata.pcori_terms
 set c_dimcode = '\i2b2\Demographics\Ethnicity\Hispanic, Latino or Spanish
 Origin\' where c_fullname = '\PCORI\DEMOGRAPHIC\HISPANIC\Y\'
 ;
 }}}

 Now, running a query based on PCORI terminology returns patient counts at
 KU!
 {{{
 Finished Query: "CDM_ETL_Test"
 [3.3 secs]
 Compute Time: 2 secs
 Number of patients for "CDM_ETL_Test"
 patient_count: 26
 }}}

 Then, I used Dan's [wiki:PortQuery Port Query plugin] to run the query on
 Babel (which I then put in the shared folder):
 `SHARED->CDM_ETL_145->CDM_GPC_Mapping_Prototype [12-18-2014]
 [ngra...@kumc.edu]`.

 Note that one slight wrinkle is that the c_columndatatype should be "T"
 rather than '@' (thanks to Matt for helping me debug that one).  I added
 [https://bitbucket.org/DanC/pcornet-dm/issue/6/c_columndatatype-should-
 be-t-rather-than an issue to pcornet-dm].  For now, I just did:
 {{{
 update i2b2metadata.pcori_terms
 set c_columndatatype = 'T' where c_fullname =
 '\PCORI\DEMOGRAPHIC\HISPANIC\Y\'
 ;
 }}}

 Note I had to do the same thing on the KU instance.

 I'm close to being able to then build the CDM table with at least that one
 column based on [https://bitbucket.org/njgraham/pcori-annotated-data-
 
dictionary/src/1c1c0f980377bbd1adbe839edf84d8de92f5b1de/CDM_transform.sql?at=default
 CDM_transform.sql] we used for the ETL Annotated Data Dictionary.  Instead
 of using the `path_map` file to map from PCORI to HERON terms, it could
 just query the PCORI hierarchy directly.

--
Ticket URL: 
<http://informatics.gpcnetwork.org/trac/Project/ticket/145#comment:14>
gpc-informatics <http://informatics.gpcnetwork.org/>
Greater Plains Network - Informatics
_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to