#145: transform (ETL) GPC i2b2 data to PCORNet CDM --------------------------+---------------------------- Reporter: dconnolly | Owner: ngraham Type: enhancement | Status: assigned Priority: major | Milestone: data-domains2 Component: data-sharing | Resolution: Keywords: | Blocked By: 109 Blocking: 160 | --------------------------+----------------------------
Comment (by ngraham): Thanks to Alex at UTHSCSA (via Dan) and Dan for pointing us to the fact that the `c_dimcode` is what's used to by i2b2 to find the relevant concept codes from the concept dimension and can be different from the `c_fullname` which defines what we see in the concept hierarchy. Given that pointer, I was able to run a query at KU using the i2b2 PCORI hierarchy ([https://bitbucket.org/DanC/pcornet-dm pconet-dm]) that returned patient counts based on our local terminology. Of course, we want to use GPC terminology eventually. I then used Dan's [wiki:PortQuery Port Query plugin] to run the query on Babel (which I then put in the shared folder): `SHARED->CDM_ETL_145->CDM_GPC_Mapping_Prototype [12-18-2014] [ngra...@kumc.edu]`. ==== Details ==== Load the PCORI terminlology that Dan built ([https://bitbucket.org/DanC /pcornet-dm pconet-dm]) on the KUMC i2b2 test instance. On the test instance: {{{ create table blueheronmetadata.pcori_terms as select * from blueheronmetadata.heron_terms where 1=0; }}} Export the PCORI hierarchy from Babel and import on our test instance: {{{ select * from i2b2metadata.table_access where c_table_name = 'pcori_terms'; -- for reference select * from i2b2metadata.pcori_terms; }}} Imported the resulting csv and updated the table_access table accordingly. Then, as a test, pick a term to translate to local terminology (PCORI: `\\PCORI_\PCORI\DEMOGRAPHIC\HISPANIC\Y\` HERON: `\i2b2\Demographics\Ethnicity\Hispanic, Latino or Spanish Origin\`) {{{ update BlueHeronMetadata.pcori_terms set c_dimcode = '\i2b2\Demographics\Ethnicity\Hispanic, Latino or Spanish Origin\' where c_fullname = '\PCORI\DEMOGRAPHIC\HISPANIC\Y\' ; }}} Now, running a query based on PCORI terminology returns patient counts at KU! {{{ Finished Query: "CDM_ETL_Test" [3.3 secs] Compute Time: 2 secs Number of patients for "CDM_ETL_Test" patient_count: 26 }}} Then, I used Dan's [wiki:PortQuery Port Query plugin] to run the query on Babel (which I then put in the shared folder): `SHARED->CDM_ETL_145->CDM_GPC_Mapping_Prototype [12-18-2014] [ngra...@kumc.edu]`. Note that one slight wrinkle is that the c_columndatatype should be "T" rather than '@' (thanks to Matt for helping me debug that one). I added [https://bitbucket.org/DanC/pcornet-dm/issue/6/c_columndatatype-should- be-t-rather-than an issue to pcornet-dm]. For now, I just did: {{{ update i2b2metadata.pcori_terms set c_columndatatype = 'T' where c_fullname = '\PCORI\DEMOGRAPHIC\HISPANIC\Y\' ; }}} Note I had to do the same thing on the KU instance. I'm close to being able to then build the CDM table with at least that one column based on [https://bitbucket.org/njgraham/pcori-annotated-data- dictionary/src/1c1c0f980377bbd1adbe839edf84d8de92f5b1de/CDM_transform.sql?at=default CDM_transform.sql] we used for the ETL Annotated Data Dictionary. Instead of using the `path_map` file to map from PCORI to HERON terms, it could just query the PCORI hierarchy directly. -- Ticket URL: <http://informatics.gpcnetwork.org/trac/Project/ticket/145#comment:14> gpc-informatics <http://informatics.gpcnetwork.org/> Greater Plains Network - Informatics _______________________________________________ Gpc-dev mailing list Gpc-dev@listserv.kumc.edu http://listserv.kumc.edu/mailman/listinfo/gpc-dev