Just remembered, as of Hackathon3 I owe everyone the "validation script" I wrote for DataBuilder (script attached). If you pass it to sqlite3 on the command line, it will spit out a bunch of tables summarizing what's in your .db file.

This is not dependent on what database you run your i2b2 on, it uses only SQLite. Here is how to invoke it on a Linux command line: *sqlite3 -column -header YOURDATABUILDERFILE.db < dbuilder_validate.sql > YOURSITENAME.txt*

Probably same syntax on MacOS shell. Hopefully some Windows user will be kind enough to furnish the equivalent for cmd.

Nothing sophisticated, but it can tell you if you are missing large numbers of EHR data elements. Hope it helps.

*Here is output from a good .db file (abridged for readability):*

tbl N           N Distinct
------------------  ---------- ----------
MODIFIER_DIMENSION  0 0
CONCEPT_DIMENSION   5419 5376
PATIENT_DIMENSION   6561 6561
OBSERVATION_FACT 420970
VARIABLE            52 52

----------


------------------------------- ----------
Patients that have observations 6561
Observations that have patients 420970

----------

Data elements redundant with selection criteria
-----------------------------------------------
9

---------------------------
Non redundant data elements
concept_path name_char
----------------------------------------------------------------------------------------------------------------------------- \i2b2\labs loinc\LP29693-6\LP29697-7\LP71800-4\54037-7\54050-0\54053-4\24323-8\ 24323-8: Comprehensive metabolic 2000 ... \i2b2\labs loinc\LP29693-6\LP29697-7\LP7834-7\24348-5\ 24348-5: Free t4 and tsh panel - serum ... \i2b2\labs loinc\LP29693-6\LP29697-7\LP71800-4\54051-8\24362-6\ 24362-6: Renal function 2000 panel - ...
... many additional lines omitted ...
TOTAL ELEMENTS 41

--------------------------------------------------------------
Non redundant data elements actually found in OBSERVATION_FACT
concept_path name_char
----------------------------------------------------------------------------------------------------------------------------- \i2b2\labs loinc\LP29693-6\LP29697-7\LP71800-4\54037-7\54050-0\54053-4\24323-8\ 24323-8: Comprehensive metabolic 2000 ... \i2b2\labs loinc\LP29693-6\LP29697-7\LP7834-7\24348-5\ 24348-5: Free t4 and tsh panel - serum ... \i2b2\labs loinc\LP29693-6\LP29697-7\LP71800-4\54051-8\24362-6\ 24362-6: Renal function 2000 panel - ...
... many additional lines omitted ...
TOTAL ELEMENTS 31


*Here is output from a site that didn't include any data elements except the ones used to select the patient set in the first place (that's what I mean by non-redundant in the report above-- non redundant with what the researchers already know because it was part of the selection criteria for the survey... the point of the i2b2 pull is to get the labs, diagnoses, procedures, and meds specified at the bottom of **GPC Obesity Query v1.3.pdf <https://informatics.gpcnetwork.org/trac/Project/raw-attachment/ticket/254/GPC%20Obesity%20Query%20v1.3.pdf> or in **obesity_data_elements_keys_names_codes.xls <https://informatics.gpcnetwork.org/trac/Project/raw-attachment/ticket/254/obesity_data_elements_keys_names_codes.xls>):*

tbl N           N Distinct
------------------  ----------  ----------
MODIFIER_DIMENSION  0           0
CONCEPT_DIMENSION   36          35
PATIENT_DIMENSION   17562       17562
OBSERVATION_FACT    2509960
VARIABLE            29          27

----------


-------------------------------  ----------
Patients that have observations  17562
Observations that have patients  2509960

----------

Data elements redundant with selection criteria
-----------------------------------------------
25

---------------------------
Non redundant data elements
concept_path    name_char
--------------  ----------
TOTAL ELEMENTS  0

--------------------------------------------------------------
Non redundant data elements actually found in OBSERVATION_FACT
concept_path    name_char
--------------  ----------
TOTAL ELEMENTS  0

Thanks.

-- Alex

--

Alex F. Bokov, Ph.D.
Deputy Chief, Clinical Informatics Research Division
UT Health Science Center at San Antonio

Attachment: dbuilder_validate.sql
Description: application/sql

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to