RE: R, SQL, i2b2, and governance RE: Example (Re: Empirical Data Dictionary)

Dan Connolly Tue, 07 Oct 2014 08:25:15 -0700

audited meaning: the query and who made it (and the results) are stored in 
i2b2's audit tables (QT_QUERY_MASTER and friends).


The HERON governance process includes an audit report showing small patient set 
results that should detect intentional efforts to re-identify a patient; our 
honest broker (Tamara) reviews it in preparation for our DROC meeting each 
month.

see class I2B2SensitiveUsage in 
audit_usage.py<https://informatics.kumc.edu/work/browser/raven-j/heron_wsgi/admin_lib/audit_usage.py>
 for details.

--
Dan

________________________________
From: gpc-dev-boun...@listserv.kumc.edu [gpc-dev-boun...@listserv.kumc.edu] on 
behalf of Alex Bokov [bo...@uthscsa.edu]
Sent: Tuesday, October 07, 2014 9:41 AM
To: gpc-dev@listserv.kumc.edu
Subject: Re: R, SQL, i2b2, and governance RE: Example (Re: Empirical Data 
Dictionary)

Yes, I'm fully in agreement there-- no direct, unlimited queries by 
investigators. Or under normal circumstances, us, for that matter. The only 
variable parts are the patient/visit sets.

I see the goal of the initial cohort work is learning how to generalize these 
queries so they can be automated and run without exposing the end user of the 
data to unnecessary internals. My actual cohort query is a lot less broad than 
the Empirical Data Dictionary. It will basically take a I2B2-generated 
patient/visit set and do the non-oracle-specific equivalent of a pivot on year 
for all of them. It will also return a top-N by prevalence list of leaf 
concepts for that patient-set or visit-set.

The logic is similar whether it's implemented in R or one's favorite flavor of 
SQL (even if the actual syntax is mind-meltingly different).

What is meant by audited query? The code for the generic query has been 
reviewed by a trusted party? Or the individual instance of each query is 
reviewed manually before being approved? Either sounds like a good idea.


On 10/06/2014 05:42 PM, Dan Connolly wrote:
(Please excuse the awkward top-posting format; I'm stuck with Microsoft 
Outlook.)

Perhaps we're converging... the new Data Builder 
code<https://informatics.gpcnetwork.org/trac/Project/ticket/134#comment:4> 
delivers an sqlite3 file, so you can continue to use SQL to analyze it; and if 
you like python or Java better than R for post-SQL work, that's fine too.

But note that each Data Builder result is based on an i2b2 patient set that 
came from and audited i2b2 query. We don't have governance to let investigators 
run arbitrary SQL queries on our whole clinical data warehouse and we don't 
plan to (neither KUMC HERON nor GPC). For the 3 initial cohorts, we can get 
away with ad-hoc one-off work, but for GPC work in general, we plan do use i2b2 
to do as much of the querying as we can.

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

RE: R, SQL, i2b2, and governance RE: Example (Re: Empirical Data Dictionary)

Reply via email to