audited meaning: the query and who made it (and the results) are stored in i2b2's audit tables (QT_QUERY_MASTER and friends).
The HERON governance process includes an audit report showing small patient set results that should detect intentional efforts to re-identify a patient; our honest broker (Tamara) reviews it in preparation for our DROC meeting each month. see class I2B2SensitiveUsage in audit_usage.py<https://informatics.kumc.edu/work/browser/raven-j/heron_wsgi/admin_lib/audit_usage.py> for details. -- Dan ________________________________ From: gpc-dev-boun...@listserv.kumc.edu [gpc-dev-boun...@listserv.kumc.edu] on behalf of Alex Bokov [bo...@uthscsa.edu] Sent: Tuesday, October 07, 2014 9:41 AM To: gpc-dev@listserv.kumc.edu Subject: Re: R, SQL, i2b2, and governance RE: Example (Re: Empirical Data Dictionary) Yes, I'm fully in agreement there-- no direct, unlimited queries by investigators. Or under normal circumstances, us, for that matter. The only variable parts are the patient/visit sets. I see the goal of the initial cohort work is learning how to generalize these queries so they can be automated and run without exposing the end user of the data to unnecessary internals. My actual cohort query is a lot less broad than the Empirical Data Dictionary. It will basically take a I2B2-generated patient/visit set and do the non-oracle-specific equivalent of a pivot on year for all of them. It will also return a top-N by prevalence list of leaf concepts for that patient-set or visit-set. The logic is similar whether it's implemented in R or one's favorite flavor of SQL (even if the actual syntax is mind-meltingly different). What is meant by audited query? The code for the generic query has been reviewed by a trusted party? Or the individual instance of each query is reviewed manually before being approved? Either sounds like a good idea. On 10/06/2014 05:42 PM, Dan Connolly wrote: (Please excuse the awkward top-posting format; I'm stuck with Microsoft Outlook.) Perhaps we're converging... the new Data Builder code<https://informatics.gpcnetwork.org/trac/Project/ticket/134#comment:4> delivers an sqlite3 file, so you can continue to use SQL to analyze it; and if you like python or Java better than R for post-SQL work, that's fine too. But note that each Data Builder result is based on an i2b2 patient set that came from and audited i2b2 query. We don't have governance to let investigators run arbitrary SQL queries on our whole clinical data warehouse and we don't plan to (neither KUMC HERON nor GPC). For the 3 initial cohorts, we can get away with ad-hoc one-off work, but for GPC work in general, we plan do use i2b2 to do as much of the querying as we can.
_______________________________________________ Gpc-dev mailing list Gpc-dev@listserv.kumc.edu http://listserv.kumc.edu/mailman/listinfo/gpc-dev