Yes, I'm fully in agreement there-- no direct, unlimited queries by
investigators. Or under normal circumstances, us, for that matter. The
only variable parts are the patient/visit sets.
I see the goal of the initial cohort work is learning how to generalize
these queries so they can be automated and run without exposing the end
user of the data to unnecessary internals. My actual cohort query is a
lot less broad than the Empirical Data Dictionary. It will basically
take a I2B2-generated patient/visit set and do the non-oracle-specific
equivalent of a pivot on year for all of them. It will also return a
top-N by prevalence list of leaf concepts for that patient-set or
visit-set.
The logic is similar whether it's implemented in R or one's favorite
flavor of SQL (even if the actual syntax is mind-meltingly different).
What is meant by audited query? The code for the generic query has been
reviewed by a trusted party? Or the individual instance of each query is
reviewed manually before being approved? Either sounds like a good idea.
On 10/06/2014 05:42 PM, Dan Connolly wrote:
(Please excuse the awkward top-posting format; I'm stuck with
Microsoft Outlook.)
Perhaps we're converging... the new Data Builder code
<https://informatics.gpcnetwork.org/trac/Project/ticket/134#comment:4>
delivers an sqlite3 file, so you can continue to use SQL to analyze
it; and if you like python or Java better than R for post-SQL work,
that's fine too.
But note that each Data Builder result is based on an i2b2 patient set
that came from and *audited i2b2 query*. We don't have governance to
let investigators run arbitrary SQL queries on our whole clinical data
warehouse and we don't plan to (neither KUMC HERON nor GPC). For the 3
initial cohorts, we can get away with ad-hoc one-off work, but for GPC
work in general, we plan do use i2b2 to do as much of the querying as
we can.
_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev