> It seems like a reasonable use case!

I'm actually working on a project that's doing basically this (indexing and 
aggregating data abstracted from single-visit and multi-site epi studies), and 
I agree that this is a great use case. Right now, we're using a relational data 
model, but I am firmly convinced that doing this the "Right Way" would require 
the flexibility and richness of an ontology, simply because of how complex the 
data is, both in terms of how it should be represented as well as in terms of 
the complexities involved in its aggregation. I'm working with a team of 
systematic reviewers and epidemiologists, and so far, pretty much every data 
element that we've added to the system has some sort of bearing on whether and 
how one goes about aggregating data, which means that any system trying to do 
this in a generalizable way has to have some way of encoding *that* knowledge, 
as well (e.g., "data points with attributes X, Y, and Z are aggregated thusly, 
whereas data points with X, Y, and A are aggregated in some other way"). 

> It seems like you'd need to identify the factors of interest--e.g. disease, 
> selection criteria, research questions--and aggregate on those. Someone who 
> actually does metaanalyses would be more aware of what factors are 
> relevant/important.


In case anybody out there is thinking of doing this, here's some of what we 
store in our system (in addition to the data elements you've identified above, 
all of which are also relevant and included in our system):
        - study design (which we model using several different attributes- 
prospective/retrospective, case-control/cohort, etc.), 
        - study setting (geographic location, hospital/outpatient, some data 
about who the study population was (military, pediatric, etc.), plus a fair bit 
of domain-specific stuff that's related to the particular medical topic with 
which we're working)
        - observation time points (both fixed ("we measured the prevalence of 
symptom X at Y days" as well as date ranges ("we measured the incidence of 
disease X between Y and Z days", sometimes reported as means with standard 
deviations or confidence intervals instead of explicit time points))
        - whether a given observation was a mean, a proportion, something else, 
with confidence intervals, without confidence intervals, sometimes with sample 
size (sometimes broken out by treatment and control group status, often for 
multiple treatment groups), 
        - etc. etc. etc... and down the rabbit hole we go, and thus far I've 
only talked about the different kinds of metadata we have to store- not even 
about the data itself that we wanted to aggregate! :-)

In spite of all of this, it's a really great domain to be working in- there's a 
ton of low-hanging data management fruit out there for systematic reviewers. 
The ones I work with (at a major evidence-based practice center that does tons 
of AHRQ and USPSTF reviews) basically live in EndNote and Microsoft Word, and 
use those as their data management platform. Coming up with tools to help them 
work more effectively is really satisfying- I wish I'd had a camera running the 
day I told them that they could all use the system to enter data simultaneously 
(instead of having to keep track of who had the EndNote file open at any given 
time). They were like kids at Christmas...

-SB

Reply via email to