Dear Erik and all

(This email might appear a bit long but it actually makes just two 
points a) Data Synthesizer Tool, b)Availability of Realistic Subject data)

A) Data Synthesizer Tool
I absolutely agree on the "data synthesizer" tool.

It is something i would like to do as a test case for parsing an 
archetype's definition node and generating a representative object 
because in this case, each and every node defined in the spec would have 
to be handled.

It's not that much of a time consuming task if you already have the RM 
builder. The AM provides everything that is needed (For example: 
http://postimage.org/image/mcytss26f/ bounds for primitive types, 
cardinality / multiplicity for other data structures), so instead of 
just creating an object from the RM and attaching it in a hierarchy 
(just by calling its constructor maybe), some values would have to be 
generated and attached to its fields as well.

Once the RM object is constructed it can be serialized to anything (XML 
included) (and there goes a first "test base")

 From this perspective, it is absolutely essential that the XSDs are 
valid (to ensure a valid structure) and also (Seref's got a very good 
point) that the archetypes are valid to ensure a valid content.

B) Availability of Realistic Subject Data
As far as clinically realistic datasets are concerned, i would like to 
suggest the following:

The Alzheimer's Disease Neuroimaging Initiative (ADNI) in the US is a 
long term project that collects, longitudinally, various clinical 
parameters from subjects at various stages in the disease 
(http://adni.loni.ucla.edu/).

At the moment, the dataset contains about 800 subjects. Each subject 
would have 4-5 sessions associated with it (at 6 month intervals 
usually) and for each session a number of parameters would be collected 
such as MMSE scores, ADAS Cog scores, received medication, lab tests and 
others as well as imaging biomarkers (MRI mostly). A basic 
"demographics" section is also available for each subject.

(To put it in the context of a visualisation, the story that these data 
reveal is the progression of AD on a subject / population of subjects 
which is very interesting.)

The data are made available as CSV files (about 12 MB just for the 
numerical data). An application must be made to ADNI to obtain the data. 
As redistribution of the data is prohibited 
(http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_DSP_Policy.pdf) 
we would be working towards a tool that would accept a set of ADNI CSV 
files and transform them into a local openEHR enabled repository.

The task here would be to create some archetypes / templates that 
reflect the structure of the data shared by ADNI and then scan the CSVs 
and populate the openEHR enabled repository.

The CSV files are not in the best of conditions (the structure has been 
changed from version to version, certain fields (such as dates) might be 
in a number of different formats, the terminology is not exactly 
standardised, etc).

For us (ctmnd.org) to work on these files we have created an SQL 
database and a set of scripts that sanitize and import the CSVs.

I would be interested in turning this database into an openEHR enabled 
repository (whether a set of XML files or "proper" openEHR database) 
because it can be used for a number of things (especially for testing AQL).

If you think that this can be of help, let me know how we can progress 
with it.

Obviously the tool can be made available to everybody who can then apply 
to download the ADNI data locally.

I am not so sure about the data (even if they become totally 
anonymised), i will have to check, but in any case, going from "I have 
nothing" to "I have a database of multi-modal data from 800 subjects 
that is more realistic than test data" is got to worth the trouble of 
converting the CSVs.

Looking forward to hearing from you
Athanasios Anastasiou

Reply via email to