Re: Mining, was: A Proposal

Brian Bray Tue, 28 Nov 2000 02:02:03 -0800
Horst Herb a �crit :
> 
> [SNIP]
> 
> Sorry, this might be a dumb question, but how would you actually data mine
> such a chaotically growing structure salad at a public health level? This is
> ultimately what we want to achieve, isn't it? Gather information on a broad
> base  to facilitate evidence based medicine, isn't it? I fail to see how
> that can be achieved without centrally defined data structures and term
> glossaries.
> 
> Horst

This is the concept that systems like OIO, Circare, and DocScope (and
now FreePM), as well as to some extent GEHR are based on. The basic code
reads meta descriptions of the information along with
display/input/processing rules specific to the information types
defined. New categories of information can be added dynamically.

>From the point of view of collecting information from disparate sources
(eg: Circare) this is really ideal because the information can be kept
close to it's original form. From the point of view of research into
health care informatics (eg: OIO), it permits easy changes to the
information collected and processed.

You are correct to be concerned about the difficulties mining this
information for analysis. This was a major issue in Circare, for
example. 

However, healthcare is local. The flexibility in these systems enables
them to adapt to local requirements, but also it's important to realize
that these systems all have a relatively fixed collection of information
types when installed at a site for a specific purpose.

The mining of the information can use the same tools as the systems
themselves (eg: CorbaScript, Python, XSLT, and the tools used to
transfer and validate XML data). A specific study at a specific site
will need to customize their mining tools to the available data. I.E.
the mining tools will themselves reference the same meta-information. In
the end, these tools would format the information for conventional
databases and/or statistical analysis packages. 

At least this was the theory for Circare! It really remains to be seen
what the advantages and disadvantages are in practice and what tools are
most useful for this use case.

When you talk about scaling these tools to a national level, even a
national system has a finite number of data types. Furthermore, a
particular research analysis is not going to need to use all the data
and all data types. Also, its quite clear that the
schemas/archetypes/DTDs used in a particular context are not going to be
developed completely independently. Similarities between type
definitions can be used to reduce the complexity of the mining task (for
example, many data types may use the same coding scheme internally that
can be mined with a single XSLT script)

Going back to your original comments, Another way of looking at this is
that these systems *do* have, in practice, a "centrally defined data
structure and term glossary.", it's just not hard coded into the
application code and it's defined locally.

-Brian
Re: Mining, was: A Proposal

Reply via email to