Hi Rafael and everyone,
As you say, Anzo (in particular Anzo for Excel) is designed for
enterprises to curate large numbers of spreadsheets, map them to
ontologies & to existing RDF instance data, and maintain them as changes
are made to the spreadsheets or to the data in the spreadsheets. It can
be used for CSV-style "tabular" spreadsheets and also for arbitrarily
"human-oriented" spreadsheets. It can be used both in interactive modes
(where people are opening up and interacting with spreadsheets) and also
in automated batch modes.
Anzo stores the RDF data from spreadsheets in an RDF database. Anzo
includes both authenticated and unauthenticated SPARQL endpoints for
this data; Anzo can also directly publish the data as Linked Data.
Finally, Anzo gives you several ways to export RDF data from the database.
Anzo is available in several editions:
* Anzo Express Starter -- includes Anzo for Excel as above for limited
#s of users; freely available
* Anzo Express -- includes Anzo for Excel and Anzo on the Web, a
user-friendly browser-based dashboard tool for visualization,
searching, and analyzing RDF data
* Anzo Enterprise -- includes the above in addition to tools to
connect to data in relational databases, to integrate unstructured
data from documents, web pages, etc., to run rules and reasoning and
work flow processes, various server-side and client-side APIs, etc.
/We also make Anzo available for free for academic use. /
If you're interested in learning more about Anzo or trying it out for
some of these CDC spreadsheets or the HL7/CIMI work or for any other
purpose, please drop me a note and I'll be happy to help you out.
Lee
On 1/20/2013 7:38 PM, Rafael Richards wrote:
I am also interested in integrating healthcare data published by the
CDC. Unfortunately, it comes as nearly 200 separate spreadsheets:
http://www.cdc.gov/nchs/hus/contents2011.htm#chartbookfigures
The only thing I am aware of that is designed to keep large numbers
(potentially hundreds) of spreadsheets continuously integrated and in
sync across an enterprise, each independently curated, is Anzo by
Cambridge Semantics. Most of the other tools I am aware of do not do
real-time updating of the RDF model from the CSV model, and are
one-off conversions, so if you have more than one spreadsheet to
update, it will be time consuming.
For one-off conversion Google Refine is quite easy to get started. It
has a great deal of data cleaning facilities for noisy or illogical
data. With its RDF extension you have *automated* data reconciliation
with outside linked data sources of your choice as DBpedia. This is
a feature I have not seen with any other conversion tool. It does
not do visualization, but there are plenty of desktop applications
that do this very well.
Any other suggestions for any other 'pipeline' tools to keep CSV and
RDF in sync which are (1) currently maintained and (2) have sufficient
documentation and examples of importing and converting CSV to RDF?
Rafael
On Jan 20, 2013, at 12:57 PM, peter.hend...@kp.org
<mailto:peter.hend...@kp.org> wrote:
What are some recommended simple "probably stand alone or work on one
machine" utilities for converting spreadsheet data to RDF. And then
once that file is on disk, to visualize it as a graph?
This would be for HL7 and CIMI where we'd be entering "clinical
models" directly into a spreadsheet, and then want to compare models
made by different people.
<Mail Attachment.jpeg>
*NOTICE TO RECIPIENT:* If you are not the intended recipient of this
e-mail, you are prohibited from sharing, copying, or otherwise using
or disclosing its contents. If you have received this e-mail in
error, please notify the sender immediately by reply e-mail and
permanently delete this e-mail and any attachments without reading,
forwarding or saving them. Thank you.