On 22/03/13 13:22, Vít Tuček wrote:
Thanks a lot!

On 22 March 2013 11:56, Lars Holm Nielsen <lars.holm.niel...@cern.ch> wrote:
Dear Vit,


On 22/03/13 08:24, Vít Tuček wrote:

Hello everyone,
  my employer decided to try to move his digital library to invenio and
I was tasked to facilitate that process. I kindly ask for your help in
assessing the work required to do so.

Our data are stored in TEI P5 XML and we already have XSL transforms
to extract MARC XML. We were thinking of the following workflow. (Of
course any ideas how to proceed in way better suited for Inveio are
appreciated.)

IMPORT: A daemon checks an assigned directory and if there is a zip
file it unzips it and runs the XSL transform on the resulting TEI XML
to produce the MARC XML. The MARC XML is then checked for sanity,
imported into the library and associated with the original TEI XML.

If you already have a MARC XML transformation, then your are nearly done :-)
There are several ways to get content into Invenio, and I think which way
you use depends mostly on how you want to do the migration. The central part
of Invenio responsible for uploading metadata is BibUpload which takes a
MARC XML file by default. Additionally we have BatchUploader which is
basically a wrapper around BibUpload, which will monitor a directory and
import MARC XML files. More details are available here
http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide
I'm glad to hear that.

EXPORT: We would like to be able to export the TEI XML (or it's XSL
transforms) as collections from the web interface.


For this you would use BibFormat. You would add a new output format (
http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#addOutputFormat):
Give it a Code and content-type (note, the code is important, there's
special handling depending on the first letter, especially for x and h).

Then add a XSL format template for the output format that transforms MARC
XML to TEI XML:
http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#xslFormatTemplate

Here's an example of Dublin Core transformation:
Output format:
http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/output_formats/XD.bfo?h=maint-1.1
Format template:
http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/format_templates/OAI_DC.xsl?h=maint-1.1
The files live in etc/bibformat/format_templates and
etc/bibformat/output_formats

Once you have the transformation, each record can be exported individually,
or collective from the search page, as well as background job.
I am not sure we are on the same page here. Let me try to explain myself better.

  The TEI XML contains much more than bibliographic data and we would
like to be able to store that in Invenio and apply some XSL transforms
to it during export. Think of handling a PDF with metadata extraction
during import and pdf2html during eport.
I think we might be misunderstanding each other with what we mean with import/export.

Import: How to get you metadata and files into Invenio.
Export: Once the metadata and files are already Invenio, how can you get it out again in another format.

For the import, it's all done through BibUpload and a MARCXML file. The MARCXML file can point to files that Invenio also needs to store (e.g. PDF documents, video images etc - see http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide#3.6). This way you get metadata and files into Invenio.

There's no restrictions on which files you can associate. Once the metadata and files are stored in Invenio, invenio has some other tools to extract references from PDFs, create thumbnails, classify documents etc. All of them however, just read information which is in inside Invenio (files or metadata), and creates a new MARCXML file (with possible links to files - see FFT tag in link above) and send it to BibUpload.

As far as I understand, you are mainly interested in getting data into Invenio, but not out again in different format (ie export).

Does this answer your questions?

Cheers,
Lars


Best regards,
Lars


I installed Invenio locally and took a quick look around. I'm a little
bit lost in the documentation right now, but it seems that the import
is mostly a matter of proper configuration. As for export, so far it
seems to me that some scripting is needed.

Best regards,
                            Vit



--
Lars Holm Nielsen
Software Engineer

CERN, IT Department, Digital Library Technology Section
Office 513/1-014
Tel: +41 22 76 79182
Cel: +41 76 672 8927


--
Lars Holm Nielsen
Software Engineer

CERN, IT Department, Digital Library Technology Section
Office 513/1-014
Tel: +41 22 76 79182
Cel: +41 76 672 8927

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to