On 22/03/13 13:22, Vít Tuček wrote:
I think we might be misunderstanding each other with what we mean with import/export.Thanks a lot!On 22 March 2013 11:56, Lars Holm Nielsen <lars.holm.niel...@cern.ch> wrote:Dear Vit, On 22/03/13 08:24, Vít Tuček wrote: Hello everyone, my employer decided to try to move his digital library to invenio and I was tasked to facilitate that process. I kindly ask for your help in assessing the work required to do so. Our data are stored in TEI P5 XML and we already have XSL transforms to extract MARC XML. We were thinking of the following workflow. (Of course any ideas how to proceed in way better suited for Inveio are appreciated.) IMPORT: A daemon checks an assigned directory and if there is a zip file it unzips it and runs the XSL transform on the resulting TEI XML to produce the MARC XML. The MARC XML is then checked for sanity, imported into the library and associated with the original TEI XML. If you already have a MARC XML transformation, then your are nearly done :-) There are several ways to get content into Invenio, and I think which way you use depends mostly on how you want to do the migration. The central part of Invenio responsible for uploading metadata is BibUpload which takes a MARC XML file by default. Additionally we have BatchUploader which is basically a wrapper around BibUpload, which will monitor a directory and import MARC XML files. More details are available here http://invenio-demo.cern.ch/help/admin/bibupload-admin-guideI'm glad to hear that.EXPORT: We would like to be able to export the TEI XML (or it's XSL transforms) as collections from the web interface. For this you would use BibFormat. You would add a new output format ( http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#addOutputFormat): Give it a Code and content-type (note, the code is important, there's special handling depending on the first letter, especially for x and h). Then add a XSL format template for the output format that transforms MARC XML to TEI XML: http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#xslFormatTemplate Here's an example of Dublin Core transformation: Output format: http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/output_formats/XD.bfo?h=maint-1.1 Format template: http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/format_templates/OAI_DC.xsl?h=maint-1.1 The files live in etc/bibformat/format_templates and etc/bibformat/output_formats Once you have the transformation, each record can be exported individually, or collective from the search page, as well as background job.I am not sure we are on the same page here. Let me try to explain myself better. The TEI XML contains much more than bibliographic data and we would like to be able to store that in Invenio and apply some XSL transforms to it during export. Think of handling a PDF with metadata extraction during import and pdf2html during eport.
Import: How to get you metadata and files into Invenio.Export: Once the metadata and files are already Invenio, how can you get it out again in another format.
For the import, it's all done through BibUpload and a MARCXML file. The MARCXML file can point to files that Invenio also needs to store (e.g. PDF documents, video images etc - see http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide#3.6). This way you get metadata and files into Invenio.
There's no restrictions on which files you can associate. Once the metadata and files are stored in Invenio, invenio has some other tools to extract references from PDFs, create thumbnails, classify documents etc. All of them however, just read information which is in inside Invenio (files or metadata), and creates a new MARCXML file (with possible links to files - see FFT tag in link above) and send it to BibUpload.
As far as I understand, you are mainly interested in getting data into Invenio, but not out again in different format (ie export).
Does this answer your questions? Cheers, Lars
Best regards, Lars I installed Invenio locally and took a quick look around. I'm a little bit lost in the documentation right now, but it seems that the import is mostly a matter of proper configuration. As for export, so far it seems to me that some scripting is needed. Best regards, Vit -- Lars Holm Nielsen Software Engineer CERN, IT Department, Digital Library Technology Section Office 513/1-014 Tel: +41 22 76 79182 Cel: +41 76 672 8927
-- Lars Holm Nielsen Software Engineer CERN, IT Department, Digital Library Technology Section Office 513/1-014 Tel: +41 22 76 79182 Cel: +41 76 672 8927
smime.p7s
Description: S/MIME Cryptographic Signature