Re: TEI XML import / export

2013-05-29 Thread Tibor Simko
On Wed, 29 May 2013, Vít Tuček wrote:
> Thank you for the additional info. We use various sources most of
> which use TEI but not all of them. Workflow which I implementing right
> now is to convert these sources to TEI and then use a lossy conversion
> TEI -> MARC with FFT tag pointing to that TEI.

That seems exactly what I had in mind.

> I don't know what you mean by "hidden files" Could you elaborate?

The `hidden file' means that the attached TEI file would not be visible
on the UI to regular end users.  But it would be ingested and you could
use it for indexing (say) and for your output formatting procedures.
You can upload files as hidden by using special FFT $o HIDDEN value in
your input MARCXML file.  For more information, see BibUpload Admin
Guide.  

> I'm glad to hear that the MARC restriction is being lifted in the new
> version. Is there some sort of rough estimate as to when to expect
> this version to be released as production ready?

Right now we are having support for UNIMARC master files, but it is not
committed yet.  The support for EAD archival formats is on the way.
This is all thanks to the M9 project.

The facility will be committed to the bleeding-edge master branch.  The
commit of UNIMARC should happen literally within weeks.  But it will
still take a few months before the facility is production-ready in other
Invenio modules, i.e. from ingestion (what we are working on right to)
through indexing (about to start working on this) to display (this part
is easy).

For some more information, see
.

Best regards
--
Tibor Simko


Re: TEI XML import / export

2013-05-29 Thread Vít Tuček
Thank you for the additional info. We use various sources most of which use
TEI but not all of them. Workflow which I implementing right now is to
convert these sources to TEI and then use a lossy conversion TEI -> MARC
with FFT tag pointing to that TEI. I don't know what you mean by "hidden
files" Could you elaborate?

I'm glad to hear that the MARC restriction is being lifted in the new
version. Is there some sort of rough estimate as to when to expect this
version to be released as production ready?

Best regards,
Vit Tucek


On 29 May 2013 12:03, Tibor Simko  wrote:

> On Fri, 22 Mar 2013, Vít Tuček wrote:
> > The TEI XML contains much more than bibliographic data and we would
> > like to be able to store that in Invenio and apply some XSL transforms
> > to it during export. Think of handling a PDF with metadata extraction
> > during import and pdf2html during eport.
>
> Here is some additional information to the one already provided by Lars.
>
> If TEI XML is your master format, then you could store it alongside
> generated MARC record in Invenio, so that if your TEI->MARC is not
> lossless but lossy, you could still serve the full original information
> from the original TEI upon export request.
>
> Depending on your Invenio version, the TEI file could be simply stored
> as a hidden file attached to the given MARC record.  Then your export
> procedure would read and serve it.  I'm suggesting this technique
> because up to know, the master format for all records in Invenio was
> MARC.
>
> In the bleeding edge Invenio master branch, we are releasing this
> constraint and introducing a notion of any master format that may or may
> not be MARC.  This may be a cleaner solution to address your issue,
> especially if you expect to stay with TEI master formats in the future.
>
> Best regards
> --
> Tibor Simko
>


Re: TEI XML import / export

2013-05-29 Thread Tibor Simko
On Fri, 22 Mar 2013, Vít Tuček wrote:
> The TEI XML contains much more than bibliographic data and we would
> like to be able to store that in Invenio and apply some XSL transforms
> to it during export. Think of handling a PDF with metadata extraction
> during import and pdf2html during eport.

Here is some additional information to the one already provided by Lars.

If TEI XML is your master format, then you could store it alongside
generated MARC record in Invenio, so that if your TEI->MARC is not
lossless but lossy, you could still serve the full original information
from the original TEI upon export request.

Depending on your Invenio version, the TEI file could be simply stored
as a hidden file attached to the given MARC record.  Then your export
procedure would read and serve it.  I'm suggesting this technique
because up to know, the master format for all records in Invenio was
MARC.

In the bleeding edge Invenio master branch, we are releasing this
constraint and introducing a notion of any master format that may or may
not be MARC.  This may be a cleaner solution to address your issue,
especially if you expect to stay with TEI master formats in the future.

Best regards
--
Tibor Simko


Re: TEI XML import / export

2013-03-25 Thread Lars Holm Nielsen

On 22/03/13 13:22, Vít Tuček wrote:

Thanks a lot!

On 22 March 2013 11:56, Lars Holm Nielsen  wrote:

Dear Vit,


On 22/03/13 08:24, Vít Tuček wrote:

Hello everyone,
  my employer decided to try to move his digital library to invenio and
I was tasked to facilitate that process. I kindly ask for your help in
assessing the work required to do so.

Our data are stored in TEI P5 XML and we already have XSL transforms
to extract MARC XML. We were thinking of the following workflow. (Of
course any ideas how to proceed in way better suited for Inveio are
appreciated.)

IMPORT: A daemon checks an assigned directory and if there is a zip
file it unzips it and runs the XSL transform on the resulting TEI XML
to produce the MARC XML. The MARC XML is then checked for sanity,
imported into the library and associated with the original TEI XML.

If you already have a MARC XML transformation, then your are nearly done :-)
There are several ways to get content into Invenio, and I think which way
you use depends mostly on how you want to do the migration. The central part
of Invenio responsible for uploading metadata is BibUpload which takes a
MARC XML file by default. Additionally we have BatchUploader which is
basically a wrapper around BibUpload, which will monitor a directory and
import MARC XML files. More details are available here
http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide

I'm glad to hear that.


EXPORT: We would like to be able to export the TEI XML (or it's XSL
transforms) as collections from the web interface.


For this you would use BibFormat. You would add a new output format (
http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#addOutputFormat):
Give it a Code and content-type (note, the code is important, there's
special handling depending on the first letter, especially for x and h).

Then add a XSL format template for the output format that transforms MARC
XML to TEI XML:
http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#xslFormatTemplate

Here's an example of Dublin Core transformation:
Output format:
http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/output_formats/XD.bfo?h=maint-1.1
Format template:
http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/format_templates/OAI_DC.xsl?h=maint-1.1
The files live in etc/bibformat/format_templates and
etc/bibformat/output_formats

Once you have the transformation, each record can be exported individually,
or collective from the search page, as well as background job.

I am not sure we are on the same page here. Let me try to explain myself better.

  The TEI XML contains much more than bibliographic data and we would
like to be able to store that in Invenio and apply some XSL transforms
to it during export. Think of handling a PDF with metadata extraction
during import and pdf2html during eport.
I think we might be misunderstanding each other with what we mean with 
import/export.


Import: How to get you metadata and files into Invenio.
Export: Once the metadata and files are already Invenio, how can you get 
it out again in another format.


For the import, it's all done through BibUpload and a MARCXML file. The 
MARCXML file can point to files that Invenio also needs to store (e.g. 
PDF documents, video images etc - see 
http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide#3.6). This 
way you get metadata and files into Invenio.


There's no restrictions on which files you can associate. Once the 
metadata and files are stored in Invenio, invenio has some other tools 
to extract references from PDFs, create thumbnails, classify documents 
etc. All of them however, just read information which is in inside 
Invenio (files or metadata), and creates a new MARCXML file (with 
possible links to files - see FFT tag in link above) and send it to 
BibUpload.


As far as I understand, you are mainly interested in getting data into 
Invenio, but not out again in different format (ie export).


Does this answer your questions?

Cheers,
Lars




Best regards,
Lars


I installed Invenio locally and took a quick look around. I'm a little
bit lost in the documentation right now, but it seems that the import
is mostly a matter of proper configuration. As for export, so far it
seems to me that some scripting is needed.

Best regards,
Vit



--
Lars Holm Nielsen
Software Engineer

CERN, IT Department, Digital Library Technology Section
Office 513/1-014
Tel: +41 22 76 79182
Cel: +41 76 672 8927



--
Lars Holm Nielsen
Software Engineer

CERN, IT Department, Digital Library Technology Section
Office 513/1-014
Tel: +41 22 76 79182
Cel: +41 76 672 8927



smime.p7s
Description: S/MIME Cryptographic Signature


Re: TEI XML import / export

2013-03-22 Thread Vít Tuček
Thanks a lot!

On 22 March 2013 11:56, Lars Holm Nielsen  wrote:
> Dear Vit,
>
>
> On 22/03/13 08:24, Vít Tuček wrote:
>
> Hello everyone,
>  my employer decided to try to move his digital library to invenio and
> I was tasked to facilitate that process. I kindly ask for your help in
> assessing the work required to do so.
>
> Our data are stored in TEI P5 XML and we already have XSL transforms
> to extract MARC XML. We were thinking of the following workflow. (Of
> course any ideas how to proceed in way better suited for Inveio are
> appreciated.)
>
> IMPORT: A daemon checks an assigned directory and if there is a zip
> file it unzips it and runs the XSL transform on the resulting TEI XML
> to produce the MARC XML. The MARC XML is then checked for sanity,
> imported into the library and associated with the original TEI XML.
>
> If you already have a MARC XML transformation, then your are nearly done :-)
> There are several ways to get content into Invenio, and I think which way
> you use depends mostly on how you want to do the migration. The central part
> of Invenio responsible for uploading metadata is BibUpload which takes a
> MARC XML file by default. Additionally we have BatchUploader which is
> basically a wrapper around BibUpload, which will monitor a directory and
> import MARC XML files. More details are available here
> http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide

I'm glad to hear that.

>
> EXPORT: We would like to be able to export the TEI XML (or it's XSL
> transforms) as collections from the web interface.
>
>
> For this you would use BibFormat. You would add a new output format (
> http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#addOutputFormat):
> Give it a Code and content-type (note, the code is important, there's
> special handling depending on the first letter, especially for x and h).
>
> Then add a XSL format template for the output format that transforms MARC
> XML to TEI XML:
> http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#xslFormatTemplate
>
> Here's an example of Dublin Core transformation:
> Output format:
> http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/output_formats/XD.bfo?h=maint-1.1
> Format template:
> http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/format_templates/OAI_DC.xsl?h=maint-1.1
> The files live in etc/bibformat/format_templates and
> etc/bibformat/output_formats
>
> Once you have the transformation, each record can be exported individually,
> or collective from the search page, as well as background job.

I am not sure we are on the same page here. Let me try to explain myself better.

 The TEI XML contains much more than bibliographic data and we would
like to be able to store that in Invenio and apply some XSL transforms
to it during export. Think of handling a PDF with metadata extraction
during import and pdf2html during eport.

>
> Best regards,
> Lars
>
>
> I installed Invenio locally and took a quick look around. I'm a little
> bit lost in the documentation right now, but it seems that the import
> is mostly a matter of proper configuration. As for export, so far it
> seems to me that some scripting is needed.
>
> Best regards,
>Vit
>
>
>
> --
> Lars Holm Nielsen
> Software Engineer
>
> CERN, IT Department, Digital Library Technology Section
> Office 513/1-014
> Tel: +41 22 76 79182
> Cel: +41 76 672 8927


Re: TEI XML import / export

2013-03-22 Thread Lars Holm Nielsen

Dear Vit,

On 22/03/13 08:24, Vít Tuček wrote:

Hello everyone,
  my employer decided to try to move his digital library to invenio and
I was tasked to facilitate that process. I kindly ask for your help in
assessing the work required to do so.

Our data are stored in TEI P5 XML and we already have XSL transforms
to extract MARC XML. We were thinking of the following workflow. (Of
course any ideas how to proceed in way better suited for Inveio are
appreciated.)

IMPORT: A daemon checks an assigned directory and if there is a zip
file it unzips it and runs the XSL transform on the resulting TEI XML
to produce the MARC XML. The MARC XML is then checked for sanity,
imported into the library and associated with the original TEI XML.
If you already have a MARC XML transformation, then your are nearly done 
:-) There are several ways to get content into Invenio, and I think 
which way you use depends mostly on how you want to do the migration. 
The central part of Invenio responsible for uploading metadata is 
BibUpload which takes a MARC XML file by default. Additionally we have 
BatchUploader which is basically a wrapper around BibUpload, which will 
monitor a directory and import MARC XML files. More details are 
available here http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide



EXPORT: We would like to be able to export the TEI XML (or it's XSL
transforms) as collections from the web interface.


For this you would use BibFormat. You would add a new output format ( 
http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#addOutputFormat):
Give it a Code and content-type (note, the code is important, there's 
special handling depending on the first letter, especially for x and h).


Then add a XSL format template for the output format that transforms 
MARC XML to TEI XML: 
http://invenio-demo.cern.ch/help/admin/bibformat-admin-guide#xslFormatTemplate


Here's an example of Dublin Core transformation:
Output format: 
http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/output_formats/XD.bfo?h=maint-1.1
Format template: 
http://invenio-software.org/repo/invenio/tree/modules/bibformat/etc/format_templates/OAI_DC.xsl?h=maint-1.1
The files live in etc/bibformat/format_templates and 
etc/bibformat/output_formats


Once you have the transformation, each record can be exported 
individually, or collective from the search page, as well as background job.


Best regards,
Lars


I installed Invenio locally and took a quick look around. I'm a little
bit lost in the documentation right now, but it seems that the import
is mostly a matter of proper configuration. As for export, so far it
seems to me that some scripting is needed.

Best regards,
Vit



--
Lars Holm Nielsen
Software Engineer

CERN, IT Department, Digital Library Technology Section
Office 513/1-014
Tel: +41 22 76 79182
Cel: +41 76 672 8927



smime.p7s
Description: S/MIME Cryptographic Signature


TEI XML import / export

2013-03-22 Thread Vít Tuček
Hello everyone,
 my employer decided to try to move his digital library to invenio and
I was tasked to facilitate that process. I kindly ask for your help in
assessing the work required to do so.

Our data are stored in TEI P5 XML and we already have XSL transforms
to extract MARC XML. We were thinking of the following workflow. (Of
course any ideas how to proceed in way better suited for Inveio are
appreciated.)

IMPORT: A daemon checks an assigned directory and if there is a zip
file it unzips it and runs the XSL transform on the resulting TEI XML
to produce the MARC XML. The MARC XML is then checked for sanity,
imported into the library and associated with the original TEI XML.

EXPORT: We would like to be able to export the TEI XML (or it's XSL
transforms) as collections from the web interface.

I installed Invenio locally and took a quick look around. I'm a little
bit lost in the documentation right now, but it seems that the import
is mostly a matter of proper configuration. As for export, so far it
seems to me that some scripting is needed.

Best regards,
   Vit