Re: Copyright notice per docfile

2013-05-29 Thread Theodoros Theodoropoulos

Hey, I could give you my kidney to have you revive BibAuthority!!

I've written many times to some developers and to the list regarding 
BibAuthority, but I never got a straight answer regarding to it's status.
Nevertheless, I based all my author tags based on the very promising 
'recipe' found in https://twiki.cern.ch/twiki/bin/view/CDS/BibAuthority 
hoping that some day Invenio would get some proper authority support.


I don't know how useful it would be for licensing, but I can guarantee 
that it will make the life of MANY Invenio users way better. I remember 
such a strong request in the last IUGM. Please tell me that you're 
thinking seriously to revive BibAuthority! (even if it's a lie...)


Best regards,
Theodoros


Re: Copyright notice per docfile

2013-05-29 Thread Tibor Simko
On Wed, 29 May 2013, Wagner, Alexander wrote:
> Point is, it can be more complex then a simple string. An auth record
> can easily hold a pdf or the like with full license description.

What I meant is that a tiny license string could be a pointer to a
knowledge base of licences containing more information.  So with even a
full description.  Kind of like authority record without authority.

> As we already discussed, when I am @cern I'll give some idea what we
> do with auth records. :)

It will be a great occasion to revive BibAuthority...

Best regards
--
Tibor Simko


Re: Copyright notice per docfile

2013-05-29 Thread Wagner, Alexander
Hi!

BibAuthority.

Point is, it can be more complex then a simple string. An auth record can 
easily hold a pdf or the like with full license description.

As we already discussed, when I am @cern I'll give some idea what we do with 
auth records. :)

--
Kind regards,


Alexander Wagner

Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi

- Reply message -
From: "Tibor Simko" 
Date: Wed, May 29, 2013 16:12
Subject: Copyright notice per docfile
To: "Wagner, Alexander" 
Cc: "Theodoros Theodoropoulos" , "project-invenio-devel" 



On Wed, 29 May 2013, Alexander Wagner wrote:
> I would say that a cleaner way would be to have some link to a
> license.  Some sort of "license as authority" where I can add a link
> to a persistent ID within the system that refers to this very
> license. Makes it easier to maintain, as you need to keep the relevant
> licences only in one place and you don't need individual texts per
> record.

This points either to reviving BibAuthority, or else the license string
could be a pointer to a knowledge base of licences containing more
information.

Best regards
--
Tibor Simko




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




Re: Copyright notice per docfile

2013-05-29 Thread Tibor Simko
On Wed, 29 May 2013, Alexander Wagner wrote:
> I would say that a cleaner way would be to have some link to a
> license.  Some sort of "license as authority" where I can add a link
> to a persistent ID within the system that refers to this very
> license. Makes it easier to maintain, as you need to keep the relevant
> licences only in one place and you don't need individual texts per
> record.

This points either to reviving BibAuthority, or else the license string
could be a pointer to a knowledge base of licences containing more
information.

Best regards
--
Tibor Simko


Re: Increase in maximum allowed size in Document Type ID, Element Name

2013-05-29 Thread Tibor Simko
On Wed, 29 May 2013, Theodoros Theodoropoulos wrote:
> Well, I'm several months past this point of having to choose the Element
> names :)

Oftentimes I'm processing email in batch mode and sometimes the delays
tend to get terrible :(

> I ended up using ugly abbreviations, but I'm happy I didn't create any
> extra potential problems for a future migration from WebSubmit to
> WebDeposit.

Good then.  So we can spare this ticket.

> Having said that, I will be happily surprised if/when such an
> 'automatic' migration procedure to WebDeposit will be made available
> (bearing in mind all the custom elements, checks, javascript-insertion
> dummy elements, etc WebSubmit users might have introduced).

I was thinking of a simple WebSubmit-to-WebSubmit upgrade recipe.  An
automated WebSubmit-to-WebDeposit upgrade recipe will probably never
happen.  Invenio v2.0 is supporting both depositions modules via
separated `/submit' and `/deposit' URLs.  So that every Invenio site
will be able to decide e.g. to keep most submissions in the old
WebSubmit system while start using the WebDeposit workflows for new
submissions at their own migration pace.

Best regards
--
Tibor Simko


Re: TEI XML import / export

2013-05-29 Thread Tibor Simko
On Wed, 29 May 2013, Vít Tuček wrote:
> Thank you for the additional info. We use various sources most of
> which use TEI but not all of them. Workflow which I implementing right
> now is to convert these sources to TEI and then use a lossy conversion
> TEI -> MARC with FFT tag pointing to that TEI.

That seems exactly what I had in mind.

> I don't know what you mean by "hidden files" Could you elaborate?

The `hidden file' means that the attached TEI file would not be visible
on the UI to regular end users.  But it would be ingested and you could
use it for indexing (say) and for your output formatting procedures.
You can upload files as hidden by using special FFT $o HIDDEN value in
your input MARCXML file.  For more information, see BibUpload Admin
Guide.  

> I'm glad to hear that the MARC restriction is being lifted in the new
> version. Is there some sort of rough estimate as to when to expect
> this version to be released as production ready?

Right now we are having support for UNIMARC master files, but it is not
committed yet.  The support for EAD archival formats is on the way.
This is all thanks to the M9 project.

The facility will be committed to the bleeding-edge master branch.  The
commit of UNIMARC should happen literally within weeks.  But it will
still take a few months before the facility is production-ready in other
Invenio modules, i.e. from ingestion (what we are working on right to)
through indexing (about to start working on this) to display (this part
is easy).

For some more information, see
.

Best regards
--
Tibor Simko


Re: CFG_WEBSEARCH_MAX_RECORDS_IN_GROUPS

2013-05-29 Thread Tibor Simko
On Wed, 28 Nov 2012, Alexander Wagner wrote:
> CFG_WEBSEARCH_MAX_RECORDS_IN_GROUPS = 200
>
> This should effectively limit the number of hits returned to 200,
> except if I'm superuser. First of all: this works.
>
> However, if I check search engine code, there is also a parameter for
> rg to return _all_ records in a collection. This disregards any max
> settings, and I think it's existence is also sensible.
>
> But if I actually set it invenio does NOT check if I have any special
> rights, it seems. At least on our test system guest just happily
> dumped out 15.000 records in hb.

Via Web API or via CLI API?  Because unlimited `rg' queries are checked
more strongly in the web context; the thinking being that people having
login access to the box can do anything anyway, so they can be
authorised to ask for big `rg' values.

Best regards
--
Tibor Simko


Re: Copyright notice per docfile

2013-05-29 Thread Alexander Wagner

On 29.05.2013 10:23, Theodoros Theodoropoulos wrote:

Hi!


Depending on the Invenio version, the BibDoc objects have `more info'
property store that basically serve as a storage for any key-value
combination.  We use it e.g. to store technical metadata about pictures:
width, height, position on page.  It could be used to store per-docfile
license/copyright information.

Alternatively, in the past, there was also an approach to enrich BibDoc
independently with licensing information and then represent it in MARC
accordingly.  We did not commit this part, but we may perhaps revive it
with Jerome.

Licensing Information per docfile is a useful thing to have in Invenio.


Agree.


Having said that, using a seperate 'field' in bibdocfile -like comment
or description- to keep it (although one possible way to do it) will
probably require a lot of changes in several files.


I would say that a cleaner way would be to have some link to a license.
Some sort of "license as authority" where I can add a link to a
persistent ID within the system that refers to this very license. Makes
it easier to maintain, as you need to keep the relevant licences only in
one place and you don't need individual texts per record.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




Re: Increase in maximum allowed size in Document Type ID, Element Name

2013-05-29 Thread Theodoros Theodoropoulos

On 29/5/2013 12:53 μμ, Tibor Simko wrote:

That said, we are however fully concentrating on developing the new
WebDeposit module which brings many new goodies and which will
eventually replace WebSubmit.  So, if at all possible, it would be
simpler if you worked with the current WebSubmit field length
limitations until WebDeposit comes?
Well, I'm several months past this point of having to choose the Element 
names :)
I ended up using ugly abbreviations, but I'm happy I didn't create any 
extra potential problems for a future migration from WebSubmit to 
WebDeposit.


Having said that, I will be happily surprised if/when such an 
'automatic' migration procedure to WebDeposit will be made available 
(bearing in mind all the custom elements, checks, javascript-insertion 
dummy elements, etc WebSubmit users might have introduced).


Cheers,
Theodoros


Re: TEI XML import / export

2013-05-29 Thread Vít Tuček
Thank you for the additional info. We use various sources most of which use
TEI but not all of them. Workflow which I implementing right now is to
convert these sources to TEI and then use a lossy conversion TEI -> MARC
with FFT tag pointing to that TEI. I don't know what you mean by "hidden
files" Could you elaborate?

I'm glad to hear that the MARC restriction is being lifted in the new
version. Is there some sort of rough estimate as to when to expect this
version to be released as production ready?

Best regards,
Vit Tucek


On 29 May 2013 12:03, Tibor Simko  wrote:

> On Fri, 22 Mar 2013, Vít Tuček wrote:
> > The TEI XML contains much more than bibliographic data and we would
> > like to be able to store that in Invenio and apply some XSL transforms
> > to it during export. Think of handling a PDF with metadata extraction
> > during import and pdf2html during eport.
>
> Here is some additional information to the one already provided by Lars.
>
> If TEI XML is your master format, then you could store it alongside
> generated MARC record in Invenio, so that if your TEI->MARC is not
> lossless but lossy, you could still serve the full original information
> from the original TEI upon export request.
>
> Depending on your Invenio version, the TEI file could be simply stored
> as a hidden file attached to the given MARC record.  Then your export
> procedure would read and serve it.  I'm suggesting this technique
> because up to know, the master format for all records in Invenio was
> MARC.
>
> In the bleeding edge Invenio master branch, we are releasing this
> constraint and introducing a notion of any master format that may or may
> not be MARC.  This may be a cleaner solution to address your issue,
> especially if you expect to stay with TEI master formats in the future.
>
> Best regards
> --
> Tibor Simko
>


Re: TEI XML import / export

2013-05-29 Thread Tibor Simko
On Fri, 22 Mar 2013, Vít Tuček wrote:
> The TEI XML contains much more than bibliographic data and we would
> like to be able to store that in Invenio and apply some XSL transforms
> to it during export. Think of handling a PDF with metadata extraction
> during import and pdf2html during eport.

Here is some additional information to the one already provided by Lars.

If TEI XML is your master format, then you could store it alongside
generated MARC record in Invenio, so that if your TEI->MARC is not
lossless but lossy, you could still serve the full original information
from the original TEI upon export request.

Depending on your Invenio version, the TEI file could be simply stored
as a hidden file attached to the given MARC record.  Then your export
procedure would read and serve it.  I'm suggesting this technique
because up to know, the master format for all records in Invenio was
MARC.

In the bleeding edge Invenio master branch, we are releasing this
constraint and introducing a notion of any master format that may or may
not be MARC.  This may be a cleaner solution to address your issue,
especially if you expect to stay with TEI master formats in the future.

Best regards
--
Tibor Simko


Re: Increase in maximum allowed size in Document Type ID, Element Name

2013-05-29 Thread Tibor Simko
On Wed, 27 Mar 2013, Theodoros Theodoropoulos wrote:
> Currently, maximum "Document Type ID" size is 10 chars and maximum
> "Element Name" size is 15 chars. [...] Unless there is a design reason
> (ie big and 'ugly' parameters in URLs or unnecessary waste in DB
> tables), I believe that these limits are somewhat ...limiting and they
> should probably be increased.

Yes, they can be increased, one just has to pay attention to all
occurrences for potential joins and stuff, e.g. sbmDOCTYPE.sdocname and
sbmIMPLEMENT.docname.  And an upgrader recipe would have to be written.

That said, we are however fully concentrating on developing the new
WebDeposit module which brings many new goodies and which will
eventually replace WebSubmit.  So, if at all possible, it would be
simpler if you worked with the current WebSubmit field length
limitations until WebDeposit comes?

Best regards
--
Tibor Simko


Re: Unexpected keyword argument errors in invenio.err for doctypeconfiguresubmissionfunctions and doctypeconfiguresubmissionpages

2013-05-29 Thread Tibor Simko
On Wed, 29 May 2013, Theodoros Theodoropoulos wrote:
> (btw I cannot select 'known issue' for a ticket... One must probably
> have higher credentials for that :) )

When submitting a ticket, it is `defect'.  I used the term `known issue'
from the point of view of the module summary page:



Best regards
--
Tibor Simko


Re: Question about document migration to new server

2013-05-29 Thread Tibor Simko
On Wed, 29 May 2013, Theodoros Theodoropoulos wrote:
> Having said all that, It would be a nice addition for the 'future' if an
> admin would be able to export the bibdocs along with their stats.

Yes.  Can you please create a ticket?

Best regards
--
Tibor Simko


Re: Unexpected keyword argument errors in invenio.err for doctypeconfiguresubmissionfunctions and doctypeconfiguresubmissionpages

2013-05-29 Thread Theodoros Theodoropoulos

On 28/5/2013 7:50 μμ, Tibor Simko wrote:

On Mon, 15 Apr 2013, Theodoros Theodoropoulos wrote:

I'm pretty sure it will be reproducible in a stock Invenio site
too. Can you verify it so that I submit a ticket?

Verified.  Please submit a new `known issue' for WebSubmit.  Thanks for
reporting!

Done. Ticketized under http://invenio-software.org/ticket/1521
(btw I cannot select 'known issue' for a ticket... One must probably 
have higher credentials for that :) )


Best regards,
Theodoros


Re: Question about document migration to new server

2013-05-29 Thread Theodoros Theodoropoulos

On 29/5/2013 10:34 πμ, Tibor Simko wrote:


That's generally very good way of thinking.  In this concrete use case,
the trouble is that the Invenio upload API does not allow to submit a
file with a given wanted docID.  (Unlike submitting a record with a
given wanted recID, which is possible.)  So the proper answer to
Theodoros's need would be to extend Invenio's FFT API support to allow
forcing of docIDs, for example.  However, this would take time.  Hence
the need for a low-level DB dump/load solution that we have been
discussing in this thread.

(BTW another `proper' solution could be simply not to reuse the old
docIDs, but load the files anew via FFT, keeping only recIDs the same.
The docIDs are never exposed to end users, so this might be acceptable.
However, some massaging would have to be done a posteriori anyway, in
order to port old logs and stats information.  Hence the first solution,
more advantageous in my eyes.)
True. I also mentioned this, more 'proper' alternative to move bibdocs, 
and I would be happy with it, if and only if the bibdoc STATS would be 
exported along with each docfile. Since this is currently not available, 
I would have to separately move the rnkDOWNLOADS and change the bibdoc 
ids from old to new. I decided that this in not a good idea (mainly 
because I would have to alter millions of lines, and also because I 
would be messing with statistics and it would be difficult for me to 
find out if something went wrong in the process). So I too believe that 
keeping the same bibdoc id's is the way to go in this particular case, 
and it will also make it easier for me to check things between old and 
new systems.


Having said all that, It would be a nice addition for the 'future' if an 
admin would be able to export the bibdocs along with their stats. Also, 
why not even be able to import them back to the system... I see possible 
issues with adding/replacing bibdocs (and what will happen to their 
statistics), but I'm just mentioning an issue that I personally needed 
once. In case you see a more generic Invenio community need for that, 
I'm sure you will find a way (or make the necessary changes in DB) to 
overcome that.


Best regards,
Theodoros


Re: Copyright notice per docfile

2013-05-29 Thread Theodoros Theodoropoulos

On 28/5/2013 8:02 μμ, Tibor Simko wrote:


Depending on the Invenio version, the BibDoc objects have `more info'
property store that basically serve as a storage for any key-value
combination.  We use it e.g. to store technical metadata about pictures:
width, height, position on page.  It could be used to store per-docfile
license/copyright information.

Alternatively, in the past, there was also an approach to enrich BibDoc
independently with licensing information and then represent it in MARC
accordingly.  We did not commit this part, but we may perhaps revive it
with Jerome.
Licensing Information per docfile is a useful thing to have in Invenio. 
Having said that, using a seperate 'field' in bibdocfile -like comment 
or description- to keep it (although one possible way to do it) will 
probably require a lot of changes in several files.



Did you opt for some solution in the meantime?  It would be nice to
settle on a technique and offer it out-of-the-box with Invenio demo
submissions.  I think the former technique would be preferable.
I had to come up with a solution really quick with the invenio code that 
was available two months ago.
So, i decided to add the license in bibdocfile's comment field in the 
form: "License:by-sa". For that, I'm using a heavily modified version of 
Upload_Files websubmit function.
I have also written/edited relevant format templates/output formats to 
'decode' this data and display the license images (currently only used 
for CC licenses and only if comment field begins with "license:") and 
link to the detailed license url.
I'm attaching 3 small screenshots in case you want to take a sneak 
preview of the output result. The filenames are _really_ dummy (i'm now 
ashamed of them) and the general idea was to include the most peculiar 
example with a lot of attachments and file versions/file formats, etc.
The only known drawback I'm having with my approach is that the the tab 
'Files' does NOT show any Licensing information. I know this is 
important, BUT in order to implement it, I would have to make a lot of 
changes in several core Invenio functions (in definitions, return values 
etc) and I wanted to avoid that as much as possible, hoping that a more 
proper/complete solution would come at some point from you :)


Best regards,
Theodoros
<><><>

Re: Question about document migration to new server

2013-05-29 Thread Tibor Simko
On Wed, 29 May 2013, Alexander Wagner wrote:
> Though my collegues in our project now, after some two years, have
> "some strong feelings" about my consistent nagging that they should
> /never ever even consider/ to do something beyond the highest level
> API (beware within the database, filesystem and the like) I think
> Theodoros post just shows nicely why.

That's generally very good way of thinking.  In this concrete use case,
the trouble is that the Invenio upload API does not allow to submit a
file with a given wanted docID.  (Unlike submitting a record with a
given wanted recID, which is possible.)  So the proper answer to
Theodoros's need would be to extend Invenio's FFT API support to allow
forcing of docIDs, for example.  However, this would take time.  Hence
the need for a low-level DB dump/load solution that we have been
discussing in this thread.

(BTW another `proper' solution could be simply not to reuse the old
docIDs, but load the files anew via FFT, keeping only recIDs the same.
The docIDs are never exposed to end users, so this might be acceptable.
However, some massaging would have to be done a posteriori anyway, in
order to port old logs and stats information.  Hence the first solution,
more advantageous in my eyes.)

> IMHO, it would IMHO be great if all those Atlantis setup routines
> wouldn't use sql statements

Yes, fully agreed.  This is where we are going with the use of
SQLAlchemy objects and Fixtures in the next branch.

Best regards
--
Tibor Simko