Re: Please allow any indicator in any field

2014-03-31 Thread Ferran Jorba
Hello Samuele,
 
 Dear Ferran, Alexander,

 I will try to explain you how Invenio is evolving and let's see if my
 understanding is correct, and if this will satisfy your MARC needs
 (Tibor, Esteban, please correct me anytime I am wrong).

Sorry, I disagree about the expression «your MARC needs».  I think that
the correct expression should be to change it to «Marc21 compliance».
Marc21 is a common agreement among a large, world-wide library
community.  UAB is a tiny, tiny fraction of this communitity, and UAB
choose to change to CATMARC to Marc21 some years ago, with all Catalan
libraries because we were following a general, world-wide movement to a
global standard, it was a strategic decission.  We could perfectly
disappear and the Marc21 community wouldn't notice at all.  We don't
have any special needs.  Marc21 allows for local fields and/or subfields
and we find some uses for it.

[...]
 Take a look at:
 https://github.com/inveniosoftware/invenio-demosite/blob/pu/invenio_demosite/recordext/fields/atlantis.cfg

 this is the default BibField configuration of Atlantis Demo
 Site. Note: this is not the default of Invenio, just Atlantis. And in
 Invenio core? Well no configuration is enforced there. It's up to
 you. You can start off Atlantis configuration and encode the whole
 MARC21 (or the part of MARC21 that you need), and Invenio will speak
 MARC21.

I still remember my first Invenio installation, back when it was called
CDSware.  I still remember the confusion to try a large and complex
piece of software (and then, it wasn't so large as it is now) that I
didn't fully understand.  I have helped some other local institutions
interested in trying Invenio, and I have seen that they are as confused
as I was.

What you call «Atlantis Demo Site» is the basis of those prospective new
Invenio users.  This is what they install and this is what they hope to
work.  My opinion is that you cannot expect prospective new Invenio
users to fiddle with Jinja2 templates just to make a 245 title with
indicators to appear as title, to be indexed as title and to be exported
in whatever format as title.  The default values (what you call
«Atlantis Demo Site») should comply with Marc21 as much as possible.  If
it doesn't, the barriers for newcomers to adopt Invenio are (a)
unnecessarily difficult to overcome and (b) you are asking each of them
to repeat the same exercice just to load a few records and to see the
result.

If you just correct the current, sub-standard records in your Atlantis
Demo Site to a more realistic ones, and make the default configuration
recognize them, the goal will be accomplished.  I sincerely think that
it is not difficult.  It is just that you are interested to make the
Invenio community grow.

Are you?

If there is some magical parameter to change it to something else like
Unimarc, the better.  But it should be easy, trivial and clear.  Please
help newcomers, or Invenio will always be something small, exotic and
marginal in the digital libraries and repositories landscape.

Best regards,

Ferran


RE: Please allow any indicator in any field

2014-03-30 Thread Wagner, Alexander
 as possible for the job and not be hindered by
 any marc or cataloging rules that are no longer really applicable.

You miss some very crucial points here. Just two examples.

- We have billions of records in this format.
- We have thousands of people /extremely/ skilled in this format, some
  trained for years. I have collegues who don't even think in terms
  like title or author but in 2450have to look at and 1001_.
  I do this myself if I want to refer to something unambigiously. We
  reintroduced this in our project as words where not good enough.
  So even non-libarians learnd marc so we were able to talk about the
  same entities efficiently.

Especially the latter point gets slightly underestimated if you take
the pure IT point of view. Anyway, just consider that if I can do
something in Marc I can easily find say 20 people in my (small)
library (~50% of it's staff!) who could handle it. Take /any/ other
format and I have probably 2 or 3 left with some IT background. And I
will need to invest huge ammounts of time for training to get
something similar.  But, I'll never get the same quality I get from
people trained and used to something they do for a decade or so. You
might call this unfortunate, but it is just a fact.

Thus, it could even be an economic argument. Sad to say, but
cataloguers, even skilled cataloguers, are just cheaper than IT
professionals. And they are fast at a very low error rate with the
stuff they are used to. Though I'm no cataloguer, I am myself much
faster in text marc than in MarcXML, cause it's not that chatty. Also
my error rate is lower.  I'm much faster to even jot down the record
in text marc from scratch than with any fancy form-based interface.
Depending on what you have to do this is a curcial factor. If I need
to correct an error in say 1000 records it is usually more cost
effective to just give them to our cataloguers than to write a program
that finds all exceptions and handles them well. (For 1000 records
usually I don't consider programming, if it is a one time issue.)

This however requires that Marc is Marc as it should be and not only
some subset of it.

In sum: libraries are built for a slightly longer time frame than
current technology. Thus we change slower, but we also work for a
much longer time. Most likely, even IT development will get much
slower, in, say 450 years from now, when it reaches the age of our
institutions. Probably, we could then use contemporary technology ;)

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


From: Rob Atkinson [atkin...@fnal.gov]
Sent: Friday, March 28, 2014 17:08
To: Ferran Jorba; Esteban Gabancho
Cc: Wagner, Alexander; project-invenio-devel (Invenio developers mailing-list)
Subject: Re: Please allow any indicator in any field

On 03/28/2014 04:39 AM, Ferran Jorba wrote:

 the problem is more general in Invenio.  You can take Marc21 and use a
 subset of it.  Or just use a subset of the tags.  Or interpret more
 strictly, more loosely, use more local tags or subfields.  Marc21 is
 like a legal code (or a programming language, if you feel more
 confortable with), where there is flexibility.

 CERN may use a subset of Marc21 where most of the indicators are __.
 Ok, that's CERNs library decission.  But if the Invenio developers claim
 that Invenio supports Marc21, you *must* allow other indicators there,
 and consider it valid.

Then don't say it supports MARC21.  Simple solution.
The primary goal of invenio should be to meet the needs of its original
institution(s).  If marc indicators are not necessary in the database
functions of the originating institution, then feel free to ignore them.
  Avoid getting dragged back 40 years to the days of library catalogs by
any mandates to follow every rule to the letter.  Those rules may have
made sense in 1970 but they don't always now.  And MARC development has
been under the control of library association committees, made up of
librarians, who make decisions based on cataloging rules for description
of items (as typed on paper cards) and not contemporary technology.

It is important to remember that marc, was originally a U.S. Library of
Congress file format designed for large main-frame machines in a day of
top down programming, and magnetic tape reel storage.  Access was
entirely sequential which explains some of the record architecture.  the
format was intended to be used to generate paper file cards.

Modern computing should have freedom to use marc in any way that makes
it as suitable as possible for the job and not be hindered by any marc
or cataloging rules that are no longer really applicable.

Rob

RE: Please allow any indicator in any field

2014-03-30 Thread Wagner, Alexander
Hello Ferran!

[...]
  I know Marc21 reasonably well, and I don't remember now any case
  where having different indicators mean something so different that
  has to be treat differently.

 Here I would be more careful. Basically, I would treat Marc fields and
 indicators not as 3 digits plus two other funny chars but consider the
 whole bunch as a 5 character wide filed designation. I think, here I'm
 in fact a bit more in line with Estebans approach. At least if I
 understand it correctly. (Though I agree with you that one might not
 come up with a complete bibfield list, but just with a set of most
 common usages.)

But «most common usages» won't cover them all, and so, you cannot load
arbitrary records coming from unknown sources and expect Invenio to do
the expected thing with them.

I'm not sure, but I think its basically a missunderstanding but we
generally agree. As I said, for indexing/dispaly I perfectly agree with you.
In definition of the fields as such, telling invenio what in input
e.g. an author should be one could, and probably should, be more
explicit.

 Be conservative in what you do, be liberal in what you accept from
 others.

Perfectly agree.

I'd add the famous Einstein her. As simple as possible, but not
simpler. ;)

 I share some concerns about this with Ferran and Martin and some
 others, and I'm very sure it's quite a task...

I don't think it is so difficult if the code just accepts 245%% for
title, 100%% for first author, etc.  With a 10% effort we could cover
more than 95% of the cases.

Alexander, would you accept to exchange the current Invenio default
behaviour with the default I'm proposing?  Knowing that it would not be
perfect, do you think that it would be better?

I think in general, yes. As said above, I feel we perfecly agree about
how Invenios default indexing and even dispaly should be set up, and
there %% is in all cases I see better than __.

If one defines a field, lets stick to author, I would however suggest,
that the definition says:

- Author should be 1001_ and stored as lastname, firstname
- alternatively 1002_ and sorted firstname lastname (note: deprecated)
- as fallback 100%% is treated as author in the index in case we have
foreign data (note: very deprecated)

You 245 example is quite telling and the people with not to much
library background might miss the point here a bit, simply as title
is as such a quite simple field in the sense that it is only a string,
from the IT point of view.

The point missed here, and it is really an important point, is that if
you get foreign data you /never/ get 245__ in your Marc, you'll
/always/ get indicators. So, stock invenio if I pull in 10.000 records
from our latest ebook package e.g. will have no titles. I consider
this indeed a bug, not an inconvenience.

Even our modernists who consider Marc ancient should agree that data
exchange is quite important. And no, I do not know /any/ format that
can transport the richness of Marc in a standardized, accepted manner,
nor do I know any format that is used for such a host of data as
Marc21. (If you consider it ancient, pease note that currently all german
library catalogues are migrated to use it instead of our own invention
MAB.) Hey, journal literature from the sciences is really quite
trivial. Book literatrue from the humanities is quite a different
story. If you don't believe it have some fun with
http://gso.gbv.de/DB=2.1/PPNSET?PPN=741186039 and it's friends.

Additionally, those indicators contain a strong meaning. I agree that
the first indicator might be considered superflous in databases (it is
probably /not/ if you consider that you have to create a bibliography,
and this is a /very/ common request).

Disregarding the second indicator is another story. If you have a
large bibliography it isn't too sensible to use a blind string
sorting. You'll get so many entries in T that you don't find your
stuff anymore. Yes, I know, these are offline bibliographies. Yes, I
hate to print a database. I feel it is completely sensless. Yes, I
work for several years now in converting all our peers to accept an URL
instead of 400 pages of paper.

BUT, IRL they often don't like trees and insist on silly printed
lists. Yes, ignoring The in sorting would be simple, but just to
add German we have 3 definite articles and two indefinite articles, of
course they have a different lenght and if you add other languages,
well.
--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: 

Re: Please allow any indicator in any field

2014-03-29 Thread Ferran Jorba
Hello Rob,

  CERN may use a subset of Marc21 where most of the indicators are __.
  Ok, that's CERNs library decission.  But if the Invenio developers
  claim that Invenio supports Marc21, you *must* allow other
  indicators there, and consider it valid.
 
 Then don't say it supports MARC21.  Simple solution.

Sorry, they do:

  Invenio complies with standards such as the ​Open Archives 
  metadata harvesting protocol (OAI-PMH) and uses ​MARC 21 as its
  underlying bibliographic format.
  http://invenio-software.org/

  Flexible metadata: Standard metadata format (MARC) 
  http://invenio-software.org/wiki/General/Features

  MARC format is the standard in the library world. It is well
  established and has been used since 1960s. [...]
  http://invenio-demo.cern.ch/help/admin/howto-marc

And those reasons were the ones we used to choose Invenio over the
other alternatives quite some years ago.

 The primary goal of invenio should be to meet the needs of its
 original institution(s).

«Primary goal» should not exclude others, specially when an easy
compatible solution exist: take any indicator as valid.  Not perfect,
but *much* better than now.

 If marc indicators are not necessary in the
 database functions of the originating institution, then feel free to
 ignore them. Avoid getting dragged back 40 years to the days of
 library catalogs by any mandates to follow every rule to the letter.
 Those rules may have made sense in 1970 but they don't always now.
 And MARC development has been under the control of library
 association committees, made up of librarians, who make decisions
 based on cataloging rules for description of items (as typed on paper
 cards) and not contemporary technology.

That's why some institutions, like CERN or FNAL may take a subset of
Marc21.  But it turns out that there are millions, hundreds of millions
Marc21 records out there, outside those HEP institutions, where
those indicators exist.  And if the library community has decided that
indicators are useful, please respect their decissions.  That's not
what I'm discussing here.

What I'm proposing is that, just changing the developers mind and
taking into account that indicators may have different values that __,
and changing the code for %% or whatever wildcard character is
relevant, those existing records are going to be recognized by any
institution testing Invenio with default values, without having to
patch it as some of us have done.  I have discuraged Invenio to more
than one institution for those reasons.

 It is important to remember that marc, was originally a U.S. Library
 of Congress file format designed for large main-frame machines in a
 day of top down programming, and magnetic tape reel storage.  Access
 was entirely sequential which explains some of the record
 architecture.  the format was intended to be used to generate paper
 file cards.

Yes.  For similar technological limitations Unix has those cryptic
abbreviations.  But both Marc an Unix, curiously both born at the
end of 1960s, have been proven much better than the alternatives, and
the reason that they still in use. And backwards compatibility with the
existing heritage has one of the reasons of its current value.
 
 Modern computing should have freedom to use marc in any way that
 makes it as suitable as possible for the job and not be hindered by
 any marc or cataloging rules that are no longer really applicable.

Yes, of course.  But as compatibility is not so difficult, I argue that
it should be a goal for Invenio.  Even more as the current users have
records *with* indicators.  And let the librarians decide whether those
cataloging rules are applicable or not.  Most of the times they are
right.

Best regards,

Ferran


Re: Please allow any indicator in any field

2014-03-29 Thread Ferran Jorba
Hello Alexander,

[...]
  I know Marc21 reasonably well, and I don't remember now any case
  where having different indicators mean something so different that
  has to be treat differently.
 
 Here I would be more careful. Basically, I would treat Marc fields and
 indicators not as 3 digits plus two other funny chars but consider the
 whole bunch as a 5 character wide filed designation. I think, here I'm
 in fact a bit more in line with Estebans approach. At least if I
 understand it correctly. (Though I agree with you that one might not
 come up with a complete bibfield list, but just with a set of most
 common usages.)

But «most common usages» won't cover them all, and so, you cannot load
arbitrary records coming from unknown sources and expect Invenio to do
the expected thing with them.  So, what I'm proposing is: let «any» be
the rule, and let's cover the exceptions later.  What I'm proposing is
to follow the Postel's rule:

 Be conservative in what you do, be liberal in what you accept from
 others.
 http://en.wikipedia.org/wiki/Robustness_principle

[...]
  The JSON structure that we create from Marc21 (or anything else)
  contains as much information from the master format as you want
  (even the indicators).  Meaning that if your data model is well
  written it is a lossless conversion and there is a one to one
  mapping that makes possible doing Marc21 to JSON to Marc21.
 
  I hope that.
 
 +1
 
 I share some concerns about this with Ferran and Martin and some
 others, and I'm very sure it's quite a task...

I don't think it is so difficult if the code just accepts 245%% for
title, 100%% for first author, etc.  With a 10% effort we could cover
more than 95% of the cases.

Alexander, would you accept to exchange the current Invenio default
behaviour with the default I'm proposing?  Knowing that it would not be
perfect, do you think that it would be better?

Best regards,

Ferran