Re: Please allow any indicator in any field

2014-03-29 Thread Ferran Jorba
Hello Alexander,

[...]
> > I know Marc21 reasonably well, and I don't remember now any case
> > where having different indicators mean something so different that
> > has to be treat differently.
> 
> Here I would be more careful. Basically, I would treat Marc fields and
> indicators not as 3 digits plus two other funny chars but consider the
> whole bunch as a 5 character wide filed designation. I think, here I'm
> in fact a bit more in line with Estebans approach. At least if I
> understand it correctly. (Though I agree with you that one might not
> come up with a complete bibfield list, but just with a set of "most
> common usages".)

But «most common usages» won't cover them all, and so, you cannot load
arbitrary records coming from unknown sources and expect Invenio to do
the expected thing with them.  So, what I'm proposing is: let «any» be
the rule, and let's cover the exceptions later.  What I'm proposing is
to follow the Postel's rule:

 Be conservative in what you do, be liberal in what you accept from
 others.
 

[...]
> >> The JSON structure that we create from Marc21 (or anything else)
> >> contains as much information from the master format as you want
> >> (even the indicators).  Meaning that if your data model is well
> >> written it is a lossless conversion and there is a one to one
> >> mapping that makes possible doing Marc21 to JSON to Marc21.
> >
> > I hope that.
> 
> +1
> 
> I share some concerns about this with Ferran and Martin and some
> others, and I'm very sure it's quite a task...

I don't think it is so difficult if the code just accepts 245%% for
title, 100%% for first author, etc.  With a 10% effort we could cover
more than 95% of the cases.

Alexander, would you accept to exchange the current Invenio default
behaviour with the default I'm proposing?  Knowing that it would not be
perfect, do you think that it would be better?

Best regards,

Ferran


Re: Please allow any indicator in any field

2014-03-29 Thread Ferran Jorba
Hello Rob,

> > CERN may use a subset of Marc21 where most of the indicators are __.
> > Ok, that's CERNs library decission.  But if the Invenio developers
> > claim that Invenio supports Marc21, you *must* allow other
> > indicators there, and consider it valid.
> 
> Then don't say it supports MARC21.  Simple solution.

Sorry, they do:

  Invenio complies with standards such as the ​Open Archives 
  metadata harvesting protocol (OAI-PMH) and uses ​MARC 21 as its
  underlying bibliographic format.
  

  Flexible metadata: Standard metadata format (MARC) 
  

  MARC format is the standard in the library world. It is well
  established and has been used since 1960s. [...]
  

And those reasons were the ones we used to choose Invenio over the
other alternatives quite some years ago.

> The primary goal of invenio should be to meet the needs of its
> original institution(s).

«Primary goal» should not exclude others, specially when an easy
compatible solution exist: take any indicator as valid.  Not perfect,
but *much* better than now.

> If marc indicators are not necessary in the
> database functions of the originating institution, then feel free to
> ignore them. Avoid getting dragged back 40 years to the days of
> library catalogs by any mandates to follow every rule to the letter.
> Those rules may have made sense in 1970 but they don't always now.
> And MARC development has been under the control of library
> association committees, made up of librarians, who make decisions
> based on cataloging rules for description of items (as typed on paper
> cards) and not contemporary technology.

That's why some institutions, like CERN or FNAL may take a subset of
Marc21.  But it turns out that there are millions, hundreds of millions
Marc21 records out there, outside those HEP institutions, where
those indicators exist.  And if the library community has decided that
indicators are useful, please respect their decissions.  That's not
what I'm discussing here.

What I'm proposing is that, just changing the developers mind and
taking into account that indicators may have different values that __,
and changing the code for %% or whatever wildcard character is
relevant, those existing records are going to be recognized by any
institution testing Invenio with default values, without having to
patch it as some of us have done.  I have discuraged Invenio to more
than one institution for those reasons.

> It is important to remember that marc, was originally a U.S. Library
> of Congress file format designed for large main-frame machines in a
> day of top down programming, and magnetic tape reel storage.  Access
> was entirely sequential which explains some of the record
> architecture.  the format was intended to be used to generate paper
> file cards.

Yes.  For similar technological limitations Unix has those cryptic
abbreviations.  But both Marc an Unix, curiously both born at the
end of 1960s, have been proven much better than the alternatives, and
the reason that they still in use. And backwards compatibility with the
existing heritage has one of the reasons of its current value.
 
> Modern computing should have freedom to use marc in any way that
> makes it as suitable as possible for the job and not be hindered by
> any marc or cataloging rules that are no longer really applicable.

Yes, of course.  But as compatibility is not so difficult, I argue that
it should be a goal for Invenio.  Even more as the current users have
records *with* indicators.  And let the librarians decide whether those
cataloging rules are applicable or not.  Most of the times they are
right.

Best regards,

Ferran