Re: Please allow any indicator in any field
Hello Samuele, Dear Ferran, Alexander, I will try to explain you how Invenio is evolving and let's see if my understanding is correct, and if this will satisfy your MARC needs (Tibor, Esteban, please correct me anytime I am wrong). Sorry, I disagree about the expression «your MARC needs». I think that the correct expression should be to change it to «Marc21 compliance». Marc21 is a common agreement among a large, world-wide library community. UAB is a tiny, tiny fraction of this communitity, and UAB choose to change to CATMARC to Marc21 some years ago, with all Catalan libraries because we were following a general, world-wide movement to a global standard, it was a strategic decission. We could perfectly disappear and the Marc21 community wouldn't notice at all. We don't have any special needs. Marc21 allows for local fields and/or subfields and we find some uses for it. [...] Take a look at: https://github.com/inveniosoftware/invenio-demosite/blob/pu/invenio_demosite/recordext/fields/atlantis.cfg this is the default BibField configuration of Atlantis Demo Site. Note: this is not the default of Invenio, just Atlantis. And in Invenio core? Well no configuration is enforced there. It's up to you. You can start off Atlantis configuration and encode the whole MARC21 (or the part of MARC21 that you need), and Invenio will speak MARC21. I still remember my first Invenio installation, back when it was called CDSware. I still remember the confusion to try a large and complex piece of software (and then, it wasn't so large as it is now) that I didn't fully understand. I have helped some other local institutions interested in trying Invenio, and I have seen that they are as confused as I was. What you call «Atlantis Demo Site» is the basis of those prospective new Invenio users. This is what they install and this is what they hope to work. My opinion is that you cannot expect prospective new Invenio users to fiddle with Jinja2 templates just to make a 245 title with indicators to appear as title, to be indexed as title and to be exported in whatever format as title. The default values (what you call «Atlantis Demo Site») should comply with Marc21 as much as possible. If it doesn't, the barriers for newcomers to adopt Invenio are (a) unnecessarily difficult to overcome and (b) you are asking each of them to repeat the same exercice just to load a few records and to see the result. If you just correct the current, sub-standard records in your Atlantis Demo Site to a more realistic ones, and make the default configuration recognize them, the goal will be accomplished. I sincerely think that it is not difficult. It is just that you are interested to make the Invenio community grow. Are you? If there is some magical parameter to change it to something else like Unimarc, the better. But it should be easy, trivial and clear. Please help newcomers, or Invenio will always be something small, exotic and marginal in the digital libraries and repositories landscape. Best regards, Ferran
RE: Please allow any indicator in any field
as possible for the job and not be hindered by any marc or cataloging rules that are no longer really applicable. You miss some very crucial points here. Just two examples. - We have billions of records in this format. - We have thousands of people /extremely/ skilled in this format, some trained for years. I have collegues who don't even think in terms like title or author but in 2450have to look at and 1001_. I do this myself if I want to refer to something unambigiously. We reintroduced this in our project as words where not good enough. So even non-libarians learnd marc so we were able to talk about the same entities efficiently. Especially the latter point gets slightly underestimated if you take the pure IT point of view. Anyway, just consider that if I can do something in Marc I can easily find say 20 people in my (small) library (~50% of it's staff!) who could handle it. Take /any/ other format and I have probably 2 or 3 left with some IT background. And I will need to invest huge ammounts of time for training to get something similar. But, I'll never get the same quality I get from people trained and used to something they do for a decade or so. You might call this unfortunate, but it is just a fact. Thus, it could even be an economic argument. Sad to say, but cataloguers, even skilled cataloguers, are just cheaper than IT professionals. And they are fast at a very low error rate with the stuff they are used to. Though I'm no cataloguer, I am myself much faster in text marc than in MarcXML, cause it's not that chatty. Also my error rate is lower. I'm much faster to even jot down the record in text marc from scratch than with any fancy form-based interface. Depending on what you have to do this is a curcial factor. If I need to correct an error in say 1000 records it is usually more cost effective to just give them to our cataloguers than to write a program that finds all exceptions and handles them well. (For 1000 records usually I don't consider programming, if it is a one time issue.) This however requires that Marc is Marc as it should be and not only some subset of it. In sum: libraries are built for a slightly longer time frame than current technology. Thus we change slower, but we also work for a much longer time. Most likely, even IT development will get much slower, in, say 450 years from now, when it reaches the age of our institutions. Probably, we could then use contemporary technology ;) -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi From: Rob Atkinson [atkin...@fnal.gov] Sent: Friday, March 28, 2014 17:08 To: Ferran Jorba; Esteban Gabancho Cc: Wagner, Alexander; project-invenio-devel (Invenio developers mailing-list) Subject: Re: Please allow any indicator in any field On 03/28/2014 04:39 AM, Ferran Jorba wrote: the problem is more general in Invenio. You can take Marc21 and use a subset of it. Or just use a subset of the tags. Or interpret more strictly, more loosely, use more local tags or subfields. Marc21 is like a legal code (or a programming language, if you feel more confortable with), where there is flexibility. CERN may use a subset of Marc21 where most of the indicators are __. Ok, that's CERNs library decission. But if the Invenio developers claim that Invenio supports Marc21, you *must* allow other indicators there, and consider it valid. Then don't say it supports MARC21. Simple solution. The primary goal of invenio should be to meet the needs of its original institution(s). If marc indicators are not necessary in the database functions of the originating institution, then feel free to ignore them. Avoid getting dragged back 40 years to the days of library catalogs by any mandates to follow every rule to the letter. Those rules may have made sense in 1970 but they don't always now. And MARC development has been under the control of library association committees, made up of librarians, who make decisions based on cataloging rules for description of items (as typed on paper cards) and not contemporary technology. It is important to remember that marc, was originally a U.S. Library of Congress file format designed for large main-frame machines in a day of top down programming, and magnetic tape reel storage. Access was entirely sequential which explains some of the record architecture. the format was intended to be used to generate paper file cards. Modern computing should have freedom to use marc in any way that makes it as suitable as possible for the job and not be hindered by any marc or cataloging rules that are no longer really applicable. Rob
RE: Please allow any indicator in any field
Hello Ferran! [...] I know Marc21 reasonably well, and I don't remember now any case where having different indicators mean something so different that has to be treat differently. Here I would be more careful. Basically, I would treat Marc fields and indicators not as 3 digits plus two other funny chars but consider the whole bunch as a 5 character wide filed designation. I think, here I'm in fact a bit more in line with Estebans approach. At least if I understand it correctly. (Though I agree with you that one might not come up with a complete bibfield list, but just with a set of most common usages.) But «most common usages» won't cover them all, and so, you cannot load arbitrary records coming from unknown sources and expect Invenio to do the expected thing with them. I'm not sure, but I think its basically a missunderstanding but we generally agree. As I said, for indexing/dispaly I perfectly agree with you. In definition of the fields as such, telling invenio what in input e.g. an author should be one could, and probably should, be more explicit. Be conservative in what you do, be liberal in what you accept from others. Perfectly agree. I'd add the famous Einstein her. As simple as possible, but not simpler. ;) I share some concerns about this with Ferran and Martin and some others, and I'm very sure it's quite a task... I don't think it is so difficult if the code just accepts 245%% for title, 100%% for first author, etc. With a 10% effort we could cover more than 95% of the cases. Alexander, would you accept to exchange the current Invenio default behaviour with the default I'm proposing? Knowing that it would not be perfect, do you think that it would be better? I think in general, yes. As said above, I feel we perfecly agree about how Invenios default indexing and even dispaly should be set up, and there %% is in all cases I see better than __. If one defines a field, lets stick to author, I would however suggest, that the definition says: - Author should be 1001_ and stored as lastname, firstname - alternatively 1002_ and sorted firstname lastname (note: deprecated) - as fallback 100%% is treated as author in the index in case we have foreign data (note: very deprecated) You 245 example is quite telling and the people with not to much library background might miss the point here a bit, simply as title is as such a quite simple field in the sense that it is only a string, from the IT point of view. The point missed here, and it is really an important point, is that if you get foreign data you /never/ get 245__ in your Marc, you'll /always/ get indicators. So, stock invenio if I pull in 10.000 records from our latest ebook package e.g. will have no titles. I consider this indeed a bug, not an inconvenience. Even our modernists who consider Marc ancient should agree that data exchange is quite important. And no, I do not know /any/ format that can transport the richness of Marc in a standardized, accepted manner, nor do I know any format that is used for such a host of data as Marc21. (If you consider it ancient, pease note that currently all german library catalogues are migrated to use it instead of our own invention MAB.) Hey, journal literature from the sciences is really quite trivial. Book literatrue from the humanities is quite a different story. If you don't believe it have some fun with http://gso.gbv.de/DB=2.1/PPNSET?PPN=741186039 and it's friends. Additionally, those indicators contain a strong meaning. I agree that the first indicator might be considered superflous in databases (it is probably /not/ if you consider that you have to create a bibliography, and this is a /very/ common request). Disregarding the second indicator is another story. If you have a large bibliography it isn't too sensible to use a blind string sorting. You'll get so many entries in T that you don't find your stuff anymore. Yes, I know, these are offline bibliographies. Yes, I hate to print a database. I feel it is completely sensless. Yes, I work for several years now in converting all our peers to accept an URL instead of 400 pages of paper. BUT, IRL they often don't like trees and insist on silly printed lists. Yes, ignoring The in sorting would be simple, but just to add German we have 3 definite articles and two indefinite articles, of course they have a different lenght and if you add other languages, well. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats:
Re: Please allow any indicator in any field
Hello Rob, CERN may use a subset of Marc21 where most of the indicators are __. Ok, that's CERNs library decission. But if the Invenio developers claim that Invenio supports Marc21, you *must* allow other indicators there, and consider it valid. Then don't say it supports MARC21. Simple solution. Sorry, they do: Invenio complies with standards such as the Open Archives metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic format. http://invenio-software.org/ Flexible metadata: Standard metadata format (MARC) http://invenio-software.org/wiki/General/Features MARC format is the standard in the library world. It is well established and has been used since 1960s. [...] http://invenio-demo.cern.ch/help/admin/howto-marc And those reasons were the ones we used to choose Invenio over the other alternatives quite some years ago. The primary goal of invenio should be to meet the needs of its original institution(s). «Primary goal» should not exclude others, specially when an easy compatible solution exist: take any indicator as valid. Not perfect, but *much* better than now. If marc indicators are not necessary in the database functions of the originating institution, then feel free to ignore them. Avoid getting dragged back 40 years to the days of library catalogs by any mandates to follow every rule to the letter. Those rules may have made sense in 1970 but they don't always now. And MARC development has been under the control of library association committees, made up of librarians, who make decisions based on cataloging rules for description of items (as typed on paper cards) and not contemporary technology. That's why some institutions, like CERN or FNAL may take a subset of Marc21. But it turns out that there are millions, hundreds of millions Marc21 records out there, outside those HEP institutions, where those indicators exist. And if the library community has decided that indicators are useful, please respect their decissions. That's not what I'm discussing here. What I'm proposing is that, just changing the developers mind and taking into account that indicators may have different values that __, and changing the code for %% or whatever wildcard character is relevant, those existing records are going to be recognized by any institution testing Invenio with default values, without having to patch it as some of us have done. I have discuraged Invenio to more than one institution for those reasons. It is important to remember that marc, was originally a U.S. Library of Congress file format designed for large main-frame machines in a day of top down programming, and magnetic tape reel storage. Access was entirely sequential which explains some of the record architecture. the format was intended to be used to generate paper file cards. Yes. For similar technological limitations Unix has those cryptic abbreviations. But both Marc an Unix, curiously both born at the end of 1960s, have been proven much better than the alternatives, and the reason that they still in use. And backwards compatibility with the existing heritage has one of the reasons of its current value. Modern computing should have freedom to use marc in any way that makes it as suitable as possible for the job and not be hindered by any marc or cataloging rules that are no longer really applicable. Yes, of course. But as compatibility is not so difficult, I argue that it should be a goal for Invenio. Even more as the current users have records *with* indicators. And let the librarians decide whether those cataloging rules are applicable or not. Most of the times they are right. Best regards, Ferran
Re: Please allow any indicator in any field
Hello Alexander, [...] I know Marc21 reasonably well, and I don't remember now any case where having different indicators mean something so different that has to be treat differently. Here I would be more careful. Basically, I would treat Marc fields and indicators not as 3 digits plus two other funny chars but consider the whole bunch as a 5 character wide filed designation. I think, here I'm in fact a bit more in line with Estebans approach. At least if I understand it correctly. (Though I agree with you that one might not come up with a complete bibfield list, but just with a set of most common usages.) But «most common usages» won't cover them all, and so, you cannot load arbitrary records coming from unknown sources and expect Invenio to do the expected thing with them. So, what I'm proposing is: let «any» be the rule, and let's cover the exceptions later. What I'm proposing is to follow the Postel's rule: Be conservative in what you do, be liberal in what you accept from others. http://en.wikipedia.org/wiki/Robustness_principle [...] The JSON structure that we create from Marc21 (or anything else) contains as much information from the master format as you want (even the indicators). Meaning that if your data model is well written it is a lossless conversion and there is a one to one mapping that makes possible doing Marc21 to JSON to Marc21. I hope that. +1 I share some concerns about this with Ferran and Martin and some others, and I'm very sure it's quite a task... I don't think it is so difficult if the code just accepts 245%% for title, 100%% for first author, etc. With a 10% effort we could cover more than 95% of the cases. Alexander, would you accept to exchange the current Invenio default behaviour with the default I'm proposing? Knowing that it would not be perfect, do you think that it would be better? Best regards, Ferran