Re: [CODE4LIB] marc21 and usmarc
On Tue, Jan 27, 2009 at 17:04, Eric Lease Morgan emor...@nd.edu wrote: Can somebody say MARCXML or MODS complete with a schema? Well, we can say it, and I think we *have* said it for a very long time, but it doesn't seem to change anything. Damn those words. Such solutions offer at least syntactic validation if not also semantic validation. Oh well. I would say a little bit more than oh well (but I don't really have; you know how I feel :), but I would love to hear what the vendors are thinking about this all. They seem to very, very quiet about it all (without speculating to why ...) regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] marc21 and usmarc (fwd)
On Tue, Jan 27, 2009 at 17:09, Ardie Bausenbach a...@loc.gov wrote: Since that time, many other national libraries have moved from their national formats to MARC 21, including (among others), the UK, Germany, Finland, and Spain. I know a few more, but another point worth, er, screaming about, is the various AACT2 / RDA / other rules changes that's not linked to MARC at all. I know a lot of it is covered in MARC documentation, but there's hidden gems, like punctuations, symbols, character-encodings, etc which aren't always specified. If the library world embraced XML as a minimum a lot could be fixed in that area (and no, XMLMARC does not qualify :). Regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] marc21 and usmarc
Alexander Johannesen wrote: I would say a little bit more than oh well (but I don't really have; you know how I feel :), but I would love to hear what the vendors are thinking about this all. They seem to very, very quiet about it all (without speculating to why ...) Because their customers are not demanding it, and they often don't have the technical expertise to understand why it matters anyway. But mainly because their customers are not demanding it. Jonathan
[CODE4LIB] djatoka now in production at Biodiversity Heritage Library
[Apologies in advance for crossposting] The Biodiversity Heritage Library (BHL) has integrated djatoka, the new open source JPEG 2000 image server developed by Ryan Chute and Herbert Van de Sompel at Los Alamos National Laboratory, into its production portal at http://www.biodiversitylibrary.org. BHL is a consortium of natural history libraries who are partnering with Internet Archive to digitize public domain scientific literature of use to individual scholars and large bioinformatics projects like the Encyclopedia of Life. To date more than 27,000 volumes have been made available in open access through Internet Archive and the BHL Portal. Here's a representative page image delivered via djatoka, chosen in honor of yesterday's celebration of the Year of the Ox: http://www.biodiversitylibrary.org/page/9370105 Since last Thursday (Jan 22), djatoka has been serving the images for the (nearly) 11 million pages available through BHL. It's scaling performing well under our normal load of 1,500 users per day, but we wanted to give it a more robust test...which we're hoping to see with this e-mail post to various listservs. We've written a blog post with more details about our implementation infrastructure, available here: http://biodiversitylibrary.blogspot.com/2009/01/now-serving-all-page-ima ges-via-djatoka.html or here: http://tinyurl.com/bjh72x We also wanted to make this announcement in support of djatoka and JPEG 2000, given the recent survey of JPEG 2000 implementation in libraries, available at http://digitalcommons.uconn.edu/libr_pubs/16/. In short, our experience evaluating and implementing djatoka has been extremely smooth and we were able to drop it into our existing infrastructure and UI with a minimum of customization (as described in the post). It replaces a non-scalable, proprietary JPEG 2000 image server that until now has been the biggest bottleneck in delivering our content to users. The lack of open source options for JPEG 2000 delivery is often among the most frequently cited reasons why cultural heritage organizations have not embraced the JPEG 2000 format. Hopefully our positive experience implementing djatoka and its demonstrated use within BHL can be a stimulus for other projects to evaluate the software and/or put JPEG 2000 back in consideration as a delivery format. [And of course this message is sent out with the caveat, stated above, that we have yet to test djatoka in production under the peak load that will (hopefully) occur as this message makes its way through Inboxes around the world. It's working beautifully under normal production load and our internal testing suggests that it will perform without incident, but the only true test is a live one...] Looking forward to feedback, ** Chris Freeland Technical Director, Biodiversity Heritage Library Director, Bioinformatics, Missouri Botanical Garden
Re: [CODE4LIB] marc21 and usmarc
So, um, could librarians everywhere start being just a tad bit more demanding about this stuff? You know, before your profession becomes obsoleted from this planet? The basic problem is that the real value of a catalog is its consistency, and the legacy data in these fields is already too inconsistent to have much value. The good news is that despite the fact that some fields are just hopeless, many things in the catalog are decent. The consistency of structure and the quality of content in author, title, and subject fields is significantly better in a library catalog than it is in other sources. That is why faceting works pretty well. Where things really start falling apart are the special purpose fields. Between some libraries deciding these aren't worth filling in, catalogers not being aware of them, etc, they are inconsistently applied. Even among the few that entered relatively consistently (such as 043), there is usually a better source of data (e.g. 6XX |z) somewhere else in the record If data aren't consistent enough, it's just not useful no matter how good your system is. Plus, any system that actually knew all the MARC tag and indicator twiddling would be hideously complex. Do you really want to design a system that exploits the special tag indicating if a piece is a Festschrift, has color illustrations, or a portrait? Do you think that exploiting the full power of the 007 would not confuse the heck out of everyone or that a special field which exists only to discuss funding associated with the content of the piece needs special treatment and that any work units or task numbers associated with that funding need to be stored separately? The data in the fields above are hideously inconsistent, but even in a perfect world where it is all correct, I'd argue it's still not worth screwing with -- especially since there are plenty of basic aspects of the patron and staff experience that need serious work. Actually, I was wondering what areas MODS can't handle which MARC does, hijack and / or change MODS to fit it (what I know of it seems a bit limiting, but through XML certainly extensible). Shouldn't folks start by demanding at least MODS (or XOBIS if we're *really* crazy :)? Frankly, the important stuff is there and it would be possible to modify MODS to accommodate the things that aren't. The main reason you're stuck with MARC is that there are a lot of legacy loaders out there so even if all transmission was done in MODS, you'd still have to convert it to MARC. There are arguments to do so, but the business case is not strong. That data providers won't send MODS until libraries demand it. Libraries won't demand it until their systems use it. Systems won't use it until libraries demand it because that's what their data providers require. It's a vicious circle, so we're stuck with MARC. The only people who aren't happy with this arrangement are those who are trying to create something new. Many librarians who think they use MARC every day have no idea that it is a binary format that is unfriendly to eyes and machines. Just because something makes sense does not mean it will happen. The QWERTY keyboard is terrible, every modern operating system can support the technically superior Dvorak layout, yet we never switch. I'm a bit more optimistic getting the content of the catalog record into a container better than MARC, but that will take a long time. kyle
Re: [CODE4LIB] marc21 and usmarc
On Tue, Jan 27, 2009 at 18:56, Kyle Banerjee kyle.baner...@gmail.com wrote: There are arguments to do so, but the business case is not strong. Well, I'd say the future of the library world is a good business case, and I know several people (high and low) fully aware of it, but I think it's hard to take any step in either direction that would be deemed worth it. Toguh one, indeed. That data providers won't send MODS until libraries demand it. Libraries won't demand it until their systems use it. Systems won't use it until libraries demand it because that's what their data providers require. Well, I've been yelling for vendors to get more involved for a long time, but there's a lot of blankness coming from them. I guess they're happy with the current tie to MARC (binding the libraries to them forever) until the business is gone ... It's a vicious circle, so we're stuck with MARC. The only people who aren't happy with this arrangement are those who are trying to create something new. Many librarians who think they use MARC every day have no idea that it is a binary format that is unfriendly to eyes and machines. MARC may be MAchine Readable, but not MAchine Understandable or even MAchine Usable. I had an idea some time ago to create a dummy / fake MARC record with much more to it (like extensions and special tags systems can react to, such as validation) and pass it around the infrastructure to see what in it survives (the golden rule is to ignore what you don't understand, although I know a few MARC systems who filter out what they don't understand (!!!) because, well, these systems were mostly built back when a megabyte of storage and / or memory had a price of about a cataloger or two. Friggin' crazies!). Anyone in? :) Regards, Alex -- --- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps -- http://shelter.nu/blog/
Re: [CODE4LIB] marc21 and usmarc
On Tue, Jan 27, 2009 at 10:50:41AM -0800, Karen Coyle wrote: I am less optimistic about MODS than Kyle. Having watched it be made, I think it's more than just a bit of a kludge, and carries forward a lot of the problems of MARC21. I also don't think that it has a strong model or philosophy behind it. I think we can do much, much better. What is stopping us is what comes up here: you can create a better record, but that doesn't mean that library systems will use it. Even so, I'm up for trying to create that better record, and I'm even up for creating one that is compatible with library cataloging practices, at least in their intent. Some of us talked about this on the exhibits floor of ALA just in the last few days. I will start by re-organizing a document I did a few years ago but that was never publicly released. I'll do a new, public version and post it, then wiki it so we can have the discussion. Also, I think that the cataloger scenarios in the DC/RDA wiki are beginning to show what one can do with the FRBR assumption behind the record. This sounds like a great idea, Karen, and I'm looking forward to seeing the document and discussing it. If a record format can demonstrate a significant leap forward then it will be adopted. Keep this list in the loop about the public version. Gabriel
Re: [CODE4LIB] marc21 and usmarc
I am less optimistic about MODS than Kyle. Having watched it be made, I think it's more than just a bit of a kludge, and carries forward a lot of the problems of MARC21. I also don't think that it has a strong model or philosophy behind it. I think we can do much, much better. I agree with Karen's characterization of how MODS has developed since its inception. The good news is that will hopefully change soon. The newly formed MODS/MADS Editorial Committee is developing a design principles document that will help guide future versions of MODS and MADS. We'd gratefully welcome feedback on what those principles should be. The MODS list [1] is probably the best place for that discussion to take place, but some of the Committee members are on this list too, so ideas brought up here won't be lost on that group. I suspect we'll have a draft document to share on the MODS list in the next month or so, but ideas for what should be on it before then are even more valuable. :-) [1] http://listserv.loc.gov/listarch/mods.html Jenn (Chair, MODS/MADS Editorial Committee) Jenn Riley Metadata Librarian Digital Library Program Indiana University - Bloomington Wells Library W501 (812) 856-5759 www.dlib.indiana.edu Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com