Re: [CODE4LIB] marc21 and usmarc

2009-01-27 Thread Alexander Johannesen
On Tue, Jan 27, 2009 at 17:04, Eric Lease Morgan emor...@nd.edu wrote:
 Can somebody say MARCXML or MODS complete with a schema?

Well, we can say it, and I think we *have* said it for a very long
time, but it doesn't seem to change anything. Damn those words.

 Such solutions offer at least syntactic validation if not also
 semantic validation. Oh well.

I would say a little bit more than oh well (but I don't really have;
you know how I feel :), but I would love to hear what the vendors are
thinking about this all. They seem to very, very quiet about it all
(without speculating to why ...)


regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] marc21 and usmarc (fwd)

2009-01-27 Thread Alexander Johannesen
On Tue, Jan 27, 2009 at 17:09, Ardie Bausenbach a...@loc.gov wrote:
 Since that time, many other national libraries have moved from
 their national formats to MARC 21, including (among others),
 the UK, Germany, Finland, and Spain.

I know a few more, but another point worth, er, screaming about, is
the various AACT2 / RDA / other rules changes that's not linked to
MARC at all. I know a lot of it is covered in MARC documentation, but
there's hidden gems, like punctuations, symbols, character-encodings,
etc which aren't always specified.

If the library world embraced XML as a minimum a lot could be fixed in
that area (and no, XMLMARC does not qualify :).


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] marc21 and usmarc

2009-01-27 Thread Jonathan Rochkind

Alexander Johannesen wrote:

I would say a little bit more than oh well (but I don't really have;
you know how I feel :), but I would love to hear what the vendors are
thinking about this all. They seem to very, very quiet about it all
(without speculating to why ...)
  


Because their customers are not demanding it, and they often don't have 
the technical expertise to understand why it matters anyway.  But mainly 
because their customers are not demanding it.


Jonathan


[CODE4LIB] djatoka now in production at Biodiversity Heritage Library

2009-01-27 Thread Chris Freeland
[Apologies in advance for crossposting]

The Biodiversity Heritage Library (BHL) has integrated djatoka, the new
open source JPEG 2000 image server developed by Ryan Chute and Herbert
Van de Sompel at Los Alamos National Laboratory, into its production
portal at http://www.biodiversitylibrary.org.  BHL is a consortium of
natural history libraries who are partnering with Internet Archive to
digitize public domain scientific literature of use to individual
scholars and large bioinformatics projects like the Encyclopedia of
Life.  To date more than 27,000 volumes have been made available in open
access through Internet Archive and the BHL Portal.  

Here's a representative page image delivered via djatoka, chosen in
honor of yesterday's celebration of the Year of the Ox:
http://www.biodiversitylibrary.org/page/9370105

Since last Thursday (Jan 22), djatoka has been serving the images for
the (nearly) 11 million pages available through BHL.  It's scaling 
performing well under our normal load of 1,500 users per day, but we
wanted to give it a more robust test...which we're hoping to see with
this e-mail post to various listservs. We've written a blog post with
more details about our implementation  infrastructure, available here:
http://biodiversitylibrary.blogspot.com/2009/01/now-serving-all-page-ima
ges-via-djatoka.html
or here:
http://tinyurl.com/bjh72x

We also wanted to make this announcement in support of djatoka and JPEG
2000, given the recent survey of JPEG 2000 implementation in libraries,
available at http://digitalcommons.uconn.edu/libr_pubs/16/.  In short,
our experience evaluating and implementing djatoka has been extremely
smooth and we were able to drop it into our existing infrastructure and
UI with a minimum of customization (as described in the post).  It
replaces a non-scalable, proprietary JPEG 2000 image server that until
now has been the biggest bottleneck in delivering our content to users.
The lack of open source options for JPEG 2000 delivery is often among
the most frequently cited reasons why cultural heritage organizations
have not embraced the JPEG 2000 format. Hopefully our positive
experience implementing djatoka and its demonstrated use within BHL can
be a stimulus for other projects to evaluate the software and/or put
JPEG 2000 back in consideration as a delivery format.

[And of course this message is sent out with the caveat, stated above,
that we have yet to test djatoka in production under the peak load that
will (hopefully) occur as this message makes its way through Inboxes
around the world.  It's working beautifully under normal production load
and our internal testing suggests that it will perform without incident,
but the only true test is a live one...]

Looking forward to feedback,

**
Chris Freeland
Technical Director, Biodiversity Heritage Library
Director, Bioinformatics, Missouri Botanical Garden


Re: [CODE4LIB] marc21 and usmarc

2009-01-27 Thread Kyle Banerjee
 So, um, could librarians everywhere start being just a tad bit more
 demanding about this stuff? You know, before your profession becomes
 obsoleted from this planet?

The basic problem is that the real value of a catalog is its
consistency, and the legacy data in these fields is already too
inconsistent to have much value.

The good news is that despite the fact that some fields are just
hopeless, many things in the catalog are decent. The consistency of
structure and the quality of content in author, title, and subject
fields is significantly better in a library catalog than it is in
other sources. That is why faceting works pretty well.

Where things really start falling apart are the special purpose
fields. Between some libraries deciding these aren't worth filling in,
catalogers not being aware of them, etc, they are inconsistently
applied. Even among the few that entered relatively consistently (such
as 043), there is usually a better source of data (e.g. 6XX |z)
somewhere else in the record

If data aren't consistent enough, it's just not useful no matter how
good your system is. Plus, any system that actually knew all the MARC
tag and indicator twiddling would be hideously complex. Do you really
want to design a system that exploits the special tag indicating if a
piece is a Festschrift, has color illustrations, or a portrait? Do you
think that exploiting the full power of the 007 would not confuse the
heck out of everyone or that a special field which exists only to
discuss funding associated with the content of the piece needs special
treatment and that any work units or task numbers associated with that
funding need to be stored separately?

The data in the fields above are hideously inconsistent, but even in a
perfect world where it is all correct, I'd argue it's still not worth
screwing with -- especially since there are plenty of basic aspects of
the patron and staff experience that need serious work.

 Actually, I was wondering what areas MODS can't handle which MARC
 does, hijack and / or change MODS to fit it (what I know of it seems a
 bit limiting, but through XML certainly extensible). Shouldn't folks
 start by demanding at least MODS (or XOBIS if we're *really* crazy :)?

Frankly, the important stuff is there and it would be possible to
modify MODS to accommodate the things that aren't. The main reason
you're stuck with MARC is that there are a lot of legacy loaders out
there so even if all transmission was done in MODS, you'd still have
to convert it to MARC.

There are arguments to do so, but the business case is not strong.
That data providers won't send MODS until libraries demand it.
Libraries won't demand it until their systems use it. Systems won't
use it until libraries demand it because that's what their data
providers require.

It's a vicious circle, so we're stuck with MARC. The only people who
aren't happy with this arrangement are those who are trying to create
something new. Many librarians who think they use MARC every day have
no idea that it is a binary format that is unfriendly to eyes and
machines.

Just because something makes sense does not mean it will happen. The
QWERTY keyboard is terrible, every modern operating system can support
the technically superior Dvorak layout, yet we never switch. I'm a bit
more optimistic getting the content of the catalog record into a
container better than MARC, but that will take a long time.

kyle


Re: [CODE4LIB] marc21 and usmarc

2009-01-27 Thread Alexander Johannesen
On Tue, Jan 27, 2009 at 18:56, Kyle Banerjee kyle.baner...@gmail.com wrote:
 There are arguments to do so, but the business case is not strong.

Well, I'd say the future of the library world is a good business case,
and I know several people (high and low) fully aware of it, but I
think it's hard to take any step in either direction that would be
deemed worth it. Toguh one, indeed.

 That data providers won't send MODS until libraries demand it.
 Libraries won't demand it until their systems use it. Systems won't
 use it until libraries demand it because that's what their data
 providers require.

Well, I've been yelling for vendors to get more involved for a long
time, but there's a lot of blankness coming from them. I guess they're
happy with the current tie to MARC (binding the libraries to them
forever) until the business is gone ...

 It's a vicious circle, so we're stuck with MARC. The only people who
 aren't happy with this arrangement are those who are trying to create
 something new. Many librarians who think they use MARC every day
 have no idea that it is a binary format that is unfriendly to eyes and
 machines.

MARC may be MAchine Readable, but not MAchine Understandable or even
MAchine Usable.

I had an idea some time ago to create a dummy / fake MARC record with
much more to it (like extensions and special tags systems can react
to, such as validation) and pass it around the infrastructure to see
what in it survives (the golden rule is to ignore what you don't
understand, although I know a few MARC systems who filter out what
they don't understand (!!!) because, well, these systems were mostly
built back when a megabyte of storage and / or memory had a price of
about a cataloger or two. Friggin' crazies!). Anyone in? :)


Regards,

Alex
-- 
---
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
-- http://shelter.nu/blog/ 


Re: [CODE4LIB] marc21 and usmarc

2009-01-27 Thread Gabriel Farrell
On Tue, Jan 27, 2009 at 10:50:41AM -0800, Karen Coyle wrote:
 I am less optimistic about MODS than Kyle. Having watched it be made, I  
 think it's more than just a bit of a kludge, and carries forward a lot  
 of the problems of MARC21. I also don't think that it has a strong model  
 or philosophy behind it. I think we can do much, much better. What is  
 stopping us is what comes up here: you can create a better record, but  
 that doesn't mean that library systems will use it. Even so, I'm up for  
 trying to create that better record, and I'm even up for creating one  
 that is compatible with library cataloging practices, at least in their  
 intent. Some of us talked about this on the exhibits floor of ALA just  
 in the last few days.

 I will start by re-organizing a document I did a few years ago but that  
 was never publicly released. I'll do a new, public version and post it,  
 then wiki it so we can have the discussion. Also, I think that the  
 cataloger scenarios in the DC/RDA wiki are beginning to show what one  
 can do with the FRBR assumption behind the record.

This sounds like a great idea, Karen, and I'm looking forward to seeing 
the document and discussing it.  If a record format can demonstrate a   
significant leap forward then it will be adopted.   

Keep this list in the loop about the public version.

Gabriel


Re: [CODE4LIB] marc21 and usmarc

2009-01-27 Thread Riley, Jenn
 I am less optimistic about MODS than Kyle. Having watched it be made, I 
 think it's more than just a bit of a kludge, and carries forward a lot 
 of the problems of MARC21. I also don't think that it has a strong model 
 or philosophy behind it. I think we can do much, much better. 

I agree with Karen's characterization of how MODS has developed since its 
inception. The good news is that will hopefully change soon. The newly formed 
MODS/MADS Editorial Committee is developing a design principles document that 
will help guide future versions of MODS and MADS. We'd gratefully welcome 
feedback on what those principles should be. The MODS list [1] is probably the 
best place for that discussion to take place, but some of the Committee members 
are on this list too, so ideas brought up here won't be lost on that group. I 
suspect we'll have a draft document to share on the MODS list in the next month 
or so, but ideas for what should be on it before then are even more valuable. 
:-)

[1] http://listserv.loc.gov/listarch/mods.html

Jenn 
(Chair, MODS/MADS Editorial Committee)


Jenn Riley
Metadata Librarian
Digital Library Program
Indiana University - Bloomington
Wells Library W501
(812) 856-5759
www.dlib.indiana.edu

Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com