Re: Caution Newb Toes The Dirt...

Ed Summers Thu, 13 Nov 2003 10:51:18 -0800

Morbus:

On Thu, Nov 13, 2003 at 11:57:03AM -0500, Morbus Iff wrote:
>  * which MARC output format does MARC::Record create? In various
>    of the demo oss4lib softwares I've seen, there's almost always
>    a "show as MARC" option, but I always get a cutesy rendered
>    HTML table, and not the actual ASCII record. Does MARC::Record
>    read them all? Should I really care what the output format is?
>    Is MARC never truly intended for human-reading (you'd think,
>    since it's "MAchine Readable" <g>)?


MARC::Record creates/reads MARC21 in transmission format (Z39.2). It is not
human readable, at least by normal people :) MARC::Record can also read/write
MicroLIF which is a little used human readable variant. MARC::XML extends
MARC::Record to be able to read MARC encoded as XML using the Library 
of Congress' XML schema [1]

>  * There seems to a number of other ways to index things - OAI,
>    DublinCore, etc., etc. Did these arise to replace MARC due
>    to limitations in the format? Are any of them being used
>    for new, non-legacy, applications instead of MARC? I've seen
>    various converters for the different formats to and from MARC.
>    Can I "go wrong" if I ALWAYS baseline to MARC and then convert
>    to a more "futuristic" standard (like the MARC XML export,
>    DublinCore, etc., etc.).

OAI is a protocol for exchanging any kind of metadata that can be 
serialized as XML. It is particularly useful if you've got a database
of metadata about digital objects and you would like to make the data
available to harvesting applications. DublinCore was spearheaded by OCLC
basically as a streamlined, easier to use metadata standard for objects
on the web. You're right there are tons of 'crosswalks' between all these
different standards. It's quite an acronym stew. The main thing to realize
about MARC is that it is an interchange format. Most applications will
rip the MARC apart to store it in a database. 

You will mainly find MARC useful if you want to take advantage of the 
wealth of 'cataloging' that has already been done. Cooperative cataloging
is kind of like what CDDB is to mp3 music players. Everyone collaborates to 
generate a metadata database, so that people can see titles, artist names,
etc, without having to enter the stuff themself. If you want to build a 
database of bibliographic data you'll probably just want to grab a record
using Net::Z3950 and MARC::Record and an ISBN.

Being able to target MARC is always a good idea. But perhaps (most likely)
it is overkill for the application you are talking about...but perhaps not.
I'm not exactly sure I understand what you want to build.

>  * ISBN or LOC? Which should I use? ISBN gives me the benefit
>    of easy conversion to Amazon's ASIN for more metadata (and
>    even MORE metadata by linking off to iMDB for movie data),
>    but my understanding was that ISBNs aren't as unique as
>    they were supposed to be (changing from edition to edition,
>    publisher to reprint, etc., etc.).

Use the ISBN. Like you said, it's more ubiquitous.

>  * As old-timers, do you EVER have a moment where you're like
>    "man, what was that indicator number again?". Would initially-
>    hidden-but-expandable-inline-help be worth doing (for me,
>    it would)?

Oh yes. One of the reasons why DublinCore is so nice is it is human readable.
Those numeric tags and indicators are crazy. Having an annotation feature
would be very nice.

>  * As old-timers, what sort of input mechanism are you already
>    used to? Do you type tag numbers by hand, tabbing through
>    fields? When you're filling in indicators, are they two different
>    fields ("tab, type, tab, type, tab") or one ("tab, type, tab").
>    What order do you input the information? Numerical by tag?
>    Bibliographic, Subject, Physical Description, etc.?

In reality the success of cooperative cataloging means that people do very 
little record creation, mainly record editing (adding local tags). Normally
LC has already cataloged the book and made it available. Libraries still do
'original cataloging'. I used to enter records into a OCLC green screen 
where you are basically given a blank slate to write on freeform, which is 
then error checked before commiting.

> Ultimately, I want the iMDB. I want to be able to find how many books I 
> have that were published in Timbuktu, how many DVDs I have that were 
> catered by Pizza Hut, and how many serials I have that showed a 30% 
> increase in page numbers during the holiday season. When I've been looking 
> around at prebuilt software, if it passes the interface/updated test, I try 
> to hunt around for the SQL schema to see how it's built. I've seen very few 
> that allow me to do this (biggest gripe: an "authors" field in the book 
> table, as opposed to an "authors" table with a book_id, same with 
> "publisher", etc.).

I hear you. Right now the closest thing we have to the iMDB is Z39.50. 

> >>     output formating? Is anyone interested in a MARC::Simple
> >>     sort of module, that would "use English"-ize all the tags
> >>     themselves ($record->author_name("Logan, Robert K.") and
> >>     $record->author_date("1939-"), which would just be wrappers
> >>     around MARC::Field and the relevant tag numbers).

You will notice that some of these already exist. author(), title(), edition(),
publication_date(). I think it could be worthwhile to extend these. What
did you have in mind? Remember doing this for all the MARC tags is probably 
not a sane thing to do :)

> MARC::Record was what I meant - sorry. I thought "MARC.pm" was the generic 
> name for "MARC crap in Perl" since the Sourceforge project name was 
> "marcpm". With that in mind, would a MARC::Simple of the MARC::Record 
> distribution be worth having to anyone but newbs like me? <G>

If you are talking about adding some accessor functions I would favor just
adding them to MARC::Record. Want to become a developer? It's as simple
as creating a souceforge account, and signing up for duty.

>  * I read a lot of stuff.
>  * I never throw anything away.
>  * The stuff I read is .. . . "odd", in the sense that no library
>    would probably stock it, nor would any existing librarian
>    take the time to index it in MARC, etc., etc.

You might be suprised, librarians have been creating metadata about all sorts 
of information objects for a while now. The question is, is it available 
digitally, and is it in the right format. The wealth of bibliographic
information available in electronic form by institutions like LC and OCLC
is not well publicized. For an interesting and informed outsider perspective
on this see Tim Bray's brief piece about OCLC's worldcat. [2]

> And the pie-in-the-sky wide-eyed wonder-boy:
> 
>  * If indexing things the "right" way was easy, I'd be able to download
>    freely available MARC records with indexes for RUE MORGUE, VIDEO
>    WATCHDOG, FORTEAN TIMES, and TOYFARE, and I wouldn't have to worry
>    about building my own.

See Net::Z3950. I would be suprised if these have not been cataloged by LC, 
and should be available from their Z39.50 server. I looked for FORTEAN TIMES
and found it.

>  * Building something that works for me, but removes all the
>    scary terms and long process for end-users who just like toys
>    or horror movies.

Yes, I think you may have something there. 

> Certainly, I realize it's a pipe dream: very few people I know who like toys
> and horror movies would ever be insane enough to index things as explicitly
> as I'd like, but I envision a roving database, where anyone can fill in
> certain bits of information, anyone can verify, improve, or correct that
> information, and it all streams down into personal databases or some
> iMDB-like master web site. Again, iMDB is the model I'm shooting for. I know
> nothing about film stock, and I don't need to: someone else can fill that
> in. But if I'm a fan of Debbie Rochon and I know she made a cameo in CRAZY
> MONKIES 7: FLINGING POO OLD STYLE, I can add that and move on.

I like the iMDB concept. It's funny the library world pioneered cooperative
cataloging when the Web was just a glimmer in Tim Berners-Lee's eye. As 
a result a lot of the technologies used today seem arcane and overly complex. 
I think you are talking about building the MARC killer app for the everyday
person, and I like what I'm hearing. 

//Ed

[1] http://www.loc.gov/standards/marcxml/
[2] http://www.tbray.org/ongoing/When/200x/2003/05/08/OCLC

Re: Caution Newb Toes The Dirt...

Reply via email to