Quoting "Beacom, Matthew" <matthew.bea...@yale.edu>:


According to the report, 69 MARC tags occur in more than 1% of the records in WorldCat. That is quite a few more than the Roy's 11, but even accounting for Karen's data elements being equivalent to the number of MARC sub-fields this is far fewer than the 1,000 data elements available to a cataloger in MARC.

So much depends on how you count things, so at the http://kcoyle.net/rda/ site I have put two MARC-related files. The first is just a list of "elements" (variable subfields) in alpha order with duplicates removed. Yes, I realize how imperfect this is, and that we will need to look beyond names to *meaning* of elements to determine what we really have. This file does not include indicators, and sometimes indicators really do create a separate element, like when person name becomes "Family" based on its indicator.

That file has over 560 entries.

The next file probably needs some more thought, but it is a list of the variable field indicators and subfields, leaving in subfields that are duplicated in different fields. I removed some of the numeric subfields that didn't seem to result in an actual elements (2, 3, 5, 6, 8), but could be wrong about that. I also did not include indicators that are = "Undefined". We can debate whether a personal name in an added entry is the same element as a personal name in a subject heading, and similarly for the various places where geographic names are used, titles, etc etc etc. This is the analysis that is needed to reduce MARC21 to a cleaner set of data elements.

That file has 1421 entries.

Neither of these contains any of the fixed field elements (many of which, IMO, should replace textual elements now carried in MARC21). When I looked at the fixed fields (and this is reported at http://futurelib.pbworks.com/Data+and+Studies), I came up with this count of *unique* fixed field elements (each with multiple values):

008 - 58
007 - 55

Each one of these should become a controlled value list in a SemWeb implementation of MARC. RDA appears to have a total of 68 defined value lists, but I don't believe that those include ones defined elsewhere, such as languages, country codes, etc.

kc

p.s. linked from that same page is the file I am using for this analysis, in CSV format, if anyone else wants to play with it. I have tried to keep it up to date with MARBI proposals.


Matthew Beacom


By the way, the descriptive fields used in more than 20% of the MARC records in WorldCat are:

245 Title statement 100%
260 Imprint statement 96%
300 Physical description 91%
100 Main entry - personal name 61%
650 Subject added entry - topical term 46%
500 General note 44%
700 Added entry - personal name 28%

They answer, more or less, a few basic questions a user might have about the material: What is it called? Who made it? When was it made? How big is it? What is it about? Answers to the question, How can I get it? are usually given in the associated MARC holdings record.


-----Original Message-----
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Roy Tennant
Sent: Monday, May 03, 2010 2:15 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MODS and DCTERMS

I would even argue with the statement "very detailed, well over 1,000
different data elements, some well-coded data (not all)". There are only 11
(yes, eleven) MARC fields that appear in 20% or more of MARC records
currently in WorldCat[1], and at least three of those elements are control
numbers or other elements that contribute nothing to actual description. I
would say overall that we would do well to not gloat about our metadata
until we've reviewed the facts on the ground. Luckily, now we can.
Roy

[1] http://www.oclc.org/research/publications/library/2010/2010-06.pdf

On Mon, May 3, 2010 at 11:03 AM, Eric Lease Morgan <emor...@nd.edu> wrote:

On May 3, 2010, at 1:55 PM, Karen Coyle wrote:

> 1. MARC the data format -- too rigid, needs to go away
> 2. MARC21 bib data -- very detailed, well over 1,000 different data
> elements, some well-coded data (not all); unfortunately trapped in #1



The differences between the two points enumerated above, IMHO, seem to be
the at the heart of the never-ending debate between computer types and
cataloger types when it comes to library metadata. The non-library computer
types don't appreciate the value of human-aided systematic description. And
the cataloger types don't understand why MARC is a really terrible bit
bucket, especially considering the current environment. All too often the
two "camps" don't know to what the other is speaking. "MARC must die. Long
live MARC."

--
Eric Lease Morgan





--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234 begin_of_the_skype_highlighting              1-510-435-8234      end_of_the_skype_highlighting
skype: kcoylenet

Reply via email to