Just as a historical note, this non-standard use of LDR/22 is likely due to 
OCLC's use of the character as a hexadecimal flag from back in the days when 
marc records were mostly schlepped around on tapes.  They referred to it as the 
"Transaction type code".  When records were sent to oclc for processing, 
various values of the flag indicated whether a catalog card was to be produced, 
whether the record was an update, whether the user location symbol was to be 
set, etc.  I'm sure others have used it for their own nefarious purposes as 
well.

Tim Prettyman
University of Michigan/LIT

On Apr 6, 2011, at 12:28 PM, Ford, Kevin wrote:

> Well, this brings us right up against the issue of files that adhere to their 
> specifications versus forgiving applications.  Think of browsers and HTML.  
> Suffice it to say, MARC applications are quite likely to be forgiving of 
> leader positions 20-23.  In my non-conforming MARC file and in Bill's, the 
> leader positions 20-21 ("45") seemed constant, but things could fall apart 
> for positions 22-23.  So...
> 
> I present the following (in-line and attached, to preserve tabs) in an 
> attempt to straddle the two sides of this issue: applications forgiving of 
> non-conforming files.  Should the two characters following 45 (at position 
> 20) *not* be 00, then the identification will be noted as "non-conforming."  
> We could classify this as reasonable identification but hardly ironclad 
> (indeed, simply checking to confirm that part of the first 24 positions match 
> the specification hardly constitutes a robust identification, but it's 
> something).
> 
> It will also give you a mimetype too, now.
> 
> Would any like testing it out more fully on their own files?
> 
> 
> #--------------------------------------------
> # MARC 21 Magic  (Third cut)
> 
> # Set at position 0
> 0     byte    x       
> 
> # leader position 20-21 must be 45
>> 20   string  45      
> 
> # leader starts with 5 digits, followed by codes specific to MARC format
>>> 0   regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z]  MARC Bibliographic
> !:mime        application/marc
>>> 0   regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority
> !:mime        application/marc
>>> 0   regex/1 (^[0-9]{5})[cdn][uvxy]  MARC Holdings
> !:mime        application/marc
>>> 0   regex/1 (^[0-9]{5})[acdn][w]    MARC Classification
> !:mime        application/marc
>>> 0   regex/1 (^[0-9]{5})[cdn][q]     MARC Community
> !:mime        application/marc
> 
> # leader position 22-23, should be "00" but is it?
>>> 0   regex/1 (^.{21})([^0]{2})       (non-conforming)
> !:mime        application/marc
> 
> 
> If this works, I'll see about submitting this copy.  Thanks to all your 
> efforts already.
> 
> Warmly,
> 
> Kevin
> 
> --
> Library of Congress
> Network Development and MARC Standards Office
> 
> 
> 
> 
> 
> ________________________________________
> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero 
> [s...@unc.edu]
> Sent: Sunday, April 03, 2011 14:01
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] MARC magic for file
> 
> I am pretty sure that the marc4j standard reader ignores them; the tolerant
> reader definitely does. Otherwise JHU might have about two parseable records
> based on the mangled leaders that J-Rock  gets stuck with :-)
> 
> An analysis of the ~7M LC bib records from the scriblio.net data files (~
> Dec 2006) indicated that leader  has less than 8 bits of information in it
> (shannon-weaver definition). This excludes the initial length value, which
> is redundant given the end of record marker.
> 
> 
> The LC V'GER adds a pseudo tag 000 to it's HTML view of the MARC leader.
> The final characters of the leader are "450".
> 
> Also, I object to the phrase "decent MARC tool".  Any tool capable of
> dealing with MARC as it exists cannot afford the luxury of decency :-)
> 
> [ HA: "A clear conscience?"
> BW: "Yes, Sir Humphrey."
> HA: "When did you acquire this taste for luxuries?"]
> 
> Simon
> 
> On Fri, Apr 1, 2011 at 5:16 AM, Owen Stephens <o...@ostephens.com> wrote:
> 
>> "I'm sure any decent MARC tool can deal with them, since decent MARC tools
>> are certainly going to be forgiving enough to deal with four characters
>> that
>> apparently don't even really matter."
>> 
>> You say that, but I'm pretty sure Marc4J throws errors MARC records where
>> these characters are incorrect
>> 
>> Owen
>> 
>> On Fri, Apr 1, 2011 at 3:51 AM, William Denton <w...@pobox.com> wrote:
>> 
>>> On 28 March 2011, Ford, Kevin wrote:
>>> 
>>> I couldn't get Simon's MARC 21 Magic file to work.  Among other issues,
>> I
>>>> received "line too long" errors.  But, since I've been curious about
>> this
>>>> for sometime, I figured I'd take a whack at it myself.  Try this:
>>>> 
>>> 
>>> This is very nice!  Thanks.  I tried it on a bunch of MARC files I have,
>>> and it recognized almost all of them.  A few it didn't, so I had a closer
>>> look, and they're invalid.
>>> 
>>> For example, the Internet Archive's Binghamton catalogue dump:
>>> 
>>> http://ia600307.us.archive.org/6/items/marc_binghamton_univ/
>>> 
>>> $ file -m marc.magic bgm*mrc
>>> bgm_openlib_final_0-5.mrc:         data
>>> bgm_openlib_final_10-15.mrc:       MARC Bibliographic
>>> bgm_openlib_final_15-18.mrc:       data
>>> bgm_openlib_final_5-10.mrc:        MARC Bibliographic
>>> 
>>> But why?  Aha:
>>> 
>>> $ head -c 25 bgm_openlib_final_*mrc
>>> ==> bgm_openlib_final_0-5.mrc <==
>>> 01812cas  2200457   45x00
>>> ==> bgm_openlib_final_10-15.mrc <==
>>> 01008nam  2200289ua 45000
>>> ==> bgm_openlib_final_15-18.mrc <==
>>> 01614cam    00385   45  0
>>> ==> bgm_openlib_final_5-10.mrc <==
>>> 00887nam  2200265v  45000
>>> 
>>> As you say, the leader should end with 4500 (as defined at
>>> http://www.loc.gov/marc/authority/adleader.html) but two of those files
>>> don't.  So they're not valid MARC.  I'm sure any decent MARC tool can
>> deal
>>> with them, since decent MARC tools are certainly going to be forgiving
>>> enough to deal with four characters that apparently don't even really
>>> matter.
>>> 
>>> So on the one hand they're usable MARC but file wouldn't say so, and on
>> the
>>> other that's a good indication that the files have failed a basic
>> validity
>>> test.  I wonder if there are similar situations for JPEGs or MP3s.
>>> 
>>> I think you should definitely submit this for inclusion in the magic
>> file.
>>> It would be very useful for us all!
>>> 
>>> Bill
>>> 
>>> P.S. I'd never used head -c (to show a fixed number of bytes) before.
>>> Always nice to find a new useful option to an old command.
>>> 
>>> 
>>> #--------------------------------------------
>>>> # MARC 21 Magic  (Second cut)
>>>> 
>>>> # Set at position 0
>>>> 0       short   >0x0000
>>>> 
>>>> # leader ends with 4500
>>>> 
>>>>> 20      string  4500
>>>>> 
>>>> 
>>>> # leader starts with 5 digits, followed by codes specific to MARC format
>>>> 
>>>>> 0       regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z]  MARC Bibliographic
>>>>>> 0       regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority
>>>>>> 0       regex/1 (^[0-9]{5})[cdn][uvxy]  MARC Holdings
>>>>>> 0       regex/1 (^[0-9]{5})[acdn][w]    MARC Classification
>>>>>> 0       regex/1 (^[0-9]{5})[cdn][q]     MARC Community
>>>>>> 
>>>>> 
>>> 
>>> --
>>> William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
>>> 
>> 
>> 
>> 
>> --
>> Owen Stephens
>> Owen Stephens Consulting
>> Web: http://www.ostephens.com
>> Email: o...@ostephens.com
>> 
> <marc.magic>

Reply via email to