You could also try to use the code I put in SolrMarc utilities classes ha ha ha.

- Naomi

On Mar 31, 2011, at 10:25 AM, Keith Jenkins wrote:

The Google Code regex looks like it will accept any 1-3 letters at the
start of the call number.  But LCC has no I, O, W, X, or Y
classifications.

So you might want to use something more like ^[A-HJ-NP-VZ] at the
start of the regex.

Also, there are only a few major classifications that use three
letters.  Like DJK, and several in the Ks.  I'm not sure, but there
might be others.

Keith


On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote:
Except now I wonder if those annoying MLCS call numbers might actually be properly MATCHED by this regex, when I need em excluded. They are annoying
_similar_ to a classified call number. Well, one way to find out.

And the reason this matters is to try and use an LCC to map to a
'discipline' or other broad category, either directly from the LCC schedule
labels, or using a mapping like umich's:
http://www.lib.umich.edu/browse/categories/

But if it's not really an LCC at all, and you try to map it, you'll get bad
postings.

On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:

Thanks, that looks good!

It's hosted on Google Code, but I don't think that code is anything
"Google uses", it looks like it's from our very own Bill Dueber.

On 3/31/2011 12:38 PM, Tod Olson wrote:

Check the regexp that Google uses in their call number normalization:

       http://code.google.com/p/library-callnumber-lc/wiki/Home

You may want to remove the prefix part, and allow for a fourth cutter.

The folks at UNC pointed me to this a few months ago.

-Tod

On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:

Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC
Classified Schedule?

In particular, I need it to NOT match an "MLC" call number, which is an
LC assigned call number that shows up in an 050 with no way to
distinguish based on indicators, but isn't actually from the LC
Schedules.  Here's an example of an "MLC" call number:

"MLCS 83/5180 (P)"

Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
exclude them just like that. But it looks like there are also OTHER
things that can show up in the 050 but aren't actually from the
classified schedule, the OCLC documentation even contains an example of
"Microfilm 19072 E".

What a mess, huh?  So, yeah, regex anyone?

[You can probably guess why I care if it's from the LC Classified
Schedule or not].

Tod Olson<t...@uchicago.edu>
Systems Librarian
University of Chicago Library


Reply via email to