Re: [CODE4LIB] regexp for LCC?

2011-04-01 Thread Kelley McGrath
At one point, much to my surprise, someone told me that 050 is defined for
numbers assigned by LC not for LCC numbers per se. It doesn't really sound
like that from the current definition
(http://www.loc.gov/marc/bibliographic/bd050.html), but if you look on the
ITS page (http://www.itsmarc.com/crs/edit7592.htm), which I think is not
up-to-date, you'll see a discussion of Pseudo call numbers and other forms
of LC call numbers

As someone pointed out, only a very few classes start with three letters
(off the top of my head; a couple in D and a number in K; see
http://library.duke.edu/services/instruction/libraryguide/lcclass.html, but
there are more in K than are listed here).

The pseudo or shelf numbers I've seen most often in 050 are MLC and SD
(which unfortunately is the same as the class for forestry). Look for SD on
musical recording records (it used to really mess up the attempts of the
catalog where I used to work to facet music CDs on LC class; there were a
few other common ones, but I've forgotten).

Depending what you're doing, you might try to prefer a call number in 090 if
there is one. These are more likely to reflect local preference.

Looking up 090 (http://www.oclc.org/bibformats/en/0xx/090.shtm) produced
some other examples of non-LCC 050's: PAR, Newspaper, UNC, or NOT IN LC.

Good luck!

Kelley

***
Except now I wonder if those annoying MLCS call numbers might actually be
properly MATCHED by this regex, when I need em excluded. They are annoying
_similar_ to a classified call number. Well, one way to find out.

And the reason this matters is to try and use an LCC to map to a
'discipline' or other broad category, either directly from the LCC schedule
labels, or using a mapping like umich's:
http://www.lib.umich.edu/browse/categories/

But if it's not really an LCC at all, and you try to map it, you'll get bad
postings.

On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:

 Thanks, that looks good!

 It's hosted on Google Code, but I don't think that code is anything 
 Google uses, it looks like it's from our very own Bill Dueber.

 On 3/31/2011 12:38 PM, Tod Olson wrote:

 Check the regexp that Google uses in their call number normalization:

        http://code.google.com/p/library-callnumber-lc/wiki/Home

 You may want to remove the prefix part, and allow for a fourth cutter.

 The folks at UNC pointed me to this a few months ago.

 -Tod

 On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:

 Does anyone have a good regular expression that will match all legal 
 LC Call Numbers from the LC Classified Schedule, but will generally 
 not match things that could not possibly be an LC Call Number from 
 the LC Classified Schedule?

 In particular, I need it to NOT match an MLC call number, which is 
 an LC assigned call number that shows up in an 050 with no way to 
 distinguish based on indicators, but isn't actually from the LC 
 Schedules.  Here's an example of an MLC call number:

 MLCS 83/5180 (P)

 Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can 
 exclude them just like that. But it looks like there are also OTHER 
 things that can show up in the 050 but aren't actually from the 
 classified schedule, the OCLC documentation even contains an example 
 of Microfilm 19072 E.

 What a mess, huh?  So, yeah, regex anyone?

 [You can probably guess why I care if it's from the LC Classified 
 Schedule or not].

 Tod Olsont...@uchicago.edu
 Systems Librarian
 University of Chicago Library



[CODE4LIB] regexp for LCC?

2011-03-31 Thread Jonathan Rochkind
Does anyone have a good regular expression that will match all legal LC 
Call Numbers from the LC Classified Schedule, but will generally not 
match things that could not possibly be an LC Call Number from the LC 
Classified Schedule?


In particular, I need it to NOT match an MLC call number, which is an 
LC assigned call number that shows up in an 050 with no way to 
distinguish based on indicators, but isn't actually from the LC 
Schedules.  Here's an example of an MLC call number:


MLCS 83/5180 (P)

Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can 
exclude them just like that. But it looks like there are also OTHER 
things that can show up in the 050 but aren't actually from the 
classified schedule, the OCLC documentation even contains an example of 
Microfilm 19072 E.


What a mess, huh?  So, yeah, regex anyone?

[You can probably guess why I care if it's from the LC Classified 
Schedule or not].


Re: [CODE4LIB] regexp for LCC?

2011-03-31 Thread Tod Olson
Check the regexp that Google uses in their call number normalization:

http://code.google.com/p/library-callnumber-lc/wiki/Home

You may want to remove the prefix part, and allow for a fourth cutter.

The folks at UNC pointed me to this a few months ago.

-Tod

On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:

 Does anyone have a good regular expression that will match all legal LC 
 Call Numbers from the LC Classified Schedule, but will generally not 
 match things that could not possibly be an LC Call Number from the LC 
 Classified Schedule?
 
 In particular, I need it to NOT match an MLC call number, which is an 
 LC assigned call number that shows up in an 050 with no way to 
 distinguish based on indicators, but isn't actually from the LC 
 Schedules.  Here's an example of an MLC call number:
 
 MLCS 83/5180 (P)
 
 Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can 
 exclude them just like that. But it looks like there are also OTHER 
 things that can show up in the 050 but aren't actually from the 
 classified schedule, the OCLC documentation even contains an example of 
 Microfilm 19072 E.
 
 What a mess, huh?  So, yeah, regex anyone?
 
 [You can probably guess why I care if it's from the LC Classified 
 Schedule or not].

Tod Olson t...@uchicago.edu
Systems Librarian
University of Chicago Library


Re: [CODE4LIB] regexp for LCC?

2011-03-31 Thread Jonathan Rochkind

Thanks, that looks good!

It's hosted on Google Code, but I don't think that code is anything 
Google uses, it looks like it's from our very own Bill Dueber.


On 3/31/2011 12:38 PM, Tod Olson wrote:

Check the regexp that Google uses in their call number normalization:

http://code.google.com/p/library-callnumber-lc/wiki/Home

You may want to remove the prefix part, and allow for a fourth cutter.

The folks at UNC pointed me to this a few months ago.

-Tod

On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:


Does anyone have a good regular expression that will match all legal LC
Call Numbers from the LC Classified Schedule, but will generally not
match things that could not possibly be an LC Call Number from the LC
Classified Schedule?

In particular, I need it to NOT match an MLC call number, which is an
LC assigned call number that shows up in an 050 with no way to
distinguish based on indicators, but isn't actually from the LC
Schedules.  Here's an example of an MLC call number:

MLCS 83/5180 (P)

Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
exclude them just like that. But it looks like there are also OTHER
things that can show up in the 050 but aren't actually from the
classified schedule, the OCLC documentation even contains an example of
Microfilm 19072 E.

What a mess, huh?  So, yeah, regex anyone?

[You can probably guess why I care if it's from the LC Classified
Schedule or not].

Tod Olsont...@uchicago.edu
Systems Librarian
University of Chicago Library



Re: [CODE4LIB] regexp for LCC?

2011-03-31 Thread Jonathan Rochkind
Except now I wonder if those annoying MLCS call numbers might actually 
be properly MATCHED by this regex, when I need em excluded. They are 
annoying _similar_ to a classified call number. Well, one way to find out.


And the reason this matters is to try and use an LCC to map to a 
'discipline' or other broad category, either directly from the LCC 
schedule labels, or using a mapping like umich's: 
http://www.lib.umich.edu/browse/categories/


But if it's not really an LCC at all, and you try to map it, you'll get 
bad postings.


On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:

Thanks, that looks good!

It's hosted on Google Code, but I don't think that code is anything
Google uses, it looks like it's from our very own Bill Dueber.

On 3/31/2011 12:38 PM, Tod Olson wrote:

Check the regexp that Google uses in their call number normalization:

http://code.google.com/p/library-callnumber-lc/wiki/Home

You may want to remove the prefix part, and allow for a fourth cutter.

The folks at UNC pointed me to this a few months ago.

-Tod

On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:


Does anyone have a good regular expression that will match all legal LC
Call Numbers from the LC Classified Schedule, but will generally not
match things that could not possibly be an LC Call Number from the LC
Classified Schedule?

In particular, I need it to NOT match an MLC call number, which is an
LC assigned call number that shows up in an 050 with no way to
distinguish based on indicators, but isn't actually from the LC
Schedules.  Here's an example of an MLC call number:

MLCS 83/5180 (P)

Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
exclude them just like that. But it looks like there are also OTHER
things that can show up in the 050 but aren't actually from the
classified schedule, the OCLC documentation even contains an example of
Microfilm 19072 E.

What a mess, huh?  So, yeah, regex anyone?

[You can probably guess why I care if it's from the LC Classified
Schedule or not].

Tod Olsont...@uchicago.edu
Systems Librarian
University of Chicago Library



Re: [CODE4LIB] regexp for LCC?

2011-03-31 Thread Keith Jenkins
The Google Code regex looks like it will accept any 1-3 letters at the
start of the call number.  But LCC has no I, O, W, X, or Y
classifications.

So you might want to use something more like ^[A-HJ-NP-VZ] at the
start of the regex.

Also, there are only a few major classifications that use three
letters.  Like DJK, and several in the Ks.  I'm not sure, but there
might be others.

Keith


On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Except now I wonder if those annoying MLCS call numbers might actually be
 properly MATCHED by this regex, when I need em excluded. They are annoying
 _similar_ to a classified call number. Well, one way to find out.

 And the reason this matters is to try and use an LCC to map to a
 'discipline' or other broad category, either directly from the LCC schedule
 labels, or using a mapping like umich's:
 http://www.lib.umich.edu/browse/categories/

 But if it's not really an LCC at all, and you try to map it, you'll get bad
 postings.

 On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:

 Thanks, that looks good!

 It's hosted on Google Code, but I don't think that code is anything
 Google uses, it looks like it's from our very own Bill Dueber.

 On 3/31/2011 12:38 PM, Tod Olson wrote:

 Check the regexp that Google uses in their call number normalization:

        http://code.google.com/p/library-callnumber-lc/wiki/Home

 You may want to remove the prefix part, and allow for a fourth cutter.

 The folks at UNC pointed me to this a few months ago.

 -Tod

 On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:

 Does anyone have a good regular expression that will match all legal LC
 Call Numbers from the LC Classified Schedule, but will generally not
 match things that could not possibly be an LC Call Number from the LC
 Classified Schedule?

 In particular, I need it to NOT match an MLC call number, which is an
 LC assigned call number that shows up in an 050 with no way to
 distinguish based on indicators, but isn't actually from the LC
 Schedules.  Here's an example of an MLC call number:

 MLCS 83/5180 (P)

 Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
 exclude them just like that. But it looks like there are also OTHER
 things that can show up in the 050 but aren't actually from the
 classified schedule, the OCLC documentation even contains an example of
 Microfilm 19072 E.

 What a mess, huh?  So, yeah, regex anyone?

 [You can probably guess why I care if it's from the LC Classified
 Schedule or not].

 Tod Olsont...@uchicago.edu
 Systems Librarian
 University of Chicago Library




Re: [CODE4LIB] regexp for LCC?

2011-03-31 Thread Doran, Michael D
Hi Jonathan,

Although designed for a different purpose, you might want to take a look at the 
regex in the LC call number sorting utilities on this page: 
http://rocky.uta.edu/doran/sortlc/

Note that unparsable call numbers printed to STDERR with error message.  So you 
could run it against a list containing valid and MLC call numbers and see 
which ones end up where,   refine regexp, retry, rinse, and repeat.  If you 
make significant (or any) improvements to the regexp being used, I'd be 
delighted to incorporate it back into those LC sort utilities.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/
 

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Jonathan Rochkind
 Sent: Thursday, March 31, 2011 11:29 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] regexp for LCC?
 
 Does anyone have a good regular expression that will match all legal LC
 Call Numbers from the LC Classified Schedule, but will generally not
 match things that could not possibly be an LC Call Number from the LC
 Classified Schedule?
 
 In particular, I need it to NOT match an MLC call number, which is an
 LC assigned call number that shows up in an 050 with no way to
 distinguish based on indicators, but isn't actually from the LC
 Schedules.  Here's an example of an MLC call number:
 
 MLCS 83/5180 (P)
 
 Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
 exclude them just like that. But it looks like there are also OTHER
 things that can show up in the 050 but aren't actually from the
 classified schedule, the OCLC documentation even contains an example of
 Microfilm 19072 E.
 
 What a mess, huh?  So, yeah, regex anyone?
 
 [You can probably guess why I care if it's from the LC Classified
 Schedule or not].


Re: [CODE4LIB] regexp for LCC?

2011-03-31 Thread Naomi Dushay
You could also try to use the code I put in SolrMarc utilities classes  
ha ha ha.


- Naomi

On Mar 31, 2011, at 10:25 AM, Keith Jenkins wrote:


The Google Code regex looks like it will accept any 1-3 letters at the
start of the call number.  But LCC has no I, O, W, X, or Y
classifications.

So you might want to use something more like ^[A-HJ-NP-VZ] at the
start of the regex.

Also, there are only a few major classifications that use three
letters.  Like DJK, and several in the Ks.  I'm not sure, but there
might be others.

Keith


On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind  
rochk...@jhu.edu wrote:
Except now I wonder if those annoying MLCS call numbers might  
actually be
properly MATCHED by this regex, when I need em excluded. They are  
annoying

_similar_ to a classified call number. Well, one way to find out.

And the reason this matters is to try and use an LCC to map to a
'discipline' or other broad category, either directly from the LCC  
schedule

labels, or using a mapping like umich's:
http://www.lib.umich.edu/browse/categories/

But if it's not really an LCC at all, and you try to map it, you'll  
get bad

postings.

On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:


Thanks, that looks good!

It's hosted on Google Code, but I don't think that code is anything
Google uses, it looks like it's from our very own Bill Dueber.

On 3/31/2011 12:38 PM, Tod Olson wrote:


Check the regexp that Google uses in their call number  
normalization:


   http://code.google.com/p/library-callnumber-lc/wiki/Home

You may want to remove the prefix part, and allow for a fourth  
cutter.


The folks at UNC pointed me to this a few months ago.

-Tod

On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:

Does anyone have a good regular expression that will match all  
legal LC
Call Numbers from the LC Classified Schedule, but will generally  
not
match things that could not possibly be an LC Call Number from  
the LC

Classified Schedule?

In particular, I need it to NOT match an MLC call number,  
which is an

LC assigned call number that shows up in an 050 with no way to
distinguish based on indicators, but isn't actually from the LC
Schedules.  Here's an example of an MLC call number:

MLCS 83/5180 (P)

Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
exclude them just like that. But it looks like there are also  
OTHER

things that can show up in the 050 but aren't actually from the
classified schedule, the OCLC documentation even contains an  
example of

Microfilm 19072 E.

What a mess, huh?  So, yeah, regex anyone?

[You can probably guess why I care if it's from the LC Classified
Schedule or not].


Tod Olsont...@uchicago.edu
Systems Librarian
University of Chicago Library