Re: [CODE4LIB] regexp for LCC?
At one point, much to my surprise, someone told me that 050 is defined for numbers assigned by LC not for LCC numbers per se. It doesn't really sound like that from the current definition (http://www.loc.gov/marc/bibliographic/bd050.html), but if you look on the ITS page (http://www.itsmarc.com/crs/edit7592.htm), which I think is not up-to-date, you'll see a discussion of Pseudo call numbers and other forms of LC call numbers As someone pointed out, only a very few classes start with three letters (off the top of my head; a couple in D and a number in K; see http://library.duke.edu/services/instruction/libraryguide/lcclass.html, but there are more in K than are listed here). The pseudo or shelf numbers I've seen most often in 050 are MLC and SD (which unfortunately is the same as the class for forestry). Look for SD on musical recording records (it used to really mess up the attempts of the catalog where I used to work to facet music CDs on LC class; there were a few other common ones, but I've forgotten). Depending what you're doing, you might try to prefer a call number in 090 if there is one. These are more likely to reflect local preference. Looking up 090 (http://www.oclc.org/bibformats/en/0xx/090.shtm) produced some other examples of non-LCC 050's: PAR, Newspaper, UNC, or NOT IN LC. Good luck! Kelley *** Except now I wonder if those annoying MLCS call numbers might actually be properly MATCHED by this regex, when I need em excluded. They are annoying _similar_ to a classified call number. Well, one way to find out. And the reason this matters is to try and use an LCC to map to a 'discipline' or other broad category, either directly from the LCC schedule labels, or using a mapping like umich's: http://www.lib.umich.edu/browse/categories/ But if it's not really an LCC at all, and you try to map it, you'll get bad postings. On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: Thanks, that looks good! It's hosted on Google Code, but I don't think that code is anything Google uses, it looks like it's from our very own Bill Dueber. On 3/31/2011 12:38 PM, Tod Olson wrote: Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not]. Tod Olsont...@uchicago.edu Systems Librarian University of Chicago Library
[CODE4LIB] regexp for LCC?
Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not].
Re: [CODE4LIB] regexp for LCC?
Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not]. Tod Olson t...@uchicago.edu Systems Librarian University of Chicago Library
Re: [CODE4LIB] regexp for LCC?
Thanks, that looks good! It's hosted on Google Code, but I don't think that code is anything Google uses, it looks like it's from our very own Bill Dueber. On 3/31/2011 12:38 PM, Tod Olson wrote: Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not]. Tod Olsont...@uchicago.edu Systems Librarian University of Chicago Library
Re: [CODE4LIB] regexp for LCC?
Except now I wonder if those annoying MLCS call numbers might actually be properly MATCHED by this regex, when I need em excluded. They are annoying _similar_ to a classified call number. Well, one way to find out. And the reason this matters is to try and use an LCC to map to a 'discipline' or other broad category, either directly from the LCC schedule labels, or using a mapping like umich's: http://www.lib.umich.edu/browse/categories/ But if it's not really an LCC at all, and you try to map it, you'll get bad postings. On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: Thanks, that looks good! It's hosted on Google Code, but I don't think that code is anything Google uses, it looks like it's from our very own Bill Dueber. On 3/31/2011 12:38 PM, Tod Olson wrote: Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not]. Tod Olsont...@uchicago.edu Systems Librarian University of Chicago Library
Re: [CODE4LIB] regexp for LCC?
The Google Code regex looks like it will accept any 1-3 letters at the start of the call number. But LCC has no I, O, W, X, or Y classifications. So you might want to use something more like ^[A-HJ-NP-VZ] at the start of the regex. Also, there are only a few major classifications that use three letters. Like DJK, and several in the Ks. I'm not sure, but there might be others. Keith On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Except now I wonder if those annoying MLCS call numbers might actually be properly MATCHED by this regex, when I need em excluded. They are annoying _similar_ to a classified call number. Well, one way to find out. And the reason this matters is to try and use an LCC to map to a 'discipline' or other broad category, either directly from the LCC schedule labels, or using a mapping like umich's: http://www.lib.umich.edu/browse/categories/ But if it's not really an LCC at all, and you try to map it, you'll get bad postings. On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: Thanks, that looks good! It's hosted on Google Code, but I don't think that code is anything Google uses, it looks like it's from our very own Bill Dueber. On 3/31/2011 12:38 PM, Tod Olson wrote: Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not]. Tod Olsont...@uchicago.edu Systems Librarian University of Chicago Library
Re: [CODE4LIB] regexp for LCC?
Hi Jonathan, Although designed for a different purpose, you might want to take a look at the regex in the LC call number sorting utilities on this page: http://rocky.uta.edu/doran/sortlc/ Note that unparsable call numbers printed to STDERR with error message. So you could run it against a list containing valid and MLC call numbers and see which ones end up where, refine regexp, retry, rinse, and repeat. If you make significant (or any) improvements to the regexp being used, I'd be delighted to incorporate it back into those LC sort utilities. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Thursday, March 31, 2011 11:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] regexp for LCC? Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not].
Re: [CODE4LIB] regexp for LCC?
You could also try to use the code I put in SolrMarc utilities classes ha ha ha. - Naomi On Mar 31, 2011, at 10:25 AM, Keith Jenkins wrote: The Google Code regex looks like it will accept any 1-3 letters at the start of the call number. But LCC has no I, O, W, X, or Y classifications. So you might want to use something more like ^[A-HJ-NP-VZ] at the start of the regex. Also, there are only a few major classifications that use three letters. Like DJK, and several in the Ks. I'm not sure, but there might be others. Keith On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Except now I wonder if those annoying MLCS call numbers might actually be properly MATCHED by this regex, when I need em excluded. They are annoying _similar_ to a classified call number. Well, one way to find out. And the reason this matters is to try and use an LCC to map to a 'discipline' or other broad category, either directly from the LCC schedule labels, or using a mapping like umich's: http://www.lib.umich.edu/browse/categories/ But if it's not really an LCC at all, and you try to map it, you'll get bad postings. On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: Thanks, that looks good! It's hosted on Google Code, but I don't think that code is anything Google uses, it looks like it's from our very own Bill Dueber. On 3/31/2011 12:38 PM, Tod Olson wrote: Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not]. Tod Olsont...@uchicago.edu Systems Librarian University of Chicago Library