Re: NACO Normalization and Text::Normalize
On Wed, Aug 27, 2003 at 09:15:25AM -0300, Brian Cassidy wrote: > * normalize() > > inputs: either a MARC::Record object or a string. This should probably > accept an arbitrary number of inputs so, you can do > > my @normrecs = normalize( @records ); > > rather than > > my @normrecs; > foreach my $rec ( @records ) { > push @normrecs, normalize( $rec ); > } > > But you still could if you wanted to. > > Given a M::R object it would do as the rules state [1] for the > appropriate fields in the record. Returns a M::R object. > > Given a string, it would apply the string normalization rules. Returns a > string. > > * compare() > > inputs: either two M::R objects or two strings. > > Given two M::R objects, both are normalize()'ed. It would return false > (or should it be true?) if, based on the rules [1], some field in $a > matches some field in $b. > > Given two strings, both are again normalize()'ed and a simple "cmp" is > performed. I like the idea of a package MARC::Record::NACO which exports the normalize() and compare() functions. My $.02 are that you not overload normalize() and compare() too much, but create different functions, since you'll have the entire MARC::Record::NACO namespace to play with! normalize( $string ); normalize_record( $record, 100, 110, etc ); compare( $string ); compare_record( $record1, $record2, 100, 110, etc ); I know its heresy, but when it comes to designing programs and interfaces I've come to trust an aspect of the Unix philosophy over the Perl philosophy. Unix: Make each program (function) do one thing well. Perl: DWIM (Do What I Mean) I see you've got CPAN modules up there already, but if you need any help with the test suite or anything I would be willing to help out. At any rate, please post to the list if you end up releasing something. //Ed
RE: NACO Normalization and Text::Normalize
From: Brian Cassidy [mailto:[EMAIL PROTECTED] Subject: RE: NACO Normalization and Text::Normalize > * normalize() > > inputs: either a MARC::Record object or a string. This should probably > accept an arbitrary number of inputs so, you can do > * compare() > > inputs: either two M::R objects or two strings. > > Given two M::R objects, both are normalize()'ed. It would return false > (or should it be true?) if, based on the rules [1], some field in $a > matches some field in $b. You may need some additional parameters, like what tags to normalize, since you may want to do NACO normalization on fields other than the 1XX. For example, I currently do NACO normalization on the 1XX, 4XX, 5XX and 7XX in my Authority records. By doing that I can quickly build a hash that allows me to find the broader, narrower, related and use-for references for a record in the entire Authority file. Andy.
RE: NACO Normalization and Text::Normalize
Hi Ed, > I think this is a great idea. At first I was thinking that it would be > nice to be able to pass your normalize() function a MARC::Record object, > which would magically normalize all the relevant fields (like a good > cataloger). This could be a subclass MARC::Record::NACO which adds a new > method normalize(), or if Andy was willing could be added to the > MARC::Record core. > > However, the docs [1] seem to say that it is only possible to determine > how a field should normalize in the context of the collection of records > that it is a part of...and that MARC::Record has no way of determining > this, so perhaps this idea is not on target? Okay, I think you're right that subclassing MARC::Record isn't going to cut the mustard, since MARC::Batch would still not pick it up (thus it isn't exactly a drop-in replacement, which would be ideal). > If you would like to contribute your NACO normalization function to cpan > (as I definitely think you should), and my reading of the lc docs are > correct, then I would recommend you add a Text::NACO module. The > Normalize part is a bit redundant because all the modules in Text do some > kind of normalization. The package could export a function normalize() on > demand, which you then pass a string, and get back the NACO normalized > version. You could also add it to the Biblio namespace as Biblio::NACO, or > MARC::NACO, but that's really your call as the module author :) The main > thing is to get it up there somewhere. What I'm now envisioning is a module, still called MARC::Record::NACO, which is not a subclass, but would export two functions on demand, normalize() and compare(). --- * normalize() inputs: either a MARC::Record object or a string. This should probably accept an arbitrary number of inputs so, you can do my @normrecs = normalize( @records ); rather than my @normrecs; foreach my $rec ( @records ) { push @normrecs, normalize( $rec ); } But you still could if you wanted to. Given a M::R object it would do as the rules state [1] for the appropriate fields in the record. Returns a M::R object. Given a string, it would apply the string normalization rules. Returns a string. * compare() inputs: either two M::R objects or two strings. Given two M::R objects, both are normalize()'ed. It would return false (or should it be true?) if, based on the rules [1], some field in $a matches some field in $b. Given two strings, both are again normalize()'ed and a simple "cmp" is performed. --- It sucks that given different inputs the results returned are a bit inconsistent. However, there's no way to say that $a > $b for a M::R (is there? :). One might want to be able to sort normalized strings, so it makes sense that compare()'ing two strings does a "cmp". How's that sound? -Brian Cassidy ( [EMAIL PROTECTED] ) [1] http://lcweb.loc.gov/catdir/pcc/naco/normrule.html http://www.gordano.com - Messaging for educators.
Re: NACO Normalization and Text::Normalize
Hi Brian: thanks for writing, On Mon, Aug 25, 2003 at 04:29:37PM -0300, Brian Cassidy wrote: > As part of a previous project I was importing MARC records into an RDBMS > structure. In order to facilitate better searching, it was suggested to > me that I do some normalization on my data and that NACO normalization > would be a good choice for guidelines. So, away I went and came back > with normalize() sub which does the trick. > > I now wonder if this code would have greater utility as a module on > CPAN. And if I do decide to upload it to CPAN, perhaps a base class > (Text::Normalize) should be created to which NACO normalization could be > added as a subclass. I think this is a great idea. At first I was thinking that it would be nice to be able to pass your normalize() function a MARC::Record object, which would magically normalize all the relevant fields (like a good cataloger). This could be a subclass MARC::Record::NACO which adds a new method normalize(), or if Andy was willing could be added to the MARC::Record core. However, the docs [1] seem to say that it is only possible to determine how a field should normalize in the context of the collection of records that it is a part of...and that MARC::Record has no way of determining this, so perhaps this idea is not on target? If you would like to contribute your NACO normalization function to cpan (as I definitely think you should), and my reading of the lc docs are correct, then I would recommend you add a Text::NACO module. The Normalize part is a bit redundant because all the modules in Text do some kind of normalization. The package could export a function normalize() on demand, which you then pass a string, and get back the NACO normalized version. You could also add it to the Biblio namespace as Biblio::NACO, or MARC::NACO, but that's really your call as the module author :) The main thing is to get it up there somewhere. Please post to the list if you decide to upload. I'd like to add a section to the tutorial, and to the perl4lib.perl.org website! //Ed [1] http://lcweb.loc.gov/catdir/pcc/naco/normrule.html
RE: NACO Normalization and Text::Normalize
> -Original Message- > > > I now wonder if this code would have greater utility as a module on > > CPAN. > > Yes, please! (You're not BRICAS on cpan.org, are you?) Yes, I am BRICAS on CPAN...is that a bad thing? :) > I would recommend putting it in the MARC::* namespace, since it's > specific to MARC records -- maybe MARC::Transform::NACO or some such. > > A class hierarchy rooted at MARC:: Transform might be useful, if (for > example) people wanted to apply arbitrary transformations to a single > record: > > my @records = ... some MARC::Record objects ... ; > my @transforms = ( > MARC::Transform::Delete9xx->new, > MARC::Transform::StripInitialArticles->new, > some_other_transforms(), > ); > foreach my $t (@ transforms) { > $t->transform($_) foreach @records; > } The current behavior is currently to take a string in, normalize, then output it. There isn't necessarily a defined behavior on a MARC record. Also, as far as "transforms" are concerned, the decode() method in MARC::File::USMARC can take a filter sub as a second parameter. So, I'm still not 100% sure it should be a MARC-specific module rather than a general normalizing module. Perhaps we need to explore exactly how a transform would interact with a MARC::Record object if we wish to go in that direction. -Brian Cassidy ( [EMAIL PROTECTED] ) http://www.gordano.com - Messaging for educators.
Re: NACO Normalization and Text::Normalize
On Monday, August 25, 2003, at 03:29 PM, Brian Cassidy wrote: The basis for this message is to get a feeling whether or not I should submit a module that will do NACO normalization (http://lcweb.loc.gov/catdir/pcc/naco/normrule.html) to CPAN. [...] So, away I went and came back with normalize() sub which does the trick. Fabulous! (Disclaimer: I'd never heard of NACO normalization before, but it sounds like it could be useful -- for MARC bib records, too.) I now wonder if this code would have greater utility as a module on CPAN. Yes, please! (You're not BRICAS on cpan.org, are you?) And if I do decide to upload it to CPAN, perhaps a base class (Text::Normalize) should be created to which NACO normalization could be added as a subclass. I would recommend putting it in the MARC::* namespace, since it's specific to MARC records -- maybe MARC::Transform::NACO or some such. A class hierarchy rooted at MARC:: Transform might be useful, if (for example) people wanted to apply arbitrary transformations to a single record: my @records = ... some MARC::Record objects ... ; my @transforms = ( MARC::Transform::Delete9xx->new, MARC::Transform::StripInitialArticles->new, some_other_transforms(), ); foreach my $t (@ transforms) { $t->transform($_) foreach @records; } Thanks for your hard work. Paul. -- Paul Hoffman :: Taubman Medical Library :: Univ. of Michigan [EMAIL PROTECTED] :: [EMAIL PROTECTED] :: http://www.nkuitse.com/