RE: NACO Normalization and Text::Normalize

Brian Cassidy Wed, 27 Aug 2003 12:19:46 +0000

Hi Ed,

> I think this is a great idea. At first I was thinking that it would be
> nice to be able to pass your normalize() function a MARC::Record
object, 
> which would magically normalize all the relevant fields (like a good 
> cataloger).  This could be a subclass MARC::Record::NACO which adds a
new > method normalize(), or if Andy was willing could be added to the 
> MARC::Record core.
> 
> However, the docs [1] seem to say that it is only possible to
determine
> how a field should normalize in the context of the collection of
records 
> that it is a part of...and that MARC::Record has no way of determining


> this, so perhaps this idea is not on target?

Okay, I think you're right that subclassing MARC::Record isn't going to
cut the mustard, since MARC::Batch would still not pick it up (thus it
isn't exactly a drop-in replacement, which would be ideal).

> If you would like to contribute your NACO normalization function to
cpan
> (as I definitely think you should), and my reading of the lc docs are 
> correct, then I would recommend you add a Text::NACO module.  The 
> Normalize part is a bit redundant because all the modules in Text do
some > kind of normalization. The package could export a function
normalize() on > demand, which you then pass a string, and get back the
NACO normalized 
> version. You could also add it to the Biblio namespace as
Biblio::NACO, or > MARC::NACO, but that's really your call as the module
author :) The main 
> thing is to get it up there somewhere.

What I'm now envisioning is a module, still called MARC::Record::NACO,
which is not a subclass, but would export two functions on demand,
normalize() and compare().

---

* normalize()

inputs: either a MARC::Record object or a string. This should probably
accept an arbitrary number of inputs so, you can do

my @normrecs = normalize( @records );

rather than

my @normrecs;
foreach my $rec ( @records ) {
        push @normrecs, normalize( $rec );
}

But you still could if you wanted to.

Given a M::R object it would do as the rules state [1] for the
appropriate fields in the record. Returns a M::R object.

Given a string, it would apply the string normalization rules. Returns a
string.

* compare()

inputs: either two M::R objects or two strings.

Given two M::R objects, both are normalize()'ed. It would return false
(or should it be true?) if, based on the rules [1], some field in $a
matches some field in $b.

Given two strings, both are again normalize()'ed and a simple "cmp" is
performed.

---

It sucks that given different inputs the results returned are a bit
inconsistent. However, there's no way to say that $a > $b for a M::R (is
there? :). One might want to be able to sort normalized strings, so it
makes sense that compare()'ing two strings does a "cmp".

How's that sound?

-Brian Cassidy ( [EMAIL PROTECTED] )

[1] http://lcweb.loc.gov/catdir/pcc/naco/normrule.html


http://www.gordano.com - Messaging for educators.

RE: NACO Normalization and Text::Normalize

Reply via email to