Re: Deleting a subfield using MARC::Record

2006-05-03 Thread Edward Summers


On May 1, 2006, at 4:41 PM, Leif Andersson wrote:


+1

count can possibly be complemented or replaced with occurrence as  
suggested.

It'd be nice to be able to denote last occurrence [-1].
And I suppose the indexing should be based on ordinary perl  
subscript indexing - i.e. governed by the value of special variable $[


$field-delete_subfield( code = $code,  # of course
 occur = [0,2,3],   # occur or pos or  
whatever...
 match = qr/pat/,   # doesn't need to be  
repeatable

   );


I actually like 'pos' better than 'occur' -- but alas

$field-delete_subfield(pos = 2);

won't work because 'pos' is a perl keyword--which is why I like using  
it I suppose :-) How about:


$field-delete_subfield(position = 2);

A bit more wordy I guess, but I still like it better than occur. Nice  
tip on the use of $[ by the way! I also like Tim's suggestion to  
allow 'code' to take multiple values too:


$field-delete_subfield(code = ['a','b','c'])

So if you check out the CVS you should find this implemented. If you  
are interested in adding any tests or documentation let me know and  
I'll add you as a sf.net developer.


The current documentation for the new method reads like this:

--

   delete_subfield() allows you to remove subfields from a field:

   # delete any subfield a in the field
   $field-delete_subfield(code = 'a');

   # delete any subfield a or u in the field
   $field-delete_subfield(code = ['a', 'u']);

   If you want to only delete subfields at a particular position  
you can

   use the position parameter:

   # delete subfield u at the first position
   $field-delete_subfield(code = 'u', position = 0);

   # delete subfield u at first or second position
   $field-delete_subfield(code = 'u', position = [0,1]);

   You can specify a regex to for only deleting subfields that  
match:


  # delete any subfield u that matches zombo.com
  $field-delete_subfield(code = 'u', match = qr/zombo.com/);

--

Sound ok?

//Ed


Re: Deleting a subfield using MARC::Record

2006-05-03 Thread Edward Summers

On May 3, 2006, at 6:28 AM, Edward Summers wrote:

$field-delete_subfield(pos = 2);

won't work because 'pos' is a perl keyword--


I should've tried it before I said this -- it works fine in that  
context, even though my perl syntax highlighter indicates otherwise.  
So I've changed the parameter name from 'position' to 'pos' keeping  
with Leif's original suggestion.


//Ed


Re: Deleting a subfield using MARC::Record

2006-05-03 Thread Michael Kreyche

Edward Summers wrote:


The current documentation for the new method reads like this:

--

   delete_subfield() allows you to remove subfields from a field:

   # delete any subfield a in the field
   $field-delete_subfield(code = 'a');

   # delete any subfield a or u in the field
   $field-delete_subfield(code = ['a', 'u']);

   If you want to only delete subfields at a particular position you 
can

   use the position parameter:

   # delete subfield u at the first position
   $field-delete_subfield(code = 'u', position = 0);

   # delete subfield u at first or second position
   $field-delete_subfield(code = 'u', position = [0,1]);


If you implemented negative indexes, it would be nice to add an example:

# delete subfield u at last position
$field-delete_subfield(code = 'u', pos = [-1]);


   You can specify a regex to for only deleting subfields that match:

  # delete any subfield u that matches zombo.com
  $field-delete_subfield(code = 'u', match = qr/zombo.com/);


The term position (pos) seems a little ambiguous to me on the face 
of it. Does (code = 'u', pos = 0) mean the first subfield u (which 
is what  I take it to mean) or subfield u if it's the first subfield 
(which it might sound like outside the context of this discussion)?


Mike
--
Michael Kreyche
Systems Librarian
Associate Professor
Kent State University Libraries and Media Services
http://www.personal.kent.edu/~mkreyche
330-672-1918



Re: Deleting a subfield using MARC::Record

2006-05-03 Thread Mark Jordan

Brad Baxter wrote:

On 5/3/06, Michael Kreyche [EMAIL PROTECTED] wrote:


The term position (pos) seems a little ambiguous to me on the face
of it. Does (code = 'u', pos = 0) mean the first subfield u (which
is what  I take it to mean) or subfield u if it's the first subfield
(which it might sound like outside the context of this discussion)?


I had the same thought.  To me, 'occur' has a clearer meaning in
that context, the zeroth occurrence of subfield 'u', while 'pos' has
more of the ambiguity described above.  So of the two, I'd prefer
'occur', but I can live with 'pos'.  Synonyms perhaps?  (Unless someone
has a need to delete the third subfield regardless of code?  I never
have, so perhaps not.)

--
Brad


I think it should mean the zeroth occurrence of subfield 'u', since 
specifying which of a repeated group of subfields is a realistic task, 
as you say. For example, each record has two 'u's but all of the first 
ones are garbage.


Mark

--
Mark Jordan
Head of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Phone (604) 291 5753 / Fax (604) 291 3023
[EMAIL PROTECTED] / http://www.sfu.ca/~mjordan/


Re: Deleting a subfield using MARC::Record

2006-05-03 Thread Edward Summers


On May 3, 2006, at 8:55 AM, Mark Jordan wrote:
I think it should mean the zeroth occurrence of subfield 'u',  
since specifying which of a repeated group of subfields is a  
realistic task, as you say. For example, each record has two 'u's  
but all of the first ones are garbage.


Actually 'pos' as implemented will remove the subfield u if it is at  
position n in the field. So we could have occurrence too. I feel like  
I'm chasing windmills a bit. Do y'all really *need* all this  
functionality in delete_subfield() :-) I guess you do or else you  
wouldn't be so interested in asking for it.


I didn't implement the -1 behavior because i wasn't quite sure how to  
do it quickly, and it seemed like too much somehow.


//Ed


Re: Deleting a subfield using MARC::Record

2006-05-03 Thread Michael Kreyche

Edward Summers wrote:


On May 3, 2006, at 8:55 AM, Mark Jordan wrote:
I think it should mean the zeroth occurrence of subfield 'u', since 
specifying which of a repeated group of subfields is a realistic task, 
as you say. For example, each record has two 'u's but all of the first 
ones are garbage.


Actually 'pos' as implemented will remove the subfield u if it is at 
position n in the field. So we could have occurrence too. I feel like 
I'm chasing windmills a bit. Do y'all really *need* all this 
functionality in delete_subfield() :-) I guess you do or else you 
wouldn't be so interested in asking for it.


Well, maybe this IS getting a little out of hand! I could live with the 
old-fashioned way myself. Being a newbie to the list I was surprised how 
fast you jumped in and provided the new functionality.


Mike
--
Michael Kreyche
Systems Librarian
Associate Professor
Kent State University Libraries and Media Services
http://www.personal.kent.edu/~mkreyche
330-672-1918



Re: Deleting a subfield using MARC::Record

2006-05-03 Thread Mark Jordan
Ed, the only problem I can see with position in the field is if a 
preceding subfield does not exist in every record. For example, in a 
given batch, most but not all records have an 856 subfield 3, followed 
by multiple subfield u's. If you ask to delete the first u using pos, 
then your target will be different determined by the presence of 
subfield 3. If you know that you  want to eliminate u's (without regard 
to what else is in the field) then your target would be easier to hit.


However, you raise a good point -- how much functionality do people 
need? Maybe some actual examples from the wild would be useful. I can 
supply some but probably not until tomorrow afternoon since I have a 
presentation to prepare for tomorrow. If other users have some examples 
of real records or use cases they might clarify the most common usage. 
I'll see what I can find tomorrow.


Mark

Edward Summers wrote:


On May 3, 2006, at 8:55 AM, Mark Jordan wrote:
I think it should mean the zeroth occurrence of subfield 'u', since 
specifying which of a repeated group of subfields is a realistic task, 
as you say. For example, each record has two 'u's but all of the first 
ones are garbage.


Actually 'pos' as implemented will remove the subfield u if it is at 
position n in the field. So we could have occurrence too. I feel like 
I'm chasing windmills a bit. Do y'all really *need* all this 
functionality in delete_subfield() :-) I guess you do or else you 
wouldn't be so interested in asking for it.


I didn't implement the -1 behavior because i wasn't quite sure how to do 
it quickly, and it seemed like too much somehow.


//Ed


--
Mark Jordan
Head of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Phone (604) 291 5753 / Fax (604) 291 3023
[EMAIL PROTECTED] / http://www.sfu.ca/~mjordan/


Re: Deleting a subfield using MARC::Record

2006-05-03 Thread Edward Summers


On May 3, 2006, at 11:25 AM, Mark Jordan wrote:

For example, in a given batch, most but not all records have an 856  
subfield 3, followed by multiple subfield u's. If you ask to delete  
the first u using pos, then your target will be different  
determined by the presence of subfield 3. If you know that you   
want to eliminate u's (without regard to what else is in the field)  
then your target would be easier to hit.


Ok this I like, having a use case like this makes it much easier to  
decide about the API. How about we go back to occurrence and remove pos?




However, you raise a good point -- how much functionality do people  
need? Maybe some actual examples from the wild would be useful. I  
can supply some but probably not until tomorrow afternoon since I  
have a presentation to prepare for tomorrow. If other users have  
some examples of real records or use cases they might clarify the  
most common usage. I'll see what I can find tomorrow.


Yeah, you know if you have the interest/time it would be great if you  
could add a couple tests to the existing test file. The tests need  
not pass, but they should illustrate they should illustrate the use.  
Hop onto #code4lib and I can walk you through how to do this, and get  
your access set up if you are interested.


//Ed


Re: Deleting a subfield using MARC::Record

2006-05-02 Thread Brad Baxter

On 5/1/06, Edward Summers [EMAIL PROTECTED] wrote:


On May 1, 2006, at 1:24 PM, Brad Baxter wrote:
 # delete first two subfield u
 $field-delete_subfield(code = 'u', count = 2);

 I don't think I like it this way.  How would you delete just the
 second one?
 I'd rather see 'count' mean 'occurrence', so the above would mean
 delete the second subfield u.

Yeah, I like your suggestion of using 'occurrence' over 'count' here.

 And ...

 # delete second and third subfield u
 $field-delete_subfield(code = 'u', count = 2, count = 3);

This won't translate very well to passing in arguments as a hash,
since the second count will stomp on the first. Have you actually had
to any real need to delete this way in the past?


Of course.  I was thinking too shallowly.  The suggestions for
occur = [1,2] would do the trick.

-- Brad


Re: Deleting a subfield using MARC::Record

2006-05-02 Thread Ben Soares
++

sorry, couldn't resist


Re: Deleting a subfield using MARC::Record

2006-05-02 Thread Timothy Prettyman

+1

I like Leif's proposal.  It also might be useful to allow code to  
accept multiple values.


-Tim

Timothy Prettyman
LIT/LIbrary Systems
University of Michigan

On May 1, 2006, at 5:41 PM, Leif Andersson wrote:


+1

count can possibly be complemented or replaced with occurrence as  
suggested.

It'd be nice to be able to denote last occurrence [-1].
And I suppose the indexing should be based on ordinary perl  
subscript indexing - i.e. governed by the value of special variable $[


$field-delete_subfield( code = $code,  # of course
 occur = [0,2,3],   # occur or pos or  
whatever...
 match = qr/pat/,   # doesn't need to be  
repeatable

   );

Leif
==
Leif Andersson, Systems Librarian
Stockholm University Library
SE-106 91 Stockholm
SWEDEN
Phone : +46 8 162769
Mobile: +46 70 6904281




Re: Deleting a subfield using MARC::Record

2006-05-01 Thread Mark Jordan

Edward Summers wrote:


On Apr 29, 2006, at 10:31 AM, Mark Jordan wrote:
 Maybe other people should verify the usefulness of a delete subfield 
function before anyone does anything about it, though. Would a half 
dozen +1 votes from perl4libers validate its usefulness?


Yes it would...but to get the changes out on CPAN we'd all need to 
convince Mike O'Regan who holds the keys to MARC-Record on CPAN. Right 
now the version on CPAN is not the latest version that's available in 
CVS on SourceForge. I think Mike has some performance concerns about the 
Unicode handling code in v2.0, since MARC::Record is used in some of his 
time critical applications. Hopefully we can benchmark v1.38 and v2.0 
sometime and get the latest code pushed out to CPAN if things look OK. 
Or if this is not an option there is always the possibility of creating 
a MARC::Record2 on CPAN. That would be kind of ugly for backwards 
compatibility though.


OK -- here's the call for a vote. All interested perl4lib members are 
encouraged to participate by emailing the list.


Proposal: Incorporate the functions for deleting specific subfields into 
MARC-Record v2.0, based on examples Ed supplies below.


Process: +1 = Yes for Proposal, anything else = No and must be 
accompanied by reasons not to incorporate these features into v2.0.


Deadline: Midnight (Greenwich/UTC), Thursday May 4.


Mark



At any rate if you are open to using the latest/greatest code from 
SourceForge there is nothing stopping us from getting a delete_subfield 
method working.



$field-delete_subfield('a', 0);


In the interests of flexibility how about using hash key/value pairs? 
Here's what I'm thinking:


# delete all subfield u
$field-delete_subfield(code = 'u');

# delete first two subfield u
$field-delete_subfield(code = 'u', count = 2);

# delete all subfield u that have 'zombo.com' in them
$field-delete_subfield(code = 'u', match = qr/zombo\.com/);

# delete only the first subfield u that has 'zombo.com' in it
$field-delete_subfield(code = 'u', match = qr/zombo\.com', count = 1);

# delete any subfield with a value that matches 'zombo.com'
$field-delete_subfield(match = qr/zombo\.com/);

So effectively there is AND boolean logic between any options that are 
supplied. If this looks good I've got some code that does it and some 
tests committed in SourceForge for you to take a look at [1]. I'm open 
to suggestions on renaming parameters, etc... Here are the basics for 
checking out the code if you haven't done it before.


 cvs -d:pserver:[EMAIL PROTECTED]:/cvsroot/marcpm login

 cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/marcpm co 
-P marc-record


//Ed

[1] http://sourceforge.net/cvs/?group_id=1254




RE: Deleting a subfield using MARC::Record

2006-05-01 Thread Bryan Baldus
OK -- here's the call for a vote. All interested perl4lib members are 
encouraged to participate by emailing the list.

+1

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 


Re: Deleting a subfield using MARC::Record

2006-05-01 Thread Edward Summers

+1

:-)

//Ed


Re: Deleting a subfield using MARC::Record

2006-05-01 Thread Gary Bertchume

+1

//Gary

--
===
Gary Bertchume
Library Systems Analyst
Columbia University Libraries
212 854-8582
[EMAIL PROTECTED]
===


Re: Deleting a subfield using MARC::Record

2006-05-01 Thread Brad Baxter

On 5/1/06, Bryan Baldus [EMAIL PROTECTED] wrote:

On Monday, May 01, 2006 1:24 PM, Brad Baxter wrote:
On 4/30/06, Edward Summers [EMAIL PROTECTED] wrote:
[snip]
 # delete first two subfield u
 $field-delete_subfield(code = 'u', count = 2);

I don't think I like it this way.  How would you delete just
the second one?
I'd rather see 'count' mean 'occurrence', so the above would mean
delete the second subfield u.  And ...

# delete second and third subfield u
$field-delete_subfield(code = 'u', count = 2, count = 3);


When I looked at it, I also wondered about 'count' being 'position' or
something like that, to be able to note which occurrence. It might be useful
to have both--if you know it is always the 1st 2 occurrences of the
subfields, use 'count', but if you know it is the 1st and 3rd, then use
'position' or 'occurrence'.

examples:
#remove 1st 2 subfield 'u'
$field-delete_subfield(code = 'u', count = 2);

#remove 1st and 3rd subfield 'u'
$field-delete_subfield(code = 'u', occurence = (0, 2)); #or (1, 3)

#remove last subfield u
$field-delete_subfield(code = 'u', occurence = (-1));



I guess those work for me, as long as it's 'occurrence'.  :-)
Or perhaps just 'occur'.
(If we allow -1, then I agree it should start at 0.)

--
Brad


Re: Deleting a subfield using MARC::Record

2006-05-01 Thread Edward Summers


On May 1, 2006, at 1:24 PM, Brad Baxter wrote:

# delete first two subfield u
$field-delete_subfield(code = 'u', count = 2);


I don't think I like it this way.  How would you delete just the  
second one?

I'd rather see 'count' mean 'occurrence', so the above would mean
delete the second subfield u.


Yeah, I like your suggestion of using 'occurrence' over 'count' here.


And ...

# delete second and third subfield u
$field-delete_subfield(code = 'u', count = 2, count = 3);


This won't translate very well to passing in arguments as a hash,  
since the second count will stomp on the first. Have you actually had  
to any real need to delete this way in the past?


//Ed


Re: Deleting a subfield using MARC::Record

2006-04-30 Thread Edward Summers


On Apr 29, 2006, at 10:31 AM, Mark Jordan wrote:
 Maybe other people should verify the usefulness of a delete  
subfield function before anyone does anything about it, though.  
Would a half dozen +1 votes from perl4libers validate its usefulness?


Yes it would...but to get the changes out on CPAN we'd all need to  
convince Mike O'Regan who holds the keys to MARC-Record on CPAN.  
Right now the version on CPAN is not the latest version that's  
available in CVS on SourceForge. I think Mike has some performance  
concerns about the Unicode handling code in v2.0, since MARC::Record  
is used in some of his time critical applications. Hopefully we can  
benchmark v1.38 and v2.0 sometime and get the latest code pushed out  
to CPAN if things look OK. Or if this is not an option there is  
always the possibility of creating a MARC::Record2 on CPAN. That  
would be kind of ugly for backwards compatibility though.


At any rate if you are open to using the latest/greatest code from  
SourceForge there is nothing stopping us from getting a  
delete_subfield method working.



$field-delete_subfield('a', 0);


In the interests of flexibility how about using hash key/value pairs?  
Here's what I'm thinking:


# delete all subfield u
$field-delete_subfield(code = 'u');

# delete first two subfield u
$field-delete_subfield(code = 'u', count = 2);

# delete all subfield u that have 'zombo.com' in them
$field-delete_subfield(code = 'u', match = qr/zombo\.com/);

# delete only the first subfield u that has 'zombo.com' in it
$field-delete_subfield(code = 'u', match = qr/zombo\.com', count  
= 1);


# delete any subfield with a value that matches 'zombo.com'
$field-delete_subfield(match = qr/zombo\.com/);

So effectively there is AND boolean logic between any options that  
are supplied. If this looks good I've got some code that does it and  
some tests committed in SourceForge for you to take a look at [1].  
I'm open to suggestions on renaming parameters, etc... Here are the  
basics for checking out the code if you haven't done it before.


 cvs -d:pserver:[EMAIL PROTECTED]:/cvsroot/marcpm login

	 cvs -z3 -d:pserver:[EMAIL PROTECTED]:/cvsroot/marcpm co  
-P marc-record


//Ed

[1] http://sourceforge.net/cvs/?group_id=1254


Re: Deleting a subfield using MARC::Record

2006-04-29 Thread Mark Jordan

Edward Summers wrote:

Deleting subfields is a bit tricky since subfields may 
repeat, and sometimes people just want to delete one of them. An 
unfortunate state of affairs perhaps. 


Yeah, I can see what you're saying, but doesn't that also apply to 
repeatable fields? If a particular subfield that is one of a repeated 
set needed to be deleted, it could be identified by a regex or by its 
order in an array (following object syntax probably not correct):


@subfields = $subject-subfields();
foreach $notwanted (@subfields) {
  if ($notwanted =~ /badsubject/) {
  $notwanted-delete_subfield();
  }
}

Mark

--
Mark Jordan
Head of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Phone (604) 291 5753 / Fax (604) 291 3023
[EMAIL PROTECTED] / http://www.sfu.ca/~mjordan/


Re: Deleting a subfield using MARC::Record

2006-04-29 Thread Edward Summers

On Apr 29, 2006, at 1:08 AM, Mark Jordan wrote:

Edward Summers wrote:

Deleting subfields is a bit tricky since subfields may repeat, and  
sometimes people just want to delete one of them. An unfortunate  
state of affairs perhaps.


Yeah, I can see what you're saying, but doesn't that also apply to  
repeatable fields?


Well yeah it does, and you're right there is a  
MARC::Record::delete_field isn't there. Would having something  
similar to

that in MARC::Field be useful to you?

If a particular subfield that is one of a repeated set needed to be  
deleted, it could be identified by a regex or by its order in an  
array (following object syntax probably not correct):


@subfields = $subject-subfields();
foreach $notwanted (@subfields) {
  if ($notwanted =~ /badsubject/) {
  $notwanted-delete_subfield();
  }
}


That could work if subfields were objects, but they're just strings.  
It could simply delete all of them unless a second parameter is  
passed in, which would basically act like a filter:


$field-delete_subfield('a', qr/badsubject/);

An alternative of course is to play fast and loose with the Object  
model and twiddle with $field-_subfields ... this is an array that  
looks like:


['a', 'foo', 'b', 'bar']

Of course this opens you up to future failure if the internals of  
MARC::Field change at some point in the future. Which may not be all  
that likely :-)


//Ed


Re: Deleting a subfield using MARC::Record

2006-04-29 Thread Michael Kreyche

Edward Summers wrote:

That could work if subfields were objects, but they're just strings. It 
could simply delete all of them unless a second parameter is passed in, 
which would basically act like a filter:


$field-delete_subfield('a', qr/badsubject/);


That sounds pretty good, though I'm working with a database where I 
found a 6xx field with two identical occurrences of subfield 2. So what 
you propose would, I suppose, delete both of them and I'd have to put 
one back in. Add another parameter to specify which occurrence?


Mike


Re: Deleting a subfield using MARC::Record

2006-04-29 Thread Mark Jordan

Edward Summers wrote:

On Apr 29, 2006, at 1:08 AM, Mark Jordan wrote:

Edward Summers wrote:

Deleting subfields is a bit tricky since subfields may repeat, and 
sometimes people just want to delete one of them. An unfortunate 
state of affairs perhaps.


Yeah, I can see what you're saying, but doesn't that also apply to 
repeatable fields?


Well yeah it does, and you're right there is a 
MARC::Record::delete_field isn't there. Would having something similar to

that in MARC::Field be useful to you?


Strangely enough, I've encountered two separate situations in the last 
two weeks (both on Friday afternoons... I think I've fallen into some 
kind of recursive metadata loop) where I wanted to delete just a 
specific subfield. Maybe other people should verify the usefulness of a 
delete subfield function before anyone does anything about it, though. 
Would a half dozen +1 votes from perl4libers validate its usefulness?




If a particular subfield that is one of a repeated set needed to be 
deleted, it could be identified by a regex or by its order in an array 
(following object syntax probably not correct):


@subfields = $subject-subfields();
foreach $notwanted (@subfields) {
  if ($notwanted =~ /badsubject/) {
  $notwanted-delete_subfield();
  }
}


That could work if subfields were objects, but they're just strings. It 
could simply delete all of them unless a second parameter is passed in, 
which would basically act like a filter:


$field-delete_subfield('a', qr/badsubject/);


Yeah, I knew I had the syntax wrong. The revised one you supply looks 
intuitive. Could the second parameter also accommodate the order (in an 
array) of the target subfield if the subfield repeated, for when you 
know your records are consistent enough to use it (such as when you get 
vendor records that have two 856 u subfields but you only want the 
second one)? For example:


$field-delete_subfield('a', 0);

Mark




Re: Deleting a subfield using MARC::Record

2006-04-28 Thread Edward Summers


On Apr 28, 2006, at 8:20 PM, Michael Kreyche wrote:

my $new856f = MARC::Field-new('856',$i1,$i2,@new856s);
$field-replace_with($new856f);

If there's an easier way, I'd like to know!


Creating a new field and replacing the old one with the new one is  
the way to go. Deleting subfields is a bit tricky since subfields may  
repeat, and sometimes people just want to delete one of them. An  
unfortunate state of affairs perhaps. If someone can suggest an  
easier API for doing this I'm open to suggestions :-)


//Ed