Hi Ed,
Yes I ment that the drawback is in modifying a CPAN module locally.
Actually, I don't know if there are any undesireable side effects.
None that I know of - I have myself used this technique for almost three years
now.
The idea is that the MARC::Record object per se should be just binary.
The efforts made in the leap from 1.38 to 2.0.0 to treat this blob as an
(always well formed!) utf8 string, was a mistake in my eyes.
It has resulted in at least two common problems.
1. when writing records: the leader length / corrupted utf8 problem I responed
to in my post.
2. when reading bad utf8 records: special care has to be taken so that not your
whole application just dies at that record
Almost all postings to this forum since 2.0.0 has been concerned with one of
these problems.
(exaggregating a little, but not much)
To put in "use bytes" is a shortcut instead of rewriting a whole bunch of code,
which probably is esthetically more pleasing.
But it is obviously much more work...
And by the way, the second problem can be dealt with by changing
sub MARC::File::Encode::marc_to_utf8 {
return Encode::decode( 'UTF-8', $_[0], 0 ); # do NOT check if UTF-8 is
valid!
}
Yes, that is also a hack!
To sum up.
I think it is a good idea to make the MARC blob a binary object, so to speak.
I don't know if you should just apply my simple hacks to CPAN code.
Or if it is called for a thourough re-write of some parts of the modules.
Those changes may involve some changes in coding styles in the scripts that use
MARC::Record.
But probably all you have to do is to remove all that strange code you put in
there as workarounds to the character "bugs".
And yes, I have been using MARC::Charset in combination with this technique,
without any problems that I can recall. :-)
/Leif
________________________________________
Från: [email protected] [[email protected]] för Ed Summers
[[email protected]]
Skickat: den 12 oktober 2010 17:13
Till: [email protected]
Ämne: Re: MARC-perl: different versions yield different results
Hi Leif,
Is the downside to this approach that you are modifying a CPAN module
in place, or is it something to do with the behavior of 'use bytes'?
Would there be any undesirable side effects to adding 'use bytes' to
MARC::File::USMARC::encode on CPAN?
//Ed
On Tue, Oct 12, 2010 at 7:58 AM, Leif Andersson
<[email protected]> wrote:
> Myself I have changed one of the modules.
>
> MARC::File::USMARC
> It has a function called encode() around line 315
> I have added a "use bytes;" just before the final return. Like this:
>
> use bytes;
> return join("",$marc->leader, @$directory, END_OF_FIELD, @$fields,
> END_OF_RECORD);
>
> To change directly in code like this is totally "no-no" to many programmers.
> If you feel uncomfortable with this, there are other methods doing the same
> stuff.
> You could write a package:
>
> package MARC_Record_hack;
> use MARC::File::USMARC;
> no warnings 'redefine';
> sub MARC::File::USMARC::encode() {
> my $marc = shift;
> $marc = shift if (ref($marc)||$marc) =~ /^MARC::File/;
> my ($fields,$directory,$reclen,$baseaddress) =
> MARC::File::USMARC::_build_tag_directory($marc);
> $marc->set_leader_lengths( $reclen, $baseaddress );
> # Glomp it all together
> use bytes;
> return join("",$marc->leader, @$directory, "\x1E", @$fields, "\x1D");
> }
> use warnings;
> 1;
> __END__