> Well, the problem is when the original Marc4J author took the spec at it's > word, and actually _acted upon_ the '4' and the '5', changing file semantics > if they were different, and throwing an exception if it was a non-digit. > At least the author actually used the values rather than checking to see if a 4 or 5 were there. I still don't see what the point of looking for a 0 in an undefined field would be. I'm wondering what kind of nut job would write this into the standard, but that's not the author's problem.
> Do you think he got it wrong? How was he supposed to know he got it wrong, > he wrote to the spec and took it at it's word. Are you SURE there aren't any > Marc formats other than Marc21 out there that actually do use these bytes > with their intended meaning, instead of fixing them? I wouldn't call it wrong -- the spec is a logical point of departure. MARC21 derives from an ISO standard that does not use those character positions and which otherwise requires the same data layout, but the author wouldn't necessarily know that. Standards have something in common with laws in that how they are used in the real world is as or more important than what is actually defined -- what's written and what's done in practice can be very different. Everyone here who has parsed catalog data who has done an ILS migration knows better than to just think for a second that fields can be assumed to be used as defined except for very basic stuff. > How was the Marc4J author supposed to be sure of that, or even guess it > might be the case, and know he'd be serving users better by ignoring the > spec here instead of following it? There might not have been a good way to know. With data, one thing you always want to do is ask a bunch of people who work with it all the time about anomalies in the wild. Many great works of fiction masquerade as documents which supposedly describe reality. > Ie: I _thought_ I was writing only for Marc21, but then it turns out I've > got to accept records from Outer Weirdistan that are a kind of legal Marc > that actually uses those bytes for their intended meaning.... Any such MARC as it would be noncompliant with the ISO standard from which MARC21 hails. If working from the MARC21 standard and weird records are in question, there would be a greater chance of choking on nonumeric tags as those are allowed by the ISO standard. Ignoring that MARC21 would need to be redefined to be able to take on other values, one can safely conclude that such a redefinition could only be written by totally deranged individuals. Values lower than 4 and 5 respectively would limit record length to the point little or no data could be stored, and greater values would be completely nonsensical as the MARC record length limitation would mean that the extra space allocated by the digits could only contain zeros. In any case, MARC is a legacy standard from the 60's. The chances of new flavors emerging are dismal at best. > Again, I realize in the actual environment we've got, this is not a luxury > we have. But it's a fault, not a benefit, to have lots of software > everywhere behaving in non-compliant ways and creating invalid (according to > the spec!) data. > Creating is another matter entirely. Since we can control what we create ourselves, we make things a little better every time we make things comformant. However, we can't control what others do and being able to read everything is useful, including stuff created using tools/processes that aren't up to scratch. kyle