Dan, thanks for the explanation and also for the positive news that Evergreen 2.0 should be able to handle these characters without problems :-)!

Deanna, I believe that USMARC or SUTRS syntax should not be the problem as other bib records in USMARC format have been processed correctly.

BTW, we encountered problems with these characters also in the older versions of Evergreen we experimented with earlier - an example (which is still "valid") would be a record from the Library of Congress (TCN 3760924 - or it can be found using Author Čapek, Karel and Title RUR :-). Maybe this could also be an example worth trying in Evergreen 2.0...

Linda

Dne 9.9.2010 17:08, Dan Scott napsal(a):
On Thu, 2010-09-09 at 08:04 +0200, Linda Jansova wrote:
It seems that the use of letters such as "Ú" or "Č" at the beginning
of
fields such as 100 a or 710 a may cause the problem (although it is
merely an assumption of ours).

Is there any way how this complication can be overcome and the
records
get indexed properly?
In Evergreen 1.6.x, as part of the indexing, all records get passed from
Perl through a server-side JavaScript routine then back to Perl, and
that seems to be a little shaky with handling Unicode. We've found in
the past that some characters need their Unicode normalized to NFC
(composed), and some need to be normalized to NFD (decomposed), to pass
through the JavaScript routine safely, and some don't make it through at
all (http://markmail.org/search/?q=NFC+NFD+import
+list:org.georgialibraries.list.open-ils-dev). I suspect this is what is
causing your problems.

The good news is that Evergreen 2.0 does away with the server-side
JavaScript routine and performs all of the indexing directly in the
database, so it should "just work". I'll try importing some of those
records in a 2.0 test server when I get a chance, but I'm confident that
it will work. The bad news is that Evergreen 2.0 is still only at an
alpha release stage at this point, so relief won't be available in a
stable release for some time yet.


Reply via email to