https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38729
--- Comment #13 from Janusz Kaczmarek <[email protected]> --- (In reply to Marcel de Rooy from comment #12) > (In reply to David Cook from comment #7) > > This is an interesting one for sure, and it seems like a very real problem. > > > > However, I think that you need to make this feature optional with your > > current implementation. Your patch would unexpectedly change existing > > behaviour that people rely upon. > > > > For instance, there are lots of cases where your bib heading doesn't > > perfectly match the authority (e.g. punctuation difference, minor spelling > > difference in one of the words in the heading, whitespace difference, etc), > > but you want it to still match and use the authorized form. > > Good points. Janusz, could you address that concern please? > Changing status to reflect need for feedback. Well, lets try. The main aim with this patch is to distinguish between Latin-based letters that should not be equated with each other. I am aware that in English alphabet there are just 26 basic letters and may be that from an anglophone point of view 'Å', 'Ä', 'Ą', 'Á' (to name only a few letters based on Latin letter 'A') all equal to 'A', but there do not -- they are separate letters in the respective alphabets (Swedish, German, Polish, Hungarian...): https://en.wikipedia.org/wiki/Swedish_alphabet, https://en.wikipedia.org/wiki/Hungarian_alphabet , etc. Also, it is not a local issue IMO but rather a general one. It will be an issue in every catalogue collecting international literature. You should definitely distinguish names like 'Jamroz', 'Jamroż' and 'Jamróz'. I agree it will not occur in one in ten cases, but this should not be a reason to ignore the issue. The fact that this is relatively rare is an explanation for why it went undetected for so long. (BTW this issue reveals itself more easily with large scale catalogues, with several hundred thousand or several million records.) At the same time, traditionally, when searching, we expect to find all three names ('Jamroz', 'Jamroż' and 'Jamróz') by searching without diacritical marks, i.e. 'Jamroz' (important especially for those who do not use accrual keyboards). And not just in the Polish catalogue, but in every catalogue. So Elasticsearch works IMO correctly with current default settings. As for the inaccuracy of the record, as a librarian I prefer to have a controlled field unlinked to the authority record than linked incorrectly and then perhaps modified incorrectly as a consequence (e.g. Änkor linked to Ankor will become Ankor when the Ankor authority record is first edited, cf. Bug 33401). Finally, regarding punctuation, here I am comparing the search_form, which is constructed by _get_search_heading. This method, among other things, removes the parentheses and punctuation from the end of each subfield, and so largely standardizes the notation, bypassing some of the possible problems of inaccurate notation. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
