Hi!
Den 12.04.2024 03:36, skrev David Cook via Koha-devel:
Hi all,
I just wanted to share a (MariaDB) SQL report that I wrote for finding
bib records with invalid XML characters:
select biblionumber from biblio_metadata where metadata REGEXP
'[^\\x{0009}\\x{000A}\\x{000D}\\x{0020}-\\x{D7FF}\\x{E000}-\\x{FFFD}\\x{10000}-\\x{10FFFF}]+';
Newer versions of Koha strip invalid character from the XML so that you
can fix your records. I figure this report is very valuable when coupled
with that functionality. In fact, I just advised a library today to use
them together to fix up some bad data in their catalogue.
--
On a related note, I’ve noticed that you can have a record with good bib
XML but invalid item XML, and you won’t notice until your record fails
to be indexed. So I’m planning on writing a report for that too.
I’m thinking it might be good to add these reports to core Koha, so that
people can find and fix their own metadata problems. What do people think?
Sounds like an excellent idea! Sounds kind of similar to "MARC
bibliographic framework test" at /cgi-bin/koha/admin/checkmarc.pl
The report could also be added to
https://wiki.koha-community.org/wiki/SQL_Reports_Library for older Kohas
and to be immediately useful.
Best regards,
Magnus
_______________________________________________
Koha-devel mailing list
Koha-devel@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/
git : https://git.koha-community.org/
bugs : https://bugs.koha-community.org/