Package: libxml-libxml-perl Version: 2.0134+dfsg-2+b1 Severity: normal Dear Maintainer,
I use XML::LibXML::Reader to work with files that validate against the Library of Congress's MARCXML Schema, available here: https://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd That schema includes a pattern: [\dA-Za-z!"#$%&'()*+,-./:;<=>?{}_^`~\[\]\\]{1} or, with the XML escaping processed: [\dA-Za-z!"#$%&'()*+,-./:;<=>?{}_^`~\[\]\\]{1} That regex requires a single character, any one of a long list of allowable characters. Note how three of the characters require escaping because they would have meaning in the regex itself: the two square brackets [ and ], and the backslash \. An online XML Schema validator that I found with a quick search: https://www.liquid-technologies.com/online-xsd-validator shows that those three characters are valid. The problem is that XML::LibXML::Reader seems to believe that they are not. I wrote a simple test script, validate.pl: ------------------- #!/usr/bin/perl use strict; use warnings; use File::Slurp; use XML::LibXML::Reader; my ($xml_path, $xsd_path) = @ARGV; my %parameters = ( 'location'=>$xml_path, 'Schema'=>XML::LibXML::Schema->new(string=>scalar(read_file($xsd_path))), ); my $reader = XML::LibXML::Reader->new(%parameters); while($reader->read()) {} print "Finished reading; document must be valid.\n"; ------------------- Along with a basic XML Schema file, test.xsd: ------------------- <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="root"> <xs:complexType> <xs:attribute name="code"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[\dA-Za-z!"#$%&'()*+,-./:;<=>?{}_^`~\[\]\\]{1}"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> </xs:schema> ------------------- and a VERY basic XML file, test.xml: ------------------- <root code="["/> ------------------- Running: $ perl validate.pl test.xml test.xsd results in: test.xml:1: Schemas validity error : Element 'root', attribute 'code': [facet 'pattern'] The value '[' is not accepted by the pattern '[\dA-Za-z!"#$%&'()*+,-./:;<=>?{}_^`~\[\]\\]{1}'. I believe that value in fact should match that pattern. The online schema validator from earlier validates this pair of files. If you replace the data in the "code" attribute with any of the other characters, validation passes. It only fails for the three characters that are escaped. -- System Information: Debian Release: 11.7 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 5.10.0-21-amd64 (SMP w/8 CPU threads) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libxml-libxml-perl depends on: ii libc6 2.31-13+deb11u6 ii libxml-namespacesupport-perl 1.12-1.1 ii libxml-sax-perl 1.02+dfsg-1 ii libxml2 2.9.10+dfsg-6.7+deb11u4 ii perl 5.32.1-4+deb11u2 ii perl-base [perlapi-5.32.0] 5.32.1-4+deb11u2 libxml-libxml-perl recommends no packages. libxml-libxml-perl suggests no packages. -- no debconf information