Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString
On Mon, Mar 30, 2015 at 01:02:49PM +0200, Ján Tomko wrote: These cannot be represented in XML. We have been stripping them, but only if the string had characters that needed escaping: ' Extend the strcspn check to include control codes, and strip them even if we don't do any escaping. https://bugzilla.redhat.com/show_bug.cgi?id=1184131 https://bugzilla.redhat.com/show_bug.cgi?id=1066564 --- src/util/virbuffer.c | 14 +++--- tests/virbuftest.c | 49 + 2 files changed, 60 insertions(+), 3 deletions(-) ACK, according to XML documentation characters below 0x20 except 0x09, 0x0A and 0x0D are forbidden. Pavel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString
On 03/30/2015 05:02 AM, Ján Tomko wrote: These cannot be represented in XML. Yes they can, via entities. DV would know for sure, but I think that #x01; is the entity for the C byte '\1'. We have been stripping them, but only if the string had characters that needed escaping: ' Extend the strcspn check to include control codes, and strip them even if we don't do any escaping. NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them. https://bugzilla.redhat.com/show_bug.cgi?id=1184131 https://bugzilla.redhat.com/show_bug.cgi?id=1066564 --- src/util/virbuffer.c | 14 +++--- tests/virbuftest.c | 49 + 2 files changed, 60 insertions(+), 3 deletions(-) As there are real bugs that will be fixed once we use the correct entities, I'm looking forward to v2. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString
On Mon, Mar 30, 2015 at 07:06:45AM -0600, Eric Blake wrote: On 03/30/2015 05:02 AM, Ján Tomko wrote: These cannot be represented in XML. Yes they can, via entities. DV would know for sure, but I think that #x01; is the entity for the C byte '\1'. Only in XML 1.1. For XML 1.0: http://www.w3.org/TR/xml/#dt-charref Well-formedness constraint: Legal Character Characters referred to using character references MUST match the production for Char. Which is: http://www.w3.org/TR/xml/#NT-Char [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]/* any Unicode character, excluding the surrogate blocks, FFFE, and . */ Both libvirt and virt-xml-validate choke on those entities error: (domain_definition):2: xmlParseCharRef: invalid xmlChar value 1 namef21#x01;/name DV was the one who wrote the code to skip over control characters in commit b36f453a581f27a4a43558978724a52df32045bb (v0.3.0~1) new function virBufferEscapeString() to format a string while escaping its content for XML, and apply it to a couple of obvious places, should fix bug #206653 It was the optimization in commit 0af02cb2e8d8192958735880e135ab69beb437c5 (v0.8.6~57) buf: Simplify virBufferEscapeString which broke this for strings that do have control codes, but not escapable characters. We have been stripping them, but only if the string had characters that needed escaping: ' Extend the strcspn check to include control codes, and strip them even if we don't do any escaping. NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them. https://bugzilla.redhat.com/show_bug.cgi?id=1184131 https://bugzilla.redhat.com/show_bug.cgi?id=1066564 --- src/util/virbuffer.c | 14 +++--- tests/virbuftest.c | 49 + 2 files changed, 60 insertions(+), 3 deletions(-) As there are real bugs that will be fixed once we use the correct entities, I'm looking forward to v2. This fixes the real bug of libvirt generating unparsable XML, which breaks creation of any VMs in virt-manager. Jan signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString
On 03/30/2015 09:50 AM, Daniel Veillard wrote: NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them. you can't escape them with a CharRef for sure http://www.w3.org/TR/REC-xml/#wf-Legalchar Characters referred to using character references must match the production for Char. That time Ján is right :-) Ouch. Then how do we represent the name of a storage volume, when the file system allows arbitrary bytes including control characters, in the volume name, but where we are restricted to only using valid XML? Do we just silently ignore such files as impossible volumes that libvirt cannot manage? (I'd rather omit such a volume from the list in the pool, than silently munge its name into something incorrect) -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString
On 03/30/2015 12:56 PM, Eric Blake wrote: On 03/30/2015 09:50 AM, Daniel Veillard wrote: NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them. you can't escape them with a CharRef for sure http://www.w3.org/TR/REC-xml/#wf-Legalchar Characters referred to using character references must match the production for Char. That time Ján is right :-) Ouch. Then how do we represent the name of a storage volume, when the file system allows arbitrary bytes including control characters, in the volume name, but where we are restricted to only using valid XML? Do we just silently ignore such files as impossible volumes that libvirt cannot manage? (I'd rather omit such a volume from the list in the pool, than silently munge its name into something incorrect) I'd say just make a follow up patch/bz to reject unrepresentable filenames before they hit the XML. It's a much less serious problem IMO - Cole -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString
On Mon, Mar 30, 2015 at 10:56:11AM -0600, Eric Blake wrote: On 03/30/2015 09:50 AM, Daniel Veillard wrote: NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them. you can't escape them with a CharRef for sure http://www.w3.org/TR/REC-xml/#wf-Legalchar Characters referred to using character references must match the production for Char. That time Ján is right :-) Ouch. Then how do we represent the name of a storage volume, when the file system allows arbitrary bytes including control characters, in the volume name, but where we are restricted to only using valid XML? Do we just silently ignore such files as impossible volumes that libvirt cannot manage? (I'd rather omit such a volume from the list in the pool, than silently munge its name into something incorrect) Since if such an invalid CharRef were to hit libxml2 you would get a parser error and no result. So you can safely assume nobody ever has experienced those. Then you can try to push an additional patch doing a libvirt escaping but of only those problematic characters prior to the encoding in the XML. Then escape them back when reading from the XML to libvirt internals. This should not affect any deployed instance since they would be unparseable if that was the case. I would suggest using the same charref escaping but before passing to XML, e.g. real path: /foo\3bar libvirt encoded: /foo#3;bar XML encoded: /fooamp;#3;bar you also need to catch and give him special status real path: /foobar libvirt encoded: /foo#38;bar XML encoded: /fooamp;#38;bar after libvirt parsing you end up with /foo#3;bar and each time you see #numericsequence; you translate that to the equivalent UTF-8 character. Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10] As a first approach, I would suggest just detecting bytes 1-8 0xB-0x1F and giving them the treatment, the probability of hitting surrogates in UTF-8 filesnames seems low enough that the patch should work in general. Whether using /foo#3;bar vs. /foo#0x3;bar is a matter of taste you only need to handle one IMHO. Add a little regression tests with all the lower caracter and use in the path and I think you're covered. Sounds too late for 1.2.14 though, Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString
On Mon, Mar 30, 2015 at 07:06:45AM -0600, Eric Blake wrote: On 03/30/2015 05:02 AM, Ján Tomko wrote: These cannot be represented in XML. Yes they can, via entities. DV would know for sure, but I think that #x01; is the entity for the C byte '\1'. no they can't :-) A character must match prod Char, even when using a CharRef http://www.w3.org/TR/REC-xml/#NT-Char [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]/* any Unicode character, excluding the surrogate blocks, FFFE, and . */ We have been stripping them, but only if the string had characters that needed escaping: ' Extend the strcspn check to include control codes, and strip them even if we don't do any escaping. NACK. Stripping control codes from a volume name represents the wrong name. We need to escape the problematic bytes, rather than strip them. you can't escape them with a CharRef for sure http://www.w3.org/TR/REC-xml/#wf-Legalchar Characters referred to using character references must match the production for Char. That time Ján is right :-) Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list