Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString

2015-03-30 Thread Pavel Hrdina
On Mon, Mar 30, 2015 at 01:02:49PM +0200, Ján Tomko wrote:
 These cannot be represented in XML.
 
 We have been stripping them, but only if the string had
 characters that needed escaping: '
 
 Extend the strcspn check to include control codes, and strip
 them even if we don't do any escaping.
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1184131
 https://bugzilla.redhat.com/show_bug.cgi?id=1066564
 ---
  src/util/virbuffer.c | 14 +++---
  tests/virbuftest.c   | 49 +
  2 files changed, 60 insertions(+), 3 deletions(-)
 

ACK, according to XML documentation characters below 0x20 except 0x09, 0x0A and
0x0D are forbidden.

Pavel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString

2015-03-30 Thread Eric Blake
On 03/30/2015 05:02 AM, Ján Tomko wrote:
 These cannot be represented in XML.

Yes they can, via entities.  DV would know for sure, but I think that
#x01; is the entity for the C byte '\1'.

 
 We have been stripping them, but only if the string had
 characters that needed escaping: '
 
 Extend the strcspn check to include control codes, and strip
 them even if we don't do any escaping.

NACK.  Stripping control codes from a volume name represents the wrong
name.  We need to escape the problematic bytes, rather than strip them.

 
 https://bugzilla.redhat.com/show_bug.cgi?id=1184131
 https://bugzilla.redhat.com/show_bug.cgi?id=1066564
 ---
  src/util/virbuffer.c | 14 +++---
  tests/virbuftest.c   | 49 +
  2 files changed, 60 insertions(+), 3 deletions(-)

As there are real bugs that will be fixed once we use the correct
entities, I'm looking forward to v2.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString

2015-03-30 Thread Ján Tomko
On Mon, Mar 30, 2015 at 07:06:45AM -0600, Eric Blake wrote:
 On 03/30/2015 05:02 AM, Ján Tomko wrote:
  These cannot be represented in XML.
 
 Yes they can, via entities.  DV would know for sure, but I think that
 #x01; is the entity for the C byte '\1'.
 

Only in XML 1.1.

For XML 1.0: http://www.w3.org/TR/xml/#dt-charref
  Well-formedness constraint: Legal Character

  Characters referred to using character references MUST match the
  production for Char.

Which is: http://www.w3.org/TR/xml/#NT-Char

  [2] Char   ::=  #x9 | #xA | #xD | [#x20-#xD7FF] |
  [#xE000-#xFFFD] | [#x1-#x10]/* any Unicode character,
  excluding the surrogate blocks, FFFE, and . */

Both libvirt and virt-xml-validate choke on those entities
error: (domain_definition):2: xmlParseCharRef: invalid xmlChar value 1
  namef21#x01;/name

DV was the one who wrote the code to skip over control characters in
commit b36f453a581f27a4a43558978724a52df32045bb (v0.3.0~1)
  new function virBufferEscapeString() to format a string while
  escaping its content for XML, and apply it to a couple of
  obvious places, should fix bug #206653

It was the optimization in
commit 0af02cb2e8d8192958735880e135ab69beb437c5 (v0.8.6~57)
buf: Simplify virBufferEscapeString
which broke this for strings that do have control codes,
but not escapable characters.

  
  We have been stripping them, but only if the string had
  characters that needed escaping: '
  
  Extend the strcspn check to include control codes, and strip
  them even if we don't do any escaping.
 
 NACK.  Stripping control codes from a volume name represents the wrong
 name.  We need to escape the problematic bytes, rather than strip them.
 
  
  https://bugzilla.redhat.com/show_bug.cgi?id=1184131
  https://bugzilla.redhat.com/show_bug.cgi?id=1066564
  ---
   src/util/virbuffer.c | 14 +++---
   tests/virbuftest.c   | 49 +
   2 files changed, 60 insertions(+), 3 deletions(-)
 
 As there are real bugs that will be fixed once we use the correct
 entities, I'm looking forward to v2.

This fixes the real bug of libvirt generating unparsable XML, which breaks
creation of any VMs in virt-manager.

Jan


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString

2015-03-30 Thread Eric Blake
On 03/30/2015 09:50 AM, Daniel Veillard wrote:

 NACK.  Stripping control codes from a volume name represents the wrong
 name.  We need to escape the problematic bytes, rather than strip them.
 
   you can't escape them with a CharRef for sure
 
 http://www.w3.org/TR/REC-xml/#wf-Legalchar
 Characters referred to using character references must match the
 production for Char.
 
   That time Ján  is right :-)

Ouch.  Then how do we represent the name of a storage volume, when the
file system allows arbitrary bytes including control characters, in the
volume name, but where we are restricted to only using valid XML?  Do we
just silently ignore such files as impossible volumes that libvirt
cannot manage?  (I'd rather omit such a volume from the list in the
pool, than silently munge its name into something incorrect)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString

2015-03-30 Thread Cole Robinson
On 03/30/2015 12:56 PM, Eric Blake wrote:
 On 03/30/2015 09:50 AM, Daniel Veillard wrote:
 
 NACK.  Stripping control codes from a volume name represents the wrong
 name.  We need to escape the problematic bytes, rather than strip them.

   you can't escape them with a CharRef for sure

 http://www.w3.org/TR/REC-xml/#wf-Legalchar
 Characters referred to using character references must match the
 production for Char.

   That time Ján  is right :-)
 
 Ouch.  Then how do we represent the name of a storage volume, when the
 file system allows arbitrary bytes including control characters, in the
 volume name, but where we are restricted to only using valid XML?  Do we
 just silently ignore such files as impossible volumes that libvirt
 cannot manage?  (I'd rather omit such a volume from the list in the
 pool, than silently munge its name into something incorrect)

I'd say just make a follow up patch/bz to reject unrepresentable filenames
before they hit the XML. It's a much less serious problem IMO

- Cole



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString

2015-03-30 Thread Daniel Veillard
On Mon, Mar 30, 2015 at 10:56:11AM -0600, Eric Blake wrote:
 On 03/30/2015 09:50 AM, Daniel Veillard wrote:
 
  NACK.  Stripping control codes from a volume name represents the wrong
  name.  We need to escape the problematic bytes, rather than strip them.
  
you can't escape them with a CharRef for sure
  
  http://www.w3.org/TR/REC-xml/#wf-Legalchar
  Characters referred to using character references must match the
  production for Char.
  
That time Ján  is right :-)
 
 Ouch.  Then how do we represent the name of a storage volume, when the
 file system allows arbitrary bytes including control characters, in the
 volume name, but where we are restricted to only using valid XML?  Do we
 just silently ignore such files as impossible volumes that libvirt
 cannot manage?  (I'd rather omit such a volume from the list in the
 pool, than silently munge its name into something incorrect)

 Since if such an invalid CharRef were to hit libxml2 you would
get a parser error and no result. So you can safely assume nobody
ever has experienced those. Then you can try to push an additional
patch doing a libvirt escaping but of only those problematic characters
prior to the encoding in the XML. Then escape them back when reading
from the XML to libvirt internals. This should not affect any deployed
instance since they would be unparseable if that was the case.
I would suggest using the same charref escaping but before passing to
XML, e.g.

real path:   /foo\3bar
libvirt encoded: /foo#3;bar
XML encoded: /fooamp;#3;bar

you also need to catch  and give him special status

real path:   /foobar
libvirt encoded: /foo#38;bar
XML encoded: /fooamp;#38;bar

after libvirt parsing you end up with /foo#3;bar
and each time you see #numericsequence; you translate that to
the equivalent UTF-8 character.

Char  ::=  #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | 
[#x1-#x10]

As a first approach, I would suggest just detecting bytes
1-8 0xB-0x1F and giving them the treatment, the probability of hitting
surrogates in UTF-8 filesnames seems low enough that the patch
should work in general.

Whether using /foo#3;bar vs. /foo#0x3;bar is a matter of taste
you only need to handle one IMHO.

Add a little regression tests with all the lower caracter and
 use in the path and I think you're covered.

Sounds too late for 1.2.14 though,

Daniel

-- 
Daniel Veillard  | Open Source and Standards, Red Hat
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Strip control codes in virBufferEscapeString

2015-03-30 Thread Daniel Veillard
On Mon, Mar 30, 2015 at 07:06:45AM -0600, Eric Blake wrote:
 On 03/30/2015 05:02 AM, Ján Tomko wrote:
  These cannot be represented in XML.
 
 Yes they can, via entities.  DV would know for sure, but I think that
 #x01; is the entity for the C byte '\1'.

  no they can't :-)
A character must match prod Char, even when using a CharRef

  http://www.w3.org/TR/REC-xml/#NT-Char


[2] Char   ::=  #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x1-#x10]/* any Unicode character,
excluding the surrogate blocks, FFFE, and . */

 
  
  We have been stripping them, but only if the string had
  characters that needed escaping: '
  
  Extend the strcspn check to include control codes, and strip
  them even if we don't do any escaping.
 
 NACK.  Stripping control codes from a volume name represents the wrong
 name.  We need to escape the problematic bytes, rather than strip them.

  you can't escape them with a CharRef for sure

http://www.w3.org/TR/REC-xml/#wf-Legalchar
Characters referred to using character references must match the
production for Char.

  That time Ján  is right :-)

Daniel

-- 
Daniel Veillard  | Open Source and Standards, Red Hat
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list