Petr Viktorin wrote:
On 02/27/2012 05:10 PM, Rob Crittenden wrote:
Rob Crittenden wrote:
Simo Sorce wrote:
On Mon, 2012-02-27 at 09:44 -0500, Rob Crittenden wrote:
We are pretty trusting that the data coming out of LDAP matches its
schema but it is possible to stuff non-printable characters into most
attributes.

I've added a sanity checker to keep a value as a python str type
(treated as binary internally). This will result in a base64 encoded
blob be returned to the client.

Shouldn't you try to parse it as a unicode string and catch
TypeError to
know when to return it as binary ?

Simo.


What we do now is the equivalent of unicode(chr(0)) which returns
u'\x00' and is why we are failing now.

I believe there is a unicode category module, we might be able to use
that if there is a category that defines non-printable characters.

rob

Like this:

import unicodedata

def contains_non_printable(val):
for c in val:
if unicodedata.category(unicode(c)) == 'Cc':
return True
return False

This wouldn't have the exclusion of tab, CR and LF like using ord() but
is probably more correct.

rob

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

If you're protecting the XML-RPC calls, it'd probably be better to look
at the XML spec directly: http://www.w3.org/TR/xml/#charsets

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

I'd say this is a good set for CLI as well.

And you can trap invalid UTF-8 sequences by catching the
UnicodeDecodeError from decode().


Replace my ord function with a regex that looks for invalid characters. Now catching that exception too and leaving as a str type.

rob
>From 32ee6038c484ee75a34300d061555ce870773635 Mon Sep 17 00:00:00 2001
From: Rob Crittenden <rcrit...@redhat.com>
Date: Sun, 26 Feb 2012 15:08:31 -0500
Subject: [PATCH] Detect non-printable characters in strings when decoding.

We use the LDAP schema to decide whether a value should be treated
as binary or not. This doesn't account for a user that somehow manages
to get binary data stuffed into a non-binary attribute though.

This has the potential to break either XML-RPC or the client trying to
display binary data as a string.

Internally anything that is a pyton str type is considered binary and
unicode is considered a string.  This patch looks at a string before
decoding it, potentially into a unicode value (what we consider a plain
string).

https://fedorahosted.org/freeipa/ticket/2131
---
 ipalib/encoder.py |   30 +++++++++++++++++++++++++++---
 1 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/ipalib/encoder.py b/ipalib/encoder.py
index 8d59bd3..14580b4 100644
--- a/ipalib/encoder.py
+++ b/ipalib/encoder.py
@@ -21,6 +21,11 @@ Encoding capabilities.
 """
 
 from decimal import Decimal
+import re
+from ipapython.ipa_log_manager import *
+
+# Declaring globally so we only have to compile this once
+non_printable = re.compile(u'[\x00-\x08\x0b\x0c\x0e-\x1F\uD800-\uDFFF\uFFFE\uFFFF]')
 
 class EncoderSettings(object):
     """
@@ -65,6 +70,19 @@ class Encoder(object):
             return val
         return self.decode(val)
 
+    def contains_non_printable(self, val):
+        """
+        The XML-RPC spec defines the following characters as allowed:
+         #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
+
+        The regular expression for the inverse of this is smaller so use that
+        to find those not allowed.
+
+        Returns True if any matches are found, False if the string is ok.
+        """
+        matches = re.match(non_printable, val)
+        return matches != None
+
     def encode(self, var):
         """
         Encode any python built-in python type variable into `self.encode_to`.
@@ -130,9 +148,15 @@ class Encoder(object):
         if isinstance(var, unicode):
             return var
         elif isinstance(var, str):
-            return self.encoder_settings.decode_postprocessor(
-                var.decode(self.encoder_settings.decode_from)
-            )
+            if self.contains_non_printable(var):
+                return var
+            try:
+                return self.encoder_settings.decode_postprocessor(
+                    var.decode(self.encoder_settings.decode_from)
+                )
+            except UnicodeDecodeError, e:
+                root_logger.error('Error decoding Unicode string %s: %s' % (var, str(e)))
+                return var
         elif isinstance(var, (bool, float, Decimal, int, long)):
             return var
         elif isinstance(var, list):
-- 
1.7.6

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to