NSString bug with test and really dodgy patch.

Chris Ball Tue, 02 Oct 2012 14:10:28 -0700

I ran into some problems the other day with clang's address sanitizer getting
very cross with me and I eventually chased it down to [NSString initWithFormat:
arguments:]  specifically GSFormat.m blithely calls strlen on my input string
even if I have specified a precision, this is not good given that I'm not
dealing with null terminated strings for the most part.  I expect the same bug
affects stringWithFormat and probably other stuff as well but I haven't bothered
to look into what all uses the GSFormat stuff too closely.



Here is a little test driver showing the problem (you will need to be
running clang's address sanitizer or electric fence or something for an error
to trigger since the nature of the bug is that it is looking at memory that
doesn't belong to it. 

------------------SNIP SNIP-------------------------------
#import <Foundation/Foundation.h>

#define BYTES 50
#define STRING
"12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"

void sub( const id arg, ...);

int main( void )
{
    NSAutoreleasePool *pool  = [[NSAutoreleasePool alloc] init];
    char *bob;

    bob = malloc( BYTES );
    memcpy( bob, STRING, BYTES );

    sub( @"%.*s\n", BYTES , bob );

    [pool release];
}


void sub( const id arg, ...)
{
    va_list ap;
    NSString *str;

    va_start(ap, arg);
    str=[[NSString alloc] initWithFormat:arg arguments:ap];
    va_end(ap);

    NSLog(@"%@", str );

    [str release];
}
------------------------SNIP SNIP---------------------------------


The fix I put together pretty much says that if a precision is specified, use
that for the length otherwise feel free to call strlen(), if I'm reading the
code right, I think this should work fine albeit with a slight potential
performance loss when there is a unicode string with a precision less than a
quarter of the string's length (and doubtless a trivial performance increase the
rest of the time since it isn't iterating a string for no reason).  Having said
all this I leave it to someone with a better familiarity with this stuff to
vet it because I only took a fairly shallow look at the code to get where I am.

-----------------------PATCHY PATCH-------------------------------
--- gnustep-base-1.24.0/Source/GSFormat.m
2011-12-15 01:42:39.000000000 -0800 +++ GSFormat.m  2012-10-02
13:12:26.778053030 -0700 @@ -1759,59 +1759,35 @@
                byteEncoding = GSPrivateIsByteEncoding(enc);
              }
 
-           len = strlen(str);  // Number of bytes to convert.
-           blen = len;         // Size of unichar output buffer.
-
-           if (prec != -1)
+        if ( prec == -1 )
+          {
+            len = strlen(str); // Number of bytes to convert.
+            blen = len;                // Size of unichar output buffer.
+          } 
+        else
              {
-               if (prec < len)
-                 {
-                   /* We don't neeed an output buffer bigger than the
-                    * precision specifies.
-                    */
-                   blen = prec;
-                 }
-               if (byteEncoding == YES)
-                 {
-                   /* Where the external encoding is one byte per character,
-                    * we know we don't need to convert more bytes than the
-                    * precision required for output.
-                    */
-                   if (prec < len)
-                     {
-                       len = prec;
-                     }
-                 }
-               else if (prec * 4 < len)
-                 {
-                   /* We assume no multibyte encoding is going to use more
-                    * than the maximum four bytes used by utf-8 for any
-                    * unicode code point.  So we do not need to convert
-                    * more than four times the precision.
-                    */
-                   len = prec * 4;
-                 }
-             }
+            /* We don't need an output buffer bigger than the
+             * precision specifies.
+             */
+            len = prec;
+            blen = prec;
 
-           /* Allocate dynamically an array which definitely is long
-            * enough for the unichar version.
-            */
-           if (blen < 8192 || ((string = (unichar *)
-             NSZoneMalloc(s->_zone, blen * sizeof(unichar))) == NULL))
-             string = (unichar *) alloca (blen * sizeof(unichar));
-           else
-             string_malloced = 1;
+            /* Allocate dynamically an array which definitely is long
+             * enough for the unichar version.
+             */
+            if (blen < 8192 || ((string = (unichar *)
+              NSZoneMalloc(s->_zone, blen * sizeof(unichar))) == NULL))
+              string = (unichar *) alloca (blen * sizeof(unichar));
+            else
+              string_malloced = 1;
 
-           GSToUnicode(&string, &blen, (unsigned char*)str, len, enc, 0, 0);
+            GSToUnicode(&string, &blen, (unsigned char*)str, len, enc, 0, 0);
 
-           /* get the number of unichars produced and truncate to the
-            * required output precision if necessary.
-            */
-           len = blen;
-           if (prec != -1 && prec < len)
-             {
-               len = prec;
-             }
+            /* get the number of unichars produced and truncate to the
+             * required output precision if necessary.
+             */
+            len = blen;
+          }
          }
        else
          {
-----------------------PATCHY PATCH-------------------------------


        Chris.

_______________________________________________
Gnustep-dev mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gnustep-dev

NSString bug with test and really dodgy patch.

Reply via email to