Re: Home-brewed code is 100X faster than -[NSScanner scanDecimal:] ??

2011-02-04 Thread Jerry Krinock
Sorry, in the last message I posted some stupid code which was written too late 
last night.  The -scanJSONNumber:accurately: implementation should be simply 
this:

- (BOOL)scanJSONNumber:(NSNumber**)number
accurately:(BOOL)accurately {
BOOL result = NO ;

if (!accurately) {
double daDouble ;
result = [self scanDouble:&daDouble] ;
if (result) {
*number = [NSNumber numberWithDouble:daDouble] ;
}
}
else {
NSDecimal decimal ;
// This local autorelease pool is quite necessary, when parsing
// strings with dense numbers, of 500K characters.  Otherwise,
// memory allocations go into the gigabytes.
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init] ;
result = [self scanDecimal:&decimal] ;
[pool release] ;
if (result) {
*number = [NSDecimalNumber decimalNumberWithDecimal:decimal] ;
}
}

return result ;
}

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Home-brewed code is 100X faster than -[NSScanner scanDecimal:] ??

2011-02-04 Thread Jerry Krinock

On 2011 Feb 03, at 13:57, glenn andreas wrote:

> Why not just use NSScanner's scanDouble: instead of trying to scan a string 
> that you think is valid and then convert to a double?

Well, because in running my tests I'd observed that using -scanDecimal: 
degraded performance even when it tried and failed to scan a non-number string, 
but I forgot that in real life you usually know enough about the structure of 
the string you're scanning that you rarely do this.

I've since realized that this thing is difficult to optimize in the general 
case.  What I'm actually trying to do is improve the performance of Blake 
Seely's BSJSONAdditions category on NSScanner, where it scans the JSON string 
representing a number value.  Blake used -scanDecimal:, which is bulletproof, 
but is sometimes way slow (read on), and few applications need 38 decimal 
places of accuracy.  The JSON standard allows "e/E" notation to be used in 
number strings, but does not allow for localized decimal separators; i.e. it's 
always ".".  And, in Blake's design, the method will not execute for valid JSON 
if there is not a valid number string at the scan location.  In my current 
project, the number strings are all integers, although many are 64-bit 
integers.  In some cases (accounting/finance??), one may need the 
38-decimal-digit precision of NSDecimal.

So, here's my last shot in case anyone is following this.  Probably there are 
still corner cases I haven't thought of, but anyhow, this works for me.  With 
the parameters as hard-coded below, the speedup of "not-accurately" vs. 
"scanDecimal" case is about 120x.  Anyone needing to optimize the performance 
of number-string scanning can adjust the parameters to fit their project.

One sure conclusion is that, if you plan to scan long strings, hundreds of KB, 
you will greatly improve performance, and possibly eliminate crashes, by 
surrounding -scanDecimal: with its own local autorelease pool.

Another conclusion is that the speedup is highly sensitive to the length of the 
string, which is controlled by the 'strexponent' in main().  For strings of 100 
characters or less, there is not much speed improvement.  However, for longer 
strings the speed improvement is approximately proportional to 2^strexponent, 
which is the length of the string.  This makes no sense to me, since, in my 
mind, -scanDecimal: should begin scanning at the scanLocation, scan whatever 
number string is there, and then stop.  It should not even need to know the 
characters before or after.  But probably I'm overlooking some unforeseen  
requirement of some corner case.

Thanks again for the feedback on this.

#import 

@interface NSScanner (Speedy)

/*!
 @briefScans for a string representing a number value, returning
 if found by reference, and offering an option to improve performance
 if "double precision" is sufficient.
 
 @details  The string to be scanned may be either a plain decimal
 number string such as "-123.456", or a string using scientific
 "e" notation such as "-123e10".
 
 The performance improvement when using accurately:NO is typically
 120x when scanning a string of 60K characters, and also autoreleased
 memory allocations are greatly reduced.  The improvement is
 propotional to the string length in this regime.
 
 This method is optimized assuming that the decimal point in the
 scanned numbers is the dot ".", and never localized to ",".
 
 Invoke this method with NULL as number to simply scan past a string
 representation of a number.
 
 @paramnumber  Upon return, contains the scanned value as either an
 NSNumber or NSDecimalNumber value.

 @paramaccurately  If YES, the 'number' pointed to upon return will
 be an NSDecimalNumber, with 38 decimal digits of precision.  If NO,
 the 'number' pointed to upon return will be an NSNumber whose value
 is only as accurate as the 'binary64' variant defined in the IEEE
 Standard for Floating-Point Arithmetic (IEEE Standard 754).
 
 @result   YES if the receiver scans anything, otherwise NO.
 */
- (BOOL)scanJSONNumber:(NSNumber**)number
accurately:(BOOL)accurately ;

@end

@implementation NSScanner (Speedy)

- (BOOL)scanJSONNumber:(NSNumber**)number
accurately:(BOOL)accurately {
BOOL result = NO ;

if (!accurately) {
NSCharacterSet* integerSet = [NSCharacterSet 
characterSetWithCharactersInString:@"0123456789+-"] ;
NSCharacterSet* floatSet = [NSCharacterSet 
characterSetWithCharactersInString:@"0123456789+-eE."] ;
NSInteger scanLocation = [self scanLocation] ;
[self scanCharactersFromSet:integerSet
 intoString:NULL] ;
NSInteger nextNonInteger = [self scanLocation] ;
[self setScanLocation:scanLocation] ;
[self scanCharactersFromSet:floatSet
 intoString:NULL] ;
NSInteger nextNonFloat = [self scanLocation] ;
[self setScanLocation:scanLocation] ;

BOOL isInteger = (nextNonFlo

Re: Home-brewed code is 100X faster than -[NSScanner scanDecimal:] ??

2011-02-03 Thread glenn andreas

On Feb 3, 2011, at 3:05 PM, Jerry Krinock wrote:

> Thanks, Glenn, and thanks, Matt, for the explanation.
> 
> So I added a parameter to my method which one can use to get the performance 
> improvement only if they are willing to compromise accuracy and parsing of 
> numbers written in "e" notation.  Also, I chose a name which is less likely 
> to used by Apple.
> 

Why not just use NSScanner's scanDouble: instead of trying to scan a string 
that you think is valid and then convert to a double?

For example, if the input string is @"123.45-67.89" you'll end up scanning the 
entire string (which scanDecimal: or scanDouble: will only scan up through the 
"5").

Using scanDouble: will give you back support for scientific notation and 
consistent handling of "unexpected" strings, and all the other subtleness that 
can occur with floating point parsing (for example, you're code also misses 
handling things like "+123")

Glenn Andreas  gandr...@gandreas.com 
The most merciful thing in the world ... is the inability of the human mind to 
correlate all its contents - HPL

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Home-brewed code is 100X faster than -[NSScanner scanDecimal:] ??

2011-02-03 Thread Jerry Krinock
Thanks, Glenn, and thanks, Matt, for the explanation.

So I added a parameter to my method which one can use to get the performance 
improvement only if they are willing to compromise accuracy and parsing of 
numbers written in "e" notation.  Also, I chose a name which is less likely to 
used by Apple.

@interface NSScanner (BSSpeedy)

/*!
 @briefScans for a string representing a value, returning if found
 value by reference, and offering an option to improve performance if 
 scanning of strings representing numbers in "e" notatation is not
 required, and if double-precision accuracy is sufficient.
 
 @details  The performance improvement when using accurately:NO is typically
 100x, and also autoreleased memory allocations are greatly reduced.
 
 Invoke this method with NULL as number to simply scan past a string
 representing a number.
 
 @paramnumber  Upon return, contains the scanned value as either an
 NSNumber or NSDecimalNumber value.
 @paramaccurately  If YES, the 'number' pointed to upon return will be
 an NSDecimalNumber, and strings representing numbers in "e" notation
 (i.e. a significand, followed by 'e' or 'E', followed by an exponent)
 will be properly scanned.
 If NO, the 'number' pointed to upon return will be an NSNumber whose value
 will represent the input string only to double-precision value, and strings
 representing numbers in "e" notation will not be scanned properly; the
 significand and the exponent will be scanned as separate numbers.
 
 @result   YES if the receiver finds a string that it can scan to a number
 value, otherwise NO.
 */
- (BOOL)scanBSNumber:(NSNumber**)number
  accurately:(BOOL)accurately ;

@end

@implementation NSScanner (BSSpeedy)

- (BOOL)scanBSNumber:(NSNumber**)number
  accurately:(BOOL)accurately {
NSString* string = nil ;
BOOL isDecimal ;

if (accurately) {
NSDecimal decimal ;
isDecimal = [self scanDecimal:&decimal];
if (isDecimal) {
*number = [NSDecimalNumber decimalNumberWithDecimal:decimal] ;
}
}
else {
NSCharacterSet* decimalSet ;
decimalSet = [NSCharacterSet 
characterSetWithCharactersInString:@"0123456789-."] ;
isDecimal = [self scanCharactersFromSet:decimalSet
 intoString:&string] ;
if (isDecimal) {
double value ;
value = [string doubleValue] ;  // See Note, below
*number = [NSNumber numberWithDouble:value] ;
}
}

return isDecimal ;

/*
 Note.  At first, I'd written a more complicated implementation which 
checked
 if the number was an integer and if so used -numberWithInteger instead of
 -numberWithDouble, but then found that performance of the current
 implementation here is the same.  I suppose this is because we've been
 doing floating-point calculations in hardware for quite a few years now.
 */
}

@end

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Home-brewed code is 100X faster than -[NSScanner scanDecimal:] ??

2011-02-03 Thread glenn andreas

On Feb 3, 2011, at 10:35 AM, Jerry Krinock wrote:

> -[NSScanner scanDecimal:] takes an average of 4 milliseconds to scan a short 
> string of decimal digits, which means tens of seconds for thousands of scans, 
> which is unacceptable for my application.  Also, excessive memory allocations 
> require a local autorelease pool around each invocation.
> 
> Surprisingly, I was able to fix both problem by replacing -scanDecimal: with 
> a rather bone-headed home-brew implementation, using 
> -scanCharactersFromSet:intoString: instead.  The home-brew implementation 
> runs 100 to 150 times faster.  How can this be?
> 

Your scanner doesn't correctly handle all valid decimals.

First, you don't handle scientific notation.

More importantly, you do not handle the full range of NSDecimal which is 
documented to be 38 _DECIMAL_ digits.  This is important because scanning in 
0.1 into a double will result in loss of accuracy (since 0.1 can't be expressed 
exactly in binary), where as NSDecimal will be able to handle it correctly.  
For that matter, 52 bits of mantissa on a double is only approximately 16 
decimal digits (no where near as close as the 38 decimal digits of NSDecimal).




Glenn Andreas  gandr...@gandreas.com 
The most merciful thing in the world ... is the inability of the human mind to 
correlate all its contents - HPL

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Home-brewed code is 100X faster than -[NSScanner scanDecimal:] ??

2011-02-03 Thread Matt Gough
Probably because you aren't comparing like-for-like; did you try comparing it 
with [NSScanner scanDouble:] instead?

Your code wouldn't cope with the HUGE range of numbers that scanDecimal could 
be required to cope with, nor the variety of formats that NSDecimalNumber could 
scan.

Do you really need NSDecimalNumber support? Seems that doubles are fine for 
your needs.

Matt Gough


On 3 Feb 2011, at 16:35, Jerry Krinock wrote:

> -[NSScanner scanDecimal:] takes an average of 4 milliseconds to scan a short 
> string of decimal digits, which means tens of seconds for thousands of scans, 
> which is unacceptable for my application.  Also, excessive memory allocations 
> require a local autorelease pool around each invocation.
> 
> Surprisingly, I was able to fix both problem by replacing -scanDecimal: with 
> a rather bone-headed home-brew implementation, using 
> -scanCharactersFromSet:intoString: instead.  The home-brew implementation 
> runs 100 to 150 times faster.  How can this be?
> 
> #import 
> 
> @interface NSScanner (Speedy)
> 
> - (BOOL)scanDecimalNumber:(NSNumber**)number ;
> 
> @end
> 
> @implementation NSScanner (Speedy)
> 
> /*
> Bone-headed home-brew implementation
> */
> - (BOOL)scanDecimalNumber:(NSNumber**)number {
>NSString* string = nil ;
>NSCharacterSet* decimalSet ;
>decimalSet = [NSCharacterSet 
> characterSetWithCharactersInString:@"0123456789-."] ;
>BOOL isDecimal = [self scanCharactersFromSet:decimalSet
>  intoString:&string] ;
>if (isDecimal) {
>double value ;
>value = [string doubleValue] ;  // See Note, below
>*number = [NSNumber numberWithDouble:value] ;
>}
> 
>return isDecimal ;
> 
>/*
> Note.  At first, I'd written a more complicated implementation which 
> checked
> if the number was an integer and if so used -numberWithInteger instead of
> -numberWithDouble, but then found that performance of the current
> implementation here is the same.  I suppose this is because we've been
> doing floating-point calculations in hardware for quite a few years now.
> */
> }
> 
> @end
> 
> NSTimeInterval TestScanner(
>   NSScanner *scanner,
>   BOOL homeBrew,
>   NSInteger exponent) {
>// We use a local autorelease pool, otherwise -scanDecimal: starts
>// using gigabytes of virtual memory, which makes both the HomeBrew
>// and scanDecimal methods take longer, and slow our Mac.
>NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init] ;
> 
>[scanner setScanLocation:0] ;
>double checksum = 0.0 ;
>NSDate* date = [NSDate date] ;
>while (![scanner isAtEnd]) {
>NSNumber* number = nil ;
>BOOL didScanNumber ;
>if (homeBrew) {
>didScanNumber = [scanner scanDecimalNumber:&number] ;
>}
>else {
>NSDecimal decimal;
>didScanNumber = [scanner scanDecimal:&decimal];
>if (didScanNumber) {
>number = [NSDecimalNumber decimalNumberWithDecimal:decimal];
>}
>}
>if (didScanNumber) {
>double value = [number doubleValue] ;
>checksum += value ;
>}
>else {
>[scanner setScanLocation:[scanner scanLocation] + 1] ;
>}
>}
> 
>/* The template includes four numbers: 1, -2, 3.5, -1.5.  These
> numbers sum to 1.  Therefore when we add together the sums
> of all the scanned numbers, the "checksum" should be equal
> to the number of times that the template appears in the
> scanner's string, which is 2^exponent. */
>BOOL checksumOK = checksum == pow(2,exponent) ;
> 
>NSTimeInterval elapsed = -[date timeIntervalSinceNow] ;
>NSLog(@"%@:  time = %9.3e seconds   checksum = %0.16f (%@)",
>  homeBrew ? @"Using Home-Brew" : @"Using -scanDecimal:",
>  elapsed,
>  checksum,
>  checksumOK ? @"OK" : @"Error!") ;
>[pool release] ;
> 
>return elapsed ;
> }
> 
> #define HOME_BREW YES
> #define SCAN_DECIMAL NO
> 
> int main(int argc, char *argv[]) {
>NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init] ;
> 
>NSString* template = @"+int:1 -int:-2 +float:3.5 -float:-1.5 " ;
>NSMutableString* s = [template mutableCopy] ;
>NSInteger i ;
>NSInteger exponent = 9 ;
>// Repeat the string 2^exponent times
>for (i=0; i[s appendString:s] ;
>}
> 
>NSScanner* scanner = [[NSScanner alloc] initWithString:s] ;
> 
>NSTimeInterval elapsedScanDecimal = 0.0 ;
>NSTimeInterval elapsedHomeBrew = 0.0 ;
> 
>NSInteger j ;
>for (j=0; j<5; j++) {
>elapsedScanDecimal += TestScanner(scanner, SCAN_DECIMAL, exponent) ;
>elapsedHomeBrew += TestScanner(scanner, HOME_BREW, exponent) ;
>}
>NSLog (@"Speed Improvement HomeBrew/scanDecimal: %0.1f",
>   elapsedScanDecimal/elapsedHomeBrew)

Home-brewed code is 100X faster than -[NSScanner scanDecimal:] ??

2011-02-03 Thread Jerry Krinock
-[NSScanner scanDecimal:] takes an average of 4 milliseconds to scan a short 
string of decimal digits, which means tens of seconds for thousands of scans, 
which is unacceptable for my application.  Also, excessive memory allocations 
require a local autorelease pool around each invocation.

Surprisingly, I was able to fix both problem by replacing -scanDecimal: with a 
rather bone-headed home-brew implementation, using 
-scanCharactersFromSet:intoString: instead.  The home-brew implementation runs 
100 to 150 times faster.  How can this be?

#import 

@interface NSScanner (Speedy)

- (BOOL)scanDecimalNumber:(NSNumber**)number ;

@end

@implementation NSScanner (Speedy)

/*
 Bone-headed home-brew implementation
 */
- (BOOL)scanDecimalNumber:(NSNumber**)number {
NSString* string = nil ;
NSCharacterSet* decimalSet ;
decimalSet = [NSCharacterSet 
characterSetWithCharactersInString:@"0123456789-."] ;
BOOL isDecimal = [self scanCharactersFromSet:decimalSet
  intoString:&string] ;
if (isDecimal) {
double value ;
value = [string doubleValue] ;  // See Note, below
*number = [NSNumber numberWithDouble:value] ;
}

return isDecimal ;

/*
 Note.  At first, I'd written a more complicated implementation which 
checked
 if the number was an integer and if so used -numberWithInteger instead of
 -numberWithDouble, but then found that performance of the current
 implementation here is the same.  I suppose this is because we've been
 doing floating-point calculations in hardware for quite a few years now.
 */
}

@end

NSTimeInterval TestScanner(
   NSScanner *scanner,
   BOOL homeBrew,
   NSInteger exponent) {
// We use a local autorelease pool, otherwise -scanDecimal: starts
// using gigabytes of virtual memory, which makes both the HomeBrew
// and scanDecimal methods take longer, and slow our Mac.
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init] ;

[scanner setScanLocation:0] ;
double checksum = 0.0 ;
NSDate* date = [NSDate date] ;
while (![scanner isAtEnd]) {
NSNumber* number = nil ;
BOOL didScanNumber ;
if (homeBrew) {
didScanNumber = [scanner scanDecimalNumber:&number] ;
}
else {
NSDecimal decimal;
didScanNumber = [scanner scanDecimal:&decimal];
if (didScanNumber) {
number = [NSDecimalNumber decimalNumberWithDecimal:decimal];
}
}
if (didScanNumber) {
double value = [number doubleValue] ;
checksum += value ;
}
else {
[scanner setScanLocation:[scanner scanLocation] + 1] ;
}
}

/* The template includes four numbers: 1, -2, 3.5, -1.5.  These
 numbers sum to 1.  Therefore when we add together the sums
 of all the scanned numbers, the "checksum" should be equal
 to the number of times that the template appears in the
 scanner's string, which is 2^exponent. */
BOOL checksumOK = checksum == pow(2,exponent) ;

NSTimeInterval elapsed = -[date timeIntervalSinceNow] ;
NSLog(@"%@:  time = %9.3e seconds   checksum = %0.16f (%@)",
  homeBrew ? @"Using Home-Brew" : @"Using -scanDecimal:",
  elapsed,
  checksum,
  checksumOK ? @"OK" : @"Error!") ;
[pool release] ;

return elapsed ;
}

#define HOME_BREW YES
#define SCAN_DECIMAL NO

int main(int argc, char *argv[]) {
NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init] ;

NSString* template = @"+int:1 -int:-2 +float:3.5 -float:-1.5 " ;
NSMutableString* s = [template mutableCopy] ;
NSInteger i ;
NSInteger exponent = 9 ;
// Repeat the string 2^exponent times
for (i=0; ihttp://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com