Using LZW or similar compression is likely to give you substantially better 
file compression, if that's what you're after. Of course you'd have to 
re-expand it to use it.

The killer here I would guess is the use of [NSArray indexOfObject:] - it has 
to perform a string-by-string linear search until it finds a match. Instead, if 
you keep each word in a NSMutableSet (which uses hashing internally) you can 
test for membership in constant time.

Also, using - componentsSeparatedByString to get an array of words is simple, 
but going to be a killer on time and space. Instead you could parse and index 
the text as you go using NSScanner so that a n array of words is not made - as 
you scan each word, add it to the set (sets automatically only add a single 
instance). At the end of the scan, the set contains all unique words. I would 
suggest returning that set rather than putting it back together as a string - 
as a set it will be more useful for membership testing and you can easily 
convert that to a string or array as you need.

--Graham






On 10/02/2011, at 2:04 PM, Brad Stone wrote:

> I made this code to remove any duplicate words from a large group of text.  
> The result is stored in an index file so the text doesn't need to make sense. 
>  I'm removing the duplicates to save space in the index file.  I was 
> wondering if anyone had a suggestion for a more efficient way to 
> accomplishing this.  I'm guessing the separations and joins are taking up 
> memory and slowing things down (even though I'm not positive about that).  
> Using this code reduced the index file size form 4.7MB to 2.7MB.
> 
> Thanks
> 
> - (NSString *)abstractText:(NSString *)srcString {
>       NSMutableArray *resultArray = [[NSMutableArray alloc] init];
>       NSArray *textArray = [srcString componentsSeparatedByString:@" "];
>       for (NSString *s in textArray) {
>               
>               s = [s stringByTrimmingCharactersInSet:[NSCharacterSet 
> alphanumericCharacterSet]];
>               s = [s lowercaseString];
>               
>               if ([resultArray indexOfObject:s] == NSNotFound) {
>                       [resultArray addObject:s];
>               }
>       }
>       
>       NSString *resultString = nil;
>       if ([resultArray count] > 0) {
>               resultString = [resultArray componentsJoinedByString:@" "];
>       } else {
>               resultString = srcString;
>       }
>       return resultString;
> }_______________________________________________
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/graham.cox%40bigpond.com
> 
> This email sent to graham....@bigpond.com

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to