Using LZW or similar compression is likely to give you substantially better file compression, if that's what you're after. Of course you'd have to re-expand it to use it.
The killer here I would guess is the use of [NSArray indexOfObject:] - it has to perform a string-by-string linear search until it finds a match. Instead, if you keep each word in a NSMutableSet (which uses hashing internally) you can test for membership in constant time. Also, using - componentsSeparatedByString to get an array of words is simple, but going to be a killer on time and space. Instead you could parse and index the text as you go using NSScanner so that a n array of words is not made - as you scan each word, add it to the set (sets automatically only add a single instance). At the end of the scan, the set contains all unique words. I would suggest returning that set rather than putting it back together as a string - as a set it will be more useful for membership testing and you can easily convert that to a string or array as you need. --Graham On 10/02/2011, at 2:04 PM, Brad Stone wrote: > I made this code to remove any duplicate words from a large group of text. > The result is stored in an index file so the text doesn't need to make sense. > I'm removing the duplicates to save space in the index file. I was > wondering if anyone had a suggestion for a more efficient way to > accomplishing this. I'm guessing the separations and joins are taking up > memory and slowing things down (even though I'm not positive about that). > Using this code reduced the index file size form 4.7MB to 2.7MB. > > Thanks > > - (NSString *)abstractText:(NSString *)srcString { > NSMutableArray *resultArray = [[NSMutableArray alloc] init]; > NSArray *textArray = [srcString componentsSeparatedByString:@" "]; > for (NSString *s in textArray) { > > s = [s stringByTrimmingCharactersInSet:[NSCharacterSet > alphanumericCharacterSet]]; > s = [s lowercaseString]; > > if ([resultArray indexOfObject:s] == NSNotFound) { > [resultArray addObject:s]; > } > } > > NSString *resultString = nil; > if ([resultArray count] > 0) { > resultString = [resultArray componentsJoinedByString:@" "]; > } else { > resultString = srcString; > } > return resultString; > }_______________________________________________ > > Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > http://lists.apple.com/mailman/options/cocoa-dev/graham.cox%40bigpond.com > > This email sent to graham....@bigpond.com _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com