I spoke too fast, having a second look I found that it was possible to make the
Match strings point to a unique object. I committed this optimization in r4964
and verified that no regression is introduced.
Before:
$ time /usr/local/bin/macruby -e "text=File.read('/tmp/foo.txt');
freq=Hash.new(0); text.scan(/\w+/) {}"
real 0m2.430s
user 0m1.628s
sys 0m1.030s
After :)
$ time ./miniruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0);
text.scan(/\w+/) {}"
real 0m0.121s
user 0m0.100s
sys 0m0.015s
Laurent
On Dec 1, 2010, at 2:46 PM, Laurent Sansonetti wrote:
> Hi Yasu,
>
> I ran your tests in Shark. Tests 1 and 3 are significantly slower because
> #scan and #gsub are called with a block, which means MacRuby has to create a
> new Match object for every yield, to conform to the Ruby specs. Each Match
> object contains a copy of the original string.
>
> MacRuby has a slow memory allocator (much slower than the original Ruby), so
> one must be careful to not allocate too many objects. This is something we
> are working on, unfortunately MacRuby doesn't fully control the object
> allocator, as it resides in the libauto library (the Objective-C garbage
> collector).
>
> In your case, I recommend using the method in Test 2, which is to not pass a
> block.
>
> It is possible that we can reduce memory usage when doing regexps in MacRuby,
> however after having a quick look at the source code I am not sure something
> can be done for 0.8 :(
>
> Laurent
>
> On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote:
>
>> Hello,
>>
>> I'm rewriting an app for text analysis in MacRuby, which I originally wrote
>> in RubyCocoa. But I encountered a serious performance issue in MacRuby,
>> which is related to processing text using regular expressions.
>>
>> I'm wondering if this will be taken care of in the near future (or already
>> done in 0.8?).
>>
>> Below are my simple tests. The first two are essentially the same with a
>> slightly different approach. Both are simply counting frequency of each
>> word. I want to use the first approach not to count word frequencies, but
>> in other processes. The third one is to test the speed of String#gsub with
>> regular expression. I felt String#gsub was slow in my app, so I just wanted
>> to test how slow it is compared to RubyCocoa.
>>
>>
>> Test 1 - scan-block
>>
>> freq = Hash.new(0)
>> text.scan(/\w+/) do |word|
>> freq[word] += 1
>> end
>>
>>
>> Test 2 - scan array.each
>>
>> freq = Hash.new(0)
>> text.scan(/\w+/).each do |word|
>> freq[word] += 1
>> end
>>
>>
>> Test 3 - gsub upcase
>>
>> text.gsub!(/\w+/){|x| x.upcase}
>>
>>
>> The results are in seconds. The original text is in English with 8154
>> words. Each process was repeated 10 times to calculate processing times.
>> Each test were done 3 times.
>>
>> Ruby 1.8.7 Test1 - scan-block: 0.542, 0.502,
>> 0.518
>> Ruby 1.8.7 Test2 - scan array.each: 0.399, 0.392,
>> 0.399
>> Ruby 1.8.7 Test3 - gsub upcase: 0.384, 0.349, 0.390
>>
>> MacRuby 0.7.1 Test1 - scan-block: 27.612, 27.707, 27.453
>> MacRuby 0.7.1 Test2 - scan array.each: 3.556, 3.616, 3.554
>> MacRuby 0.7.1 Test3 - gsub upcase: 27.613, 26.826, 27.327
>>
>>
>> Thanks,
>> Yasu
>> _______________________________________________
>> MacRuby-devel mailing list
>> [email protected]
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>
> _______________________________________________
> MacRuby-devel mailing list
> [email protected]
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
_______________________________________________
MacRuby-devel mailing list
[email protected]
http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel