Sorry, it's 5:00am here and needless to say it's waaaay past my bedtime and
I'm making mistakes.
The comparison should have been between both ruby versions .... ugh.

I'll let you play though.  Have a great night.

On Sat, Jan 15, 2022 at 4:57 AM Paul Procacci <pproca...@gmail.com> wrote:

> Hey John,
>
> One more follow up and then it's bedtime for me.  I wanted to further this
> discussion just a little bit more by implementing the mmap solution that I
> applied to perl to ruby instead.  Now all of a sudden, ruby is much much
> faster.  My ruby source code follows:
>
> Goodnight!
>
> # ruby -W0 ./doit.rb | md5
> 786be54356a5832dcd1148c18de71fc8
> # perl ./doit2.pl | md5
> 786be54356a5832dcd1148c18de71fc8
>
>
> # truss -c ruby -W0 ./doit.rb
> <!-- snip -->
>                       ------------- ------- -------
>                         0.014111502    1855     260
>
> # truss -c perl ./doit2.pl
> <!-- snip -->
>                       ------------- ------- -------
>                         0.049820267     777      52
>
>
>
> -------------------------------------
> require 'mmap';
>
> stopwords = {}
> mmap_s = Mmap.new('stopwords.txt')
> mmap_s.advise(Mmap::MADV_SEQUENTIAL)
> mmap_s.each_line do |s|
>   s.strip!
>   stopwords[s] =1
> end
>
> count = {}
> mmap_c = Mmap.new('words.txt')
> mmap_c.advise(Mmap::MADV_SEQUENTIAL)
> mmap_c.each_line do |s|
>   s.strip!
>   if ! stopwords.has_key?(s)
>     if count.has_key?(s)
>        count[s] += 1
>     else
>        count[s] = 1
>     end
>   end
> end
>
> z = count.sort {|a1,a2| a2[1]<=>a1[1]}
> z.take(20).each do |s| puts "#{s[0]} -> #{s[1]}" end
>
> On Sat, Jan 15, 2022 at 3:48 AM Paul Procacci <pproca...@gmail.com> wrote:
>
>> Hey John,
>>
>> On Sat, Jan 15, 2022 at 3:04 AM Jon Smart <j...@smartown.nl> wrote:
>>
>>>
>>> Hello Paul
>>>
>>> Do you mean by undef $/ and with <$fh> we can read the file into memory
>>> at one time?
>>>
>>
>> In most cases the short answer is yes.
>> I have problems with your wording however given the 'geek' that I am.
>> 'At one time' .... not quite.  In your example there were over 4000 read(2)
>> syscalls by the operating system for instance.  This wouldn't have been 'at
>> one time'.  ;)
>>
>>
>> Yes that would be faster b/c we don't need to read file by each line,
>>> which increases the disk IO.
>>>
>>>
>> It actually doesn't make it faster.
>> Perl buffers it's reads as does all modern programming languages.  If you
>> ask perl to give you 10 bytes it certainly will, but what you don't know is
>> that perl has really read up to 8192 bytes.  It only gave you what you
>> asked for and the rest is sitting in perl buffers.
>> To put this another way, you can put 8192 newline characters in a file
>> and read this file line by line.  This doesn't equate to 8192 separate
>> read(2) syscalls ... it's just 1 read syscall.  It won't be faster nor
>> slower.
>>
>>
>>
>>> Another questions:
>>> 1. what's the "truss" command?
>>>
>>
>> truss is akin to strace.  If you're on linux, you can install strace and
>> get the samish type of utility.
>> It allows you to trace system calls and see how much of your time for a
>> given program is waiting on the kernel and/or how often it's asking the
>> kernel to do something.
>>
>> 2. what's the syntax "<:mmap"?
>>>
>>> mmap is a method of mapping a file (among other things) into memory on
>> an on-demand basis.
>> Given the example you provided, this is actually where the speed up comes
>> from.  This is because my version removes the 4000+ read(2) syscalls in
>> favor of just 2 mmap(2) syscalls.
>>
>> Thank you.
>>
>>
>> ~Paul
>>
>
>
> --
> __________________
>
> :(){ :|:& };:
>


-- 
__________________

:(){ :|:& };:

Reply via email to