Re: [julia-users] Re: TinySegmenter benchmark
Finally, I compared with C++ and Go version using TinySegmenterMaker. https://github.com/shogo82148/TinySegmenterMaker/pull/10 The resulting times(in seconds for 100 loops a text file) is following: RubyC++PerlNode.jsGoPythonJulia132.9848134105.3110.50111.8511.70 After my blog post, ikawaha optimized Golang version using same way we did, and golang gets faster than Julia. On Thu, Oct 22, 2015 at 11:45 PM Michiaki ARIGAwrote: > Masahiro Nakagawa a.k.a. repeatedly told me my mistakes of the benchmark, > I re-benchmarked. > > Node.jsPython2Python3JuliaRuby9.6293.0823.941.4619.44 > > - loop number of Python was 10 times smaller than other languages > - repeatedly optimized Ruby implementation > - changed loop size from 100 to 10 > > repeatedly also benchmarked in dlang, I will do it after updating El > Capitan :) > http://repeatedly.github.io/ja/2015/10/tinysegmenter-benchmark-and-d/ > > > On Thu, Oct 22, 2015 at 6:28 AM Pontus Stenetorp > wrote: > >> On 21 October 2015 at 17:49, Stefan Karpinski >> wrote: >> > >> > That's an excellent performance comparison case study! Nice work, >> Chezou and nice blog post (the Google translation is pretty readable). >> >> Very readable indeed and I am always happy to see more NLP code in >> Julia! Keep up the good work! >> >> Pontus >> >
Re: [julia-users] Re: TinySegmenter benchmark
I'm kind of surprised that C++ is so slow. I would imagine that anything you can do performance-wise in Go or Julia, you ought to be able to do in C++. Any idea what's going on there? On Fri, Nov 6, 2015 at 12:00 PM, Michiaki ARIGAwrote: > Finally, I compared with C++ and Go version using TinySegmenterMaker. > https://github.com/shogo82148/TinySegmenterMaker/pull/10 > > The resulting times(in seconds for 100 loops a text file) is following: > RubyC++PerlNode.jsGoPythonJulia132.9848134105.3110.50111.8511.70 > > After my blog post, ikawaha optimized Golang version using same way we > did, and golang gets faster than Julia. > > On Thu, Oct 22, 2015 at 11:45 PM Michiaki ARIGA wrote: > >> Masahiro Nakagawa a.k.a. repeatedly told me my mistakes of the benchmark, >> I re-benchmarked. >> >> Node.jsPython2Python3JuliaRuby9.6293.0823.941.4619.44 >> >> - loop number of Python was 10 times smaller than other languages >> - repeatedly optimized Ruby implementation >> - changed loop size from 100 to 10 >> >> repeatedly also benchmarked in dlang, I will do it after updating El >> Capitan :) >> http://repeatedly.github.io/ja/2015/10/tinysegmenter-benchmark-and-d/ >> >> >> On Thu, Oct 22, 2015 at 6:28 AM Pontus Stenetorp >> wrote: >> >>> On 21 October 2015 at 17:49, Stefan Karpinski >>> wrote: >>> > >>> > That's an excellent performance comparison case study! Nice work, >>> Chezou and nice blog post (the Google translation is pretty readable). >>> >>> Very readable indeed and I am always happy to see more NLP code in >>> Julia! Keep up the good work! >>> >>> Pontus >>> >>
Re: [julia-users] Re: TinySegmenter benchmark
Note that the Go version went one step further and packed the tuples into 64-bit integers where possible. We could do the same thing, though at this point we seem to be hitting the point of diminishing returns.
Re: [julia-users] Re: TinySegmenter benchmark
The C++ version is basically transcribed from the JavaScript version and constructs tons of temporary strings. The key improvement we made (and Go subsequently adapted) is to use tuples of Char instead for the hash tables. I'm pretty pleased that Julia is within 10% of the Go version, and also that Go benefited from our optimization work.
Re: [julia-users] Re: TinySegmenter benchmark
Masahiro Nakagawa a.k.a. repeatedly told me my mistakes of the benchmark, I re-benchmarked. Node.jsPython2Python3JuliaRuby9.6293.0823.941.4619.44 - loop number of Python was 10 times smaller than other languages - repeatedly optimized Ruby implementation - changed loop size from 100 to 10 repeatedly also benchmarked in dlang, I will do it after updating El Capitan :) http://repeatedly.github.io/ja/2015/10/tinysegmenter-benchmark-and-d/ On Thu, Oct 22, 2015 at 6:28 AM Pontus Stenetorpwrote: > On 21 October 2015 at 17:49, Stefan Karpinski > wrote: > > > > That's an excellent performance comparison case study! Nice work, Chezou > and nice blog post (the Google translation is pretty readable). > > Very readable indeed and I am always happy to see more NLP code in > Julia! Keep up the good work! > > Pontus >
[julia-users] Re: TinySegmenter benchmark
Thanks for Steven's great help, I learned many things to optimize string operation of Julia. Finally, I wrote this episode on my blog (in Japanese only, sorry). http://chezou.hatenablog.com/entry/2015/10/21/234317 -- chezou 2015年10月21日水曜日 2時49分56秒 UTC+9 Steven G. Johnson: > > I thought people might be interested in this cross-language benchmark of a > realistic application: > > https://github.com/chezou/TinySegmenter.jl/issues/8 > > TinySegmenter is an algorithm for breaking Japanese text into words, and > it has been ported by several authors to different programming languages. > Michiaki Ariga (@chezou) ported it to Julia, and after optimizing it a bit > with me he ran some benchmarks comparing the performance to the different > TinySegmenter ports. The resulting times (in seconds) for different > languages were: > > JavaScriptPython2Python3JuliaRuby121.0492.8529.640012.36(933+) > > The algorithm basically consists of looping over the characters in a > string, plugging tuples of consecutive characters into a dictionary of > "scores", and spitting out a word break when the score exceeds a threshold. > The biggest speedup in optimizing the Julia code came from using tuples > of Char (characters) rather than concatenating the chars into strings > (which avoids the need to create and then discard lots of temporary strings > by exploiting Julia's fast tuples). > > The Julia implementation is also different from the others in that it is > the only one that operates completely in-place on the text, without > allocating large temporary arrays of characters and character categories, > and returns SubStrings rather than copies of the words. This sped things > up only slightly, but saves a lot of memory for a large text. > > --SGJ > > PS. Also, Julia's ability to explicitly type dictionaries caught a bug in > the original implementation, where the author had missed the fact that the > グ character is actually formed by two codepoints in Unicode. >
Re: [julia-users] Re: TinySegmenter benchmark
That's an excellent performance comparison case study! Nice work, Chezou and nice blog post (the Google translation is pretty readable). On Wed, Oct 21, 2015 at 10:58 AM, Michiaki Arigawrote: > Thanks for Steven's great help, I learned many things to optimize string > operation of Julia. > > Finally, I wrote this episode on my blog (in Japanese only, sorry). > http://chezou.hatenablog.com/entry/2015/10/21/234317 > > -- chezou > > 2015年10月21日水曜日 2時49分56秒 UTC+9 Steven G. Johnson: > >> I thought people might be interested in this cross-language benchmark of >> a realistic application: >> >> https://github.com/chezou/TinySegmenter.jl/issues/8 >> >> TinySegmenter is an algorithm for breaking Japanese text into words, and >> it has been ported by several authors to different programming languages. >> Michiaki Ariga (@chezou) ported it to Julia, and after optimizing it a bit >> with me he ran some benchmarks comparing the performance to the different >> TinySegmenter ports. The resulting times (in seconds) for different >> languages were: >> >> JavaScriptPython2Python3JuliaRuby121.0492.8529.640012.36(933+) >> >> The algorithm basically consists of looping over the characters in a >> string, plugging tuples of consecutive characters into a dictionary of >> "scores", and spitting out a word break when the score exceeds a threshold. >> The biggest speedup in optimizing the Julia code came from using tuples >> of Char (characters) rather than concatenating the chars into strings >> (which avoids the need to create and then discard lots of temporary strings >> by exploiting Julia's fast tuples). >> >> The Julia implementation is also different from the others in that it is >> the only one that operates completely in-place on the text, without >> allocating large temporary arrays of characters and character categories, >> and returns SubStrings rather than copies of the words. This sped things >> up only slightly, but saves a lot of memory for a large text. >> >> --SGJ >> >> PS. Also, Julia's ability to explicitly type dictionaries caught a bug in >> the original implementation, where the author had missed the fact that the >> グ character is actually formed by two codepoints in Unicode. >> >
Re: [julia-users] Re: TinySegmenter benchmark
On 21 October 2015 at 17:49, Stefan Karpinskiwrote: > > That's an excellent performance comparison case study! Nice work, Chezou and > nice blog post (the Google translation is pretty readable). Very readable indeed and I am always happy to see more NLP code in Julia! Keep up the good work! Pontus