== Quote from Caligo (iteronve...@gmail.com)'s article > --000e0cd215b8b968a004982e3775 > Content-Type: text/plain; charset=ISO-8859-1 > This is the page that would require your attention: > http://unthought.net/c++/c_vs_c++.html > I'm going to ignore the C version because it's ugly and uses a hash. I'm > also going to ignore the fastest C++ version because it uses a digital trie > (it's very fast but extremely memory hungry; the complexity is constant over > the size of the input and linear over the length of the word being searched > for). I just wanted to focus on the language and the std library and not > have to implement a data structure. > Here is the C++ code: > #include <unordered_set> > #include <string> > #include <iostream> > #include <stdio.h> > int main(int argc, char* argv[]){ > using namespace std; > char buf[8192]; > string word; > unordered_set<string> wordcount; > while( scanf("%s", buf) != EOF ) wordcount.insert(buf); > cout << "Words: " << wordcount.size() << endl; > return 0; > } > For D I pretty much used the example from TDPL. As far as I can tell, the > associate array used is closer to std::map (or maybe std::unordered_map ?) > than std::unordered_set, but I don't know of any other data structures in D > for this (I'm still learning). > Here is the D code: > import std.stdio; > import std.string; > void main(){ > size_t[string] dictionary; > foreach(line; stdin.byLine()){ > foreach(word; splitter(strip(line))){ > if(word in dictionary) continue; > dictionary[word.idup] = 1; > } > } > writeln("Words: ", dictionary.length); > } > Here are the measurements (average of 3 runs): > C++ > === > Data size: 990K with 23K unique words > real 0m0.055s > user 0m0.046s > sys 0m0.000 > Data size: 9.7M with 23K unique words > real 0m0.492s > user 0m0.470s > sys 0m0.013 > Data size: 5.1M with 65K unique words > real 0m0.298s > user 0m0.277s > sys 0m0.013 > Data size: 51M with 65K unique words > real 0m2.589s > user 0m2.533s > sys 0m0.070 > DMD D 2.051 > === > Data size: 990K with 23K unique words > real 0m0.064s > user 0m0.053s > sys 0m0.006 > Data size: 9.7M with 23K unique words > real 0m0.513s > user 0m0.487s > sys 0m0.013 > Data size: 5.1M with 65K unique words > real 0m0.305s > user 0m0.287s > sys 0m0.007 > Data size: 51M with 65K unique words > real 0m2.683s > user 0m2.590s > sys 0m0.103 > GDC D 2.051 > === > Data size: 990K with 23K unique words > real 0m0.146s > user 0m0.140s > sys 0m0.000 > Data size: 9.7M with 23K unique words > Segmentation fault > Data size: 5.1M with 65K unique words > Segmentation fault > Data size: 51M with 65K unique words > Segmentation fault > GDC fails for some reason with large number of unique words and/or large > data. Also, GDC doesn't always give correct results; the word count is > usually off by a few hundred. > D and C++ are very close. Without scanf() C++ is almost twice as slow. > Also, using std::unordered_set, the performance almost doubles. > I'm interested to see a better D version than the one I posted. > P.S. > No flame wars please.
System details, compiler flags and the test data you used would be helpful. Else can't be sure what you mean by "doesn't always give correct results". :~)