This is the page that would require your attention: http://unthought.net/c++/c_vs_c++.html
I'm going to ignore the C version because it's ugly and uses a hash. I'm also going to ignore the fastest C++ version because it uses a digital trie (it's very fast but extremely memory hungry; the complexity is constant over the size of the input and linear over the length of the word being searched for). I just wanted to focus on the language and the std library and not have to implement a data structure. Here is the C++ code: #include <unordered_set> #include <string> #include <iostream> #include <stdio.h> int main(int argc, char* argv[]){ using namespace std; char buf[8192]; string word; unordered_set<string> wordcount; while( scanf("%s", buf) != EOF ) wordcount.insert(buf); cout << "Words: " << wordcount.size() << endl; return 0; } For D I pretty much used the example from TDPL. As far as I can tell, the associate array used is closer to std::map (or maybe std::unordered_map ?) than std::unordered_set, but I don't know of any other data structures in D for this (I'm still learning). Here is the D code: import std.stdio; import std.string; void main(){ size_t[string] dictionary; foreach(line; stdin.byLine()){ foreach(word; splitter(strip(line))){ if(word in dictionary) continue; dictionary[word.idup] = 1; } } writeln("Words: ", dictionary.length); } Here are the measurements (average of 3 runs): C++ === Data size: 990K with 23K unique words real 0m0.055s user 0m0.046s sys 0m0.000 Data size: 9.7M with 23K unique words real 0m0.492s user 0m0.470s sys 0m0.013 Data size: 5.1M with 65K unique words real 0m0.298s user 0m0.277s sys 0m0.013 Data size: 51M with 65K unique words real 0m2.589s user 0m2.533s sys 0m0.070 DMD D 2.051 === Data size: 990K with 23K unique words real 0m0.064s user 0m0.053s sys 0m0.006 Data size: 9.7M with 23K unique words real 0m0.513s user 0m0.487s sys 0m0.013 Data size: 5.1M with 65K unique words real 0m0.305s user 0m0.287s sys 0m0.007 Data size: 51M with 65K unique words real 0m2.683s user 0m2.590s sys 0m0.103 GDC D 2.051 === Data size: 990K with 23K unique words real 0m0.146s user 0m0.140s sys 0m0.000 Data size: 9.7M with 23K unique words Segmentation fault Data size: 5.1M with 65K unique words Segmentation fault Data size: 51M with 65K unique words Segmentation fault GDC fails for some reason with large number of unique words and/or large data. Also, GDC doesn't always give correct results; the word count is usually off by a few hundred. D and C++ are very close. Without scanf() C++ is almost twice as slow. Also, using std::unordered_set, the performance almost doubles. I'm interested to see a better D version than the one I posted. P.S. No flame wars please.