Try this rather small C++ program...it will more than likley be a LOT faster 
than anything you could do in hadoop.  Hadoop is not the hammer for every nail. 
 Too many people think that any "cluster" solution will automagically scale 
their problem...tain't true.

I'd appreciate hearing your results with this.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main(int argc, char *argv[])
{
    if (argc < 2) {
        cerr << "Usage: " << argv[0] << " [filename]" << endl;
        return -1;
    }
    ifstream in(argv[1]);
    if (!in) {
        perror(argv[1]);
        return -1;
    }
    string str;
    in >> str;
    int n=0;
    while(!in.eof()) {
        ++n;
        //cout << str << endl;
        in >> str;
    }
    in.close();
    cout << n << " words" << endl;
    return 0;
}

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate



________________________________________
From: Igor Bubkin [igb...@gmail.com]
Sent: Tuesday, February 01, 2011 2:19 AM
To: common-iss...@hadoop.apache.org
Cc: common-user@hadoop.apache.org
Subject: EXTERNAL:How to speed up of Map/Reduce job?

Hello everybody

I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?

BR,
Igor Babkin, Mifors.com

Reply via email to