Re: :How to speed up of Map/Reduce job?

2011-02-03 Thread madhu phatak
Most of the Hadoop uses includes processing of large data. But in real time
applications , the data provided by user will be relatively small ,in which
its not advised to use Hadoop
On Tue, Feb 1, 2011 at 10:01 PM, Black, Michael (IS)  wrote:

> Try this rather small C++ program...it will more than likley be a LOT
> faster than anything you could do in hadoop.  Hadoop is not the hammer for
> every nail.  Too many people think that any "cluster" solution will
> automagically scale their problem...tain't true.
>
> I'd appreciate hearing your results with this.
>
> #include 
> #include 
> #include 
>
> using namespace std;
>
> int main(int argc, char *argv[])
> {
>if (argc < 2) {
>cerr << "Usage: " << argv[0] << " [filename]" << endl;
>return -1;
>}
>ifstream in(argv[1]);
>if (!in) {
>perror(argv[1]);
>return -1;
>}
>string str;
>in >> str;
>int n=0;
>while(!in.eof()) {
>++n;
>//cout << str << endl;
>in >> str;
>}
>in.close();
>cout << n << " words" << endl;
>return 0;
> }
>
> Michael D. Black
> Senior Scientist
> NG Information Systems
> Advanced Analytics Directorate
>
>
>
> 
> From: Igor Bubkin [igb...@gmail.com]
> Sent: Tuesday, February 01, 2011 2:19 AM
> To: common-iss...@hadoop.apache.org
> Cc: common-user@hadoop.apache.org
> Subject: EXTERNAL:How to speed up of Map/Reduce job?
>
> Hello everybody
>
> I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
> example. It takes about 20 sec for processing of 1,5MB text file. We want
> to
> use Map/Reduce in real time (interactive: by user's requests). User can't
> wait for his request 20 sec. This is too long. Is it possible to reduce
> time
> of Map/Reduce job? Or may be I misunderstand something?
>
> BR,
> Igor Babkin, Mifors.com


RE::How to speed up of Map/Reduce job?

2011-02-01 Thread Black, Michael (IS)
Try this rather small C++ program...it will more than likley be a LOT faster 
than anything you could do in hadoop.  Hadoop is not the hammer for every nail. 
 Too many people think that any "cluster" solution will automagically scale 
their problem...tain't true.

I'd appreciate hearing your results with this.

#include 
#include 
#include 

using namespace std;

int main(int argc, char *argv[])
{
if (argc < 2) {
cerr << "Usage: " << argv[0] << " [filename]" << endl;
return -1;
}
ifstream in(argv[1]);
if (!in) {
perror(argv[1]);
return -1;
}
string str;
in >> str;
int n=0;
while(!in.eof()) {
++n;
//cout << str << endl;
in >> str;
}
in.close();
cout << n << " words" << endl;
return 0;
}

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: Igor Bubkin [igb...@gmail.com]
Sent: Tuesday, February 01, 2011 2:19 AM
To: common-iss...@hadoop.apache.org
Cc: common-user@hadoop.apache.org
Subject: EXTERNAL:How to speed up of Map/Reduce job?

Hello everybody

I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?

BR,
Igor Babkin, Mifors.com

Re: How to speed up of Map/Reduce job?

2011-02-01 Thread Steve Loughran

On 01/02/11 08:19, Igor Bubkin wrote:

Hello everybody

I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?


1. I'd expect a minimum 30s query time due to the way work gets queued 
and dispatched, JVM startup costs etc. There is no way to eliminate this 
in Hadoop's current architecture.


2. 1.5M is a very small file size; I'm currently recommending a block 
size of 512M in new clusters for various reasons. This size of data is 
just too small to bother with distribution. Load it up into memory; 
analyse it locally. Things like Apache CouchDB also support MapReduce.


Hadoop is not designed for clusters of less than about 10 machines (not 
enough redundancy of storage), or for small datasets. If your problems 
aren't big enough, use different tools, because Hadoop contains design 
decisions and overheads that only make sense once your data is measured 
in GB and your filesystem in tens to thousands of Terabytes.


RE: How to speed up of Map/Reduce job?

2011-02-01 Thread praveen.peddi
Hi Igor,
I am not sure if Hadoop is designed for realtime requests. I have a feeling 
that you are trying to use Hadoop in a way that it isnot designed for. From my 
experience, Hadoop cluster will be much slower than "local" hadoop mode when 
processing smaller dataset, because there is always extra overhead of task and 
job management in cluster mode.

Praveen

From: ext Igor Bubkin [igb...@gmail.com]
Sent: Tuesday, February 01, 2011 3:19 AM
To: common-iss...@hadoop.apache.org
Cc: common-user@hadoop.apache.org
Subject: How to speed up of Map/Reduce job?

Hello everybody

I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?

BR,
Igor Babkin, Mifors.com


Re: How to speed up of Map/Reduce job?

2011-02-01 Thread li ping
The Hadoop is designed for no-real time application.
But You can change the parameter to reduce the job execution time.

I search an article in Google.
Hope You can find some useful information on that.
http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation

On Tue, Feb 1, 2011 at 4:19 PM, Igor Bubkin  wrote:

> Hello everybody
>
> I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
> example. It takes about 20 sec for processing of 1,5MB text file. We want
> to
> use Map/Reduce in real time (interactive: by user's requests). User can't
> wait for his request 20 sec. This is too long. Is it possible to reduce
> time
> of Map/Reduce job? Or may be I misunderstand something?
>
> BR,
> Igor Babkin, Mifors.com
>



-- 
-李平