As was suggested, create your own input and put it into HDFS. You can create it
in your HD and copy it to hdfs by a simple command. Create a list of
1000 random "words". Pick from the list randomly a few million times
and place that into the hdfs in a file or several files whose sizes are 64 megs 
or more.
That should do it. But things that are not CPU intensive and that you can fit
in a RAM will be done quicker in 1 machine than 4. The benefit  starts when
you have more data than fits the RAM. The M/R gives you a tool for gathering 
values
by the key and processing them in batches where each set of values that 
corresponds to a
key can hopefully can fit in some ram. Usually the applications are not to make 
things
faster, but make things at all.


On Apr 18, 2011, at 10:41 PM, praveenesh kumar wrote:

> Thank you guys for clearing my glasses.. now I can see the clean picture :-)
> So how can I test my cluster... Can anyone suggest any scenario or have any
> data set or any website where I can get dataset of this range ??
> 
> Thanks,
> Praveenesh
> 
> On Tue, Apr 19, 2011 at 11:03 AM, Mehmet Tepedelenlioglu <
> mehmets...@gmail.com> wrote:
> 
>> For such small input, the only way you would see speed gains would be if
>> your job was dominated
>> by cpu time, and not i/o. Since word-count is mostly an i/o problem and
>> your
>> input size is quite small, you are seeing similar run times. 3 computers is
>> better than 1
>> only if you need them.
>> 
>> On Apr 18, 2011, at 10:06 PM, praveenesh kumar wrote:
>> 
>>> The input were  3  plain text files..
>>> 
>>> 1 file was around 665 KB and other 2 files were around 1.5 MB each..
>>> 
>>> Thanks,
>>> Praveeenesh
>>> 
>>> 
>>> 
>>> On Tue, Apr 19, 2011 at 10:27 AM, real great.. <
>> greatness.hardn...@gmail.com
>>>> wrote:
>>> 
>>>> Whats your input size?
>>>> 
>>>> On Tue, Apr 19, 2011 at 10:21 AM, praveenesh kumar <
>> praveen...@gmail.com
>>>>> wrote:
>>>> 
>>>>> Hello everyone,
>>>>> 
>>>>> I am new to hadoop...
>>>>> I set up a  hadoop cluster of 4 ubuntu systems. ( Hadoop 0.20.2)
>>>>> and I am running the well known word count (gutenberg) example to test
>>>> how
>>>>> fast my hadoop is working..
>>>>> 
>>>>> But whenever I am running wordcount example..I am not able to see any
>>>> much
>>>>> processing time difference..
>>>>> On single node the wordcount is taking the same time.. and on cluster
>> of
>>>> 4
>>>>> systems also it is taking almost the same time..
>>>>> 
>>>>> Am I  doing anything wrong here ??
>>>>> Can anyone explain me why its happening.. and how can I make maximum
>> use
>>>> of
>>>>> my cluster ??
>>>>> 
>>>>> Thanks.
>>>>> Praveenesh
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> R.V.
>>>> 
>> 
>> 

Reply via email to