As was suggested, create your own input and put it into HDFS. You can create it in your HD and copy it to hdfs by a simple command. Create a list of 1000 random "words". Pick from the list randomly a few million times and place that into the hdfs in a file or several files whose sizes are 64 megs or more. That should do it. But things that are not CPU intensive and that you can fit in a RAM will be done quicker in 1 machine than 4. The benefit starts when you have more data than fits the RAM. The M/R gives you a tool for gathering values by the key and processing them in batches where each set of values that corresponds to a key can hopefully can fit in some ram. Usually the applications are not to make things faster, but make things at all.
On Apr 18, 2011, at 10:41 PM, praveenesh kumar wrote: > Thank you guys for clearing my glasses.. now I can see the clean picture :-) > So how can I test my cluster... Can anyone suggest any scenario or have any > data set or any website where I can get dataset of this range ?? > > Thanks, > Praveenesh > > On Tue, Apr 19, 2011 at 11:03 AM, Mehmet Tepedelenlioglu < > mehmets...@gmail.com> wrote: > >> For such small input, the only way you would see speed gains would be if >> your job was dominated >> by cpu time, and not i/o. Since word-count is mostly an i/o problem and >> your >> input size is quite small, you are seeing similar run times. 3 computers is >> better than 1 >> only if you need them. >> >> On Apr 18, 2011, at 10:06 PM, praveenesh kumar wrote: >> >>> The input were 3 plain text files.. >>> >>> 1 file was around 665 KB and other 2 files were around 1.5 MB each.. >>> >>> Thanks, >>> Praveeenesh >>> >>> >>> >>> On Tue, Apr 19, 2011 at 10:27 AM, real great.. < >> greatness.hardn...@gmail.com >>>> wrote: >>> >>>> Whats your input size? >>>> >>>> On Tue, Apr 19, 2011 at 10:21 AM, praveenesh kumar < >> praveen...@gmail.com >>>>> wrote: >>>> >>>>> Hello everyone, >>>>> >>>>> I am new to hadoop... >>>>> I set up a hadoop cluster of 4 ubuntu systems. ( Hadoop 0.20.2) >>>>> and I am running the well known word count (gutenberg) example to test >>>> how >>>>> fast my hadoop is working.. >>>>> >>>>> But whenever I am running wordcount example..I am not able to see any >>>> much >>>>> processing time difference.. >>>>> On single node the wordcount is taking the same time.. and on cluster >> of >>>> 4 >>>>> systems also it is taking almost the same time.. >>>>> >>>>> Am I doing anything wrong here ?? >>>>> Can anyone explain me why its happening.. and how can I make maximum >> use >>>> of >>>>> my cluster ?? >>>>> >>>>> Thanks. >>>>> Praveenesh >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> R.V. >>>> >> >>