I don't know that your load-in speed is going to dramatically
increase. There's a number of parameters that adjust aspects of
MapReduce, but HDFS more or less works out of the box. You should run
some monitoring on your nodes (ganglia, nagios) or check out what
they're doing with top, iotop and iftop to see where you're
experiencing bottlenecks.
- Aaron

On Thu, Aug 6, 2009 at 11:41 AM, Zeev Milin<zeevm...@gmail.com> wrote:
> Thanks Aaron,
>
> I changed the settings in hadoop-site.xml file on all the machines. BTW,
> some settings are only reflected on the job level when I change the
> hadoop-default file, not sure why hadoop-site is being ignored (ex:
> mapred.tasktracker.map.tasks.maximum).
>
> The files I am trying load are fairly small (~4MB on average). The
> configuration of each machine is: 2 dual cores (Xeon, 2.33Ghz), 8GB ram and
> a local SCSI hard drive. (total of 6 nodes)
>
> I will look into the article you mentioned, I understand that to load the
> files is going to be slow, was just wondering why the machines are not being
> utilized and mostly idle when more maps can be run in parallel. Maps running
> is always 6.
>
> Another option is to load one 20GB file but currently the speed is fairly
> slow in my opinion: 1GB in 1.5min. What kind of tuning can be done to
> speedup the load into hdfs? If you have any recommendation for specific
> parameters that might help it will be great.
>
> Thanks,
> Zeev
>

Reply via email to