On Sat, Oct 30, 2010 at 1:06 AM, Shrinivasan T <tshriniva...@gmail.com>wrote:

> Friends.
>
> I have to work on some huge text files that are around 2GB - 10GB.
> Have to read some content on those files and have to rewrite them.
> Mostly have to cut/copy/paste/edit the contents in random areas.
>
> sed, awk kind of utilities can not be used as the data is not under
> regex and the manual operation/verification is must.
>
> I have a 2GB RAM machines.
> Opening in vim takes much time and every action takes 20-30 min.
>
> Is there any possibilities to speed up the progress?
> I have many computers that are idle.
>
> How can I create a cluster of RAMs from those idle machines, so that I
> can work on those huge files easily?
> Is it possible?
>
> Are there any other ways?
>
>
Hi Shrini ,

If you have number of idle machine, you may use Apache Hadoop:
http://hadoop.apache.org/.  The beauty is,  it is open source and  runs on
commodity hardware.

HDFS + MapReduce is a very effective solution for large file (e.g Log
processing, Web index building, Distributed Grep, Inverted Index,
Distributed Sort ...).

HDFS - distributes data.
MapReduce - distributes application.

Hadoop Distributed File System (HDFS) is designed to handle large files
(multi-GB) with sequential read/write operation. Each file is automatically
broken into chunks, and stored across multiple data nodes as local OS
files.

MapReduce - MapReduce is a programming model for efficient distributed
computing. MapReduce provides API to write application.  Application written
in this style are automatically parallelized and executed on a large cluster
of commodity machines. The run-time system takes care of the details of
partitioning the input data, scheduling the program's execution across a set
of machines, handling machine failures, and managing the required
inter-machine communication. This allows programmers without any experience
with parallel and distributed systems to easily utilize the resources of a
large distributed system.

Thanks & Rg
Mohan L
_______________________________________________
ILUGC Mailing List:
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc

Reply via email to