On Sat, Oct 30, 2010 at 1:06 AM, Shrinivasan T <tshriniva...@gmail.com>wrote:
> Friends. > > I have to work on some huge text files that are around 2GB - 10GB. > Have to read some content on those files and have to rewrite them. > Mostly have to cut/copy/paste/edit the contents in random areas. > > sed, awk kind of utilities can not be used as the data is not under > regex and the manual operation/verification is must. > > I have a 2GB RAM machines. > Opening in vim takes much time and every action takes 20-30 min. > > Is there any possibilities to speed up the progress? > I have many computers that are idle. > > How can I create a cluster of RAMs from those idle machines, so that I > can work on those huge files easily? > Is it possible? > > Are there any other ways? > > Hi Shrini , If you have number of idle machine, you may use Apache Hadoop: http://hadoop.apache.org/. The beauty is, it is open source and runs on commodity hardware. HDFS + MapReduce is a very effective solution for large file (e.g Log processing, Web index building, Distributed Grep, Inverted Index, Distributed Sort ...). HDFS - distributes data. MapReduce - distributes application. Hadoop Distributed File System (HDFS) is designed to handle large files (multi-GB) with sequential read/write operation. Each file is automatically broken into chunks, and stored across multiple data nodes as local OS files. MapReduce - MapReduce is a programming model for efficient distributed computing. MapReduce provides API to write application. Application written in this style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Thanks & Rg Mohan L _______________________________________________ ILUGC Mailing List: http://www.ae.iitm.ac.in/mailman/listinfo/ilugc