There are additional off-shoots of Hadoop that can specifically address real-time needs such as Spark, S4 and Hstreaming.
Most real-time-ish applications come, however, with a 100% uptime guarantee. Most simply put, a system that is down and is going to take 10's to 100's of minutes to come back is going to miss a lot of real-time windows. As such, you may need to investigate derivatives of Hadoop that explicitly support high availability. On Sat, Sep 3, 2011 at 11:38 PM, Jacques <whs...@gmail.com> wrote: > It is hard to reply to an article that you don't actually reference but > I'll > do my best. Also, you don't define real-time so I'll consider it as being > something that would come back within 1-2 seconds (e.g. an end user on a > web > site is waiting for the info). > > >>Can you please tell me why Hadoop is said not to be used for Real time > processing of data? > > There are two different parts to the core Hadoop project. Both of these > are focused more on batch processing by themselves as opposed to real time > workflows. > 1. HDFS, a distributed file system that is good at safely managing a large > quantity of very large files. Generally speaking, Hadoop is a write once > file system. You can't modify the middle of a file after it is written. > You also can't append to the end of a file without a special version of > Hadoop. Also, you can't tail a file directly as it is being written. As > such, it would be hard to use it directly to create a real-time work flow. > > 2. MapReduce is a distributed computing framework. It is used to process > those large files held on HDFS. Because of the design of MapReduce, jobs > usually take at least 10 seconds and typically much longer. This would also > mean you're looking at batch processing large quantities of data in some > non-real-time period. > > HBase, is a separate, sub-project from the Hadoop project proper. It is > built specifically to handle real time loads. You can insert a row and get > it back immediately. > > >I was thinking we can replace the DB with Hadoop...I do not see any > issue? > > HBase can replace many of the functions of existing databases but should be > used primarily when you need the massive scale it can provide. You have to > give up things like transactions and SQL to HBase when compared to > traditional RDBMS's (Mysql, PostreSQL, etc). The schema design is very > different and generally your application must be built with this in mind. > You should probably spend some time with the HBase book ( > http://hbase.apache.org/book.html) and looking at your current > applications > to determine what kinds of things you would need to do. Many people > actually use HBase in parallel with a traditional RDBMS, leveraging the > strengths of each. > > Good luck! >