HBase has entered Debian (unstable)
Hi, HBase 0.20.4 has entered Debian unstable, should slide into testing after the usual 14 day period and will therefor most likely be included in the upcomming Debian Squeeze. http://packages.debian.org/source/sid/hbase Please note, that this packaging effort is still very much work-in-progress and not yet suitable for production use. However the aim is to have a rock solid stable HBase in squeeze+1 respectively in Debian testing in the next months. Meanwhile the HBase package in Debian can raise HBase's visibility and lower the entrance barrier. So if somebody wants to try out HBase (on Debian), it is as easy as: aptitude install zookeeperd hbase-masterd In other news: zookeeper is in Debian testing as of today. Best regards, Thomas Koch, http://www.koch.ro
How to recover hadoop cluster using SecondaryNameNode ?
Hi all, I wonder is it enough for recovering hadoop cluster by just coping the meta data from SecondaryNameNode to the new master node ? Do I any need do any other stuffs ? Thanks for any help. -- Best Regards Jeff Zhang
Re: How to recover hadoop cluster using SecondaryNameNode ?
I suggest you take a look at AvatarNode which runs standby avatar that can be activated if primary avatar fails. On Thursday, May 13, 2010, Jeff Zhang zjf...@gmail.com wrote: Hi all, I wonder is it enough for recovering hadoop cluster by just coping the meta data from SecondaryNameNode to the new master node ? Do I any need do any other stuffs ? Thanks for any help. -- Best Regards Jeff Zhang
Re: Setting up a second cluster and getting a weird issue
Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The log and pid directories are local. Thanks! --Andrew On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: These 4 nodes share NFS ? On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen andrew-lists-had...@ucsfcti.org wrote: I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. Any thoughts? Thanks! --Andrew -- Best Regards Jeff Zhang
How to change join output separator
Hi, I am running map-side join. My input looks like this. file1.txt --- a|deer b|dog file2.txt --- a|veg b|nveg I am getting output like a|[deer,veg] b|[dog,nveg] I dont want those square brackets and the field seperator should be | (pipe) instead of comma Please guide me how to acheive this. Thanks, Dhana -- View this message in context: http://old.nabble.com/How-to-change-join-output-separator-tp28547855p28547855.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: How to recover hadoop cluster using SecondaryNameNode ?
You can use the copy of fsimage and the editlog from the SNN to recover. Remember that it will be (roughly) an hour old. The process for recovery is to copy the fsimage and editlog to a new machine, place them in the dfs.name.dir/current directory, and start all the daemons. It's worth practicing this type of procedure before trying it on a production cluster. More importantly, it's worth practicing this *before* you need it on a production cluster. On Thu, May 13, 2010 at 5:01 AM, Jeff Zhang zjf...@gmail.com wrote: Hi all, I wonder is it enough for recovering hadoop cluster by just coping the meta data from SecondaryNameNode to the new master node ? Do I any need do any other stuffs ? Thanks for any help. -- Best Regards Jeff Zhang -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com
Re: How to recover hadoop cluster using SecondaryNameNode ?
This is a good time to remind folks that the namenode can write to multiple directories, including one over a network filesystem or SAN so that you always have a fresh copy. :) On May 13, 2010, at 8:05 AM, Eric Sammer wrote: You can use the copy of fsimage and the editlog from the SNN to recover. Remember that it will be (roughly) an hour old. The process for recovery is to copy the fsimage and editlog to a new machine, place them in the dfs.name.dir/current directory, and start all the daemons. It's worth practicing this type of procedure before trying it on a production cluster. More importantly, it's worth practicing this *before* you need it on a production cluster. On Thu, May 13, 2010 at 5:01 AM, Jeff Zhang zjf...@gmail.com wrote: Hi all, I wonder is it enough for recovering hadoop cluster by just coping the meta data from SecondaryNameNode to the new master node ? Do I any need do any other stuffs ? Thanks for any help. -- Best Regards Jeff Zhang -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com
Re: How to recover hadoop cluster using SecondaryNameNode ?
This code hasn't been committed to any branch yet and doesn't appear to have undergone any review outside of Facebook. :( On May 13, 2010, at 6:18 AM, Ted Yu wrote: I suggest you take a look at AvatarNode which runs standby avatar that can be activated if primary avatar fails. On Thursday, May 13, 2010, Jeff Zhang zjf...@gmail.com wrote: Hi all, I wonder is it enough for recovering hadoop cluster by just coping the meta data from SecondaryNameNode to the new master node ? Do I any need do any other stuffs ? Thanks for any help. -- Best Regards Jeff Zhang
Re: How to recover hadoop cluster using SecondaryNameNode ?
On Thu, May 13, 2010 at 1:32 PM, Allen Wittenauer awittena...@linkedin.comwrote: This code hasn't been committed to any branch yet and doesn't appear to have undergone any review outside of Facebook. :( On May 13, 2010, at 6:18 AM, Ted Yu wrote: I suggest you take a look at AvatarNode which runs standby avatar that can be activated if primary avatar fails. On Thursday, May 13, 2010, Jeff Zhang zjf...@gmail.com wrote: Hi all, I wonder is it enough for recovering hadoop cluster by just coping the meta data from SecondaryNameNode to the new master node ? Do I any need do any other stuffs ? Thanks for any help. -- Best Regards Jeff Zhang Wow that AvatarNode was really flying below the Radar! First I heard of it. Awesome!
Build a indexing and search service with Hadoop
Hi, Actually I have an indexing and search service that receives documents to be indexed and search requests through XML-RPC. The server uses the Lucene Search Engine and store it's index on the local File System. The documents to be indexed are usually academic papers, so I'm not trying to index large-scale data sets once in a Job, although the indexes may become very large as more documents are received. Now we are trying to parallelize the search and indexing by distributing the index into shards on a cluster. I've been studying Hadoop but It's not clear yet how to implement the system. The basic design is: 1. We receive a document trough XML-RPC and it should be indexed in one shard on the cluster. 2. We receive a query request trough XML-RPC and the query must be executed over all shards and then the hits should be returned in a XML response. My initial idea is: 1. Indexing - The document received is used as input of one map. This function would index the document on the local shard using our custom library build on top of Lucene. There is no reduce. 2. Search - The query received is used as input of the map function. This function would search the document on the local shard using our custom library and emit the hits. The reduce function would group the hits from all shards. Is it possible to implement that using Hadoop MapReduce framework? Implementing custom InputFormats and OutputFormats? Or should I use the Hadoop RPC Layer? There's any documentation about it? Any suggestions? Thanks, Aécio Santos. -- Instituto Federal de Ciência, Educação e Tecnologia do Piauí Laboratório de Pesquisa em Sistemas de Informação Teresina - Piauí - Brazil
[ANNOUNCE] hamake-2.0b
After more than one year since previous release I am proud to announce a new version of HAMAKE. Based on our experience of using we rewrote it in Java, added support for Amazon EMR. We also streamlined XML syntax and updated and improved documentation. Please visit http://code.google.com/p/hamake/ to learn more and to download a new version. Brief description: Most non-trivial data processing scenarios with Hadoop typically require more than one MapReduce job. Usually such processing is data-driven, with the data funneled through a sequence of jobs. The processing model could be presented in terms of dataflow programming. It could be expressed as a directed graph, with datasets as nodes. Each edge indicates a dependency between two or more datasets and is associated with a processing instruction (Hadoop MapReduce job, PIG Latin script or an external command), which produces one dataset from the others. Using fuzzy timestamps as a way to detect when a dataset needs to be updated, we can calculate a sequence in which the tasks need to be executed to bring all datasets up to date. Jobs for updating independent datasets could be executed concurrently, taking advantage of your Hadoop cluster's full capacity. The dependency graph may even contain cycles, leading to dependency loops which could be resolved using dataset versioning. These ideas inspired the creation of HAMAKE utility. We tried emphasizing data and allowing the developer to express one's goals in terms of dataflow (versus workflow). Data dependency graph is expressed using just two data flow instructions: fold and foreach providing a clear processing model, similar to MapReduce, but on a dataset level. Another design goal was to create a simple to use utility that developers can start using right away without complex installation or extensive learning. Key Features * Lightweight utility - no need for complex installation * Based on dataflow programming model * Easy learning curve. * Supports Amazon Elastic MapReduce * Allows to run MapReduce jobs as well as PIG Latin scripts Sincerely, Vadim Zaliva
Re: Build a indexing and search service with Hadoop
On Thu, May 13, 2010 at 2:41 PM, Aécio aecio.sola...@gmail.com wrote: Hi, Actually I have an indexing and search service that receives documents to be indexed and search requests through XML-RPC. The server uses the Lucene Search Engine and store it's index on the local File System. The documents to be indexed are usually academic papers, so I'm not trying to index large-scale data sets once in a Job, although the indexes may become very large as more documents are received. Now we are trying to parallelize the search and indexing by distributing the index into shards on a cluster. I've been studying Hadoop but It's not clear yet how to implement the system. The basic design is: 1. We receive a document trough XML-RPC and it should be indexed in one shard on the cluster. 2. We receive a query request trough XML-RPC and the query must be executed over all shards and then the hits should be returned in a XML response. My initial idea is: 1. Indexing - The document received is used as input of one map. This function would index the document on the local shard using our custom library build on top of Lucene. There is no reduce. 2. Search - The query received is used as input of the map function. This function would search the document on the local shard using our custom library and emit the hits. The reduce function would group the hits from all shards. Is it possible to implement that using Hadoop MapReduce framework? Implementing custom InputFormats and OutputFormats? Or should I use the Hadoop RPC Layer? There's any documentation about it? Any suggestions? Thanks, Aécio Santos. -- Instituto Federal de Ciência, Educação e Tecnologia do Piauí Laboratório de Pesquisa em Sistemas de Informação Teresina - Piauí - Brazil You can inplement some of what you have described. Take a look at these if you have not already they are concrete implementations for building and searching distributed indexes. http://lucene.apache.org/nutch/ http://katta.sourceforge.net/ (not hadoop based but still pretty cool) http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/
Seattle Hadoop/NoSQL: Facebook, more Discussion. Thurs May 27th
We've heard your feedback from the last meetup: we're having less speakers and more discussion. Yay! http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/ We're expecting: 1. Facebook will talk about Hive (a SQL-like language for MapReduce) 2. OpsCode will talk about cluster management with Chef 3. Then we'll break up into groups and have casual Hadoop/NoSQL related discussions and QA with several experts, so you can learn more! Also, stay tuned for news on a FREE Seattle Hadoop Community Training day in late July. We're going to get some fantastic people, and you'll have hands-on experience with all the Hadoop ecosystem. When: Thursday, May 27, 2010 6:45 PM Where: Amazon SLU, Von Vorst Building 426 Terry Ave N Seattle, WA 98109 9044153009 -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: How to recover hadoop cluster using SecondaryNameNode ?
Allen, Do you mean HDFS-976 https://issues.apache.org/jira/browse/HDFS-976 and HDFS-966 https://issues.apache.org/jira/browse/HDFS-966 ? On Fri, May 14, 2010 at 1:32 AM, Allen Wittenauer awittena...@linkedin.com wrote: This code hasn't been committed to any branch yet and doesn't appear to have undergone any review outside of Facebook. :( On May 13, 2010, at 6:18 AM, Ted Yu wrote: I suggest you take a look at AvatarNode which runs standby avatar that can be activated if primary avatar fails. On Thursday, May 13, 2010, Jeff Zhang zjf...@gmail.com wrote: Hi all, I wonder is it enough for recovering hadoop cluster by just coping the meta data from SecondaryNameNode to the new master node ? Do I any need do any other stuffs ? Thanks for any help. -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang
Re: Setting up a second cluster and getting a weird issue
It is not suggested to deploy hadoop on NFS, there will be conflict between data nodes, because NFS share the same namespace of file system. On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen and...@ucsfcti.org wrote: Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The log and pid directories are local. Thanks! --Andrew On May 12, 2010, at 7:40 PM, Jeff Zhang wrote: These 4 nodes share NFS ? On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen andrew-lists-had...@ucsfcti.org wrote: I'm working on bringing up a second test cluster and am getting these intermittent errors on the DataNodes: 2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243) at org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394) There are 4 slaves and sometimes 1 or 2 have the error but the specific nodes change. Sometimes it's slave1, sometimes it's slave4, etc. Any thoughts? Thanks! --Andrew -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang
HDFS-630 patch for Hadoop v0.20
Hello all, I am trying to install HBase and while going through the requirements (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are for Hadoop 0.21. I am using version 0.20. For this version, should I apply Todd Lipcon's patch at https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt. Would this be the right patch to apply? The directory structures have changed from 0.20 to 0.21. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements Thank you. Regards, Raghava.
Re: HDFS-630 patch for Hadoop v0.20
Hi Raghava, Yes, that's a patch targeted at 0.20, but I'm not certain whether it applies on the vanilla 0.20 code or not. If you'd like a version of Hadoop that already has it applied and tested, I'd recommend using Cloudera's CDH2. -Todd On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello all, I am trying to install HBase and while going through the requirements (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are for Hadoop 0.21. I am using version 0.20. For this version, should I apply Todd Lipcon's patch at https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt . Would this be the right patch to apply? The directory structures have changed from 0.20 to 0.21. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements Thank you. Regards, Raghava. -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS-630 patch for Hadoop v0.20
Hello Todd, Thank you for the reply. In the cluster I use here, apache Hadoop is installed. So I have to use that. I am trying out HBase on my laptop first. Even though I install CDH2, it won't be useful because on the cluster, I have to work with apache Hadoop. Since version 0.21 is still in development, there should be a HDFS-630 patch for the current stable release of Hadoop isn't it? Regards, Raghava. On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote: Hi Raghava, Yes, that's a patch targeted at 0.20, but I'm not certain whether it applies on the vanilla 0.20 code or not. If you'd like a version of Hadoop that already has it applied and tested, I'd recommend using Cloudera's CDH2. -Todd On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello all, I am trying to install HBase and while going through the requirements (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are for Hadoop 0.21. I am using version 0.20. For this version, should I apply Todd Lipcon's patch at https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt . Would this be the right patch to apply? The directory structures have changed from 0.20 to 0.21. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements Thank you. Regards, Raghava. -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS-630 patch for Hadoop v0.20
On Thu, May 13, 2010 at 8:56 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello Todd, Thank you for the reply. In the cluster I use here, apache Hadoop is installed. So I have to use that. I am trying out HBase on my laptop first. Even though I install CDH2, it won't be useful because on the cluster, I have to work with apache Hadoop. Since version 0.21 is still in development, there should be a HDFS-630 patch for the current stable release of Hadoop isn't it? No, it was not considered for release in Hadoop 0.20.X because it breaks wire compatibility, and though I've done a workaround to avoid issues stemming from that, it would be unlikely to pass a backport vote. -Todd On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote: Hi Raghava, Yes, that's a patch targeted at 0.20, but I'm not certain whether it applies on the vanilla 0.20 code or not. If you'd like a version of Hadoop that already has it applied and tested, I'd recommend using Cloudera's CDH2. -Todd On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello all, I am trying to install HBase and while going through the requirements (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are for Hadoop 0.21. I am using version 0.20. For this version, should I apply Todd Lipcon's patch at https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt . Would this be the right patch to apply? The directory structures have changed from 0.20 to 0.21. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements Thank you. Regards, Raghava. -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS-630 patch for Hadoop v0.20
Hello Todd, Oh, then isn't it a bit contradictory to the instructions on HBase overview page. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements It says that the current version of HBase works only with 0.20.X and asks users to apply the patch HDFS-630 but that patch is not available for 0.20.X. Is there any work around for this? Is that patch really required? Thank you. Regards, Raghava. On Fri, May 14, 2010 at 12:03 AM, Todd Lipcon t...@cloudera.com wrote: On Thu, May 13, 2010 at 8:56 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello Todd, Thank you for the reply. In the cluster I use here, apache Hadoop is installed. So I have to use that. I am trying out HBase on my laptop first. Even though I install CDH2, it won't be useful because on the cluster, I have to work with apache Hadoop. Since version 0.21 is still in development, there should be a HDFS-630 patch for the current stable release of Hadoop isn't it? No, it was not considered for release in Hadoop 0.20.X because it breaks wire compatibility, and though I've done a workaround to avoid issues stemming from that, it would be unlikely to pass a backport vote. -Todd On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote: Hi Raghava, Yes, that's a patch targeted at 0.20, but I'm not certain whether it applies on the vanilla 0.20 code or not. If you'd like a version of Hadoop that already has it applied and tested, I'd recommend using Cloudera's CDH2. -Todd On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello all, I am trying to install HBase and while going through the requirements (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are for Hadoop 0.21. I am using version 0.20. For this version, should I apply Todd Lipcon's patch at https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt . Would this be the right patch to apply? The directory structures have changed from 0.20 to 0.21. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements Thank you. Regards, Raghava. -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera
Re: HDFS-630 patch for Hadoop v0.20
oops, sorry, by really required I meant, would that problem arise only in any special situations or is it required for normal operations also. Regards, Raghava. On Fri, May 14, 2010 at 12:11 AM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello Todd, Oh, then isn't it a bit contradictory to the instructions on HBase overview page. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements It says that the current version of HBase works only with 0.20.X and asks users to apply the patch HDFS-630 but that patch is not available for 0.20.X. Is there any work around for this? Is that patch really required? Thank you. Regards, Raghava. On Fri, May 14, 2010 at 12:03 AM, Todd Lipcon t...@cloudera.com wrote: On Thu, May 13, 2010 at 8:56 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello Todd, Thank you for the reply. In the cluster I use here, apache Hadoop is installed. So I have to use that. I am trying out HBase on my laptop first. Even though I install CDH2, it won't be useful because on the cluster, I have to work with apache Hadoop. Since version 0.21 is still in development, there should be a HDFS-630 patch for the current stable release of Hadoop isn't it? No, it was not considered for release in Hadoop 0.20.X because it breaks wire compatibility, and though I've done a workaround to avoid issues stemming from that, it would be unlikely to pass a backport vote. -Todd On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote: Hi Raghava, Yes, that's a patch targeted at 0.20, but I'm not certain whether it applies on the vanilla 0.20 code or not. If you'd like a version of Hadoop that already has it applied and tested, I'd recommend using Cloudera's CDH2. -Todd On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hello all, I am trying to install HBase and while going through the requirements (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are for Hadoop 0.21. I am using version 0.20. For this version, should I apply Todd Lipcon's patch at https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt . Would this be the right patch to apply? The directory structures have changed from 0.20 to 0.21. http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements Thank you. Regards, Raghava. -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera