HBase has entered Debian (unstable)

2010-05-13 Thread Thomas Koch
Hi,

HBase 0.20.4 has entered Debian unstable, should slide into testing after the 
usual 14 day period and will therefor most likely be included in the upcomming 
Debian Squeeze.

http://packages.debian.org/source/sid/hbase

Please note, that this packaging effort is still very much work-in-progress 
and not yet suitable for production use. However the aim is to have a rock 
solid stable HBase in squeeze+1 respectively in Debian testing in the next 
months. Meanwhile the HBase package in Debian can raise HBase's visibility and 
lower the entrance barrier.

So if somebody wants to try out HBase (on Debian), it is as easy as:

aptitude install zookeeperd hbase-masterd

In other news: zookeeper is in Debian testing as of today.

Best regards,

Thomas Koch, http://www.koch.ro


How to recover hadoop cluster using SecondaryNameNode ?

2010-05-13 Thread Jeff Zhang
Hi all,

I wonder is it enough for recovering hadoop cluster by just coping the
meta data from SecondaryNameNode to the new master node ? Do I any
need do any other stuffs ?
Thanks for any help.



-- 
Best Regards

Jeff Zhang


Re: How to recover hadoop cluster using SecondaryNameNode ?

2010-05-13 Thread Ted Yu
I suggest you take a look at AvatarNode which runs standby avatar that
can be activated if primary avatar fails.


On Thursday, May 13, 2010, Jeff Zhang zjf...@gmail.com wrote:
 Hi all,

 I wonder is it enough for recovering hadoop cluster by just coping the
 meta data from SecondaryNameNode to the new master node ? Do I any
 need do any other stuffs ?
 Thanks for any help.



 --
 Best Regards

 Jeff Zhang



Re: Setting up a second cluster and getting a weird issue

2010-05-13 Thread Andrew Nguyen
Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  The 
log and pid directories are local.

Thanks!

--Andrew

On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:

 These 4 nodes share NFS ?
 
 
 On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
 andrew-lists-had...@ucsfcti.org wrote:
 I'm working on bringing up a second test cluster and am getting these 
 intermittent errors on the DataNodes:
 
 2010-05-12 17:17:15,094 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such 
 file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
at 
 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
at 
 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
at 
 org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
 
 
 There are 4 slaves and sometimes 1 or 2 have the error but the specific 
 nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
 
 Any thoughts?
 
 Thanks!
 
 --Andrew
 
 
 
 -- 
 Best Regards
 
 Jeff Zhang



How to change join output separator

2010-05-13 Thread Carbon Rock

Hi,

I am running map-side join.  My input looks like this.

file1.txt
---
a|deer
b|dog

file2.txt
---
a|veg
b|nveg

I am getting output like

a|[deer,veg]
b|[dog,nveg]

I dont want those square brackets and the field seperator should be | (pipe)
instead of comma

Please guide me how to acheive this.

Thanks,
Dhana

-- 
View this message in context: 
http://old.nabble.com/How-to-change-join-output-separator-tp28547855p28547855.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: How to recover hadoop cluster using SecondaryNameNode ?

2010-05-13 Thread Eric Sammer
You can use the copy of fsimage and the editlog from the SNN to
recover. Remember that it will be (roughly) an hour old. The process
for recovery is to copy the fsimage and editlog to a new machine,
place them in the dfs.name.dir/current directory, and start all the
daemons. It's worth practicing this type of procedure before trying it
on a production cluster. More importantly, it's worth practicing this
*before* you need it on a production cluster.

On Thu, May 13, 2010 at 5:01 AM, Jeff Zhang zjf...@gmail.com wrote:
 Hi all,

 I wonder is it enough for recovering hadoop cluster by just coping the
 meta data from SecondaryNameNode to the new master node ? Do I any
 need do any other stuffs ?
 Thanks for any help.



 --
 Best Regards

 Jeff Zhang




-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com


Re: How to recover hadoop cluster using SecondaryNameNode ?

2010-05-13 Thread Allen Wittenauer

This is a good time to remind folks that the namenode can write to multiple 
directories, including one over a network filesystem or SAN  so that you always 
have a fresh copy. :)

On May 13, 2010, at 8:05 AM, Eric Sammer wrote:

 You can use the copy of fsimage and the editlog from the SNN to
 recover. Remember that it will be (roughly) an hour old. The process
 for recovery is to copy the fsimage and editlog to a new machine,
 place them in the dfs.name.dir/current directory, and start all the
 daemons. It's worth practicing this type of procedure before trying it
 on a production cluster. More importantly, it's worth practicing this
 *before* you need it on a production cluster.
 
 On Thu, May 13, 2010 at 5:01 AM, Jeff Zhang zjf...@gmail.com wrote:
 Hi all,
 
 I wonder is it enough for recovering hadoop cluster by just coping the
 meta data from SecondaryNameNode to the new master node ? Do I any
 need do any other stuffs ?
 Thanks for any help.
 
 
 
 --
 Best Regards
 
 Jeff Zhang
 
 
 
 
 -- 
 Eric Sammer
 phone: +1-917-287-2675
 twitter: esammer
 data: www.cloudera.com



Re: How to recover hadoop cluster using SecondaryNameNode ?

2010-05-13 Thread Allen Wittenauer

This code hasn't been committed to any branch yet and doesn't appear to have 
undergone any review outside of Facebook. :(

On May 13, 2010, at 6:18 AM, Ted Yu wrote:

 I suggest you take a look at AvatarNode which runs standby avatar that
 can be activated if primary avatar fails.
 
 
 On Thursday, May 13, 2010, Jeff Zhang zjf...@gmail.com wrote:
 Hi all,
 
 I wonder is it enough for recovering hadoop cluster by just coping the
 meta data from SecondaryNameNode to the new master node ? Do I any
 need do any other stuffs ?
 Thanks for any help.
 
 
 
 --
 Best Regards
 
 Jeff Zhang
 



Re: How to recover hadoop cluster using SecondaryNameNode ?

2010-05-13 Thread Edward Capriolo
On Thu, May 13, 2010 at 1:32 PM, Allen Wittenauer
awittena...@linkedin.comwrote:


 This code hasn't been committed to any branch yet and doesn't appear to
 have undergone any review outside of Facebook. :(

 On May 13, 2010, at 6:18 AM, Ted Yu wrote:

  I suggest you take a look at AvatarNode which runs standby avatar that
  can be activated if primary avatar fails.
 
 
  On Thursday, May 13, 2010, Jeff Zhang zjf...@gmail.com wrote:
  Hi all,
 
  I wonder is it enough for recovering hadoop cluster by just coping the
  meta data from SecondaryNameNode to the new master node ? Do I any
  need do any other stuffs ?
  Thanks for any help.
 
 
 
  --
  Best Regards
 
  Jeff Zhang
 

 Wow that AvatarNode was really flying below the Radar! First I heard of it.
Awesome!


Build a indexing and search service with Hadoop

2010-05-13 Thread Aécio
Hi,

Actually I have an indexing and search service that receives documents to be
indexed and search requests through XML-RPC. The server uses the Lucene
Search Engine and store it's index on the local File System. The documents
to be indexed are usually academic papers, so I'm not trying to index
large-scale data sets once in a Job, although the indexes may become very
large as more documents are received.

Now we are trying to parallelize the search and indexing by distributing the
index into shards on a cluster. I've been studying Hadoop but It's not clear
yet how to implement the system.

The basic design is:
1. We receive a document trough XML-RPC and it should be indexed in one
shard on the cluster.
2. We receive a query request trough XML-RPC and the query must be executed
over all shards and then the hits should be returned in a XML response.


My initial idea is:
1. Indexing
- The document received is used as input of one map. This function would
index the document on the local shard using our custom library build on top
of Lucene. There is no reduce.

2. Search
- The query received is used as input of the map function. This function
would search the document on the local shard using our custom library and
emit the hits. The reduce function would group the hits from all shards.


Is it possible to implement that using Hadoop MapReduce framework?
Implementing custom InputFormats and OutputFormats?
Or should I use the Hadoop RPC Layer? There's any documentation about  it?
Any suggestions?

Thanks,
Aécio Santos.

-- 
Instituto Federal de Ciência, Educação e Tecnologia do Piauí
Laboratório de Pesquisa em Sistemas de Informação
Teresina - Piauí - Brazil


[ANNOUNCE] hamake-2.0b

2010-05-13 Thread kroko...@gmail.com
After more than one year since previous release I am proud to announce
a new version of HAMAKE. Based on our experience of using we rewrote
it in Java, added support for Amazon EMR. We also streamlined XML
syntax and updated and improved documentation. Please visit
http://code.google.com/p/hamake/ to learn more and to download a new
version.

Brief description:

Most non-trivial data processing scenarios with Hadoop typically
require more than one MapReduce job. Usually such processing is
data-driven, with the data funneled through a sequence of jobs. The
processing model could be presented in terms of dataflow programming.
It could be expressed as a directed graph, with datasets as nodes.
Each edge indicates a dependency between two or more datasets and is
associated with a processing instruction (Hadoop MapReduce job, PIG
Latin script or an external command), which produces one dataset from
the others. Using fuzzy timestamps as a way to detect when a dataset
needs to be updated, we can calculate a sequence in which the tasks
need to be executed to bring all datasets up to date. Jobs for
updating independent datasets could be executed concurrently, taking
advantage of your Hadoop cluster's full capacity. The dependency graph
may even contain cycles, leading to dependency loops which could be
resolved using dataset versioning.

These ideas inspired the creation of HAMAKE utility. We tried
emphasizing data and allowing the developer to express one's goals in
terms of dataflow (versus workflow). Data dependency graph is
expressed using just two data flow instructions: fold and foreach
providing a clear processing model, similar to MapReduce, but on a
dataset level. Another design goal was to create a simple to use
utility that developers can start using right away without complex
installation or extensive learning.

Key Features

* Lightweight utility - no need for complex installation
* Based on dataflow programming model
* Easy learning curve.
* Supports Amazon Elastic MapReduce
* Allows to run MapReduce jobs as well as PIG Latin scripts

Sincerely,
Vadim Zaliva


Re: Build a indexing and search service with Hadoop

2010-05-13 Thread Edward Capriolo
On Thu, May 13, 2010 at 2:41 PM, Aécio aecio.sola...@gmail.com wrote:

 Hi,

 Actually I have an indexing and search service that receives documents to
 be
 indexed and search requests through XML-RPC. The server uses the Lucene
 Search Engine and store it's index on the local File System. The documents
 to be indexed are usually academic papers, so I'm not trying to index
 large-scale data sets once in a Job, although the indexes may become very
 large as more documents are received.

 Now we are trying to parallelize the search and indexing by distributing
 the
 index into shards on a cluster. I've been studying Hadoop but It's not
 clear
 yet how to implement the system.

 The basic design is:
 1. We receive a document trough XML-RPC and it should be indexed in one
 shard on the cluster.
 2. We receive a query request trough XML-RPC and the query must be executed
 over all shards and then the hits should be returned in a XML response.


 My initial idea is:
 1. Indexing
 - The document received is used as input of one map. This function would
 index the document on the local shard using our custom library build on top
 of Lucene. There is no reduce.

 2. Search
 - The query received is used as input of the map function. This function
 would search the document on the local shard using our custom library and
 emit the hits. The reduce function would group the hits from all shards.


 Is it possible to implement that using Hadoop MapReduce framework?
 Implementing custom InputFormats and OutputFormats?
 Or should I use the Hadoop RPC Layer? There's any documentation about  it?
 Any suggestions?

 Thanks,
 Aécio Santos.

 --
 Instituto Federal de Ciência, Educação e Tecnologia do Piauí
 Laboratório de Pesquisa em Sistemas de Informação
 Teresina - Piauí - Brazil


You can inplement some of what you have described.

Take a look at these if you have not already they are concrete
implementations for building and searching distributed indexes.

http://lucene.apache.org/nutch/
http://katta.sourceforge.net/

(not hadoop based but still pretty cool)
http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/


Seattle Hadoop/NoSQL: Facebook, more Discussion. Thurs May 27th

2010-05-13 Thread Bradford Stephens
We've heard your feedback from the last meetup: we're having less
speakers and more discussion. Yay!
http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/

We're expecting:

1. Facebook will talk about Hive (a SQL-like language for MapReduce)
2. OpsCode will talk about cluster management with Chef
3. Then we'll break up into groups and have casual Hadoop/NoSQL
related discussions and QA with several experts, so you can learn
more!

Also, stay tuned for news on a FREE Seattle Hadoop Community 
Training day in late July. We're going to get some fantastic people,
and you'll have hands-on experience with all the Hadoop ecosystem.

When: Thursday, May 27, 2010 6:45 PM

Where:
Amazon SLU, Von Vorst Building
426 Terry Ave N
Seattle, WA 98109
9044153009

-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


Re: How to recover hadoop cluster using SecondaryNameNode ?

2010-05-13 Thread Jeff Zhang
Allen,

Do you mean HDFS-976 https://issues.apache.org/jira/browse/HDFS-976 and
HDFS-966 https://issues.apache.org/jira/browse/HDFS-966 ?



On Fri, May 14, 2010 at 1:32 AM, Allen Wittenauer awittena...@linkedin.com
wrote:

 This code hasn't been committed to any branch yet and doesn't appear to
have undergone any review outside of Facebook. :(

 On May 13, 2010, at 6:18 AM, Ted Yu wrote:

 I suggest you take a look at AvatarNode which runs standby avatar that
 can be activated if primary avatar fails.


 On Thursday, May 13, 2010, Jeff Zhang zjf...@gmail.com wrote:
 Hi all,

 I wonder is it enough for recovering hadoop cluster by just coping the
 meta data from SecondaryNameNode to the new master node ? Do I any
 need do any other stuffs ?
 Thanks for any help.



 --
 Best Regards

 Jeff Zhang






-- 
Best Regards

Jeff Zhang


Re: Setting up a second cluster and getting a weird issue

2010-05-13 Thread Jeff Zhang
It is not suggested to deploy hadoop on NFS, there will be conflict
between data nodes, because NFS share the same namespace of file
system.



On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen and...@ucsfcti.org wrote:

 Yes, in this deployment, I'm attempting to share the hadoop files via NFS.  
 The log and pid directories are local.

 Thanks!

 --Andrew

 On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:

  These 4 nodes share NFS ?
 
 
  On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
  andrew-lists-had...@ucsfcti.org wrote:
  I'm working on bringing up a second test cluster and am getting these 
  intermittent errors on the DataNodes:
 
  2010-05-12 17:17:15,094 ERROR 
  org.apache.hadoop.hdfs.server.datanode.DataNode: 
  java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such 
  file or directory)
         at java.io.RandomAccessFile.open(Native Method)
         at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
         at 
  org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:249)
         at 
  org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.write(Storage.java:243)
         at 
  org.apache.hadoop.hdfs.server.common.Storage.writeAll(Storage.java:689)
         at 
  org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:560)
         at 
  org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1230)
         at 
  org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1273)
         at 
  org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1394)
 
 
  There are 4 slaves and sometimes 1 or 2 have the error but the specific 
  nodes change.  Sometimes it's slave1, sometimes it's slave4, etc.
 
  Any thoughts?
 
  Thanks!
 
  --Andrew
 
 
 
  --
  Best Regards
 
  Jeff Zhang




--
Best Regards

Jeff Zhang


HDFS-630 patch for Hadoop v0.20

2010-05-13 Thread Raghava Mutharaju
Hello all,

 I am trying to install HBase and while going through the requirements
(link below), it asked me to apply HDFS-630 patch. The latest 2 patches are
for Hadoop 0.21. I am using version 0.20. For this version, should I apply
Todd Lipcon's patch at
https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt.
Would this be the right patch to apply? The directory structures have
changed from 0.20 to 0.21.

http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements


Thank you.

Regards,
Raghava.


Re: HDFS-630 patch for Hadoop v0.20

2010-05-13 Thread Todd Lipcon
Hi Raghava,

Yes, that's a patch targeted at 0.20, but I'm not certain whether it applies
on the vanilla 0.20 code or not. If you'd like a version of Hadoop that
already has it applied and tested, I'd recommend using Cloudera's CDH2.

-Todd

On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju 
m.vijayaragh...@gmail.com wrote:

 Hello all,

 I am trying to install HBase and while going through the requirements
 (link below), it asked me to apply HDFS-630 patch. The latest 2 patches are
 for Hadoop 0.21. I am using version 0.20. For this version, should I apply
 Todd Lipcon's patch at
 https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt
 .
 Would this be the right patch to apply? The directory structures have
 changed from 0.20 to 0.21.


 http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements


 Thank you.

 Regards,
 Raghava.




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HDFS-630 patch for Hadoop v0.20

2010-05-13 Thread Raghava Mutharaju
Hello Todd,

Thank you for the reply. In the cluster I use here, apache Hadoop is
installed. So I have to use that. I am trying out HBase on my laptop first.
Even though I install CDH2, it won't be useful because on the cluster, I
have to work with apache Hadoop. Since version 0.21 is still in development,
there should be a HDFS-630 patch for the current stable release of Hadoop
isn't it?

Regards,
Raghava.

On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote:

 Hi Raghava,

 Yes, that's a patch targeted at 0.20, but I'm not certain whether it
 applies
 on the vanilla 0.20 code or not. If you'd like a version of Hadoop that
 already has it applied and tested, I'd recommend using Cloudera's CDH2.

 -Todd

 On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju 
 m.vijayaragh...@gmail.com wrote:

  Hello all,
 
  I am trying to install HBase and while going through the requirements
  (link below), it asked me to apply HDFS-630 patch. The latest 2 patches
 are
  for Hadoop 0.21. I am using version 0.20. For this version, should I
 apply
  Todd Lipcon's patch at
 
 https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt
  .
  Would this be the right patch to apply? The directory structures have
  changed from 0.20 to 0.21.
 
 
 
 http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements
 
 
  Thank you.
 
  Regards,
  Raghava.
 



 --
 Todd Lipcon
 Software Engineer, Cloudera



Re: HDFS-630 patch for Hadoop v0.20

2010-05-13 Thread Todd Lipcon
On Thu, May 13, 2010 at 8:56 PM, Raghava Mutharaju 
m.vijayaragh...@gmail.com wrote:

 Hello Todd,

Thank you for the reply. In the cluster I use here, apache Hadoop is
 installed. So I have to use that. I am trying out HBase on my laptop first.
 Even though I install CDH2, it won't be useful because on the cluster, I
 have to work with apache Hadoop. Since version 0.21 is still in
 development,
 there should be a HDFS-630 patch for the current stable release of Hadoop
 isn't it?


No, it was not considered for release in Hadoop 0.20.X because it breaks
wire compatibility, and though I've done a workaround to avoid issues
stemming from that, it would be unlikely to pass a backport vote.

-Todd

On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote:

  Hi Raghava,
 
  Yes, that's a patch targeted at 0.20, but I'm not certain whether it
  applies
  on the vanilla 0.20 code or not. If you'd like a version of Hadoop that
  already has it applied and tested, I'd recommend using Cloudera's CDH2.
 
  -Todd
 
  On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju 
  m.vijayaragh...@gmail.com wrote:
 
   Hello all,
  
   I am trying to install HBase and while going through the
 requirements
   (link below), it asked me to apply HDFS-630 patch. The latest 2 patches
  are
   for Hadoop 0.21. I am using version 0.20. For this version, should I
  apply
   Todd Lipcon's patch at
  
 
 https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt
   .
   Would this be the right patch to apply? The directory structures have
   changed from 0.20 to 0.21.
  
  
  
 
 http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements
  
  
   Thank you.
  
   Regards,
   Raghava.
  
 
 
 
  --
  Todd Lipcon
  Software Engineer, Cloudera
 




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HDFS-630 patch for Hadoop v0.20

2010-05-13 Thread Raghava Mutharaju
Hello Todd,

Oh, then isn't it a bit contradictory to the instructions on HBase overview
page.

http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements

It says that the current version of HBase works only with 0.20.X and asks
users to apply the patch HDFS-630 but that patch is not available for
0.20.X. Is there any work around for this? Is that patch really required?

Thank you.

Regards,
Raghava.

On Fri, May 14, 2010 at 12:03 AM, Todd Lipcon t...@cloudera.com wrote:

 On Thu, May 13, 2010 at 8:56 PM, Raghava Mutharaju 
 m.vijayaragh...@gmail.com wrote:

  Hello Todd,
 
 Thank you for the reply. In the cluster I use here, apache Hadoop
 is
  installed. So I have to use that. I am trying out HBase on my laptop
 first.
  Even though I install CDH2, it won't be useful because on the cluster, I
  have to work with apache Hadoop. Since version 0.21 is still in
  development,
  there should be a HDFS-630 patch for the current stable release of Hadoop
  isn't it?
 

 No, it was not considered for release in Hadoop 0.20.X because it breaks
 wire compatibility, and though I've done a workaround to avoid issues
 stemming from that, it would be unlikely to pass a backport vote.

 -Todd

 On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote:
 
   Hi Raghava,
  
   Yes, that's a patch targeted at 0.20, but I'm not certain whether it
   applies
   on the vanilla 0.20 code or not. If you'd like a version of Hadoop that
   already has it applied and tested, I'd recommend using Cloudera's CDH2.
  
   -Todd
  
   On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju 
   m.vijayaragh...@gmail.com wrote:
  
Hello all,
   
I am trying to install HBase and while going through the
  requirements
(link below), it asked me to apply HDFS-630 patch. The latest 2
 patches
   are
for Hadoop 0.21. I am using version 0.20. For this version, should I
   apply
Todd Lipcon's patch at
   
  
 
 https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt
.
Would this be the right patch to apply? The directory structures have
changed from 0.20 to 0.21.
   
   
   
  
 
 http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements
   
   
Thank you.
   
Regards,
Raghava.
   
  
  
  
   --
   Todd Lipcon
   Software Engineer, Cloudera
  
 



 --
 Todd Lipcon
 Software Engineer, Cloudera



Re: HDFS-630 patch for Hadoop v0.20

2010-05-13 Thread Raghava Mutharaju
oops, sorry, by really required I meant, would that problem arise only in
any special situations or is it required for normal operations also.

Regards,
Raghava.

On Fri, May 14, 2010 at 12:11 AM, Raghava Mutharaju 
m.vijayaragh...@gmail.com wrote:

 Hello Todd,

 Oh, then isn't it a bit contradictory to the instructions on HBase overview
 page.



 http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements

 It says that the current version of HBase works only with 0.20.X and asks
 users to apply the patch HDFS-630 but that patch is not available for
 0.20.X. Is there any work around for this? Is that patch really required?

 Thank you.

 Regards,
 Raghava.


 On Fri, May 14, 2010 at 12:03 AM, Todd Lipcon t...@cloudera.com wrote:

 On Thu, May 13, 2010 at 8:56 PM, Raghava Mutharaju 
 m.vijayaragh...@gmail.com wrote:

  Hello Todd,
 
 Thank you for the reply. In the cluster I use here, apache Hadoop
 is
  installed. So I have to use that. I am trying out HBase on my laptop
 first.
  Even though I install CDH2, it won't be useful because on the cluster, I
  have to work with apache Hadoop. Since version 0.21 is still in
  development,
  there should be a HDFS-630 patch for the current stable release of
 Hadoop
  isn't it?
 

 No, it was not considered for release in Hadoop 0.20.X because it breaks
 wire compatibility, and though I've done a workaround to avoid issues
 stemming from that, it would be unlikely to pass a backport vote.

 -Todd

 On Thu, May 13, 2010 at 11:50 PM, Todd Lipcon t...@cloudera.com wrote:
 
   Hi Raghava,
  
   Yes, that's a patch targeted at 0.20, but I'm not certain whether it
   applies
   on the vanilla 0.20 code or not. If you'd like a version of Hadoop
 that
   already has it applied and tested, I'd recommend using Cloudera's
 CDH2.
  
   -Todd
  
   On Thu, May 13, 2010 at 7:59 PM, Raghava Mutharaju 
   m.vijayaragh...@gmail.com wrote:
  
Hello all,
   
I am trying to install HBase and while going through the
  requirements
(link below), it asked me to apply HDFS-630 patch. The latest 2
 patches
   are
for Hadoop 0.21. I am using version 0.20. For this version, should I
   apply
Todd Lipcon's patch at
   
  
 
 https://issues.apache.org/jira/secure/attachment/12430230/hdfs-630-0.20.txt
.
Would this be the right patch to apply? The directory structures
 have
changed from 0.20 to 0.21.
   
   
   
  
 
 http://hadoop.apache.org/hbase/docs/current/api/overview-summary.html#requirements
   
   
Thank you.
   
Regards,
Raghava.
   
  
  
  
   --
   Todd Lipcon
   Software Engineer, Cloudera
  
 



 --
 Todd Lipcon
 Software Engineer, Cloudera