[ANN] HBase 0.20.0-alpha available for download

2009-06-16 Thread stack
An alpha version of HBase 0.20.0 is available for download at:

  http://people.apache.org/~stack/hbase-0.20.0-alpha/

We are making this release available to preview what is coming in HBase
0.20.0.  In short, 0.20.0 is about performance and high-availability.  Also,
a new, richer API has been added and the old deprecated.  Here is a list of
almost 300 issues addressed so far in 0.20.0: http://tinyurl.com/ntvheo

This alpha release contains known bugs.  See http://tinyurl.com/kvfsft for
the current list.  In particular, this alpha release is without a migration
script to bring your 0.19.x era data forward to work on hbase 0.20.0.  A
working, well-tested migration script will be in place before we cut the
first HBase 0.20.0 release candidate some time in the next week or so.

After download, please take the time to review the 0.20.0 'Getting Started'
also available here:
http://people.apache.org/~stack/hbase-0.20.0-alpha/docs/api/overview-summary.html#overview_description.
HBase 0.20.0 has new dependencies, in particular it now depends on
ZooKeeper.  With ZooKeeper in the mix a few core HBase configurations have
been removed and replaced with ZooKeeper configurations instead.

Also of note, HBase 0.20.0 will include Stargate, an improved REST
connector for HBase.  The old, bundled REST connector will be deprecated.
Stargate is implemented using the Jersey framework.  It includes protobuf
encoding support, has caching proxy awareness, supports batching for
scanners and updates, and in general has the goal of enabling Web scale
storage systems (a la S3) backed by HBase.  Currently its only available up
on github, http://github.com/macdiesel/stargate/tree/master.  It will be
added to a new contrib directory before we cut a release candidate.

Please let us know if you have difficulty with the install, if you find the
documentation missing or, if you trip over bugs hbasing.

Yours,
The HBasistas


[ANN] hbase-0.19.3 available for download

2009-05-27 Thread stack
HBase 0.19.3 is now available for download:
http://www.apache.org/dyn/closer.cgi/hadoop/hbase/

This release addresses 14 issues found since the release of 0.19.2.
See the release notes for details: http://tinyurl.com/qcd4dg

We recommend that all upgrade to this version of hbase.

Thanks to all who made this release possible.

Yours,
The HBase Team


Re: Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-12 Thread stack
On Mon, May 11, 2009 at 9:43 PM, Raghu Angadi rang...@yahoo-inc.com wrote:

 stack wrote:

 Thanks Raghu:

 Here is where it gets stuck:  [...]


 Is that where it normally stuck? That implies it is spending unusually long
 time at the end of writing a block, which should not be the case.


I studied datanode as you suggested.  This sent be back to the client
application and indeed, we were spending time finalizing blocks because
block size had been set way down in the application.  Write-rate is
reasonable again.

Thanks for the pointers Raghu,
St.Ack


Re: Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-11 Thread stack
Thanks Raghu:

Here is where it gets stuck:

DataStreamer for file
/hbasetrunk2/.logs/aa0-000-13.u.powerset.com_1241988169615_60021/hlog.dat.1242020985471
block blk_-1659539029802462400_12649 daemon prio=10 tid=0x7f10ac00
nid=0x660 in Object.wait() [0x43a33000..0x43a33c80]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2322)
- locked 0x7f10e2b0c588 (a java.util.LinkedList)

Which is the wait in the below in the middle of DataStream.run:

  // Is this block full?
  if (one.lastPacketInBlock) {
synchronized (ackQueue) {
  while (!hasError  ackQueue.size() != 0  clientRunning) {
try {
  ackQueue.wait();   // wait for acks to arrive from
datanodes
} catch (InterruptedException  e) {
}
  }
}

Sounds like, if we set the replication down from 3 to 2 it should write a
little faster.

Regards increasing size of ackqueue, are you thinking maxPackage?  Currently
its hardcoded at 80 -- a queue of 5MB (packets are 64k).  You thinking I
should experiment with that?  I suppose that won't hel w/ much w/ getting my
writes on the datanode.  Maybe I should be digging on datanode side to
figure why its slow getting back to the client?

Thanks,
St.Ack




On Sun, May 10, 2009 at 7:49 PM, Raghu Angadi rang...@yahoo-inc.com wrote:


 It should not be waiting unnecessarily. But the client has to, if any of
 the datanodes in the pipeline is not able to receive the as fast as client
 is writing. IOW writing goes as fast as the slowest of nodes involved in the
 pipeline (1 client and 3 datanodes).

 But based on what your case is, you probably could benefit by increasing
 the buffer (number of unacked packets).. it would depend on where the
 datastream thread is blocked.

 Raghu.


 stack wrote:

 Writing a file, our application spends a load of time here:

at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at

 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2964)
- locked 0x7f11054c2b68 (a java.util.LinkedList)
- locked 0x7f11054c24c0 (a
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at

 org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
at
 org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
- locked 0x7f11054c24c0 (a
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
 org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
- locked 0x7f11054c24c0 (a
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
 org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
at
 org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
- locked 0x7f11054c24c0 (a
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at

 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
- locked 0x7f1105694f28 (a
 org.apache.hadoop.fs.FSDataOutputStream)
at
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1020)
- locked 0x7f1105694e98 (a
 org.apache.hadoop.io.SequenceFile$Writer)
at
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:984)

 Here is the code from around line 2964 in writeChunk.

// If queue is full, then wait till we can create  enough
 space
while (!closed  dataQueue.size() + ackQueue.size()   maxPackets)
 {
  try
 {

 dataQueue.wait();
  } catch (InterruptedException  e) {

 }

}

 The queue of packets is full and we're waiting for it to be cleared.

 Any suggestions for how I might get the DataStreamer to act more promptly
 clearing the package queue?

 This is hadoop 0.20 branch.  Its a small cluster but relatively lightly
 loaded (so says ganglia).

 Thanks,
 St.Ack





Suggestions for making writing faster? DFSClient waiting while writing chunk

2009-05-10 Thread stack
Writing a file, our application spends a load of time here:

at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2964)
- locked 0x7f11054c2b68 (a java.util.LinkedList)
- locked 0x7f11054c24c0 (a
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
- locked 0x7f11054c24c0 (a
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
- locked 0x7f11054c24c0 (a
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
- locked 0x7f11054c24c0 (a
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
- locked 0x7f1105694f28 (a
org.apache.hadoop.fs.FSDataOutputStream)
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1020)
- locked 0x7f1105694e98 (a
org.apache.hadoop.io.SequenceFile$Writer)
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:984)

Here is the code from around line 2964 in writeChunk.

// If queue is full, then wait till we can create  enough
space
while (!closed  dataQueue.size() + ackQueue.size()   maxPackets)
{
  try
{

dataQueue.wait();
  } catch (InterruptedException  e) {

}

}

The queue of packets is full and we're waiting for it to be cleared.

Any suggestions for how I might get the DataStreamer to act more promptly
clearing the package queue?

This is hadoop 0.20 branch.  Its a small cluster but relatively lightly
loaded (so says ganglia).

Thanks,
St.Ack


[ANN] hbase-0.19.2 available for download

2009-05-09 Thread stack
HBase 0.19.2 is now available for download

 http://hadoop.apache.org/hbase/releases.html

17 issues have been fixed since hbase 0.19.1.   Release notes are available
here: *http://tinyurl.com/p3x2bn* http://tinyurl.com/8xmyx9

Thanks to all who contributed to this release.

At your service,
The HBase Team


Re: ApacheCon EU 2009 this week

2009-03-23 Thread stack
Riding on the below's coat-tails, there'll also be a talk on HBase on the
wednesday at 3pm.
Thanks,
St.Ack

On Mon, Mar 23, 2009 at 11:06 AM, Owen O'Malley omal...@apache.org wrote:

  ApacheCon EU 2009 is in Amsterdam this week, with a lot of talks on
 Hadoop. There are also going to be a lot of the committers there, including
 Doug Cutting. There is no word yet whether he is bringing the original
 Hadoop as seen in the NY Times.

 This year the live video streaming includes the Hadoop track. The video of
 the keynote and lunch talks are free, but there is a charge for the Hadoop
 track.

 The Hadoop track this year includes:

* Opening Keynote - Data Management in the Cloud - Raghu Ramakrishnan
* Introduction to Hadoop - Owen O'Malley
* Hadoop Map-Reduce: Tuning and Debugging - Arun Murthy
* Pig: Making Hadoop Easy - Olga Natkovich
* Running Hadoop in the Cloud - Tom White
* Configuring Hadoop for Grid Services - Allen Wittenauer
* Dynamic Hadoop Clusters - Steve Loughran

 -- Owen




Re: RDF store over HDFS/HBase

2009-03-23 Thread stack
Philip:

Anywhere we can go to learn more about the effort?  What can we do in HBase
to make the project more likely to succeed?

Thank you,
St.Ack

On Mon, Mar 23, 2009 at 5:05 PM, Philip M. White p...@qnan.org wrote:

 On Mon, Mar 23, 2009 at 04:07:01PM -0700, Amandeep Khurana wrote:
  Has anyone explored using HDFS/HBase as the underlying storage for an RDF
  store? Most solutions (all are single node) that I have found till now
 scale
  up only to a couple of billion rows in the Triple store. Wondering how
  Hadoop could be leveraged here...

 Amandeep, the Semantic Web Research Lab of the University of Texas at
 Dallas is working on this.  We expect to have an implementation of this
 for Jena by summer.

 --
 Philip



Re: Connection problem during data import into hbase

2009-02-21 Thread stack
The table exists before you start the MR job?

When you say 'midway through the job', are you using tableoutputformat to
insert into your table?

Which version of hbase?

St.Ack

On Fri, Feb 20, 2009 at 9:55 PM, Amandeep Khurana ama...@gmail.com wrote:

 I dont know if this is related or not, but it seems to be. After this map
 reduce job, I tried to count the number of entries in the table in hbase
 through the shell. It failed with the following error:

 hbase(main):002:0 count 'in_table'
 NativeException: java.lang.NullPointerException: null
from java.lang.String:-1:in `init'
from org/apache/hadoop/hbase/util/Bytes.java:92:in `toString'
from org/apache/hadoop/hbase/client/RetriesExhaustedException.java:50:in
 `getMessage'
from org/apache/hadoop/hbase/client/RetriesExhaustedException.java:40:in
 `init'
from org/apache/hadoop/hbase/client/HConnectionManager.java:841:in
 `getRegionServerWithRetries'
from org/apache/hadoop/hbase/client/MetaScanner.java:56:in `metaScan'
from org/apache/hadoop/hbase/client/MetaScanner.java:30:in `metaScan'
from org/apache/hadoop/hbase/client/HConnectionManager.java:411:in
 `getHTableDescriptor'
from org/apache/hadoop/hbase/client/HTable.java:219:in
 `getTableDescriptor'
from sun.reflect.NativeMethodAccessorImpl:-2:in `invoke0'
from sun.reflect.NativeMethodAccessorImpl:-1:in `invoke'
from sun.reflect.DelegatingMethodAccessorImpl:-1:in `invoke'
from java.lang.reflect.Method:-1:in `invoke'
from org/jruby/javasupport/JavaMethod.java:250:in
 `invokeWithExceptionHandling'
from org/jruby/javasupport/JavaMethod.java:219:in `invoke'
from org/jruby/javasupport/JavaClass.java:416:in `execute'
 ... 145 levels...
from org/jruby/internal/runtime/methods/DynamicMethod.java:74:in `call'
from org/jruby/internal/runtime/methods/CompiledMethod.java:48:in `call'
from org/jruby/runtime/CallSite.java:123:in `cacheAndCall'
from org/jruby/runtime/CallSite.java:298:in `call'
from

 ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:429:in
 `__file__'
from

 ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in
 `__file__'
from

 ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in
 `load'
from org/jruby/Ruby.java:512:in `runScript'
from org/jruby/Ruby.java:432:in `runNormally'
from org/jruby/Ruby.java:312:in `runFromMain'
from org/jruby/Main.java:144:in `run'
from org/jruby/Main.java:89:in `run'
from org/jruby/Main.java:80:in `main'
from /hadoop/install/hbase/bin/../bin/HBase.rb:444:in `count'
from /hadoop/install/hbase/bin/../bin/hirb.rb:348:in `count'
from (hbase):3:in `binding'


 Amandeep Khurana
 Computer Science Graduate Student
 University of California, Santa Cruz


 On Fri, Feb 20, 2009 at 9:46 PM, Amandeep Khurana ama...@gmail.com
 wrote:

  Here's what it throws on the console:
 
  09/02/20 21:45:29 INFO mapred.JobClient: Task Id :
  attempt_200902201300_0019_m_06_0, Status : FAILED
  java.io.IOException: table is null
  at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:33)
  at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:1)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
  at org.apache.hadoop.mapred.Child.main(Child.java:155)
 
  attempt_200902201300_0019_m_06_0:
  org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
 trying
  to locate root region
  attempt_200902201300_0019_m_06_0:   at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:768)
  attempt_200902201300_0019_m_06_0:   at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:448)
  attempt_200902201300_0019_m_06_0:   at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430)
  attempt_200902201300_0019_m_06_0:   at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557)
  attempt_200902201300_0019_m_06_0:   at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:457)
  attempt_200902201300_0019_m_06_0:   at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:430)
  attempt_200902201300_0019_m_06_0:   at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:557)
  attempt_200902201300_0019_m_06_0:   at
 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:461)
  attempt_200902201300_0019_m_06_0:   at
 
 

Re: Connection problem during data import into hbase

2009-02-21 Thread stack
It looks like regionserver hosting root crashed:

org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
to locate root region

How many servers you running?

You made similar config. to that reported by Larry Compton in a mail from
earlier today?  (See FAQ and Troubleshooting page for more on his listed
configs.)

St.Ack


On Sat, Feb 21, 2009 at 1:01 AM, Amandeep Khurana ama...@gmail.com wrote:

 Yes, the table exists before I start the job.

 I am not using TableOutputFormat. I picked up the sample code from the docs
 and am using it.

 Here's the job conf:

 JobConf conf = new JobConf(getConf(), IN_TABLE_IMPORT.class);
FileInputFormat.setInputPaths(conf, new Path(import_data));
conf.setMapperClass(MapClass.class);
conf.setNumReduceTasks(0);
conf.setOutputFormat(NullOutputFormat.class);
JobClient.runJob(conf);

 Interestingly, the hbase shell isnt working now either. Its giving errors
 even when I give the command list...



 Amandeep Khurana
 Computer Science Graduate Student
 University of California, Santa Cruz


 On Sat, Feb 21, 2009 at 12:10 AM, stack st...@duboce.net wrote:

  The table exists before you start the MR job?
 
  When you say 'midway through the job', are you using tableoutputformat to
  insert into your table?
 
  Which version of hbase?
 
  St.Ack
 
  On Fri, Feb 20, 2009 at 9:55 PM, Amandeep Khurana ama...@gmail.com
  wrote:
 
   I dont know if this is related or not, but it seems to be. After this
 map
   reduce job, I tried to count the number of entries in the table in
 hbase
   through the shell. It failed with the following error:
  
   hbase(main):002:0 count 'in_table'
   NativeException: java.lang.NullPointerException: null
  from java.lang.String:-1:in `init'
  from org/apache/hadoop/hbase/util/Bytes.java:92:in `toString'
  from
  org/apache/hadoop/hbase/client/RetriesExhaustedException.java:50:in
   `getMessage'
  from
  org/apache/hadoop/hbase/client/RetriesExhaustedException.java:40:in
   `init'
  from org/apache/hadoop/hbase/client/HConnectionManager.java:841:in
   `getRegionServerWithRetries'
  from org/apache/hadoop/hbase/client/MetaScanner.java:56:in
 `metaScan'
  from org/apache/hadoop/hbase/client/MetaScanner.java:30:in
 `metaScan'
  from org/apache/hadoop/hbase/client/HConnectionManager.java:411:in
   `getHTableDescriptor'
  from org/apache/hadoop/hbase/client/HTable.java:219:in
   `getTableDescriptor'
  from sun.reflect.NativeMethodAccessorImpl:-2:in `invoke0'
  from sun.reflect.NativeMethodAccessorImpl:-1:in `invoke'
  from sun.reflect.DelegatingMethodAccessorImpl:-1:in `invoke'
  from java.lang.reflect.Method:-1:in `invoke'
  from org/jruby/javasupport/JavaMethod.java:250:in
   `invokeWithExceptionHandling'
  from org/jruby/javasupport/JavaMethod.java:219:in `invoke'
  from org/jruby/javasupport/JavaClass.java:416:in `execute'
   ... 145 levels...
  from org/jruby/internal/runtime/methods/DynamicMethod.java:74:in
  `call'
  from org/jruby/internal/runtime/methods/CompiledMethod.java:48:in
  `call'
  from org/jruby/runtime/CallSite.java:123:in `cacheAndCall'
  from org/jruby/runtime/CallSite.java:298:in `call'
  from
  
  
 
 ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:429:in
   `__file__'
  from
  
  
 
 ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in
   `__file__'
  from
  
  
 
 ruby/hadoop/install/hbase_minus_0_dot_19_dot_0/bin//hadoop/install/hbase/bin/../bin/hirb.rb:-1:in
   `load'
  from org/jruby/Ruby.java:512:in `runScript'
  from org/jruby/Ruby.java:432:in `runNormally'
  from org/jruby/Ruby.java:312:in `runFromMain'
  from org/jruby/Main.java:144:in `run'
  from org/jruby/Main.java:89:in `run'
  from org/jruby/Main.java:80:in `main'
  from /hadoop/install/hbase/bin/../bin/HBase.rb:444:in `count'
  from /hadoop/install/hbase/bin/../bin/hirb.rb:348:in `count'
  from (hbase):3:in `binding'
  
  
   Amandeep Khurana
   Computer Science Graduate Student
   University of California, Santa Cruz
  
  
   On Fri, Feb 20, 2009 at 9:46 PM, Amandeep Khurana ama...@gmail.com
   wrote:
  
Here's what it throws on the console:
   
09/02/20 21:45:29 INFO mapred.JobClient: Task Id :
attempt_200902201300_0019_m_06_0, Status : FAILED
java.io.IOException: table is null
at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:33)
at IN_TABLE_IMPORT$MapClass.map(IN_TABLE_IMPORT.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
   
attempt_200902201300_0019_m_06_0:
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
   trying
to locate root region

ANN: hbase-0.19.0 release available for download

2009-01-21 Thread stack

HBase 0.19.0 is now available for download

 http://hadoop.apache.org/hbase/releases.html

Thanks to all who contributed to this release.  185 issues have been 
fixed since hbase 0.18.0.   Release notes are available here: 
http://tinyurl.com/8xmyx9


At your service,
The HBase Team



Re: Hung in DFSClient$DFSOutputStream.writeChunk

2008-11-21 Thread stack
:37:29,530 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
/XX.XX.45.128:50020. Already tried 3 time(s).
2008-11-21 16:37:30,540 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /XX.XX.45.128:50020. Already tried 4 time(s).
2008-11-21 16:37:31,550 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /XX.XX.45.128:50020. Already tried 5 time(s).
2008-11-21 16:37:32,560 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /XX.XX.45.128:50020. Already tried 6 time(s).
2008-11-21 16:37:33,570 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /XX.XX.45.128:50020. Already tried 7 time(s).
2008-11-21 16:37:34,580 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /XX.XX.45.128:50020. Already tried 8 time(s).
2008-11-21 16:37:35,590 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: /XX.XX.45.128:50020. Already tried 9 time(s).2008-11-21
16:37:35,591 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block
blk_8300709359495898650_29813788 failed  because recovery from primary
datanode XX.XX.45.128:50010 failed 6 times. Aborting...

St.Ack

...

On Thu, Nov 20, 2008 at 1:51 PM, stack [EMAIL PROTECTED] wrote:

 stack wrote:

 Over in hbase-space, we trigger a hang in DFSOutputStream.writeChunk.
  Input appreciated.


 Pardon me.  The above should have read, ...we sometimes trigger.

 The below stack traces are from hadoop-0.18.2.

 Other ill-documented instances of the hang can be found over in HBASE-667.

 Thanks,
 St.Ack




 Here are the two pertinent extracts from the hbase regionserver thread
 dump:

 IPC Server handler 9 on 60020 daemon prio=10 tid=0x7fef1c3f0400
 nid=0x7470 waiting for monitor entry
 [0x42d18000..0x42d189f0]
  java.lang.Thread.State: BLOCKED (on object monitor)
   at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2486)

   - waiting to lock 0x7fef38ecc138 (a java.util.LinkedList)
   - locked 0x7fef38ecbdb8 (a
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at
 org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)

   at
 org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
   - locked 0x7fef38ecbdb8 (a
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at
 org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
   - locked 0x7fef38ecbdb8 (a
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
   - locked 0x7fef38ecbdb8 (a
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)

   at java.io.DataOutputStream.write(DataOutputStream.java:107)
   - locked 0x7fef38e09fc0 (a
 org.apache.hadoop.fs.FSDataOutputStream)
   at
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016)
   - locked 0x7fef38e09f30 (a
 org.apache.hadoop.io.SequenceFile$Writer)
   at
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:980)
   - locked 0x7fef38e09f30 (a
 org.apache.hadoop.io.SequenceFile$Writer)
   at org.apache.hadoop.hbase.regionserver.HLog.doWrite(HLog.java:461)
   at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:421)
   - locked 0x7fef29ad9588 (a java.lang.Integer)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.update(HRegion.java:1676)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1439)
   at
 org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1378)
   at
 org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1184)

   at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   at java.lang.reflect.Method.invoke(Method.java:616)
   at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:622)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)


 Here is trace from accompanying DataStreamer:


 DataStreamer for file
 /hbase/log_72.34.249.212_1225407466779_60020/hlog.dat.1227075571390 block
 blk_-7436808403424765554_553837 daemon prio=10 tid=0x01c84c00
 nid=0x7125 in Object.wait() [0x409b3000..0x409b3d70]
  java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at java.lang.Object.wait(Object.java:502)
   at org.apache.hadoop.ipc.Client.call(Client.java:709)
   - locked 0x7fef39520bb8 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
   at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343

Hung in DFSClient$DFSOutputStream.writeChunk

2008-11-20 Thread stack
Over in hbase-space, we trigger a hang in DFSOutputStream.writeChunk.  
Input appreciated.


Here are the two pertinent extracts from the hbase regionserver thread dump:

IPC Server handler 9 on 60020 daemon prio=10 tid=0x7fef1c3f0400 
nid=0x7470 waiting for monitor entry 
[0x42d18000..0x42d189f0]

  java.lang.Thread.State: BLOCKED (on object monitor)
   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2486)

   - waiting to lock 0x7fef38ecc138 (a java.util.LinkedList)
   - locked 0x7fef38ecbdb8 (a 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
   at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
   - locked 0x7fef38ecbdb8 (a 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
   - locked 0x7fef38ecbdb8 (a 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream)

   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
   - locked 0x7fef38ecbdb8 (a 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)

   at java.io.DataOutputStream.write(DataOutputStream.java:107)
   - locked 0x7fef38e09fc0 (a 
org.apache.hadoop.fs.FSDataOutputStream)
   at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016)
   - locked 0x7fef38e09f30 (a 
org.apache.hadoop.io.SequenceFile$Writer)
   at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:980)
   - locked 0x7fef38e09f30 (a 
org.apache.hadoop.io.SequenceFile$Writer)

   at org.apache.hadoop.hbase.regionserver.HLog.doWrite(HLog.java:461)
   at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:421)
   - locked 0x7fef29ad9588 (a java.lang.Integer)
   at 
org.apache.hadoop.hbase.regionserver.HRegion.update(HRegion.java:1676)
   at 
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1439)
   at 
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1378)
   at 
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1184)

   at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   at java.lang.reflect.Method.invoke(Method.java:616)
   at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:622)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)


Here is trace from accompanying DataStreamer:


DataStreamer for file 
/hbase/log_72.34.249.212_1225407466779_60020/hlog.dat.1227075571390 
block blk_-7436808403424765554_553837 daemon prio=10 
tid=0x01c84c00 nid=0x7125 in Object.wait() 
[0x409b3000..0x409b3d70]

  java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at java.lang.Object.wait(Object.java:502)
   at org.apache.hadoop.ipc.Client.call(Client.java:709)
   - locked 0x7fef39520bb8 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
   at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
   at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
   at 
org.apache.hadoop.dfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:139)
   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2185)
   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

   - locked 0x7fef38ecc138 (a java.util.LinkedList)

Repeated thread dumpings show that we never move on from this state.

Looking at where we're stuck in DataStreamer, we're down in 
processDatanodeError trying to setup an RPC proxy with the 'least' 
datanode as our new 'primary'.


2178 // Tell the primary datanode to do error recovery
2179 // by stamping appropriate generation stamps.
2180 //
2181 Block newBlock = null;
2182 ClientDatanodeProtocol primary =  null;
2183 try {
2184   // Pick the least datanode as the primary datanode to 
avoid deadlock.

2185   primary = createClientDatanodeProtocolProxy(
2186   Collections.min(Arrays.asList(newnodes)), conf);
2187   newBlock = primary.recoverBlock(block, newnodes);
2188 } catch (IOException e) {


If I read this right, problem is that since RPC doesn't timeout, we 
never return from createClientDatanodeProtocolProxy (At the time, a 
machine vanished from 

Re: Question on opening file info from namenode in DFSClient

2008-11-09 Thread stack

Taeho Kang wrote:

Hi, thanks for your reply Dhruba,

One of my co-workers is writing a BigTable-like application that could be
used for online, near-real-time, services. 
Can your co-worker be convinced to instead spend his time helping-along 
the ongoing bigtable-like efforts?

I think HBase developers would have run into similar issues as well.
  
In hbase, we open the file once and keep it open.  File is shared 
amongst all clients.


St.Ack


[ANN] hbase 0.18.0 available

2008-09-21 Thread stack
HBase 0.18.0 fixes 57 issues [1] since the HBase 0.2.0 release.  New 
features include experimental transaction support and client-side 
exposure of row locks.


HBase 0.18.0 runs on Hadoop 0.18.0.

With this release, HBase major+minor version now echoes that of the 
Hadoop core version it depends on.  See FAQ #18 and #19 for more on 
HBase versioning [2].


Thanks to all who contributed to this release.

Yours,
The HBase Team

1. http://tinyurl.com/4zl9ch
2. http://wiki.apache.org/hadoop/Hbase/FAQ#18


[ANN] hbase-0.2.0 release

2008-08-09 Thread stack
The HBase 0.2.0 release includes 291 changes [1]. New features include a
richer API, a new ruby irb-based shell, an improved UI, and many
improvements to overall stability.  To download, visit [4].

HBase 0.2.0 is not backward compatible with HBase 0.1 API (See [2] for an
overview of the changes). To migrate your 0.1 era HBase data to 0.2, see the
Migration Guide [3] http://wiki.apache.org/hadoop/Hbase/HowToMigrate.
HBase 0.2.0 runs on Hadoop 0.17.x. To run 0.2.0 on hadoop 0.18.x, replace
the hadoop 0.17.1 jars under $HBASE_HOME/lib with their 0.18.x equivalents
and then recompile.

Thanks to all who contributed to this release.

Yours,
The HBase Team

1.
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truepid=12310753fixfor=12312955
2. http://wiki.apache.org/hadoop/Hbase/Plan-0.2/APIChanges
3. http://wiki.apache.org/hadoop/Hbase/HowToMigrate
4. http://www.apache.org/dyn/closer.cgi/hadoop/hbase/


[ANN] hbase-0.1.3 release

2008-06-27 Thread stack
hbase-0.1.3 is available for download: 
http://www.apache.org/dyn/closer.cgi/hadoop/hbase/


hbase-0.1.3 resolves 20 issues [1] including fixes for a regionserver 
deadlock, non-splitting in the presence of regions of multiple families 
under load, unreliable iteration of vintage cells, and improved 
robustness around recovery from crashes.  Also bundles mild performance 
improvements.


We recommend all upgrade to this latest version.

Thanks to all who contributed to this release.

hbase-0.1.3 runs on hadoop-0.16.x.

Yours,
The HBase Team

[1] 
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=trueamp;pid=12310753amp;fixfor=12313169 



Re: Hadoop fsck displays open files as corrupt.

2008-05-22 Thread stack

The first case sounds like HADOOP-2703.
St.Ack


Martin Schaaf wrote:

Hi,

we wrote a program that uses a Writer to append keys and values to a file.
If we do an fsck during these writing the opened files are reported as corrupt 
and
the file size is zero until they are closed. On the other side if
we copy a file from local fs to the hadoop fs the size constantly increases and 
the
files aren't displayed as corrupt. So my question is this the expected 
behaviour? What
is the difference between this two operation.

Thanks in advance for your help
martin
  




[ANN] REMINDER HBase User Group Meeting 3: Tomorrow night, May 20th, in San Franciso

2008-05-19 Thread stack

See http://upcoming.yahoo.com/event/672690/?ps=6
Thanks,
St.Ack


[ANN] hbase-0.1.2 release now available for download

2008-05-14 Thread stack
hbase-0.1.2 resolves 27 issues including critical fixes for 'missing' 
edits and unreliable onlining/offlining of tables.  We recommend all 
upgrade to this latest version.


To download, please go to http://hadoop.apache.org/hbase/releases.html.

Thanks to all who contributed to this release.

Yours,
The HBase team.


[ANN] HUG3 -- The Third HBase User Group Meeting, 20th May in San Francisco

2008-05-13 Thread stack

See http://upcoming.yahoo.com/event/672690
Thanks,
St.Ack


Re: Hadoop Permissions Question - [Fwd: Hbase on hadoop]

2008-05-09 Thread stack

[EMAIL PROTECTED] wrote:

The stack track is good enough.  HMasting does 
DistributedFileSystem.setSafeMod(...) which required superuser privilege.

Nicholas
  


HBase won't start if HDFS is in safe mode.  HADOOP-3066, committed to 
hadoop-0.17, made it so querying if hdfs is in 'safe mode' no longer 
requires superuser privilege.

St.Ack





- Original Message 
From: Rick Hangartner [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Friday, May 9, 2008 11:51:55 AM
Subject: Re: Hadoop Permissions Question - [Fwd: Hbase on hadoop]

Hi Nicholas,

I was the original poster of this question.  Thanks for your  
response.  (And thanks for elevating attention to this Stack).


Am I missing something or is one implication of how hdfs determines  
privileges from the Linux filesystem that the hbase master must be run  
on the same machine as the hadoop hdfs (what part of it?) if one wants  
to use the hdfs permissions system or that right now we must run  
without permissions?


Here's most of the full Java trace for the exception that might be  
helpful in determining why superuser privilege is required to run  
HMaster.  Unfortunately log4j appears to have chopped off the last 6  
entries.  (This is from the hbase log).


Thanks for the help.

2008-05-08 10:13:28,670 ERROR org.apache.hadoop.hbase.HMaster: Can not  
start master

java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native  
Method)
at  
sun 
.reflect 
.NativeConstructorAccessorImpl 
.newInstance(NativeConstructorAccessorImpl.java:39)
at  
sun 
.reflect 
.DelegatingConstructorAccessorImpl 
.newInstance(DelegatingConstructorAccessorImpl.java:27)

at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at org.apache.hadoop.hbase.HMaster.doMain(HMaster.java:3312)
at org.apache.hadoop.hbase.HMaster.main(HMaster.java:3346)
Caused by: org.apache.hadoop.ipc.RemoteException:  
org.apache.hadoop.fs.permission.AccessControlException: Superuser  
privilege is required
at  
org 
.apache 
.hadoop.dfs.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:4020)
at org.apache.hadoop.dfs.FSNamesystem.setSafeMode(FSNamesystem.java: 
3794)

at org.apache.hadoop.dfs.NameNode.setSafeMode(NameNode.java:473)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at  
sun 
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 
39)
at  
sun 
.reflect 
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 
25)

at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)

at org.apache.hadoop.ipc.Client.call(Client.java:512)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
at org.apache.hadoop.dfs.$Proxy0.setSafeMode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at  
sun 
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 
39)
at  
sun 
.reflect 
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 
25)

at java.lang.reflect.Method.invoke(Method.java:585)
at  
org 
.apache 
.hadoop 
.io 
.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java: 
82)
at  
org 
.apache 
.hadoop 
.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)

at org.apache.hadoop.dfs.$Proxy0.setSafeMode(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.setSafeMode(DFSClient.java:486)
at  
org 
.apache 
.hadoop 
.dfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:257)

at org.apache.hadoop.hbase.HMaster.init(HMaster.java:893)
at org.apache.hadoop.hbase.HMaster.init(HMaster.java:859)
... 6 more

On May 9, 2008, at 11:34 AM, [EMAIL PROTECTED] wrote:

  

Hi Stack,


One question this raises is if the hbase:hbase user and group are  
being derived from the Linux file system user and group, or if they  
are the hdfs user and group?
  
HDFS currently does not manage user and group information.  User and  
group in HDFS are being derived from the underlying OS (Linux in  
your case) user and group.



Otherwise, how can we indicate that hbase user is in the hdfs  
group supergroup?
  
In Hadoop conf, the property dfs.permissions.supergroup specifies  
the super-user group and the default value is supergroup.  
Administrator should set this property to a dedicated group in the  
underlying OS for HDFS superuser.  For example, you could create a  
group hdfs-superuser in Linux, set dfs.permissions.supergroup to  
hdfs-superuser and add hdfs-superuser to hbase's group list.  
Then, hbase becomes a HDFS superuser.


I don't know why superuser privilege is required to run HMaster.  I  
might be able to tell if a complete stack track is given.


Nicholas



- Original Message 
From: stack [EMAIL PROTECTED]
To: [EMAIL

Re: How can I use counters in Hadoop

2008-04-15 Thread stack
https://issues.apache.org/jira/browse/HBASE-559 has an example. Ignore 
the HBase stuff.  Whats important is the ENUM at head of the MR job 
class, the calls to Reporter inside in tasks, and the properties file -- 
both how its named and that it ends up in the generated job jar.

St.Ack


CloudyEye wrote:

Hi, I am new newbie to Hadoop. I would be thankful if you help me.

I've read that I can use the Reporter class to increase counters. this way:
reporter.incrCounter(Enum args, long arg1);

How can I get the values of those counters ?

My aim is to count the total inputs to the mappers , then i want to override
the:
public void configure(JobConf job) {} method of the reducer to get the
counter value before reducing any key,values.

Regards,
  




Re: single node Hbase

2008-03-17 Thread stack
Try our 'getting started': 
http://hadoop.apache.org/hbase/docs/current/api/index.html.

St.Ack


Peter W. wrote:

Hello,

Are there any Hadoop documentation resources showing
how to run the current version of Hbase on a single node?

Thanks,

Peter W.




Re: if can not close the connection to HBase using HTable ...?

2008-03-14 Thread stack

There is no close on HTable because there are no 'resources' to release.

St.Ack
P.S. HBase has its own mailing lists.  See 
http://hadoop.apache.org/hbase//mailing_lists.html



ma qiang wrote:

Hi all,
 If I can not close the connection to HBase using HTable, after
 the object was set as null . Whether the resource of this connection
 will be released ?

 The code as below;

 public class MyMap extends MapReduceBase implements Mapper {
private HTable connection ;

public MyMap(){
   connection=new HTable(new HBaseConfiguration(),
new Text(HBaseTest));
}

public void map(...){
  ..
  connection=null;  // I couldn't use  connection.close;
}

 }
  




Re: Does HBase have a index?

2008-03-06 Thread stack

Bin:

FYI, there is now a hbase mailing list: See 
http://hadoop.apache.org/hbase/mailing_lists.html#Developers. Your 
questions (and Ma's on 'connection to HBase using HTable') would sit 
better there.


St.Ack


Bin YANG wrote:

Dear colleagues,

I have a questions on HBase's index implementation.

How does the HBase find the data according to a row key? Use a index
like database, or use a hash function?
I suppose that a hash function which hash row key to physical address
is more efficient.

As we know, a big table in HBase is stored as several Small tables,
each table stores attributes in a column family.
So that, each row may be stored in several small tables.
Does a hash function hash row key to many physical address? Each
physical address correspond to a small table which contains the row
key?

Does anybody have idea on how to create a index on other attribute?

Best,
Bin YANG
  




Re: MR with HBase

2008-02-05 Thread stack

Have you seen this page: http://wiki.apache.org/hadoop/Hbase/MapReduce?

Also, hbase has its own user list now: [EMAIL PROTECTED]  
HBase questions sit better over there.


Yours,
St.Ack

Peeyush Bishnoi wrote:

Hello all ,

Can any one tell me how map/reduce framework is used to process/read the
data from HBase Tables. Detailed Description is needed.

Thankyou ,

---
Peeyush 

  




Re: What should the open file limit be for hbase

2008-01-28 Thread stack

Hey Marc:

You are still seeing 'too many open files'?  Whats your schema look 
like.  I added to http://wiki.apache.org/hadoop/Hbase/FAQ#5 a rough 
formula for counting how many open mapfiles in a running regionserver.


Currently, your only recourse is upping the ulimit.   Addressing this 
scaling barrier will be a focus of next hbase release.


St.Ack



Marc Harris wrote:

I have seen that hbase can cause too many open file errors. I increase
my limit to 10240 (10 times the previous limit) but still get errors.

Is there a recommended value that I should set my open files limit to?
Is there something else I can do to reduce the number of files, perhaps
with some other trade-off?

Thanks
- Marc


  




Re: Backing up hbase (or maybe making a check-point)

2008-01-23 Thread stack

Marc Harris wrote:

Is it just a matter of
shutting down the hbase and hadoop servers, and then copying the dfs
and mapred folders somewhere else? And then moving them back into
place if I need to revert to that state. Or are there other files that
need to be copied too?

  
You need to shutdown hbase so it will dump whats in memory out to the 
filesystem.  Then it should just be a matter of copying the 
hbase.rootdir elsewhere.  If that doesn't work, its a bug.


St.Ack