Adding new disks to an Hadoop Cluster

2011-05-10 Thread Pete Haidinyak

Hi all,
   When you add a disk to a Hadoop data node do you have to bounce the  
node (restart mapreduce and dfs) before Hadoop can use the new disk?


Thanks

-Pete



Re: Adding new disks to an Hadoop Cluster

2011-05-10 Thread lohit
Yes, you have to bounce datanode so that it can start using the disk. Also
note that you have to tell datanode to use this disk via dfs.data.dir config
parameter in hdfs-site.xml. Same with tasktracker, if you want tasktracker
to use this disk for its temp output, you have to tell it via
mapred-site.xml

2011/5/9 Pete Haidinyak javam...@cox.net

 Hi all,
   When you add a disk to a Hadoop data node do you have to bounce the node
 (restart mapreduce and dfs) before Hadoop can use the new disk?

 Thanks

 -Pete




-- 
Have a Nice Day!
Lohit


Re: A question about client

2011-05-10 Thread Gaojinchao
Hbase version: 0.90.2 .
I merged patches:
HBASE-3773  Set ZK max connections much higher in 0.90
HBASE-3771  All jsp pages don't clean their HBA
HBASE-3783  hbase-0.90.2.jar exists in hbase root and in 'lib/'
HBASE-3756  Can't move META or ROOT from shell
HBASE-3744  createTable blocks until all regions are out of transition
HBASE-3712  HTable.close() doesn't shutdown thread pool
HBASE-3750  HTablePool.putTable() should call 
tableFactory.releaseHTableInterface() for discarded table
HBASE-3722  A lot of data is lost when name node crashed
HBASE-3800  If HMaster is started after NN without starting DN in Hbase 
090.2 then HMaster is not able to start due to AlreadyCreatedException for 
/hbase/hbase.version
HBASE-3749  Master can't exit when open port failed

-邮件原件-
发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年5月10日 1:17
收件人: user@hbase.apache.org
主题: Re: A question about client

TreeMap isn't concurrent and it seems it was used that way? I know you
guys are testing a bunch of different things at the same time so which
HBase version and which patches were you using when you got that?

Thx,

J-D

On Mon, May 9, 2011 at 5:22 AM, Gaojinchao gaojinc...@huawei.com wrote:
    I used ycsb to put data and threw exception.
    Who can give me some suggestion?

   Hbase Code:
      // Cut the cache so that we only get the part that could contain
      // regions that match our key
      SoftValueSortedMapbyte[], HRegionLocation matchingRegions =
        tableLocations.headMap(row);

      // if that portion of the map is empty, then we're done. otherwise,
      // we need to examine the cached location to verify that it is
      // a match by end key as well.
      if (!matchingRegions.isEmpty()) {
        HRegionLocation possibleRegion =
          matchingRegions.get(matchingRegions.lastKey());

    ycsb client log:

    [java] begin StatusThread run
     [java] java.util.NoSuchElementException
     [java]     at java.util.TreeMap.key(TreeMap.java:1206)
     [java]     at java.util.TreeMap$NavigableSubMap.lastKey(TreeMap.java:1435)
     [java]     at 
 org.apache.hadoop.hbase.util.SoftValueSortedMap.lastKey(SoftValueSortedMap.java:131)
     [java]     at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getCachedLocation(HConnectionManager.java:841)
     [java]     at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:664)
     [java]     at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
     [java]     at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1114)
     [java]     at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1234)
     [java]     at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
     [java]     at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:675)
     [java]     at org.apache.hadoop.hbase.client.HTable.put(HTable.java:665)
     [java]     at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source)
     [java]     at com.yahoo.ycsb.db.HBaseClient.insert(Unknown Source)
     [java]     at com.yahoo.ycsb.DBWrapper.insert(Unknown Source)
     [java]     at com.yahoo.ycsb.workloads.MyWorkload.doInsert(Unknown Source)
     [java]     at com.yahoo.ycsb.ClientThread.run(Unknown Source)



Re: Hmaster is OutOfMemory

2011-05-10 Thread Gaojinchao
If the cluster has 100K regions , restart cluster, Master will need a lot of 
memory.


-邮件原件-
发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack
发送时间: 2011年5月10日 13:58
收件人: user@hbase.apache.org
主题: Re: Hmaster is OutOfMemory

2011/5/9 Gaojinchao gaojinc...@huawei.com:
 Hbase version : 0.90.3RC0

 It happened when creating table with Regions
 I find master started needs so much memory when the cluster has 100K regions

Do you need to have 100k regions in the cluster Gao?  Or, you are just
testing how we do w/ 100k regions?


 It seems likes zkclientcnxn.

 It seems master assigned region need improve.


 top -c | grep 5834
 5834 root  20   0 8875m 7.9g  11m S2 50.5  33:53.19 
 /opt/jdk1.6.0_22/bin/java -Xmx8192m -ea -XX:+UseConcMarkSweepGC 
 -XX:+CMSIncrementalMode


You probably don't need CMSIncrementalMode if your hardware has = 4 CPUs.

Where do you see heap used in the below?  I just see stats on your
heap config. and a snapshot of what is currently in use.  Seems to be
5G of your 8G heap (~60%).   If you do a full GC, does this go down?

In 0.90.x, HBase Master keeps an 'image' of the cluster in HMaster
RAM.  I'd doubt this takes up 5G but I haven't measured it so perhaps
it could.  Is this a problem for you Gao?  You do have a 100k regions.

St.Ack

 Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize  = 8589934592 (8192.0MB)
   NewSize  = 21757952 (20.75MB)
   MaxNewSize   = 174456832 (166.375MB)
   OldSize  = 65404928 (62.375MB)
   NewRatio = 7
   SurvivorRatio= 8
   PermSize = 21757952 (20.75MB)
   MaxPermSize  = 88080384 (84.0MB)

 Heap Usage:
 New Generation (Eden + 1 Survivor Space):
   capacity = 100335616 (95.6875MB)
   used = 47094720 (44.91302490234375MB)
   free = 53240896 (50.77447509765625MB)
   46.93719127612671% used
 Eden Space:
   capacity = 89194496 (85.0625MB)
   used = 35953600 (34.28802490234375MB)
   free = 53240896 (50.77447509765625MB)
   40.30921369856723% used
 From Space:
   capacity = 11141120 (10.625MB)
   used = 11141120 (10.625MB)
   free = 0 (0.0MB)
   100.0% used
 To Space:
   capacity = 11141120 (10.625MB)
   used = 0 (0.0MB)
   free = 11141120 (10.625MB)
   0.0% used
 concurrent mark-sweep generation:
   capacity = 8415477760 (8025.625MB)
   used = 5107249280 (4870.6524658203125MB)
   free = 3308228480 (3154.9725341796875MB)
   60.68876213155128% used
 Perm Generation:
   capacity = 31199232 (29.75390625MB)
   used = 18681784 (17.81633758544922MB)
   free = 12517448 (11.937568664550781MB)
   59.87898676480241% used


 -邮件原件-
 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans
 发送时间: 2011年5月10日 1:20
 收件人: user@hbase.apache.org
 主题: Re: Hmaster is OutOfMemory

 It looks like the master entered a GC loop of death (since there are a
 lot of We slept 76166ms messages) and finally died. Was it splitting
 logs? Did you get a heap dump? Did you inspect it and can you tell
 what was using all that space?

 Thx,

 J-D

 2011/5/8 Gaojinchao gaojinc...@huawei.com:
 Hbase version 0.90.2:
 Hmaster has 8G memory,  It seems like not enough ? why it needs so much 
 memory?(50K region)

 Other issue. Log is error:
 see http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9 should be see 
 http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A8

 Hmaster logs:

 2011-05-06 19:31:09,924 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:2-0x12fc3a17c070022 Creating (or updating) unassigned node for 
 2f19f33ae3f21ac4cb681f1662767d0c with OFFLINE state
 2011-05-06 19:31:09,924 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 76166ms instead of 6ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9
 2011-05-06 19:31:09,924 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
 16697ms instead of 1000ms, this is likely due to a long garbage collecting 
 pause and it's usually bad, see 
 http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9
 2011-05-06 19:31:09,932 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=ufdr,211007,1304669377398.696f124cc6ff82302f735c8413c6ac0b. 
 state=CLOSED, ts=1304681364406
 2011-05-06 19:31:09,932 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:2-0x12fc3a17c070022 Creating (or updating) unassigned node for 
 696f124cc6ff82302f735c8413c6ac0b with OFFLINE state
 2011-05-06 19:31:22,942 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
 ufdr,071415,1304668656420.aa026fbb27a25b0fe54039c00108dad6. on 
 157-5-100-9,20020,1304678135900
 2011-05-06 19:31:22,942 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
 event for 7a75bac2028fba1529075225a3755c4c; deleting unassigned node
 2011-05-06 19:31:22,942 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 

HBase filtered scan problem

2011-05-10 Thread Stefan Comanita
Hi all, 

I want to do a scan on a number of rows, each row having multiple columns, and 
I want to filter out some of this columns based on their values per example, if 
I have the following rows:

plainRow:col:value1 column=T:19, timestamp=19, 
value= 
  
plainRow:col:value1 column=T:2, timestamp=2, 
value=U  
  
plainRow:col:value1 column=T:3, timestamp=3, 
value=U  
  
plainRow:col:value1 column=T:4, timestamp=4, value=

and

secondRow:col:value1 column=T:1, timestamp=1, 
value= 
  
secondRow:col:value1 column=T:2, timestamp=2, 
value=  
  
secondRow:col:value1 column=T:3, timestamp=3, 
value=U 
  
secondRow:col:value1 column=T:4, timestamp=4, value=


and I want to select all the rows but just with the columns that don't have the 
value U, something like:

plainRow:col:value1 column=T:19, timestamp=19, 
value= 
  
plainRow:col:value1 column=T:4, timestamp=4, value=
secondRow:col:value1 column=T:1, timestamp=1, 
value= 
  
secondRow:col:value1 column=T:2, timestamp=2, 
value=   
 secondRow:col:value1 column=T:4, timestamp=4, value=

and to achieve this, i try the following:

Scan scan = new Scan();
    
scan.setStartRow(stringToBytes(rowIdentifier));
scan.setStopRow(stringToBytes(rowIdentifier + Constants.MAX_CHAR));
scan.addFamily(Constants.TERM_VECT_COLUMN_FAMILY);

if(includeFilter) {
    Filter filter = new ValueFilter(CompareOp.EQUAL, 
    new BinaryComparator(stringToBytes(U)));    
    scan.setFilter(filter);
}

and if i execute this scan I get the rows with the columns having the value 
U, which is correct, but when i set CompareOp.NOT_EQUAL and i expect to get 
the other columns it doesnt work the way i want, it give me back all the rows, 
including the one which have the value U, the same happens when i use: 
Filter filter = new ValueFilter(CompareOp.EQUAL, new 
BinaryComparator(stringToBytes())); 

I mention that the columns have the values U and  (empty string), and that 
i also saw the same behaivior with the RegexComparator and SubstringComparator.

Any idea would be very much appreciated, sorry for the long mail, thank you.

Stefan Comanita

Mapping Object-HBase data Framework!

2011-05-10 Thread Kobla Gbenyo

Hello,

I am new at this list and I start testing HBase. I download and install 
HBase successfully and now I am looking for a framework which can help 
me performing CRUD operations (create, read, update and delete). Through 
my research, I found JDO but I do not find more support on it. There are 
any other frameworks to perform my CRUD operations or are there more 
supports on JDO for HBASE? (for information, I am using maven for building)


Cheers,

--
Kobla.




Re: Error of Got error in response to OP_READ_BLOCK for file

2011-05-10 Thread Jean-Daniel Cryans
Data cannot be corrupted at all, since the files in HDFS are immutable
and CRC'ed (unless you are able to lose all 3 copies of every block).

Corruption would happen at the metadata level, whereas the .META.
table which contains the regions for the tables would lose rows. This
is a likely scenario if the region server holding that region dies of
GC since the hadoop version you are using along hbase 0.20.6 doesn't
support appends, meaning that the write-ahead log would be missing
data that, obviously, cannot be replayed.

The best advice I can give you is to upgrade.

J-D

On Tue, May 10, 2011 at 5:44 AM, Stanley Xu wenhao...@gmail.com wrote:
 Thanks J-D. A little more confused that is it looks when we have a corrupt
 hbase table or some inconsistency data, we will got lots of message like
 that. But if the hbase table is proper, we will also get some lines of
 messages like that.

 How could I identify if it comes from a corruption in data or just some
 mis-hit in the scenario you mentioned?



 On Tue, May 10, 2011 at 6:23 AM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 Very often the cannot open filename happens when the region in
 question was reopened somewhere else and that region was compacted. As
 to why it was reassigned, most of the time it's because of garbage
 collections taking too long. The master log should have all the
 required evidence, and the region server should print some slept for
 Xms (where X is some number of ms) messages before everything goes
 bad.

 Here are some general tips on debugging problems in HBase
 http://hbase.apache.org/book/trouble.html

 J-D

 On Sat, May 7, 2011 at 2:10 AM, Stanley Xu wenhao...@gmail.com wrote:
  Dear all,
 
  We were using HBase 0.20.6 in our environment, and it is pretty stable in
  the last couple of month, but we met some reliability issue from last
 week.
  Our situation is very like the following link.
 
 http://search-hadoop.com/m/UJW6Efw4UW/Got+error+in+response+to+OP_READ_BLOCK+for+filesubj=HBase+fail+over+reliability+issues
 
  When we use a hbase client to connect to the hbase table, it looks stuck
  there. And we can find the logs like
 
  WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /
  10.24.166.74:50010 for *file*
 /hbase/users/73382377/data/312780071564432169
  for block -4841840178880951849:java.io.IOException: *Got* *error* in *
  response* to
  OP_READ_BLOCK for *file* /hbase/users/73382377/data/312780071564432169
 for
  block -4841840178880951849
 
  INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 40 on 60020,
 call
  get([B@25f907b4, row=963aba6c5f351f5655abdc9db82a4cbd, maxVersions=1,
  timeRange=[0,9223372036854775807), families={(family=data, columns=ALL})
  from 10.24.117.100:2365: *error*: java.io.IOException: Cannot open
 filename
  /hbase/users/73382377/data/312780071564432169
  java.io.IOException: Cannot open filename
  /hbase/users/73382377/data/312780071564432169
 
 
  WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(
  10.24.166.74:50010,
 storageID=DS-14401423-10.24.166.74-50010-1270741415211,
  infoPort=50075, ipcPort=50020):
  *Got* exception while serving blk_-4841840178880951849_50277 to /
  10.25.119.113
  :
  java.io.IOException: Block blk_-4841840178880951849_50277 is not valid.
 
  in the server side.
 
  And if we do a flush and then a major compaction on the .META., the
  problem just went away, but will appear again some time later.
 
  At first we guess it might be the problem of xceiver. So we set the
 xceiver
  to 4096 as the link here.
  http://ccgtech.blogspot.com/2010/02/hadoop-hdfs-deceived-by-xciever.html
 
  But we still get the same problem. It looks that a restart of the whole
  HBase cluster will fix the problem for a while, but actually we could not
  say always trying to restart the server.
 
  I am waiting online, will really appreciate any message.
 
 
  Best wishes,
  Stanley Xu
 




Re: Mapping Object-HBase data Framework!

2011-05-10 Thread Jean-Daniel Cryans
Most users I know rolled out their own since it doesn't require a very
big layer on top of HBase (since it's all simple queries) and it's
tailored to their own environment.

For JDO there's DataNucleus that supports HBase.

J-D

On Tue, May 10, 2011 at 2:34 AM, Kobla Gbenyo ko...@riastudio.fr wrote:
 Hello,

 I am new at this list and I start testing HBase. I download and install
 HBase successfully and now I am looking for a framework which can help me
 performing CRUD operations (create, read, update and delete). Through my
 research, I found JDO but I do not find more support on it. There are any
 other frameworks to perform my CRUD operations or are there more supports on
 JDO for HBASE? (for information, I am using maven for building)

 Cheers,

 --
 Kobla.





Re: A question about client

2011-05-10 Thread Jean-Daniel Cryans
Are you running a modified YCSB by any chance? Because last time I
looked at that code it didn't share the HTables between threads and it
looks like it's doing something like that.

Looking more deeper at the code, the NoSuchElementException is thrown
because the map is empty. This is what that code looks like:

  if (!matchingRegions.isEmpty()) {
HRegionLocation possibleRegion =
  matchingRegions.get(matchingRegions.lastKey());

So to me it seems that the only way you would get this exception is if
someone emptied the map between the isEmpty call and lastKey which
shouldn't happen if HTables aren't shared.

The only other way it seems it could happen, and it's a stretch, is
that since the regions are kept in a SoftValueSortedMap then the GC
would have removed the elements you needed exactly between those two
lines...  Is it easy for you to recreate the issue?

Thx a bunch,

J-D

On Mon, May 9, 2011 at 11:34 PM, Gaojinchao gaojinc...@huawei.com wrote:
 Hbase version: 0.90.2 .
 I merged patches:
 HBASE-3773  Set ZK max connections much higher in 0.90
 HBASE-3771  All jsp pages don't clean their HBA
 HBASE-3783  hbase-0.90.2.jar exists in hbase root and in 'lib/'
 HBASE-3756  Can't move META or ROOT from shell
 HBASE-3744  createTable blocks until all regions are out of transition
 HBASE-3712  HTable.close() doesn't shutdown thread pool
 HBASE-3750  HTablePool.putTable() should call 
 tableFactory.releaseHTableInterface() for discarded table
 HBASE-3722  A lot of data is lost when name node crashed
 HBASE-3800  If HMaster is started after NN without starting DN in Hbase 
 090.2 then HMaster is not able to start due to AlreadyCreatedException for 
 /hbase/hbase.version
 HBASE-3749  Master can't exit when open port failed

 -邮件原件-
 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans
 发送时间: 2011年5月10日 1:17
 收件人: user@hbase.apache.org
 主题: Re: A question about client

 TreeMap isn't concurrent and it seems it was used that way? I know you
 guys are testing a bunch of different things at the same time so which
 HBase version and which patches were you using when you got that?

 Thx,

 J-D

 On Mon, May 9, 2011 at 5:22 AM, Gaojinchao gaojinc...@huawei.com wrote:
I used ycsb to put data and threw exception.
Who can give me some suggestion?

   Hbase Code:
  // Cut the cache so that we only get the part that could contain
  // regions that match our key
  SoftValueSortedMapbyte[], HRegionLocation matchingRegions =
tableLocations.headMap(row);

  // if that portion of the map is empty, then we're done. otherwise,
  // we need to examine the cached location to verify that it is
  // a match by end key as well.
  if (!matchingRegions.isEmpty()) {
HRegionLocation possibleRegion =
  matchingRegions.get(matchingRegions.lastKey());

ycsb client log:

[java] begin StatusThread run
 [java] java.util.NoSuchElementException
 [java] at java.util.TreeMap.key(TreeMap.java:1206)
 [java] at 
 java.util.TreeMap$NavigableSubMap.lastKey(TreeMap.java:1435)
 [java] at 
 org.apache.hadoop.hbase.util.SoftValueSortedMap.lastKey(SoftValueSortedMap.java:131)
 [java] at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getCachedLocation(HConnectionManager.java:841)
 [java] at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:664)
 [java] at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
 [java] at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1114)
 [java] at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1234)
 [java] at 
 org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
 [java] at 
 org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:675)
 [java] at org.apache.hadoop.hbase.client.HTable.put(HTable.java:665)
 [java] at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source)
 [java] at com.yahoo.ycsb.db.HBaseClient.insert(Unknown Source)
 [java] at com.yahoo.ycsb.DBWrapper.insert(Unknown Source)
 [java] at com.yahoo.ycsb.workloads.MyWorkload.doInsert(Unknown 
 Source)
 [java] at com.yahoo.ycsb.ClientThread.run(Unknown Source)




Re: Error of Got error in response to OP_READ_BLOCK for file

2011-05-10 Thread Stanley Xu
Thanks J-D. We are using Hadoop 0.20.2 with quite a couple of patches. Could
you please tell me which patches does the WAL required? Do we need all the
patches in the branch-0.20-append? We just patched the patch that add the
support for the append function I thought.

Thanks.

On Wed, May 11, 2011 at 12:50 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 Data cannot be corrupted at all, since the files in HDFS are immutable
 and CRC'ed (unless you are able to lose all 3 copies of every block).

 Corruption would happen at the metadata level, whereas the .META.
 table which contains the regions for the tables would lose rows. This
 is a likely scenario if the region server holding that region dies of
 GC since the hadoop version you are using along hbase 0.20.6 doesn't
 support appends, meaning that the write-ahead log would be missing
 data that, obviously, cannot be replayed.

 The best advice I can give you is to upgrade.

 J-D

 On Tue, May 10, 2011 at 5:44 AM, Stanley Xu wenhao...@gmail.com wrote:
  Thanks J-D. A little more confused that is it looks when we have a
 corrupt
  hbase table or some inconsistency data, we will got lots of message like
  that. But if the hbase table is proper, we will also get some lines of
  messages like that.
 
  How could I identify if it comes from a corruption in data or just some
  mis-hit in the scenario you mentioned?
 
 
 
  On Tue, May 10, 2011 at 6:23 AM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  Very often the cannot open filename happens when the region in
  question was reopened somewhere else and that region was compacted. As
  to why it was reassigned, most of the time it's because of garbage
  collections taking too long. The master log should have all the
  required evidence, and the region server should print some slept for
  Xms (where X is some number of ms) messages before everything goes
  bad.
 
  Here are some general tips on debugging problems in HBase
  http://hbase.apache.org/book/trouble.html
 
  J-D
 
  On Sat, May 7, 2011 at 2:10 AM, Stanley Xu wenhao...@gmail.com wrote:
   Dear all,
  
   We were using HBase 0.20.6 in our environment, and it is pretty stable
 in
   the last couple of month, but we met some reliability issue from last
  week.
   Our situation is very like the following link.
  
 
 http://search-hadoop.com/m/UJW6Efw4UW/Got+error+in+response+to+OP_READ_BLOCK+for+filesubj=HBase+fail+over+reliability+issues
  
   When we use a hbase client to connect to the hbase table, it looks
 stuck
   there. And we can find the logs like
  
   WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /
   10.24.166.74:50010 for *file*
  /hbase/users/73382377/data/312780071564432169
   for block -4841840178880951849:java.io.IOException: *Got* *error* in *
   response* to
   OP_READ_BLOCK for *file* /hbase/users/73382377/data/312780071564432169
  for
   block -4841840178880951849
  
   INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 40 on
 60020,
  call
   get([B@25f907b4, row=963aba6c5f351f5655abdc9db82a4cbd, maxVersions=1,
   timeRange=[0,9223372036854775807), families={(family=data,
 columns=ALL})
   from 10.24.117.100:2365: *error*: java.io.IOException: Cannot open
  filename
   /hbase/users/73382377/data/312780071564432169
   java.io.IOException: Cannot open filename
   /hbase/users/73382377/data/312780071564432169
  
  
   WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
  DatanodeRegistration(
   10.24.166.74:50010,
  storageID=DS-14401423-10.24.166.74-50010-1270741415211,
   infoPort=50075, ipcPort=50020):
   *Got* exception while serving blk_-4841840178880951849_50277 to /
   10.25.119.113
   :
   java.io.IOException: Block blk_-4841840178880951849_50277 is not
 valid.
  
   in the server side.
  
   And if we do a flush and then a major compaction on the .META., the
   problem just went away, but will appear again some time later.
  
   At first we guess it might be the problem of xceiver. So we set the
  xceiver
   to 4096 as the link here.
  
 http://ccgtech.blogspot.com/2010/02/hadoop-hdfs-deceived-by-xciever.html
  
   But we still get the same problem. It looks that a restart of the
 whole
   HBase cluster will fix the problem for a while, but actually we could
 not
   say always trying to restart the server.
  
   I am waiting online, will really appreciate any message.
  
  
   Best wishes,
   Stanley Xu