[jira] [Created] (HBASE-3907) make it easier to add per-CF metrics; add some key per-CF metrics to start with

2011-05-20 Thread Kannan Muthukkaruppan (JIRA)
make it easier to add per-CF metrics; add some key per-CF metrics to start with
---

 Key: HBASE-3907
 URL: https://issues.apache.org/jira/browse/HBASE-3907
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan


Add plumbing need to add various types of per ColumnFamily metrics. And to 
start with add a bunch per-CF metrics such as:

1) Blocks read, cache hit, avg time of read for a column family.
2) Similar stats for compaction related reads.
3) Stats for meta block reads per CF
4) Bloom Filter stats per CF
etc.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3907) make it easier to add per-CF metrics; add some key per-CF metrics to start with

2011-05-20 Thread Kannan Muthukkaruppan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-3907:
-

Description: 
Add plumbing needed to add various types of per ColumnFamily metrics. And to 
start with add a bunch per-CF metrics such as:

1) Blocks read, cache hit, avg time of read for a column family.
2) Similar stats for compaction related reads.
3) Stats for meta block reads per CF
4) Bloom Filter stats per CF
etc.


  was:
Add plumbing need to add various types of per ColumnFamily metrics. And to 
start with add a bunch per-CF metrics such as:

1) Blocks read, cache hit, avg time of read for a column family.
2) Similar stats for compaction related reads.
3) Stats for meta block reads per CF
4) Bloom Filter stats per CF
etc.



 make it easier to add per-CF metrics; add some key per-CF metrics to start 
 with
 ---

 Key: HBASE-3907
 URL: https://issues.apache.org/jira/browse/HBASE-3907
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 Add plumbing needed to add various types of per ColumnFamily metrics. And to 
 start with add a bunch per-CF metrics such as:
 1) Blocks read, cache hit, avg time of read for a column family.
 2) Similar stats for compaction related reads.
 3) Stats for meta block reads per CF
 4) Bloom Filter stats per CF
 etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.

2011-05-20 Thread jian zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian zhang updated HBASE-3906:
--

Affects Version/s: 0.90.3

 When HMaster is running,there are a lot of RegionLoad instances(far greater 
 than the regions),it has risk of OOME.
 --

 Key: HBASE-3906
 URL: https://issues.apache.org/jira/browse/HBASE-3906
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.2, 0.90.3
 Environment: 1 hmaster,4 regionserver,about 100,000 regions.
Reporter: jian zhang
 Fix For: 0.90.4

   Original Estimate: 168h
  Remaining Estimate: 168h

 1、Start hbase cluster;
 2、After hmaster finish regions assignement,use jmap to dump the memory of 
 hmaster;
 3、Use MAT to analyse the dump file,there are too many RegionLoad 
 instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.

2011-05-20 Thread jian zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian zhang updated HBASE-3906:
--

Attachment: HBASE-3906.patch

 When HMaster is running,there are a lot of RegionLoad instances(far greater 
 than the regions),it has risk of OOME.
 --

 Key: HBASE-3906
 URL: https://issues.apache.org/jira/browse/HBASE-3906
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.2, 0.90.3
 Environment: 1 hmaster,4 regionserver,about 100,000 regions.
Reporter: jian zhang
 Fix For: 0.90.4

 Attachments: HBASE-3906.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 1、Start hbase cluster;
 2、After hmaster finish regions assignement,use jmap to dump the memory of 
 hmaster;
 3、Use MAT to analyse the dump file,there are too many RegionLoad 
 instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3892) Table can't disable

2011-05-20 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-3892:
--

Attachment: AssignmentManager_90.patch

 Table can't disable
 ---

 Key: HBASE-3892
 URL: https://issues.apache.org/jira/browse/HBASE-3892
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: gaojinchao
 Fix For: 0.90.4

 Attachments: AssignmentManager_90.patch, Hmaster_0.90.patch


 In TimeoutMonitor : 
 if node exists and node state is RS_ZK_REGION_CLOSED
 We should send a zk message again when close region is timeout.
 in this case, It may be loss some message.
 I See. It seems like a bug. This is my analysis.
 // disable table and master sent Close message to region server, Region state 
 was set PENDING_CLOSE
 2011-05-08 17:44:25,745 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, 
 usedHeap=4097, maxHeap=8175) for region 
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 // received splitting message and cleared Region state (PENDING_CLOSE)
 2011-05-08 17:46:45,303 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 
 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, 
 load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 

[jira] [Updated] (HBASE-3892) Table can't disable

2011-05-20 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-3892:
--

Attachment: (was: Hmaster_0.90.patch)

 Table can't disable
 ---

 Key: HBASE-3892
 URL: https://issues.apache.org/jira/browse/HBASE-3892
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: gaojinchao
 Fix For: 0.90.4

 Attachments: AssignmentManager_90.patch


 In TimeoutMonitor : 
 if node exists and node state is RS_ZK_REGION_CLOSED
 We should send a zk message again when close region is timeout.
 in this case, It may be loss some message.
 I See. It seems like a bug. This is my analysis.
 // disable table and master sent Close message to region server, Region state 
 was set PENDING_CLOSE
 2011-05-08 17:44:25,745 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, 
 usedHeap=4097, maxHeap=8175) for region 
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 // received splitting message and cleared Region state (PENDING_CLOSE)
 2011-05-08 17:46:45,303 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 
 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, 
 load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  

[jira] [Commented] (HBASE-3892) Table can't disable

2011-05-20 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036724#comment-13036724
 ] 

gaojinchao commented on HBASE-3892:
---

I am not familiar with zk api and have learned it now. I make a patch again. 
I want to use api setData(ZooKeeperWatcher zkw, String znode,
  byte [] data). It seems dangerous for parallel operation. 
I want to verify more carefully in next week.

 Table can't disable
 ---

 Key: HBASE-3892
 URL: https://issues.apache.org/jira/browse/HBASE-3892
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: gaojinchao
 Fix For: 0.90.4

 Attachments: AssignmentManager_90.patch


 In TimeoutMonitor : 
 if node exists and node state is RS_ZK_REGION_CLOSED
 We should send a zk message again when close region is timeout.
 in this case, It may be loss some message.
 I See. It seems like a bug. This is my analysis.
 // disable table and master sent Close message to region server, Region state 
 was set PENDING_CLOSE
 2011-05-08 17:44:25,745 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, 
 usedHeap=4097, maxHeap=8175) for region 
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 // received splitting message and cleared Region state (PENDING_CLOSE)
 2011-05-08 17:46:45,303 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 
 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, 
 load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 

[jira] [Commented] (HBASE-3903) A successful write to client write-buffer may be lost or not visible

2011-05-20 Thread Tallat (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036749#comment-13036749
 ] 

Tallat commented on HBASE-3903:
---

+1 on the patch, but I would suggest a couple of other things:

1) We can mention the same thing in section 10.1.2. WriteBuffer and Batch 
Methods for clarity, in a href=book.html#clientclient architecture/a.

2) IMHO, the documentation at http://hbase.apache.org/acid-semantics.html has 
some weak points that need clarification, for example:

  (a) Visibility: quote When a client receives a success response for any 
mutation, that mutation is immediately visible to both that client and any 
client with whom it later communicates through side channels./quote

  Here, what is a side channel exactly? 

  (b) Durability: quote All reasonable failure scenarios will not affect any 
of the guarantees of this document./quote 

Here, what is a reasonable failure scenario?

Thanks.

 A successful write to client write-buffer may be lost or not visible
 

 Key: HBASE-3903
 URL: https://issues.apache.org/jira/browse/HBASE-3903
 Project: HBase
  Issue Type: Bug
  Components: documentation
 Environment: Any.
Reporter: Tallat
Assignee: Doug Meil
Priority: Minor
  Labels: documentation
 Attachments: acid-semantics_HBASE_3903.xml.patch


 A client can do a write to a client side 'write buffer' if enabled via 
 hTable.setAutoFlush(false). Now, assume a client puts value v under key k. 
 Two wrongs things can happen, violating the ACID semantics  of Hbase given 
 at: http://hbase.apache.org/acid-semantics.html
 1) Say the client fails immediately after the put succeeds. In this case, the 
 put will be lost, violating the durability property:
 quote Any operation that returns a success code (eg does not throw an 
 exception) will be made durable. /quote
  
 2) Say the client issues a read for k immediately after writing k. The put 
 will be stored in the client side write buffer, while the read will go to the 
 region server, returning an older value, instead of v, violating the 
 visibility property:
 quote
 When a client receives a success response for any mutation, that mutation
 is immediately visible to both that client and any client with whom it later
 communicates through side channels.
 /quote
 Thanks,
 Tallat

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3903) A successful write to client write-buffer may be lost or not visible

2011-05-20 Thread Doug Meil (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036826#comment-13036826
 ] 

Doug Meil commented on HBASE-3903:
--

I'll add a reference to acid-semantics in the client writebuffer section.

I think the other questions should be split off in a different ticket.


 A successful write to client write-buffer may be lost or not visible
 

 Key: HBASE-3903
 URL: https://issues.apache.org/jira/browse/HBASE-3903
 Project: HBase
  Issue Type: Bug
  Components: documentation
 Environment: Any.
Reporter: Tallat
Assignee: Doug Meil
Priority: Minor
  Labels: documentation
 Attachments: acid-semantics_HBASE_3903.xml.patch


 A client can do a write to a client side 'write buffer' if enabled via 
 hTable.setAutoFlush(false). Now, assume a client puts value v under key k. 
 Two wrongs things can happen, violating the ACID semantics  of Hbase given 
 at: http://hbase.apache.org/acid-semantics.html
 1) Say the client fails immediately after the put succeeds. In this case, the 
 put will be lost, violating the durability property:
 quote Any operation that returns a success code (eg does not throw an 
 exception) will be made durable. /quote
  
 2) Say the client issues a read for k immediately after writing k. The put 
 will be stored in the client side write buffer, while the read will go to the 
 region server, returning an older value, instead of v, violating the 
 visibility property:
 quote
 When a client receives a success response for any mutation, that mutation
 is immediately visible to both that client and any client with whom it later
 communicates through side channels.
 /quote
 Thanks,
 Tallat

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3908) TableSplit not implementing hashCode problem

2011-05-20 Thread Daniel Iancu (JIRA)
TableSplit not implementing hashCode problem
--

 Key: HBASE-3908
 URL: https://issues.apache.org/jira/browse/HBASE-3908
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.1
Reporter: Daniel Iancu



reported by Lucian Iordache on hbase-user mail list. will attach the patch asap
---

Hi guys,

I've just found a problem with the class TableSplit. It implements equals,
but it does not implement hashCode also, as it should have.
I've discovered it by trying to use a HashSet of TableSplit's, and I've
noticed that some duplicate splits are added to the set.

The only option I have for now is to extend TableSplit and to use the
subclass.
I use cloudera hbase cdh3u0 version.

Do you know about this problem? Should I open a Jira issue for that, or it
already exists?

Thanks,
Lucian


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3883) book.xml / added something in schema design and FAQ about not being able to change rowkeys

2011-05-20 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3883:
-

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to TRUNK.  Thanks for the patch Doug.\\

 book.xml / added something in schema design and FAQ about not being able to 
 change rowkeys
 --

 Key: HBASE-3883
 URL: https://issues.apache.org/jira/browse/HBASE-3883
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Fix For: 0.92.0

 Attachments: book_HBASE_3883.xml.patch


 This question has come up enough times in the dist-list to warrant inclusion 
 in the book.
 Added small entry in schema design and in FAQ (referencing schema design).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036948#comment-13036948
 ] 

stack commented on HBASE-3904:
--

@Ted What issue are you trying to fix?  Thanks.

 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor

 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036947#comment-13036947
 ] 

stack commented on HBASE-3904:
--

bq. From what I read in the isTableAvailable function, the Metascanvisitor 
ensures that if there is at least one region not assigned, then the function 
will return false.

That and at least one region must be assigned (where 'assigned' is a non-null 
server column which is a far from definitive test of assignedness).

bq. This isn't enough since the createTable function in master assigns one 
region after another. (Refer to HMAster.createTable(final HRegionInfo [] 
newRegions, boolean sync))

Yes, it adds regions one at at time to .META. but then uses the bulk assign 
engine (this was a recent addition by Ted -- do you have this?)

bq. Hence there might be a case when all regions are indeed fully assigned in 
META but it is just that the master is yet to populate META with the rest of 
the regions.

Is this so?  We add the regions to .META. before we assign.  On add to .META. 
they will have an empty server field so isTableAssigned should be returning 
false. 

I wonder if this check inside in HBaseAdmin#isTableAssigned is 'off':

{code}
  if (value == null) {
available.set(false);
return false;
  }
{code}

Maybe the value is 'empty', zero-length byte array.  We should check for that?  
Perhaps this is why you got ...inconsistent responses from isTableAvailable.

bq. Therefor for isTableAvailable to work correctly with 
createTable(splitkeys), the master will have to populate all the regions in 
meta first before assigning them.

Unless I'm reading it wrong, this is what it *is* doing.   Something else is up 
(maybe the above check?).





 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor

 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036959#comment-13036959
 ] 

Ted Yu commented on HBASE-3904:
---

My proposal is based on the observation that Vidhyashankar (and other users) 
used a loop to check for table availability.
This is equivalent to calling the newly introduced createTableSync() method 
where there is no need to write the loop above.

bq. Hence there might be a case when all regions are indeed fully assigned in 
META but it is just that the master is yet to populate META with the rest of 
the regions.

What Vidhyashankar meant was that the existing entries for the table in .META. 
carried server information, but there were more regions to be assigned by 
Master which weren't in .META. yet.

 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor

 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036968#comment-13036968
 ] 

Ted Yu commented on HBASE-3904:
---

Looking at MetaEditor.addRegionToMeta() which is called by 
HMaster.createTable():
{code}
  public static void addRegionToMeta(CatalogTracker catalogTracker,
  HRegionInfo regionInfo)
  throws IOException {
Put put = new Put(regionInfo.getRegionName());
put.add(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER,
Writables.getBytes(regionInfo));
{code}
Server info was initially omitted.

 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor

 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2938) Add Thread-Local Behavior To HTable Pool

2011-05-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036999#comment-13036999
 ] 

stack commented on HBASE-2938:
--

@Karthick Does TestMasterObserver fail for you?  It fails w/ your patch in 
place.  Can you take a look?  Otherwise all tests pass (Except currently 
distributed splitting but thats not your patch).

 Add Thread-Local Behavior To HTable Pool
 

 Key: HBASE-2938
 URL: https://issues.apache.org/jira/browse/HBASE-2938
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
 Attachments: HBASE-2938-V2.patch, HBASE-2938.patch


   It is a well-documented fact that the HBase table client (viz., HTable) is 
 not thread-safe. Hence, the recommendation has been to use a HTablePool or a 
 ThreadLocal to manage access to tables. The downside of the latter is that it 
 (a) requires the user to reinvent the wheel in terms of mapping table names 
 to tables and (b) forces the user to maintain the thread-local objects. 
 Ideally, it would be nice if we could make the HTablePool handle thread-local 
 objects as well. That way, it not only becomes the one stop shop for all 
 client-side tables, but also insulates the user from the ThreadLocal object.
   
   Here, we propose a way to generalize the HTablePool so that the underlying 
 pool type is either reusable or thread-local. To make this possible, we 
 introdudce the concept of a SharedMap, which essentially, maps a key to a 
 collection of values, the elements of which are managed by a pool. In effect, 
 that collection acts as a shared pool of resources, access to which is 
 closely controlled as dictated by the particular semantics of the pool.
  Furthermore, to simplify the construction of HTablePools, we added a couple 
 of parameters (viz. hbase.client.htable.pool.type and 
 hbase.client.hbase.pool.size) to control the default behavior of a 
 HTablePool.
   
   In case the size of the pool is set to a non-zero positive number, that is 
 used to cap the number of resources that a pool may contain for any given 
 key. A size of Integer#MAX_VALUE is interpreted to mean an unbounded pool.

Currently, the SharedMap supports the following types of pools:
* A ThreadLocalPool, which represents a pool that builds on the 
 ThreadLocal class. It essentially binds the resource to the thread from which 
 it is accessed.
* A ReusablePool, which represents a pool that builds on the LinkedList 
 class. It essentially allows resources to be checked out, at which point it 
 is (temporarily) removed from the pool. When the resource is no longer 
 required, it should be returned to the pool in order to be reused.
* A RoundRobinPool, which represents a pool that stores its resources in 
 an ArrayList. It load-balances access to its resources by returning a 
 different resource every time a given key is looked up.
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.

2011-05-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037000#comment-13037000
 ] 

Ted Yu commented on HBASE-3906:
---

The patch wouldn't apply to trunk where heart beat has been removed.

 When HMaster is running,there are a lot of RegionLoad instances(far greater 
 than the regions),it has risk of OOME.
 --

 Key: HBASE-3906
 URL: https://issues.apache.org/jira/browse/HBASE-3906
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.2, 0.90.3
 Environment: 1 hmaster,4 regionserver,about 100,000 regions.
Reporter: jian zhang
 Fix For: 0.90.4

 Attachments: HBASE-3906.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 1、Start hbase cluster;
 2、After hmaster finish regions assignement,use jmap to dump the memory of 
 hmaster;
 3、Use MAT to analyse the dump file,there are too many RegionLoad 
 instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.

2011-05-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037011#comment-13037011
 ] 

stack commented on HBASE-3906:
--

@Ted I think the patch is for branch only. It has the problem.  I don't believe 
TRUNK does.

@Jian This should work though its ugly; i.e. refreshing an HServerInfo instance 
(Do we need to keep load in the Map of regions?  What about clearing the load 
from the HSI we add to the Map of regions to HSI?  Would that work?  Or is this 
Map used balancing?).  Does your patch work for you?  Any issues w/ the new 
synchronize blocks?

 When HMaster is running,there are a lot of RegionLoad instances(far greater 
 than the regions),it has risk of OOME.
 --

 Key: HBASE-3906
 URL: https://issues.apache.org/jira/browse/HBASE-3906
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.2, 0.90.3
 Environment: 1 hmaster,4 regionserver,about 100,000 regions.
Reporter: jian zhang
 Fix For: 0.90.4

 Attachments: HBASE-3906.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 1、Start hbase cluster;
 2、After hmaster finish regions assignement,use jmap to dump the memory of 
 hmaster;
 3、Use MAT to analyse the dump file,there are too many RegionLoad 
 instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2938) Add Thread-Local Behavior To HTable Pool

2011-05-20 Thread Karthick Sankarachary (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037022#comment-13037022
 ] 

Karthick Sankarachary commented on HBASE-2938:
--

Yes, that test does pass for me (this is after rebasing):

{code}
Running org.apache.hadoop.hbase.coprocessor.TestMasterObserver
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.47 sec
{code}

Can you attach your target/surefire-reports/*TestMasterObserver*.txt files?

 Add Thread-Local Behavior To HTable Pool
 

 Key: HBASE-2938
 URL: https://issues.apache.org/jira/browse/HBASE-2938
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
 Attachments: HBASE-2938-V2.patch, HBASE-2938.patch


   It is a well-documented fact that the HBase table client (viz., HTable) is 
 not thread-safe. Hence, the recommendation has been to use a HTablePool or a 
 ThreadLocal to manage access to tables. The downside of the latter is that it 
 (a) requires the user to reinvent the wheel in terms of mapping table names 
 to tables and (b) forces the user to maintain the thread-local objects. 
 Ideally, it would be nice if we could make the HTablePool handle thread-local 
 objects as well. That way, it not only becomes the one stop shop for all 
 client-side tables, but also insulates the user from the ThreadLocal object.
   
   Here, we propose a way to generalize the HTablePool so that the underlying 
 pool type is either reusable or thread-local. To make this possible, we 
 introdudce the concept of a SharedMap, which essentially, maps a key to a 
 collection of values, the elements of which are managed by a pool. In effect, 
 that collection acts as a shared pool of resources, access to which is 
 closely controlled as dictated by the particular semantics of the pool.
  Furthermore, to simplify the construction of HTablePools, we added a couple 
 of parameters (viz. hbase.client.htable.pool.type and 
 hbase.client.hbase.pool.size) to control the default behavior of a 
 HTablePool.
   
   In case the size of the pool is set to a non-zero positive number, that is 
 used to cap the number of resources that a pool may contain for any given 
 key. A size of Integer#MAX_VALUE is interpreted to mean an unbounded pool.

Currently, the SharedMap supports the following types of pools:
* A ThreadLocalPool, which represents a pool that builds on the 
 ThreadLocal class. It essentially binds the resource to the thread from which 
 it is accessed.
* A ReusablePool, which represents a pool that builds on the LinkedList 
 class. It essentially allows resources to be checked out, at which point it 
 is (temporarily) removed from the pool. When the resource is no longer 
 required, it should be returned to the pool in order to be reused.
* A RoundRobinPool, which represents a pool that stores its resources in 
 an ArrayList. It load-balances access to its resources by returning a 
 different resource every time a given key is looked up.
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (HBASE-3909) Add dynamic config

2011-05-20 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack moved HADOOP-7315 to HBASE-3909:
--

Issue Type: Bug  (was: Improvement)
   Key: HBASE-3909  (was: HADOOP-7315)
   Project: HBase  (was: Hadoop Common)

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack

 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3910) acid-semantics.html - clarify some of the concepts

2011-05-20 Thread Doug Meil (JIRA)
acid-semantics.html - clarify some of the concepts
--

 Key: HBASE-3910
 URL: https://issues.apache.org/jira/browse/HBASE-3910
 Project: HBase
  Issue Type: Bug
  Components: documentation
 Environment: Any.
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor


A client can do a write to a client side 'write buffer' if enabled via 
hTable.setAutoFlush(false). Now, assume a client puts value v under key k. Two 
wrongs things can happen, violating the ACID semantics  of Hbase given at: 
http://hbase.apache.org/acid-semantics.html

1) Say the client fails immediately after the put succeeds. In this case, the 
put will be lost, violating the durability property:

quote Any operation that returns a success code (eg does not throw an 
exception) will be made durable. /quote
 
2) Say the client issues a read for k immediately after writing k. The put will 
be stored in the client side write buffer, while the read will go to the region 
server, returning an older value, instead of v, violating the visibility 
property:

quote
When a client receives a success response for any mutation, that mutation
is immediately visible to both that client and any client with whom it later
communicates through side channels.
/quote

Thanks,
Tallat



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3903) A successful write to client write-buffer may be lost or not visible

2011-05-20 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3903:
-

Attachment: book_HBASE_3903.xml.patch

 A successful write to client write-buffer may be lost or not visible
 

 Key: HBASE-3903
 URL: https://issues.apache.org/jira/browse/HBASE-3903
 Project: HBase
  Issue Type: Bug
  Components: documentation
 Environment: Any.
Reporter: Tallat
Assignee: Doug Meil
Priority: Minor
  Labels: documentation
 Attachments: acid-semantics_HBASE_3903.xml.patch, 
 book_HBASE_3903.xml.patch


 A client can do a write to a client side 'write buffer' if enabled via 
 hTable.setAutoFlush(false). Now, assume a client puts value v under key k. 
 Two wrongs things can happen, violating the ACID semantics  of Hbase given 
 at: http://hbase.apache.org/acid-semantics.html
 1) Say the client fails immediately after the put succeeds. In this case, the 
 put will be lost, violating the durability property:
 quote Any operation that returns a success code (eg does not throw an 
 exception) will be made durable. /quote
  
 2) Say the client issues a read for k immediately after writing k. The put 
 will be stored in the client side write buffer, while the read will go to the 
 region server, returning an older value, instead of v, violating the 
 visibility property:
 quote
 When a client receives a success response for any mutation, that mutation
 is immediately visible to both that client and any client with whom it later
 communicates through side channels.
 /quote
 Thanks,
 Tallat

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3910) acid-semantics.html - clarify some of the concepts

2011-05-20 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3910:
-

Description: 

Inspired from HBASE-3903 regarding the acid-semantics page.

What's a side-channel?

quote
When a client receives a success response for any mutation, that mutation
is immediately visible to both that client and any client with whom it later
communicates through side channels.
/quote

Thanks,
Tallat



  was:
A client can do a write to a client side 'write buffer' if enabled via 
hTable.setAutoFlush(false). Now, assume a client puts value v under key k. Two 
wrongs things can happen, violating the ACID semantics  of Hbase given at: 
http://hbase.apache.org/acid-semantics.html

1) Say the client fails immediately after the put succeeds. In this case, the 
put will be lost, violating the durability property:

quote Any operation that returns a success code (eg does not throw an 
exception) will be made durable. /quote
 
2) Say the client issues a read for k immediately after writing k. The put will 
be stored in the client side write buffer, while the read will go to the region 
server, returning an older value, instead of v, violating the visibility 
property:

quote
When a client receives a success response for any mutation, that mutation
is immediately visible to both that client and any client with whom it later
communicates through side channels.
/quote

Thanks,
Tallat



 Issue Type: Improvement  (was: Bug)

 acid-semantics.html - clarify some of the concepts
 --

 Key: HBASE-3910
 URL: https://issues.apache.org/jira/browse/HBASE-3910
 Project: HBase
  Issue Type: Improvement
  Components: documentation
 Environment: Any.
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
  Labels: documentation

 Inspired from HBASE-3903 regarding the acid-semantics page.
 What's a side-channel?
 quote
 When a client receives a success response for any mutation, that mutation
 is immediately visible to both that client and any client with whom it later
 communicates through side channels.
 /quote
 Thanks,
 Tallat

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client

2011-05-20 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037079#comment-13037079
 ] 

jirapos...@reviews.apache.org commented on HBASE-2937:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/755/
---

(Updated 2011-05-20 20:49:57.345063)


Review request for hbase.


Changes
---

Retry {{ServerCallable#call}} in the case of non-{{SocketTimeoutException}}s, 
but only if we spent less time than the operation timeout.


Summary
---

Thanks to HBASE-3154, users now have the ability to specify a timeout for 
client-side RPC calls. However, it doesn't go far enough in terms of how low 
that timeout can go. Set the RPC timeout to too low a value and you run the 
risk of timing out on calls to the meta tables, which are preconditions to 
calling the {{HRegionInterface}} proxy.

Given that, I believe the motivation at work in HBASE-2937 still hold true. In 
this patch, I add a operation-level timeout, configurable through 
hbase.client.operation.timeout, which will override the value specified by 
hbase.rpc.timeout, if any, within the scope of the {{ServerCallable#call}} 
method. In other words, the operation-level timeout does not apply to calls to 
the meta tables. 

Furthermore, the patch treats an RPC timeout as a non-fatal event, in that it 
will not cause the {{HBaseClient#Connection}} instance to be closed. Last but 
not the least, users will also have the ability to set the operation timeout on 
the {{HTable}} on the fly.


This addresses bug HBASE-2937.
https://issues.apache.org/jira/browse/HBASE-2937


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java e9e3694 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java b26f41e 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 61e151a 
  src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 6f22123 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 470e741 
  src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java dbb57d9 
  src/main/java/org/apache/hadoop/hbase/util/PoolMap.java 354d49a 

Diff: https://reviews.apache.org/r/755/diff


Testing
---

mvn test


Thanks,

Karthick



 Facilitate Timeouts In HBase Client
 ---

 Key: HBASE-2937
 URL: https://issues.apache.org/jira/browse/HBASE-2937
 Project: HBase
  Issue Type: New Feature
  Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-2937.patch, HBASE-2937.patch


 Currently, there is no way to force an operation on the HBase client (viz. 
 HTable) to time out if a certain amount of time has elapsed.  In other words, 
 all invocations on the HTable class are veritable blocking calls, which will 
 not return until a response (successful or otherwise) is received. 
 In general, there are two ways to handle timeouts:  (a) call the operation in 
 a separate thread, until it returns a response or the wait on the thread 
 times out and (b) have the underlying socket unblock the operation if the 
 read times out.  The downside of the former approach is that it consumes more 
 resources in terms of threads and callables. 
 Here, we describe a way to specify and handle timeouts on the HTable client, 
 which relies on the latter approach (i.e., socket timeouts). Right now, the 
 HBaseClient sets the socket timeout to the value of the ipc.ping.interval 
 parameter, which is also how long it waits before pinging the server in case 
 of a failure. The goal is to allow clients to set that timeout on the fly 
 through HTable. Rather than adding an optional timeout argument to every 
 HTable operation, we chose to make it a property of HTable which effectively 
 applies to every method that involves a remote operation.
 In order to propagate the timeout  from HTable to HBaseClient, we replaced 
 all occurrences of ServerCallable in HTable with an extension called 
 ClientCallable, which sets the timeout on the region server interface, once 
 it has been instantiated, through the HConnection object. The latter, in 
 turn, asks HBaseRPC to pass that timeout to the corresponding Invoker, so 
 that it may inject the timeout at the time the invocation is made on the 
 region server proxy. Right before the request is sent to the server, we set 
 the timeout specified by the client on the underlying socket.
 In conclusion, this patch will afford clients the option of performing an 
 HBase operation until it completes or a specified timeout elapses. Note that 
 a timeout of zero 

[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client

2011-05-20 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037078#comment-13037078
 ] 

jirapos...@reviews.apache.org commented on HBASE-2937:
--



bq.  On 2011-05-19 06:11:23, Michael Stack wrote:
bq.   This seems like a bunch of functionality for a relatively small change.  
Nice one Karthick.  A few questions in the below.
bq.  
bq.  Karthick Sankarachary wrote:
bq.  Yes, it does seem like a big change for a relatively small feature, 
but an important one nevertheless. The complexity stems from the fact the scope 
of the operation timeout has to be limited to the {{ServerCallable#call}} 
method. 
bq.  
bq.  By way of motivation, if you run the TestFromClientSide test with the 
following patch (which sets the hbase.rpc.timeout to 10ms), you'll see that 
39 out of the 44 test cases will fail.
bq.  
bq.   24 --- 
a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   25 +++ 
b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   26 @@ -94,6 +94,8 @@ public class TestFromClientSide {
bq.   27@BeforeClass
bq.   28public static void setUpBeforeClass() throws Exception {
bq.   29  TEST_UTIL.startMiniCluster(3);
bq.   30 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 10);
bq.   32}
bq.  
bq.  On the other hand, if you run it with the default hbase.rpc.timeout 
but a hbase.client.operation.timeout set to 10ms, then you should see the 
test pass.
bq.  
bq.   24 --- 
a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   25 +++ 
b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   26 @@ -94,6 +94,8 @@ public class TestFromClientSide {
bq.   27@BeforeClass
bq.   28public static void setUpBeforeClass() throws Exception {
bq.   29  TEST_UTIL.startMiniCluster(3);
bq.   30 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 6);
bq.   31 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, 
10);
bq.   32}
bq.
bq.  
bq.  Michael Stack wrote:
bq.  Actually I was saying the opposite.  I'm surprised at how little code 
had to change to make this fix.
bq.  
bq.  So, I don't recall if there is good documentation in this patch on the 
difference between hbase.rpc.timeout and hbase.client.operation.timeout?  
If not, we need it.
bq.  
bq.  Does the TestFromClientSide complete in shorter time if I set a 
hbase.client.operation.timeout of 10ms?

There's comments in {{HConstants}} for both of those configuration properties. 
Is there another place where we should document them?

The test completes in more or less the same time, regardless of whether or not 
the hbase.client.operation.timeout is set to 10ms. I guess that's because the 
test server is running locally, which is probably why the test cases don't 
timeout.


bq.  On 2011-05-19 06:11:23, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 
106
bq.   https://reviews.apache.org/r/755/diff/1/?file=19383#file19383line106
bq.  
bq.   Are there other exceptions you think we should rethrow?  Connection 
Exception?
bq.  
bq.  Karthick Sankarachary wrote:
bq.  How about we do what HBaseClient does, which is wrap the 
SocketTimeoutException inside another one, along with a context-specific error 
message?
bq.  
bq.  Michael Stack wrote:
bq.  I was more wondering if there were exceptions we should treat like 
SocketTimeoutException?

The other kinds of exceptions we might expect {{HBaseClient}} to throw include 
{{ConnectException}} and {{IOException}}. We could treat them similarly, but 
only if we have already spent more time than the operation timeout. If not, 
then we could retry the call, this time using a lower operation timeout. To 
take an example, if the operation timeout is 50ms, and a {{ConnectException}} 
occurs 10ms after the call, then we could retry the call with a 40ms operation 
timeout. What do you think?


- Karthick


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/755/#review683
---


On 2011-05-20 20:49:57, Karthick Sankarachary wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/755/
bq.  ---
bq.  
bq.  (Updated 2011-05-20 20:49:57)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Thanks to HBASE-3154, 

[jira] [Updated] (HBASE-2077) NullPointerException with an open scanner that expired causing an immediate region server shutdown

2011-05-20 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2077:
-

Attachment: 2077-suggestion.txt

Here is a suggestion where we remove lease from leases while we are processing 
a request then on the way out in a finally we renew lease.

 NullPointerException with an open scanner that expired causing an immediate 
 region server shutdown
 --

 Key: HBASE-2077
 URL: https://issues.apache.org/jira/browse/HBASE-2077
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.2, 0.20.3
 Environment: Hadoop 0.20.0, Mac OS X, Java 6
Reporter: Sam Pullara
Assignee: Sam Pullara
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2077-suggestion.txt, HBASE-2077-3.patch, 
 HBASE-2077-redux.patch, 
 [Bug_HBASE-2077]_Fixes_a_very_rare_race_condition_between_lease_expiration_and_renewal.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 2009-12-29 18:05:55,432 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
 -4250070597157694417 lease expired
 2009-12-29 18:05:55,443 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
   at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
   at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
   at java.util.PriorityQueue.poll(PriorityQueue.java:523)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944)
   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 2009-12-29 18:05:55,446 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 7 on 55260, call next(-4250070597157694417, 1) from 
 192.168.1.90:54011: error: java.io.IOException: java.lang.NullPointerException
 java.io.IOException: java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1965)
   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
   at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
   at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
   at java.util.PriorityQueue.poll(PriorityQueue.java:523)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944)
   ... 5 more
 2009-12-29 18:05:55,447 WARN org.apache.hadoop.ipc.HBaseServer: IPC 

[jira] [Commented] (HBASE-3894) Thread contention over row locks set monitor

2011-05-20 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037086#comment-13037086
 ] 

Jean-Daniel Cryans commented on HBASE-3894:
---

I gave the latest patch a spin on my laptop using two PE randomWrite 1 (to 
generate lock contention) and my CPU profiling doesn't see any slowness related 
to the locking and the memory profiling shows that ~10k CountDownLatch only 
accounts for ~300KB. Also since they are short lived they get cleared up almost 
right away.

I would be +1 on committing if Dave tried it out on his cluster.

 Thread contention over row locks set monitor
 

 Key: HBASE-3894
 URL: https://issues.apache.org/jira/browse/HBASE-3894
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Dave Latham
Priority: Blocker
 Fix For: 0.90.4

 Attachments: concurrentRowLocks-2.patch, 
 concurrentRowLocks-trunk.patch, 
 regionserver_rowLock_set_contention.threads.txt


 HRegion maintains a set of row locks.  Whenever any thread attempts to lock 
 or release a row it needs to acquire the monitor on that set.  We've been 
 encountering cases with 30 handler threads all contending for that monitor, 
 blocked progress on the region server.  Clients timeout, and retry making it 
 worse, and the region server stops responding to new clients almost entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client

2011-05-20 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037097#comment-13037097
 ] 

jirapos...@reviews.apache.org commented on HBASE-2937:
--



bq.  On 2011-05-19 06:11:23, Michael Stack wrote:
bq.   This seems like a bunch of functionality for a relatively small change.  
Nice one Karthick.  A few questions in the below.
bq.  
bq.  Karthick Sankarachary wrote:
bq.  Yes, it does seem like a big change for a relatively small feature, 
but an important one nevertheless. The complexity stems from the fact the scope 
of the operation timeout has to be limited to the {{ServerCallable#call}} 
method. 
bq.  
bq.  By way of motivation, if you run the TestFromClientSide test with the 
following patch (which sets the hbase.rpc.timeout to 10ms), you'll see that 
39 out of the 44 test cases will fail.
bq.  
bq.   24 --- 
a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   25 +++ 
b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   26 @@ -94,6 +94,8 @@ public class TestFromClientSide {
bq.   27@BeforeClass
bq.   28public static void setUpBeforeClass() throws Exception {
bq.   29  TEST_UTIL.startMiniCluster(3);
bq.   30 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 10);
bq.   32}
bq.  
bq.  On the other hand, if you run it with the default hbase.rpc.timeout 
but a hbase.client.operation.timeout set to 10ms, then you should see the 
test pass.
bq.  
bq.   24 --- 
a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   25 +++ 
b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   26 @@ -94,6 +94,8 @@ public class TestFromClientSide {
bq.   27@BeforeClass
bq.   28public static void setUpBeforeClass() throws Exception {
bq.   29  TEST_UTIL.startMiniCluster(3);
bq.   30 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 6);
bq.   31 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, 
10);
bq.   32}
bq.
bq.  
bq.  Michael Stack wrote:
bq.  Actually I was saying the opposite.  I'm surprised at how little code 
had to change to make this fix.
bq.  
bq.  So, I don't recall if there is good documentation in this patch on the 
difference between hbase.rpc.timeout and hbase.client.operation.timeout?  
If not, we need it.
bq.  
bq.  Does the TestFromClientSide complete in shorter time if I set a 
hbase.client.operation.timeout of 10ms?
bq.  
bq.  Karthick Sankarachary wrote:
bq.  There's comments in {{HConstants}} for both of those configuration 
properties. Is there another place where we should document them?
bq.  
bq.  The test completes in more or less the same time, regardless of 
whether or not the hbase.client.operation.timeout is set to 10ms. I guess 
that's because the test server is running locally, which is probably why the 
test cases don't timeout.

So, high-level, IIUC, this patch will allow setting shorter operation timeouts. 
 You'll have to do it by setting hbase.client.operation.timeout in the 
Configuration the HTable uses.  Is that right? I see the default is MAX_INT for 
hbase.client.operation.timeout.  Does that mean the hbase.rpc.timeout 
prevails?  If hbase.client.operation.timeout timeouts we retry?  Is that 
right, the configured amount of times?

Sort-of-related, shorter timeouts make it more critical that we do a better job 
server-side keeping account of when an operation arrives and making sure it 
does not go through if by the time it comes out of the RPC queue, so much time 
has elapsed, the client has gone away (We don't want operations completing on 
the server if no client to reply to).


bq.  On 2011-05-19 06:11:23, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 
106
bq.   https://reviews.apache.org/r/755/diff/1/?file=19383#file19383line106
bq.  
bq.   Are there other exceptions you think we should rethrow?  Connection 
Exception?
bq.  
bq.  Karthick Sankarachary wrote:
bq.  How about we do what HBaseClient does, which is wrap the 
SocketTimeoutException inside another one, along with a context-specific error 
message?
bq.  
bq.  Michael Stack wrote:
bq.  I was more wondering if there were exceptions we should treat like 
SocketTimeoutException?
bq.  
bq.  Karthick Sankarachary wrote:
bq.  The other kinds of exceptions we might expect {{HBaseClient}} to throw 
include {{ConnectException}} and {{IOException}}. We could treat them 
similarly, but only if we have already spent more time than the operation 
timeout. If not, then we could retry the call, this time using a lower 
operation 

[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client

2011-05-20 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037108#comment-13037108
 ] 

jirapos...@reviews.apache.org commented on HBASE-2937:
--



bq.  On 2011-05-19 06:11:23, Michael Stack wrote:
bq.   This seems like a bunch of functionality for a relatively small change.  
Nice one Karthick.  A few questions in the below.
bq.  
bq.  Karthick Sankarachary wrote:
bq.  Yes, it does seem like a big change for a relatively small feature, 
but an important one nevertheless. The complexity stems from the fact the scope 
of the operation timeout has to be limited to the {{ServerCallable#call}} 
method. 
bq.  
bq.  By way of motivation, if you run the TestFromClientSide test with the 
following patch (which sets the hbase.rpc.timeout to 10ms), you'll see that 
39 out of the 44 test cases will fail.
bq.  
bq.   24 --- 
a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   25 +++ 
b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   26 @@ -94,6 +94,8 @@ public class TestFromClientSide {
bq.   27@BeforeClass
bq.   28public static void setUpBeforeClass() throws Exception {
bq.   29  TEST_UTIL.startMiniCluster(3);
bq.   30 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 10);
bq.   32}
bq.  
bq.  On the other hand, if you run it with the default hbase.rpc.timeout 
but a hbase.client.operation.timeout set to 10ms, then you should see the 
test pass.
bq.  
bq.   24 --- 
a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   25 +++ 
b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
bq.   26 @@ -94,6 +94,8 @@ public class TestFromClientSide {
bq.   27@BeforeClass
bq.   28public static void setUpBeforeClass() throws Exception {
bq.   29  TEST_UTIL.startMiniCluster(3);
bq.   30 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 6);
bq.   31 +
TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, 
10);
bq.   32}
bq.
bq.  
bq.  Michael Stack wrote:
bq.  Actually I was saying the opposite.  I'm surprised at how little code 
had to change to make this fix.
bq.  
bq.  So, I don't recall if there is good documentation in this patch on the 
difference between hbase.rpc.timeout and hbase.client.operation.timeout?  
If not, we need it.
bq.  
bq.  Does the TestFromClientSide complete in shorter time if I set a 
hbase.client.operation.timeout of 10ms?
bq.  
bq.  Karthick Sankarachary wrote:
bq.  There's comments in {{HConstants}} for both of those configuration 
properties. Is there another place where we should document them?
bq.  
bq.  The test completes in more or less the same time, regardless of 
whether or not the hbase.client.operation.timeout is set to 10ms. I guess 
that's because the test server is running locally, which is probably why the 
test cases don't timeout.
bq.  
bq.  Michael Stack wrote:
bq.  So, high-level, IIUC, this patch will allow setting shorter operation 
timeouts.  You'll have to do it by setting hbase.client.operation.timeout in 
the Configuration the HTable uses.  Is that right? I see the default is MAX_INT 
for hbase.client.operation.timeout.  Does that mean the hbase.rpc.timeout 
prevails?  If hbase.client.operation.timeout timeouts we retry?  Is that 
right, the configured amount of times?
bq.  
bq.  Sort-of-related, shorter timeouts make it more critical that we do a 
better job server-side keeping account of when an operation arrives and making 
sure it does not go through if by the time it comes out of the RPC queue, so 
much time has elapsed, the client has gone away (We don't want operations 
completing on the server if no client to reply to).

bq. So, high-level, IIUC, this patch will allow setting shorter operation 
timeouts.  You'll have to do it by setting hbase.client.operation.timeout in 
the Configuration the HTable uses.  Is that right? I see the default is MAX_INT 
for hbase.client.operation.timeout.  Does that mean the hbase.rpc.timeout 
prevails?  

Yes, to both of the above questions.

bq. If hbase.client.operation.timeout timeouts we retry?  Is that right, the 
configured amount of times?

Actually, no we don't retry, as that would kind of defeat the purpose of the 
operation timeout, in my opinion. Note that if we were to retry we would have 
to pause (for at least 1000 ms by default). If the client does not have the 
luxury of spending say 10ms on a {{HTable}} operation, then it will probably 
not want to pause either, which rules out retries.

bq. Sort-of-related, shorter timeouts make it more critical that we do a better 
job 

[jira] [Commented] (HBASE-3909) Add dynamic config

2011-05-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037141#comment-13037141
 ] 

Ted Yu commented on HBASE-3909:
---

I went over HADOOP-7001.5.patch
We have the following decision to make:
1. HADOOP-7001 is in trunk only. Are we going to pull the interface/base 
class/util class over to hbase ?
2. ReconfigurationServlet would be convenient for admin to use. Are we going to 
support reloading conf from hbase shell ?
3. HADOOP-7001 provides fine-grained property reconfig through 
reconfigurePropertyImpl() calls. Shall we also provide coarse-grained property 
reconfig mechanism ? e.g. we can notify AssignmentManager of the properties it 
uses whose values have just changed. This mechanism is also related to 
getReconfigurableProperties(). I think HMaster, AssignmentManager, etc would 
all extend ReconfigurableBase.

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack

 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037150#comment-13037150
 ] 

Ted Yu commented on HBASE-3904:
---

I have run tests related to table creation and availability checking. 
Namely this code in LoadIncrementalHFiles:
{code}
while (!conn.isTableAvailable(table.getTableName())  
(ctrTABLE_CREATE_MAX_RETRIES)) {
{code}
TestHFileOutputFormat, TestLoadIncrementalHFiles and TestAdmin.

Please outline what more test(s) should be devised.

 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor
 Attachments: 3904.txt


 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-20 Thread Vidhyashankar Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037163#comment-13037163
 ] 

Vidhyashankar Venkataraman commented on HBASE-3904:
---

Ok, I tested your patch with the code attached below:

And I get the following output:
Caught Socket timeout.. Mostly caused by a slow region assignment by master!
11/05/20 23:26:00 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=b3110640.yst.yahoo.net:44481,b3110600.yst.yahoo.net:44481,b3110560.yst.yahoo.net:44481,b3110520.yst.yahoo.net:44481,b3110680.yst.yahoo.net:44481
 sessionTimeout=18 watcher=hconnection
11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Opening socket connection to 
server b3110560.yst.yahoo.net/67.195.55.234:44481
11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Socket connection established to 
b3110560.yst.yahoo.net/67.195.55.234:44481, initiating session
11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Session establishment complete on 
server b3110560.yst.yahoo.net/67.195.55.234:44481, sessionid = 
0x12ff6d3911179e8, negotiated timeout = 18
Table test-v6 not yet available... Sleeping for 5 more minutes... Expected 
#regions = 17933
Table is probably available!! : test-v6 Available? true
Table test-v6 may not be available... Double checking: Sleeping for 5 minutes 
more...
Table test-v6: Expected # Regions = 17933 Actual number = 4744
Table test-v6 may not be available... Double checking: Sleeping for 5 minutes 
more...

And it is still trying to assign.


 1.  The good: Notice that tableAvailable got out of the loop because it was 
true and it also printed true in the following print message. This has never 
happened without the patch.
 2.  The doubtful part:  isTableAvailable still doesn't return back when all 
regions are online as we see in the subsequent output.

Can you let me know what your patch intended to do?

Thank you
Vidhya

THE CODE:


   try {
  hbAdmin.createTableAsync(htd, keysArray.toArray(new byte[0][0]));
} catch (java.net.SocketTimeoutException e) {
  System.err.println(Caught Socket timeout..  +
Mostly caused by a slow region assignment by 
master!);
}

HTable table = new HTable(tableName);
HConnection conn = table.getConnection();
do {
  System.out.println(Table  + tableName + not yet available...  +
Sleeping for 5 more minutes... Expected #regions = 
 +
(keysArray.size()+1));
  Thread.sleep(30);
} while (!conn.isTableAvailable(table.getTableName()));

System.err.println(Table is probably available!! :  +
tableName +
 Available?  +
conn.isTableAvailable(table.getTableName()));

MapHRegionInfo, HServerAddress regionList = null;
do {
  System.out.println(Table  + tableName + may not be available...  +
Double checking: Sleeping for 5 minutes more...);
  Thread.sleep(30);
  regionList = table.getRegionsInfo();
  System.out.println(Table  + tableName + : Expected # Regions =  +
(keysArray.size()+1) +
 Actual number =  +
((regionList!=null)?regionList.size():0) );
} while ((regionList==null) ||
(regionList.size()!=(keysArray.size()+1)));



On 5/20/11 4:19 PM, Ted Yu (JIRA) j...@apache.org wrote:



[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037150#comment-13037150
 ]

Ted Yu commented on HBASE-3904:
---

I have run tests related to table creation and availability checking.
Namely this code in LoadIncrementalHFiles:
{code}
while (!conn.isTableAvailable(table.getTableName())  
(ctrTABLE_CREATE_MAX_RETRIES)) {
{code}
TestHFileOutputFormat, TestLoadIncrementalHFiles and TestAdmin.

Please outline what more test(s) should be devised.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor
 Attachments: 3904.txt


 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is 

[jira] [Updated] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-20 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3904:
-

Comment: was deleted

(was: Ok, I tested your patch with the code attached below:

And I get the following output:
Caught Socket timeout.. Mostly caused by a slow region assignment by master!
11/05/20 23:26:00 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=b3110640.yst.yahoo.net:44481,b3110600.yst.yahoo.net:44481,b3110560.yst.yahoo.net:44481,b3110520.yst.yahoo.net:44481,b3110680.yst.yahoo.net:44481
 sessionTimeout=18 watcher=hconnection
11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Opening socket connection to 
server b3110560.yst.yahoo.net/67.195.55.234:44481
11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Socket connection established to 
b3110560.yst.yahoo.net/67.195.55.234:44481, initiating session
11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Session establishment complete on 
server b3110560.yst.yahoo.net/67.195.55.234:44481, sessionid = 
0x12ff6d3911179e8, negotiated timeout = 18
Table test-v6 not yet available... Sleeping for 5 more minutes... Expected 
#regions = 17933
Table is probably available!! : test-v6 Available? true
Table test-v6 may not be available... Double checking: Sleeping for 5 minutes 
more...
Table test-v6: Expected # Regions = 17933 Actual number = 4744
Table test-v6 may not be available... Double checking: Sleeping for 5 minutes 
more...

And it is still trying to assign.


 1.  The good: Notice that tableAvailable got out of the loop because it was 
true and it also printed true in the following print message. This has never 
happened without the patch.
 2.  The doubtful part:  isTableAvailable still doesn't return back when all 
regions are online as we see in the subsequent output.

Can you let me know what your patch intended to do?

Thank you
Vidhya

THE CODE:


   try {
  hbAdmin.createTableAsync(htd, keysArray.toArray(new byte[0][0]));
} catch (java.net.SocketTimeoutException e) {
  System.err.println(Caught Socket timeout..  +
Mostly caused by a slow region assignment by 
master!);
}

HTable table = new HTable(tableName);
HConnection conn = table.getConnection();
do {
  System.out.println(Table  + tableName + not yet available...  +
Sleeping for 5 more minutes... Expected #regions = 
 +
(keysArray.size()+1));
  Thread.sleep(30);
} while (!conn.isTableAvailable(table.getTableName()));

System.err.println(Table is probably available!! :  +
tableName +
 Available?  +
conn.isTableAvailable(table.getTableName()));

MapHRegionInfo, HServerAddress regionList = null;
do {
  System.out.println(Table  + tableName + may not be available...  +
Double checking: Sleeping for 5 minutes more...);
  Thread.sleep(30);
  regionList = table.getRegionsInfo();
  System.out.println(Table  + tableName + : Expected # Regions =  +
(keysArray.size()+1) +
 Actual number =  +
((regionList!=null)?regionList.size():0) );
} while ((regionList==null) ||
(regionList.size()!=(keysArray.size()+1)));



On 5/20/11 4:19 PM, Ted Yu (JIRA) j...@apache.org wrote:



[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037150#comment-13037150
 ]

Ted Yu commented on HBASE-3904:
---

I have run tests related to table creation and availability checking.
Namely this code in LoadIncrementalHFiles:
{code}
while (!conn.isTableAvailable(table.getTableName())  
(ctrTABLE_CREATE_MAX_RETRIES)) {
{code}
TestHFileOutputFormat, TestLoadIncrementalHFiles and TestAdmin.

Please outline what more test(s) should be devised.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

)

 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor
 Attachments: 3904.txt


 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For 

[jira] [Commented] (HBASE-3883) book.xml / added something in schema design and FAQ about not being able to change rowkeys

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037169#comment-13037169
 ] 

Hudson commented on HBASE-3883:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 book.xml / added something in schema design and FAQ about not being able to 
 change rowkeys
 --

 Key: HBASE-3883
 URL: https://issues.apache.org/jira/browse/HBASE-3883
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Fix For: 0.92.0

 Attachments: book_HBASE_3883.xml.patch


 This question has come up enough times in the dist-list to warrant inclusion 
 in the book.
 Added small entry in schema design and in FAQ (referencing schema design).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3826) Minor compaction needs to check if still over compactionThreshold after compacting

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037170#comment-13037170
 ] 

Hudson commented on HBASE-3826:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 Minor compaction needs to check if still over compactionThreshold after 
 compacting
 --

 Key: HBASE-3826
 URL: https://issues.apache.org/jira/browse/HBASE-3826
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.90.1
 Environment: hbase-0.90.1
 hbase-0.90.1-cdh3u0
Reporter: Schubert Zhang
Assignee: Nicolas Spiegelberg
  Labels: compaction
 Fix For: 0.92.0

 Attachments: HBASE-3826.patch, HBASE-3826_0.92.patch


 I have a busy region, and there are 43 StoreFiles (compactionThreshold=8) in 
 this region.
 Now, I stopped the client and stopped putting new data into it. I expect 
 these StoreFiles to be compacted later.
  
 But, almost one day later, these 43 StoreFiles are still there.
 (Note: in my hbase instance, I disabled the major compaction.)
  
 It seems the minor compaction does not be started continuiously to compact 
 remaining storefiles.
 And I checked the code, it is true.
 -
 After more test, a obvious issue/problem is, the complete of a minor 
 compaction does not check if current storefiles need more minor compaction.
  
 I think this may be a bug or leak.
  
 Try this test:
  
 1. Put many data to a region, then there are 30 storefiles accumulated, 
 because the backend compaction cannot catch up with the fast puts. 
 (hbase.hstore.compactionThreshold=8, base.hstore.compaction.max=12)
  
 2. Then stop put.
  
 3. Then, these 30 storefiles are still there for a long time, (no automatic 
 minor compaction)
  
 4. Submit a compaction on this region, then, only 12 files are compaction, 
 now, we have 19 storefiles. The minor compaction stopped.
  
 I think, when a minor compaction complete, it should check if the number of 
 storefiles still many, if so, another minor compaction should start 
 continuiously.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3902) Add Bytes.toBigDecimal and Bytes.toBytes(BigDecimal)

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037171#comment-13037171
 ] 

Hudson commented on HBASE-3902:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 Add Bytes.toBigDecimal and Bytes.toBytes(BigDecimal) 
 -

 Key: HBASE-3902
 URL: https://issues.apache.org/jira/browse/HBASE-3902
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.90.1, 0.90.2
Reporter: Vaibhav Puranik
 Fix For: 0.90.4

 Attachments: big-decimal-methods-patch.txt


 Bytes.toBigDecimal and Bytes.toBytes were removed in 0.90.x. Please add it 
 back. We have data encoded using these methods. I don't think BigDecimal 
 class as getBytes/toBytes methods. And even if it had it, if the logic of 
 encoding it into bytes is different, it wouldn't work with the existing data. 
 I am sure that lot of people might face this issue. 
 I will submit the patch in a day or two.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037168#comment-13037168
 ] 

Hudson commented on HBASE-3691:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 Add compressor support for 'snappy', google's compressor
 

 Key: HBASE-3691
 URL: https://issues.apache.org/jira/browse/HBASE-3691
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: hbase-snappy-3691-trunk-002.patch, 
 hbase-snappy-3691-trunk-003.patch, hbase-snappy-3691-trunk-004.patch, 
 hbase-snappy-3691-trunk.patch


 http://code.google.com/p/snappy/ is apache licensed.
 bq. Snappy is a compression/decompression library. It does not aim for 
 maximum compression, or compatibility with any other compression library; 
 instead, it aims for very high speeds and reasonable compression. For 
 instance, compared to the fastest mode of zlib, Snappy is an order of 
 magnitude faster for most inputs, but the resulting compressed files are 
 anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 
 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses 
 at about 500 MB/sec or more.
 bq. Snappy is widely used inside Google, in everything from BigTable and 
 MapReduce to our internal RPC systems. (Snappy has previously been referred 
 to as Zippy in some presentations and the likes.)
 Lets get it in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3905) HBaseAdmin.createTableAsync() should check for invalid split keys.

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037167#comment-13037167
 ] 

Hudson commented on HBASE-3905:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 HBaseAdmin.createTableAsync() should check for invalid split keys.
 --

 Key: HBASE-3905
 URL: https://issues.apache.org/jira/browse/HBASE-3905
 Project: HBase
  Issue Type: Bug
 Environment: Considering this function is open to users, this 
 function should validate the split key array. For example, I had tried 
 creating a table with keys that had duplicate entries. The master (sometimes) 
 crashed with a KeeperException.
 2011-05-14 01:23:33,196 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating/setting node OFFLINE 
 org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
 BadVersion for /hbase/unassigned/39c3c2f26c777f9d2da8076d9b058c9b
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)  
   
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)   
  
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:708)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createOrForceNodeOffline(ZKAssign.java:248)
 
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:936)
 
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:887)
 
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:729)
 
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:709)
 
 at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:805)   
  
 at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:773)   
  
 at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:740)   
  
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)   
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036)
 2011-05-14 01:23:33,197 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 And just before exiting:
 2011-05-14 01:23:34,048 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 Responder, call createTable({BLAH BLAH BLAH}, [[B@244e3ce5) from 
 67.195.46.34:36335: output error
Reporter: Vidhyashankar Venkataraman
Assignee: Ted Yu
Priority: Minor
 Fix For: 0.90.4

 Attachments: 3905.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3901) Update documentation for ImportTsv to reflect recent features

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037172#comment-13037172
 ] 

Hudson commented on HBASE-3901:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 Update documentation for ImportTsv to reflect recent features
 -

 Key: HBASE-3901
 URL: https://issues.apache.org/jira/browse/HBASE-3901
 Project: HBase
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.92.0

 Attachments: HBASE-3901_1.patch


 HBASE-3880 added new features to ImportTsv. Here's a patch to update 
 documentation for these and other recent features.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3820) Splitlog() executed while the namenode was in safemode may cause data-loss

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037175#comment-13037175
 ] 

Hudson commented on HBASE-3820:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 Splitlog() executed while the namenode was in safemode may cause data-loss
 --

 Key: HBASE-3820
 URL: https://issues.apache.org/jira/browse/HBASE-3820
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: Jieshan Bean
 Fix For: 0.90.4

 Attachments: HBASE-3820-90-V3.patch, HBASE-3820-MFSFix-90-V2.patch, 
 HBASE-3820-MFSFix-90.patch


 I found this problem while the namenode went into safemode due to some 
 unclear reasons. 
 There's one patch about this problem:
try {
   HLogSplitter splitter = HLogSplitter.createLogSplitter(
 conf, rootdir, logDir, oldLogDir, this.fs);
   try {
 splitter.splitLog();
   } catch (OrphanHLogAfterSplitException e) {
 LOG.warn(Retrying splitting because of:, e);
 // An HLogSplitter instance can only be used once.  Get new instance.
 splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
   oldLogDir, this.fs);
 splitter.splitLog();
   }
   splitTime = splitter.getTime();
   splitLogSize = splitter.getSize();
 } catch (IOException e) {
   checkFileSystem();
   LOG.error(Failed splitting  + logDir.toString(), e);
   master.abort(Shutting down HBase cluster: Failed splitting hlog 
 files..., e);
 } finally {
   this.splitLogLock.unlock();
 }
 And it was really give some useful help to some extent, while the namenode 
 process exited or been killed, but not considered the Namenode safemode 
 exception.
I think the root reason is the method of checkFileSystem().
It gives out an method to check whether the HDFS works normally(Read and 
 write could be success), and that maybe the original propose of this method. 
 This's how this method implements:
 DistributedFileSystem dfs = (DistributedFileSystem) fs;
 try {
   if (dfs.exists(new Path(/))) {  
 return;
   }
 } catch (IOException e) {
   exception = RemoteExceptionHandler.checkIOException(e);
 }

I have check the hdfs code, and learned that while the namenode was in 
 safemode ,the dfs.exists(new Path(/)) returned true. Because the file 
 system could provide read-only service. So this method just checks the dfs 
 whether could be read. I think it's not reasonable.
 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3881) Add disable balancer in graceful_stop.sh script

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037174#comment-13037174
 ] 

Hudson commented on HBASE-3881:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 Add disable balancer in graceful_stop.sh script
 ---

 Key: HBASE-3881
 URL: https://issues.apache.org/jira/browse/HBASE-3881
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.4

 Attachments: balancer.txt


 If balancer is on when graceful_stop.sh runs, it can get messy.  Add disable 
 of balancer to the script.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2938) Add Thread-Local Behavior To HTable Pool

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037177#comment-13037177
 ] 

Hudson commented on HBASE-2938:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])
HBASE-2938 Add Thread-Local Behavior To HTable Pool

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestHTablePool.java


 Add Thread-Local Behavior To HTable Pool
 

 Key: HBASE-2938
 URL: https://issues.apache.org/jira/browse/HBASE-2938
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
 Fix For: 0.92.0

 Attachments: HBASE-2938-V2.patch, HBASE-2938.patch


   It is a well-documented fact that the HBase table client (viz., HTable) is 
 not thread-safe. Hence, the recommendation has been to use a HTablePool or a 
 ThreadLocal to manage access to tables. The downside of the latter is that it 
 (a) requires the user to reinvent the wheel in terms of mapping table names 
 to tables and (b) forces the user to maintain the thread-local objects. 
 Ideally, it would be nice if we could make the HTablePool handle thread-local 
 objects as well. That way, it not only becomes the one stop shop for all 
 client-side tables, but also insulates the user from the ThreadLocal object.
   
   Here, we propose a way to generalize the HTablePool so that the underlying 
 pool type is either reusable or thread-local. To make this possible, we 
 introdudce the concept of a SharedMap, which essentially, maps a key to a 
 collection of values, the elements of which are managed by a pool. In effect, 
 that collection acts as a shared pool of resources, access to which is 
 closely controlled as dictated by the particular semantics of the pool.
  Furthermore, to simplify the construction of HTablePools, we added a couple 
 of parameters (viz. hbase.client.htable.pool.type and 
 hbase.client.hbase.pool.size) to control the default behavior of a 
 HTablePool.
   
   In case the size of the pool is set to a non-zero positive number, that is 
 used to cap the number of resources that a pool may contain for any given 
 key. A size of Integer#MAX_VALUE is interpreted to mean an unbounded pool.

Currently, the SharedMap supports the following types of pools:
* A ThreadLocalPool, which represents a pool that builds on the 
 ThreadLocal class. It essentially binds the resource to the thread from which 
 it is accessed.
* A ReusablePool, which represents a pool that builds on the LinkedList 
 class. It essentially allows resources to be checked out, at which point it 
 is (temporarily) removed from the pool. When the resource is no longer 
 required, it should be returned to the pool in order to be reused.
* A RoundRobinPool, which represents a pool that stores its resources in 
 an ArrayList. It load-balances access to its resources by returning a 
 different resource every time a given key is looked up.
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3898) TestSplitTransactionOnCluster broke in TRUNK

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037178#comment-13037178
 ] 

Hudson commented on HBASE-3898:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 TestSplitTransactionOnCluster broke in TRUNK
 

 Key: HBASE-3898
 URL: https://issues.apache.org/jira/browse/HBASE-3898
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Attachments: 3898.txt


 It hangs for 15 minutes.  I see a NPE trying to split a region.  The splitKey 
 passed is null.  Looks to be by-product of recent compaction refactorings.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3888) book.xml - filled in architecture 'daemon' section

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037176#comment-13037176
 ] 

Hudson commented on HBASE-3888:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 book.xml - filled in architecture 'daemon' section 
 ---

 Key: HBASE-3888
 URL: https://issues.apache.org/jira/browse/HBASE-3888
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Fix For: 0.92.0

 Attachments: book_HBASE_3888.xml.patch


 The 'daemon' section in architecture has been empty for a while.  
 Filled in an overview of what HMaster and HRegionServer do, with a brief 
 overview of what their functional interfaces look like, along with a short 
 description of their background processes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3874) ServerShutdownHandler fails on NPE if a plan has a random region assignment

2011-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037173#comment-13037173
 ] 

Hudson commented on HBASE-3874:
---

Integrated in HBase-TRUNK #1930 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1930/])


 ServerShutdownHandler fails on NPE if a plan has a random region assignment
 ---

 Key: HBASE-3874
 URL: https://issues.apache.org/jira/browse/HBASE-3874
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.4

 Attachments: HBASE-3874-trunk.patch, HBASE-3874.patch


 By chance, we were able to revert the ulimit on one of our clusters to 1024 
 and it started dying non-stop on Too many open files. Now the bad thing is 
 that some region servers weren't completely ServerShutdownHandler'd because 
 they failed on:
 {quote}
 2011-05-07 00:04:46,203 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
 Caught throwable while processing event M_SERVER_SHUTDOWN
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:1804)
   at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:101)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {quote}
 Reading the code, it seems the NPE is in the if statement:
 {code}
 Map.EntryString, RegionPlan e = i.next();
 if (e.getValue().getDestination().equals(hsi)) {
   // Use iterator's remove else we'll get CME
   i.remove();
 }
 {code}
 Which means that the destination (HSI) is null. Looking through the code, it 
 seems we instantiate a RegionPlan with a null HSI when it's a random 
 assignment. 
 It means that if there's a random assignment going on while a node dies then 
 this issue might happen.
 Initially I thought that this could mean data loss, but the logs are already 
 split so it's just the reassignment that doesn't happen (still bad).
 Also it left the master with dead server being processed, so for two days the 
 balancer didn't run failing on:
 bq. org.apache.hadoop.hbase.master.HMaster: Not running balancer because 
 processing dead regionserver(s): []
 And the reason why the array is empty is because we are running 0.90.3 which 
 removes the RS from the dead list if it comes back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

2011-05-20 Thread Joey Echeverria (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joey Echeverria updated HBASE-1316:
---

Attachment: HBASE-1316-1.patch
zookeeper-native-Linux-amd64-64.tgz
zookeeper-native-headers.tgz

I've got a partial patch ready. The build relies on native-maven-plugin to 
build the native code. This plugin pulls native dependencies as maven 
artifacts. To make this work, I packaged up the zookeeper header files and the 
static library compiled for x86-64 Linux.

In order to test the patch you need to install the artifacts into your local 
maven repository. I've included a simple install.sh to do this for you. We'll 
need to upload these artifacts somewhere, along with other supported 
OSes/architectures in the future.

I did attempt to make both the build and runtime code work if you're not on a 
supported platform, but I haven't extensively tested it.

At this point, the patch just adds support for interacting with zookeeper via 
the native code. The interaction is very limited, currently only creating 
ephemeral nodes is supported. One thing I did do was add a callback for the 
native code to notify Java when it's session gets expired.

Right now, I'm generating my own session expiration event to send to the Java 
zookeeper connection. I think this will allow the region server to shutdown if 
the native session expires. It should look just like an expiration of the Java 
session.

Things that are not yet implemented:

# The region server hasn't been modified to use the native code at all.
# I haven't modified the packaging part of the build. I'm not sure how we'll 
want the build to generate versions of the native library for multiple 
platforms.

Let me know if you think this is on the right track or if anything needs a big 
rethink.

 ZooKeeper: use native threads to avoid GC stalls (JNI integration)
 --

 Key: HBASE-1316
 URL: https://issues.apache.org/jira/browse/HBASE-1316
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.20.0
Reporter: Andrew Purtell
Assignee: Berk D. Demir
 Attachments: HBASE-1316-1.patch, zk_wrapper.tar.gz, 
 zookeeper-native-Linux-amd64-64.tgz, zookeeper-native-headers.tgz


 From Joey Echeverria up on hbase-users@:
 We've used zookeeper in a write-heavy project we've been working on and 
 experienced issues similar to what you described. After several days of 
 debugging, we discovered that our issue was garbage collection. There was no 
 way to guarantee we wouldn't have long pauses especially since our 
 environment was the worst case for garbage collection, millions of tiny, 
 short lived objects. I suspect HBase sees similar work loads frequently, if 
 it's not constantly. With anything shorter than a 30 second session time out, 
 we got session expiration events extremely frequently. We needed to use 60 
 seconds for any real confidence that an ephemeral node disappearing meant 
 something was unavailable.
 We really wanted quick recovery so we ended up writing a light-weight wrapper 
 around the C API and used swig to auto-generate a JNI interface. It's not 
 perfect, but since we switched to this method we've never seen a session 
 expiration event and ephemeral nodes only disappear when there are network 
 issues or a machine/process goes down.
 I don't know if it's worth doing the same kind of thing for HBase as it adds 
 some unnecessary native code, but it's a solution that I found works.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037218#comment-13037218
 ] 

Ted Yu commented on HBASE-3904:
---

From Vidhyashankar's test:
{code}
Table test-v6 not yet available... Sleeping for 5 more minutes... Expected 
#regions = 17933
Table is probably available!! : test-v6 Available? true
Table test-v6may not be available... Double checking: Sleeping for 5 minutes 
more...
Table test-v6: Expected # Regions = 17933 Actual number = 4744
{code}
We can see that after conn.isTableAvailable() returned true, there were still 
at least 13189 regions that were not assigned - not reaching .META.

I think we should implement createTableSync() as I proposed earlier.
We can ask user to call table.getRegionsInfo() but that is not convenient, and 
getRegionsInfo() is marked deprecated.

 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor
 Attachments: 3904.txt


 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3909) Add dynamic config

2011-05-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037255#comment-13037255
 ] 

Andrew Purtell commented on HBASE-3909:
---

Given that Hadoop does not require ZooKeeper, but we do anyway, I wonder if it 
makes more sense to go our own route and host all of configuration in the 
ZooKeeper namespace. It would therefore be possible to make one edit (committed 
into ZK) and watches on all processes would automatically pull it.

The access controller on HBASE-3025 uses this approach for ACLs. Upon cold boot 
they are loaded from META into znodes. Then all processes open watches on the 
znode(s). Upon update, the znode is updated, firing the watchers, propagating 
the change cluster wide.

For supporting dynamic configuration, the first process up could populate 
znode(s) from Configuration; otherwise if the znodes exist configuration would 
be read from there. Whenever the znode(s) are updated, the changes can be 
applied to running state by the watcher.

How/if the updated configuration should be written back to the config xml files 
on local disk may be a subject of debate.

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack

 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.

2011-05-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037257#comment-13037257
 ] 

Andrew Purtell commented on HBASE-3906:
---

How many of those 3G of objects on the heap are live?

 When HMaster is running,there are a lot of RegionLoad instances(far greater 
 than the regions),it has risk of OOME.
 --

 Key: HBASE-3906
 URL: https://issues.apache.org/jira/browse/HBASE-3906
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.2, 0.90.3
 Environment: 1 hmaster,4 regionserver,about 100,000 regions.
Reporter: jian zhang
 Fix For: 0.90.4

 Attachments: HBASE-3906.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 1、Start hbase cluster;
 2、After hmaster finish regions assignement,use jmap to dump the memory of 
 hmaster;
 3、Use MAT to analyse the dump file,there are too many RegionLoad 
 instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-2077) NullPointerException with an open scanner that expired causing an immediate region server shutdown

2011-05-20 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2077:
-

Attachment: 2077-v4.txt

Ahemm.. this is a version that actually works (TestFromClientSide is a good 
test for this change).

 NullPointerException with an open scanner that expired causing an immediate 
 region server shutdown
 --

 Key: HBASE-2077
 URL: https://issues.apache.org/jira/browse/HBASE-2077
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.2, 0.20.3
 Environment: Hadoop 0.20.0, Mac OS X, Java 6
Reporter: Sam Pullara
Assignee: Sam Pullara
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2077-suggestion.txt, 2077-v4.txt, HBASE-2077-3.patch, 
 HBASE-2077-redux.patch, 
 [Bug_HBASE-2077]_Fixes_a_very_rare_race_condition_between_lease_expiration_and_renewal.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 2009-12-29 18:05:55,432 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
 -4250070597157694417 lease expired
 2009-12-29 18:05:55,443 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
   at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
   at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
   at java.util.PriorityQueue.poll(PriorityQueue.java:523)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944)
   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 2009-12-29 18:05:55,446 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 7 on 55260, call next(-4250070597157694417, 1) from 
 192.168.1.90:54011: error: java.io.IOException: java.lang.NullPointerException
 java.io.IOException: java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1965)
   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
   at 
 java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
   at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
   at java.util.PriorityQueue.poll(PriorityQueue.java:523)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944)
   ... 5 more
 2009-12-29 18:05:55,447 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 Responder, call 

[jira] [Commented] (HBASE-3909) Add dynamic config

2011-05-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037265#comment-13037265
 ] 

Todd Lipcon commented on HBASE-3909:


I'm always skeptical of the suggestion to store configuration in ZooKeeper. 
Here's my reasoning:

- we already require at least one piece of configuration in the client itself 
in order to connect to ZooKeeper (ie the ZK quorum info and session timeouts, 
etc)
- operations teams are very good at managing text-based configuration files 
with tools like puppet, cfengine, etc. It's also easy to version-control these 
kinds of configs, add !-- comments --, etc. Moving to ZK makes these tasks 
more difficult -- we'd need lots of tooling, etc.
- If we keep both the text-based and ZK-based, it's easy to accidentally change 
something in ZK but forget to update in text, so it would revert on next 
restart.
- we currently have the somewhat nice property that nothing in ZK is critical 
- even if the ZK cluster is completely wiped out, we dont lose any info. This 
would be a change.

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack

 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3909) Add dynamic config

2011-05-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037266#comment-13037266
 ] 

Ted Yu commented on HBASE-3909:
---

+1 on Todd's comment.

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack

 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira