[jira] [Created] (HBASE-3907) make it easier to add per-CF metrics; add some key per-CF metrics to start with
make it easier to add per-CF metrics; add some key per-CF metrics to start with --- Key: HBASE-3907 URL: https://issues.apache.org/jira/browse/HBASE-3907 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Add plumbing need to add various types of per ColumnFamily metrics. And to start with add a bunch per-CF metrics such as: 1) Blocks read, cache hit, avg time of read for a column family. 2) Similar stats for compaction related reads. 3) Stats for meta block reads per CF 4) Bloom Filter stats per CF etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3907) make it easier to add per-CF metrics; add some key per-CF metrics to start with
[ https://issues.apache.org/jira/browse/HBASE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Muthukkaruppan updated HBASE-3907: - Description: Add plumbing needed to add various types of per ColumnFamily metrics. And to start with add a bunch per-CF metrics such as: 1) Blocks read, cache hit, avg time of read for a column family. 2) Similar stats for compaction related reads. 3) Stats for meta block reads per CF 4) Bloom Filter stats per CF etc. was: Add plumbing need to add various types of per ColumnFamily metrics. And to start with add a bunch per-CF metrics such as: 1) Blocks read, cache hit, avg time of read for a column family. 2) Similar stats for compaction related reads. 3) Stats for meta block reads per CF 4) Bloom Filter stats per CF etc. make it easier to add per-CF metrics; add some key per-CF metrics to start with --- Key: HBASE-3907 URL: https://issues.apache.org/jira/browse/HBASE-3907 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Add plumbing needed to add various types of per ColumnFamily metrics. And to start with add a bunch per-CF metrics such as: 1) Blocks read, cache hit, avg time of read for a column family. 2) Similar stats for compaction related reads. 3) Stats for meta block reads per CF 4) Bloom Filter stats per CF etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.
[ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian zhang updated HBASE-3906: -- Affects Version/s: 0.90.3 When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME. -- Key: HBASE-3906 URL: https://issues.apache.org/jira/browse/HBASE-3906 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.2, 0.90.3 Environment: 1 hmaster,4 regionserver,about 100,000 regions. Reporter: jian zhang Fix For: 0.90.4 Original Estimate: 168h Remaining Estimate: 168h 1、Start hbase cluster; 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster; 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.
[ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jian zhang updated HBASE-3906: -- Attachment: HBASE-3906.patch When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME. -- Key: HBASE-3906 URL: https://issues.apache.org/jira/browse/HBASE-3906 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.2, 0.90.3 Environment: 1 hmaster,4 regionserver,about 100,000 regions. Reporter: jian zhang Fix For: 0.90.4 Attachments: HBASE-3906.patch Original Estimate: 168h Remaining Estimate: 168h 1、Start hbase cluster; 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster; 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3892) Table can't disable
[ https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-3892: -- Attachment: AssignmentManager_90.patch Table can't disable --- Key: HBASE-3892 URL: https://issues.apache.org/jira/browse/HBASE-3892 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: gaojinchao Fix For: 0.90.4 Attachments: AssignmentManager_90.patch, Hmaster_0.90.patch In TimeoutMonitor : if node exists and node state is RS_ZK_REGION_CLOSED We should send a zk message again when close region is timeout. in this case, It may be loss some message. I See. It seems like a bug. This is my analysis. // disable table and master sent Close message to region server, Region state was set PENDING_CLOSE 2011-05-08 17:44:25,745 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175) for region ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 // received splitting message and cleared Region state (PENDING_CLOSE) 2011-05-08 17:46:45,303 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175) 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT:
[jira] [Updated] (HBASE-3892) Table can't disable
[ https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-3892: -- Attachment: (was: Hmaster_0.90.patch) Table can't disable --- Key: HBASE-3892 URL: https://issues.apache.org/jira/browse/HBASE-3892 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: gaojinchao Fix For: 0.90.4 Attachments: AssignmentManager_90.patch In TimeoutMonitor : if node exists and node state is RS_ZK_REGION_CLOSED We should send a zk message again when close region is timeout. in this case, It may be loss some message. I See. It seems like a bug. This is my analysis. // disable table and master sent Close message to region server, Region state was set PENDING_CLOSE 2011-05-08 17:44:25,745 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175) for region ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 // received splitting message and cleared Region state (PENDING_CLOSE) 2011-05-08 17:46:45,303 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175) 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
[jira] [Commented] (HBASE-3892) Table can't disable
[ https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036724#comment-13036724 ] gaojinchao commented on HBASE-3892: --- I am not familiar with zk api and have learned it now. I make a patch again. I want to use api setData(ZooKeeperWatcher zkw, String znode, byte [] data). It seems dangerous for parallel operation. I want to verify more carefully in next week. Table can't disable --- Key: HBASE-3892 URL: https://issues.apache.org/jira/browse/HBASE-3892 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: gaojinchao Fix For: 0.90.4 Attachments: AssignmentManager_90.patch In TimeoutMonitor : if node exists and node state is RS_ZK_REGION_CLOSED We should send a zk message again when close region is timeout. in this case, It may be loss some message. I See. It seems like a bug. This is my analysis. // disable table and master sent Close message to region server, Region state was set PENDING_CLOSE 2011-05-08 17:44:25,745 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175) for region ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 // received splitting message and cleared Region state (PENDING_CLOSE) 2011-05-08 17:46:45,303 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175) 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62., ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66. from C4C4.site,60020,1304820199467 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.: Daughters; ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
[jira] [Commented] (HBASE-3903) A successful write to client write-buffer may be lost or not visible
[ https://issues.apache.org/jira/browse/HBASE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036749#comment-13036749 ] Tallat commented on HBASE-3903: --- +1 on the patch, but I would suggest a couple of other things: 1) We can mention the same thing in section 10.1.2. WriteBuffer and Batch Methods for clarity, in a href=book.html#clientclient architecture/a. 2) IMHO, the documentation at http://hbase.apache.org/acid-semantics.html has some weak points that need clarification, for example: (a) Visibility: quote When a client receives a success response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels./quote Here, what is a side channel exactly? (b) Durability: quote All reasonable failure scenarios will not affect any of the guarantees of this document./quote Here, what is a reasonable failure scenario? Thanks. A successful write to client write-buffer may be lost or not visible Key: HBASE-3903 URL: https://issues.apache.org/jira/browse/HBASE-3903 Project: HBase Issue Type: Bug Components: documentation Environment: Any. Reporter: Tallat Assignee: Doug Meil Priority: Minor Labels: documentation Attachments: acid-semantics_HBASE_3903.xml.patch A client can do a write to a client side 'write buffer' if enabled via hTable.setAutoFlush(false). Now, assume a client puts value v under key k. Two wrongs things can happen, violating the ACID semantics of Hbase given at: http://hbase.apache.org/acid-semantics.html 1) Say the client fails immediately after the put succeeds. In this case, the put will be lost, violating the durability property: quote Any operation that returns a success code (eg does not throw an exception) will be made durable. /quote 2) Say the client issues a read for k immediately after writing k. The put will be stored in the client side write buffer, while the read will go to the region server, returning an older value, instead of v, violating the visibility property: quote When a client receives a success response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels. /quote Thanks, Tallat -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3903) A successful write to client write-buffer may be lost or not visible
[ https://issues.apache.org/jira/browse/HBASE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036826#comment-13036826 ] Doug Meil commented on HBASE-3903: -- I'll add a reference to acid-semantics in the client writebuffer section. I think the other questions should be split off in a different ticket. A successful write to client write-buffer may be lost or not visible Key: HBASE-3903 URL: https://issues.apache.org/jira/browse/HBASE-3903 Project: HBase Issue Type: Bug Components: documentation Environment: Any. Reporter: Tallat Assignee: Doug Meil Priority: Minor Labels: documentation Attachments: acid-semantics_HBASE_3903.xml.patch A client can do a write to a client side 'write buffer' if enabled via hTable.setAutoFlush(false). Now, assume a client puts value v under key k. Two wrongs things can happen, violating the ACID semantics of Hbase given at: http://hbase.apache.org/acid-semantics.html 1) Say the client fails immediately after the put succeeds. In this case, the put will be lost, violating the durability property: quote Any operation that returns a success code (eg does not throw an exception) will be made durable. /quote 2) Say the client issues a read for k immediately after writing k. The put will be stored in the client side write buffer, while the read will go to the region server, returning an older value, instead of v, violating the visibility property: quote When a client receives a success response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels. /quote Thanks, Tallat -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3908) TableSplit not implementing hashCode problem
TableSplit not implementing hashCode problem -- Key: HBASE-3908 URL: https://issues.apache.org/jira/browse/HBASE-3908 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.1 Reporter: Daniel Iancu reported by Lucian Iordache on hbase-user mail list. will attach the patch asap --- Hi guys, I've just found a problem with the class TableSplit. It implements equals, but it does not implement hashCode also, as it should have. I've discovered it by trying to use a HashSet of TableSplit's, and I've noticed that some duplicate splits are added to the set. The only option I have for now is to extend TableSplit and to use the subclass. I use cloudera hbase cdh3u0 version. Do you know about this problem? Should I open a Jira issue for that, or it already exists? Thanks, Lucian -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3883) book.xml / added something in schema design and FAQ about not being able to change rowkeys
[ https://issues.apache.org/jira/browse/HBASE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3883: - Resolution: Fixed Fix Version/s: 0.92.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to TRUNK. Thanks for the patch Doug.\\ book.xml / added something in schema design and FAQ about not being able to change rowkeys -- Key: HBASE-3883 URL: https://issues.apache.org/jira/browse/HBASE-3883 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Fix For: 0.92.0 Attachments: book_HBASE_3883.xml.patch This question has come up enough times in the dist-list to warrant inclusion in the book. Added small entry in schema design and in FAQ (referencing schema design). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.
[ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036948#comment-13036948 ] stack commented on HBASE-3904: -- @Ted What issue are you trying to fix? Thanks. HConnection.isTableAvailable returns true even with not all regions available. -- Key: HBASE-3904 URL: https://issues.apache.org/jira/browse/HBASE-3904 Project: HBase Issue Type: Bug Components: client Reporter: Vidhyashankar Venkataraman Priority: Minor This function as per the java doc is supposed to return true iff all the regions in the table are available. But if the table is still being created this function may return inconsistent results (For example, when a table with a large number of split keys is created). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.
[ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036947#comment-13036947 ] stack commented on HBASE-3904: -- bq. From what I read in the isTableAvailable function, the Metascanvisitor ensures that if there is at least one region not assigned, then the function will return false. That and at least one region must be assigned (where 'assigned' is a non-null server column which is a far from definitive test of assignedness). bq. This isn't enough since the createTable function in master assigns one region after another. (Refer to HMAster.createTable(final HRegionInfo [] newRegions, boolean sync)) Yes, it adds regions one at at time to .META. but then uses the bulk assign engine (this was a recent addition by Ted -- do you have this?) bq. Hence there might be a case when all regions are indeed fully assigned in META but it is just that the master is yet to populate META with the rest of the regions. Is this so? We add the regions to .META. before we assign. On add to .META. they will have an empty server field so isTableAssigned should be returning false. I wonder if this check inside in HBaseAdmin#isTableAssigned is 'off': {code} if (value == null) { available.set(false); return false; } {code} Maybe the value is 'empty', zero-length byte array. We should check for that? Perhaps this is why you got ...inconsistent responses from isTableAvailable. bq. Therefor for isTableAvailable to work correctly with createTable(splitkeys), the master will have to populate all the regions in meta first before assigning them. Unless I'm reading it wrong, this is what it *is* doing. Something else is up (maybe the above check?). HConnection.isTableAvailable returns true even with not all regions available. -- Key: HBASE-3904 URL: https://issues.apache.org/jira/browse/HBASE-3904 Project: HBase Issue Type: Bug Components: client Reporter: Vidhyashankar Venkataraman Priority: Minor This function as per the java doc is supposed to return true iff all the regions in the table are available. But if the table is still being created this function may return inconsistent results (For example, when a table with a large number of split keys is created). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.
[ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036959#comment-13036959 ] Ted Yu commented on HBASE-3904: --- My proposal is based on the observation that Vidhyashankar (and other users) used a loop to check for table availability. This is equivalent to calling the newly introduced createTableSync() method where there is no need to write the loop above. bq. Hence there might be a case when all regions are indeed fully assigned in META but it is just that the master is yet to populate META with the rest of the regions. What Vidhyashankar meant was that the existing entries for the table in .META. carried server information, but there were more regions to be assigned by Master which weren't in .META. yet. HConnection.isTableAvailable returns true even with not all regions available. -- Key: HBASE-3904 URL: https://issues.apache.org/jira/browse/HBASE-3904 Project: HBase Issue Type: Bug Components: client Reporter: Vidhyashankar Venkataraman Priority: Minor This function as per the java doc is supposed to return true iff all the regions in the table are available. But if the table is still being created this function may return inconsistent results (For example, when a table with a large number of split keys is created). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.
[ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036968#comment-13036968 ] Ted Yu commented on HBASE-3904: --- Looking at MetaEditor.addRegionToMeta() which is called by HMaster.createTable(): {code} public static void addRegionToMeta(CatalogTracker catalogTracker, HRegionInfo regionInfo) throws IOException { Put put = new Put(regionInfo.getRegionName()); put.add(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER, Writables.getBytes(regionInfo)); {code} Server info was initially omitted. HConnection.isTableAvailable returns true even with not all regions available. -- Key: HBASE-3904 URL: https://issues.apache.org/jira/browse/HBASE-3904 Project: HBase Issue Type: Bug Components: client Reporter: Vidhyashankar Venkataraman Priority: Minor This function as per the java doc is supposed to return true iff all the regions in the table are available. But if the table is still being created this function may return inconsistent results (For example, when a table with a large number of split keys is created). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2938) Add Thread-Local Behavior To HTable Pool
[ https://issues.apache.org/jira/browse/HBASE-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036999#comment-13036999 ] stack commented on HBASE-2938: -- @Karthick Does TestMasterObserver fail for you? It fails w/ your patch in place. Can you take a look? Otherwise all tests pass (Except currently distributed splitting but thats not your patch). Add Thread-Local Behavior To HTable Pool Key: HBASE-2938 URL: https://issues.apache.org/jira/browse/HBASE-2938 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.89.20100621 Reporter: Karthick Sankarachary Attachments: HBASE-2938-V2.patch, HBASE-2938.patch It is a well-documented fact that the HBase table client (viz., HTable) is not thread-safe. Hence, the recommendation has been to use a HTablePool or a ThreadLocal to manage access to tables. The downside of the latter is that it (a) requires the user to reinvent the wheel in terms of mapping table names to tables and (b) forces the user to maintain the thread-local objects. Ideally, it would be nice if we could make the HTablePool handle thread-local objects as well. That way, it not only becomes the one stop shop for all client-side tables, but also insulates the user from the ThreadLocal object. Here, we propose a way to generalize the HTablePool so that the underlying pool type is either reusable or thread-local. To make this possible, we introdudce the concept of a SharedMap, which essentially, maps a key to a collection of values, the elements of which are managed by a pool. In effect, that collection acts as a shared pool of resources, access to which is closely controlled as dictated by the particular semantics of the pool. Furthermore, to simplify the construction of HTablePools, we added a couple of parameters (viz. hbase.client.htable.pool.type and hbase.client.hbase.pool.size) to control the default behavior of a HTablePool. In case the size of the pool is set to a non-zero positive number, that is used to cap the number of resources that a pool may contain for any given key. A size of Integer#MAX_VALUE is interpreted to mean an unbounded pool. Currently, the SharedMap supports the following types of pools: * A ThreadLocalPool, which represents a pool that builds on the ThreadLocal class. It essentially binds the resource to the thread from which it is accessed. * A ReusablePool, which represents a pool that builds on the LinkedList class. It essentially allows resources to be checked out, at which point it is (temporarily) removed from the pool. When the resource is no longer required, it should be returned to the pool in order to be reused. * A RoundRobinPool, which represents a pool that stores its resources in an ArrayList. It load-balances access to its resources by returning a different resource every time a given key is looked up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.
[ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037000#comment-13037000 ] Ted Yu commented on HBASE-3906: --- The patch wouldn't apply to trunk where heart beat has been removed. When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME. -- Key: HBASE-3906 URL: https://issues.apache.org/jira/browse/HBASE-3906 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.2, 0.90.3 Environment: 1 hmaster,4 regionserver,about 100,000 regions. Reporter: jian zhang Fix For: 0.90.4 Attachments: HBASE-3906.patch Original Estimate: 168h Remaining Estimate: 168h 1、Start hbase cluster; 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster; 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.
[ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037011#comment-13037011 ] stack commented on HBASE-3906: -- @Ted I think the patch is for branch only. It has the problem. I don't believe TRUNK does. @Jian This should work though its ugly; i.e. refreshing an HServerInfo instance (Do we need to keep load in the Map of regions? What about clearing the load from the HSI we add to the Map of regions to HSI? Would that work? Or is this Map used balancing?). Does your patch work for you? Any issues w/ the new synchronize blocks? When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME. -- Key: HBASE-3906 URL: https://issues.apache.org/jira/browse/HBASE-3906 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.2, 0.90.3 Environment: 1 hmaster,4 regionserver,about 100,000 regions. Reporter: jian zhang Fix For: 0.90.4 Attachments: HBASE-3906.patch Original Estimate: 168h Remaining Estimate: 168h 1、Start hbase cluster; 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster; 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2938) Add Thread-Local Behavior To HTable Pool
[ https://issues.apache.org/jira/browse/HBASE-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037022#comment-13037022 ] Karthick Sankarachary commented on HBASE-2938: -- Yes, that test does pass for me (this is after rebasing): {code} Running org.apache.hadoop.hbase.coprocessor.TestMasterObserver Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.47 sec {code} Can you attach your target/surefire-reports/*TestMasterObserver*.txt files? Add Thread-Local Behavior To HTable Pool Key: HBASE-2938 URL: https://issues.apache.org/jira/browse/HBASE-2938 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.89.20100621 Reporter: Karthick Sankarachary Attachments: HBASE-2938-V2.patch, HBASE-2938.patch It is a well-documented fact that the HBase table client (viz., HTable) is not thread-safe. Hence, the recommendation has been to use a HTablePool or a ThreadLocal to manage access to tables. The downside of the latter is that it (a) requires the user to reinvent the wheel in terms of mapping table names to tables and (b) forces the user to maintain the thread-local objects. Ideally, it would be nice if we could make the HTablePool handle thread-local objects as well. That way, it not only becomes the one stop shop for all client-side tables, but also insulates the user from the ThreadLocal object. Here, we propose a way to generalize the HTablePool so that the underlying pool type is either reusable or thread-local. To make this possible, we introdudce the concept of a SharedMap, which essentially, maps a key to a collection of values, the elements of which are managed by a pool. In effect, that collection acts as a shared pool of resources, access to which is closely controlled as dictated by the particular semantics of the pool. Furthermore, to simplify the construction of HTablePools, we added a couple of parameters (viz. hbase.client.htable.pool.type and hbase.client.hbase.pool.size) to control the default behavior of a HTablePool. In case the size of the pool is set to a non-zero positive number, that is used to cap the number of resources that a pool may contain for any given key. A size of Integer#MAX_VALUE is interpreted to mean an unbounded pool. Currently, the SharedMap supports the following types of pools: * A ThreadLocalPool, which represents a pool that builds on the ThreadLocal class. It essentially binds the resource to the thread from which it is accessed. * A ReusablePool, which represents a pool that builds on the LinkedList class. It essentially allows resources to be checked out, at which point it is (temporarily) removed from the pool. When the resource is no longer required, it should be returned to the pool in order to be reused. * A RoundRobinPool, which represents a pool that stores its resources in an ArrayList. It load-balances access to its resources by returning a different resource every time a given key is looked up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack moved HADOOP-7315 to HBASE-3909: -- Issue Type: Bug (was: Improvement) Key: HBASE-3909 (was: HADOOP-7315) Project: HBase (was: Hadoop Common) Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3910) acid-semantics.html - clarify some of the concepts
acid-semantics.html - clarify some of the concepts -- Key: HBASE-3910 URL: https://issues.apache.org/jira/browse/HBASE-3910 Project: HBase Issue Type: Bug Components: documentation Environment: Any. Reporter: Doug Meil Assignee: Doug Meil Priority: Minor A client can do a write to a client side 'write buffer' if enabled via hTable.setAutoFlush(false). Now, assume a client puts value v under key k. Two wrongs things can happen, violating the ACID semantics of Hbase given at: http://hbase.apache.org/acid-semantics.html 1) Say the client fails immediately after the put succeeds. In this case, the put will be lost, violating the durability property: quote Any operation that returns a success code (eg does not throw an exception) will be made durable. /quote 2) Say the client issues a read for k immediately after writing k. The put will be stored in the client side write buffer, while the read will go to the region server, returning an older value, instead of v, violating the visibility property: quote When a client receives a success response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels. /quote Thanks, Tallat -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3903) A successful write to client write-buffer may be lost or not visible
[ https://issues.apache.org/jira/browse/HBASE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3903: - Attachment: book_HBASE_3903.xml.patch A successful write to client write-buffer may be lost or not visible Key: HBASE-3903 URL: https://issues.apache.org/jira/browse/HBASE-3903 Project: HBase Issue Type: Bug Components: documentation Environment: Any. Reporter: Tallat Assignee: Doug Meil Priority: Minor Labels: documentation Attachments: acid-semantics_HBASE_3903.xml.patch, book_HBASE_3903.xml.patch A client can do a write to a client side 'write buffer' if enabled via hTable.setAutoFlush(false). Now, assume a client puts value v under key k. Two wrongs things can happen, violating the ACID semantics of Hbase given at: http://hbase.apache.org/acid-semantics.html 1) Say the client fails immediately after the put succeeds. In this case, the put will be lost, violating the durability property: quote Any operation that returns a success code (eg does not throw an exception) will be made durable. /quote 2) Say the client issues a read for k immediately after writing k. The put will be stored in the client side write buffer, while the read will go to the region server, returning an older value, instead of v, violating the visibility property: quote When a client receives a success response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels. /quote Thanks, Tallat -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3910) acid-semantics.html - clarify some of the concepts
[ https://issues.apache.org/jira/browse/HBASE-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3910: - Description: Inspired from HBASE-3903 regarding the acid-semantics page. What's a side-channel? quote When a client receives a success response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels. /quote Thanks, Tallat was: A client can do a write to a client side 'write buffer' if enabled via hTable.setAutoFlush(false). Now, assume a client puts value v under key k. Two wrongs things can happen, violating the ACID semantics of Hbase given at: http://hbase.apache.org/acid-semantics.html 1) Say the client fails immediately after the put succeeds. In this case, the put will be lost, violating the durability property: quote Any operation that returns a success code (eg does not throw an exception) will be made durable. /quote 2) Say the client issues a read for k immediately after writing k. The put will be stored in the client side write buffer, while the read will go to the region server, returning an older value, instead of v, violating the visibility property: quote When a client receives a success response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels. /quote Thanks, Tallat Issue Type: Improvement (was: Bug) acid-semantics.html - clarify some of the concepts -- Key: HBASE-3910 URL: https://issues.apache.org/jira/browse/HBASE-3910 Project: HBase Issue Type: Improvement Components: documentation Environment: Any. Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Labels: documentation Inspired from HBASE-3903 regarding the acid-semantics page. What's a side-channel? quote When a client receives a success response for any mutation, that mutation is immediately visible to both that client and any client with whom it later communicates through side channels. /quote Thanks, Tallat -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client
[ https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037079#comment-13037079 ] jirapos...@reviews.apache.org commented on HBASE-2937: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/755/ --- (Updated 2011-05-20 20:49:57.345063) Review request for hbase. Changes --- Retry {{ServerCallable#call}} in the case of non-{{SocketTimeoutException}}s, but only if we spent less time than the operation timeout. Summary --- Thanks to HBASE-3154, users now have the ability to specify a timeout for client-side RPC calls. However, it doesn't go far enough in terms of how low that timeout can go. Set the RPC timeout to too low a value and you run the risk of timing out on calls to the meta tables, which are preconditions to calling the {{HRegionInterface}} proxy. Given that, I believe the motivation at work in HBASE-2937 still hold true. In this patch, I add a operation-level timeout, configurable through hbase.client.operation.timeout, which will override the value specified by hbase.rpc.timeout, if any, within the scope of the {{ServerCallable#call}} method. In other words, the operation-level timeout does not apply to calls to the meta tables. Furthermore, the patch treats an RPC timeout as a non-fatal event, in that it will not cause the {{HBaseClient#Connection}} instance to be closed. Last but not the least, users will also have the ability to set the operation timeout on the {{HTable}} on the fly. This addresses bug HBASE-2937. https://issues.apache.org/jira/browse/HBASE-2937 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/HConstants.java e9e3694 src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java b26f41e src/main/java/org/apache/hadoop/hbase/client/HTable.java 61e151a src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 6f22123 src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 470e741 src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java dbb57d9 src/main/java/org/apache/hadoop/hbase/util/PoolMap.java 354d49a Diff: https://reviews.apache.org/r/755/diff Testing --- mvn test Thanks, Karthick Facilitate Timeouts In HBase Client --- Key: HBASE-2937 URL: https://issues.apache.org/jira/browse/HBASE-2937 Project: HBase Issue Type: New Feature Components: client Affects Versions: 0.89.20100621 Reporter: Karthick Sankarachary Assignee: Karthick Sankarachary Priority: Critical Fix For: 0.92.0 Attachments: HBASE-2937.patch, HBASE-2937.patch Currently, there is no way to force an operation on the HBase client (viz. HTable) to time out if a certain amount of time has elapsed. In other words, all invocations on the HTable class are veritable blocking calls, which will not return until a response (successful or otherwise) is received. In general, there are two ways to handle timeouts: (a) call the operation in a separate thread, until it returns a response or the wait on the thread times out and (b) have the underlying socket unblock the operation if the read times out. The downside of the former approach is that it consumes more resources in terms of threads and callables. Here, we describe a way to specify and handle timeouts on the HTable client, which relies on the latter approach (i.e., socket timeouts). Right now, the HBaseClient sets the socket timeout to the value of the ipc.ping.interval parameter, which is also how long it waits before pinging the server in case of a failure. The goal is to allow clients to set that timeout on the fly through HTable. Rather than adding an optional timeout argument to every HTable operation, we chose to make it a property of HTable which effectively applies to every method that involves a remote operation. In order to propagate the timeout from HTable to HBaseClient, we replaced all occurrences of ServerCallable in HTable with an extension called ClientCallable, which sets the timeout on the region server interface, once it has been instantiated, through the HConnection object. The latter, in turn, asks HBaseRPC to pass that timeout to the corresponding Invoker, so that it may inject the timeout at the time the invocation is made on the region server proxy. Right before the request is sent to the server, we set the timeout specified by the client on the underlying socket. In conclusion, this patch will afford clients the option of performing an HBase operation until it completes or a specified timeout elapses. Note that a timeout of zero
[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client
[ https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037078#comment-13037078 ] jirapos...@reviews.apache.org commented on HBASE-2937: -- bq. On 2011-05-19 06:11:23, Michael Stack wrote: bq. This seems like a bunch of functionality for a relatively small change. Nice one Karthick. A few questions in the below. bq. bq. Karthick Sankarachary wrote: bq. Yes, it does seem like a big change for a relatively small feature, but an important one nevertheless. The complexity stems from the fact the scope of the operation timeout has to be limited to the {{ServerCallable#call}} method. bq. bq. By way of motivation, if you run the TestFromClientSide test with the following patch (which sets the hbase.rpc.timeout to 10ms), you'll see that 39 out of the 44 test cases will fail. bq. bq. 24 --- a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 25 +++ b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 26 @@ -94,6 +94,8 @@ public class TestFromClientSide { bq. 27@BeforeClass bq. 28public static void setUpBeforeClass() throws Exception { bq. 29 TEST_UTIL.startMiniCluster(3); bq. 30 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 10); bq. 32} bq. bq. On the other hand, if you run it with the default hbase.rpc.timeout but a hbase.client.operation.timeout set to 10ms, then you should see the test pass. bq. bq. 24 --- a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 25 +++ b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 26 @@ -94,6 +94,8 @@ public class TestFromClientSide { bq. 27@BeforeClass bq. 28public static void setUpBeforeClass() throws Exception { bq. 29 TEST_UTIL.startMiniCluster(3); bq. 30 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 6); bq. 31 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, 10); bq. 32} bq. bq. bq. Michael Stack wrote: bq. Actually I was saying the opposite. I'm surprised at how little code had to change to make this fix. bq. bq. So, I don't recall if there is good documentation in this patch on the difference between hbase.rpc.timeout and hbase.client.operation.timeout? If not, we need it. bq. bq. Does the TestFromClientSide complete in shorter time if I set a hbase.client.operation.timeout of 10ms? There's comments in {{HConstants}} for both of those configuration properties. Is there another place where we should document them? The test completes in more or less the same time, regardless of whether or not the hbase.client.operation.timeout is set to 10ms. I guess that's because the test server is running locally, which is probably why the test cases don't timeout. bq. On 2011-05-19 06:11:23, Michael Stack wrote: bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 106 bq. https://reviews.apache.org/r/755/diff/1/?file=19383#file19383line106 bq. bq. Are there other exceptions you think we should rethrow? Connection Exception? bq. bq. Karthick Sankarachary wrote: bq. How about we do what HBaseClient does, which is wrap the SocketTimeoutException inside another one, along with a context-specific error message? bq. bq. Michael Stack wrote: bq. I was more wondering if there were exceptions we should treat like SocketTimeoutException? The other kinds of exceptions we might expect {{HBaseClient}} to throw include {{ConnectException}} and {{IOException}}. We could treat them similarly, but only if we have already spent more time than the operation timeout. If not, then we could retry the call, this time using a lower operation timeout. To take an example, if the operation timeout is 50ms, and a {{ConnectException}} occurs 10ms after the call, then we could retry the call with a 40ms operation timeout. What do you think? - Karthick --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/755/#review683 --- On 2011-05-20 20:49:57, Karthick Sankarachary wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/755/ bq. --- bq. bq. (Updated 2011-05-20 20:49:57) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Thanks to HBASE-3154,
[jira] [Updated] (HBASE-2077) NullPointerException with an open scanner that expired causing an immediate region server shutdown
[ https://issues.apache.org/jira/browse/HBASE-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-2077: - Attachment: 2077-suggestion.txt Here is a suggestion where we remove lease from leases while we are processing a request then on the way out in a finally we renew lease. NullPointerException with an open scanner that expired causing an immediate region server shutdown -- Key: HBASE-2077 URL: https://issues.apache.org/jira/browse/HBASE-2077 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.2, 0.20.3 Environment: Hadoop 0.20.0, Mac OS X, Java 6 Reporter: Sam Pullara Assignee: Sam Pullara Priority: Critical Fix For: 0.92.0 Attachments: 2077-suggestion.txt, HBASE-2077-3.patch, HBASE-2077-redux.patch, [Bug_HBASE-2077]_Fixes_a_very_rare_race_condition_between_lease_expiration_and_renewal.patch Original Estimate: 1h Remaining Estimate: 1h 2009-12-29 18:05:55,432 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4250070597157694417 lease expired 2009-12-29 18:05:55,443 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2009-12-29 18:05:55,446 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 55260, call next(-4250070597157694417, 1) from 192.168.1.90:54011: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1965) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944) ... 5 more 2009-12-29 18:05:55,447 WARN org.apache.hadoop.ipc.HBaseServer: IPC
[jira] [Commented] (HBASE-3894) Thread contention over row locks set monitor
[ https://issues.apache.org/jira/browse/HBASE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037086#comment-13037086 ] Jean-Daniel Cryans commented on HBASE-3894: --- I gave the latest patch a spin on my laptop using two PE randomWrite 1 (to generate lock contention) and my CPU profiling doesn't see any slowness related to the locking and the memory profiling shows that ~10k CountDownLatch only accounts for ~300KB. Also since they are short lived they get cleared up almost right away. I would be +1 on committing if Dave tried it out on his cluster. Thread contention over row locks set monitor Key: HBASE-3894 URL: https://issues.apache.org/jira/browse/HBASE-3894 Project: HBase Issue Type: Bug Affects Versions: 0.90.2 Reporter: Dave Latham Priority: Blocker Fix For: 0.90.4 Attachments: concurrentRowLocks-2.patch, concurrentRowLocks-trunk.patch, regionserver_rowLock_set_contention.threads.txt HRegion maintains a set of row locks. Whenever any thread attempts to lock or release a row it needs to acquire the monitor on that set. We've been encountering cases with 30 handler threads all contending for that monitor, blocked progress on the region server. Clients timeout, and retry making it worse, and the region server stops responding to new clients almost entirely. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client
[ https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037097#comment-13037097 ] jirapos...@reviews.apache.org commented on HBASE-2937: -- bq. On 2011-05-19 06:11:23, Michael Stack wrote: bq. This seems like a bunch of functionality for a relatively small change. Nice one Karthick. A few questions in the below. bq. bq. Karthick Sankarachary wrote: bq. Yes, it does seem like a big change for a relatively small feature, but an important one nevertheless. The complexity stems from the fact the scope of the operation timeout has to be limited to the {{ServerCallable#call}} method. bq. bq. By way of motivation, if you run the TestFromClientSide test with the following patch (which sets the hbase.rpc.timeout to 10ms), you'll see that 39 out of the 44 test cases will fail. bq. bq. 24 --- a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 25 +++ b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 26 @@ -94,6 +94,8 @@ public class TestFromClientSide { bq. 27@BeforeClass bq. 28public static void setUpBeforeClass() throws Exception { bq. 29 TEST_UTIL.startMiniCluster(3); bq. 30 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 10); bq. 32} bq. bq. On the other hand, if you run it with the default hbase.rpc.timeout but a hbase.client.operation.timeout set to 10ms, then you should see the test pass. bq. bq. 24 --- a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 25 +++ b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 26 @@ -94,6 +94,8 @@ public class TestFromClientSide { bq. 27@BeforeClass bq. 28public static void setUpBeforeClass() throws Exception { bq. 29 TEST_UTIL.startMiniCluster(3); bq. 30 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 6); bq. 31 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, 10); bq. 32} bq. bq. bq. Michael Stack wrote: bq. Actually I was saying the opposite. I'm surprised at how little code had to change to make this fix. bq. bq. So, I don't recall if there is good documentation in this patch on the difference between hbase.rpc.timeout and hbase.client.operation.timeout? If not, we need it. bq. bq. Does the TestFromClientSide complete in shorter time if I set a hbase.client.operation.timeout of 10ms? bq. bq. Karthick Sankarachary wrote: bq. There's comments in {{HConstants}} for both of those configuration properties. Is there another place where we should document them? bq. bq. The test completes in more or less the same time, regardless of whether or not the hbase.client.operation.timeout is set to 10ms. I guess that's because the test server is running locally, which is probably why the test cases don't timeout. So, high-level, IIUC, this patch will allow setting shorter operation timeouts. You'll have to do it by setting hbase.client.operation.timeout in the Configuration the HTable uses. Is that right? I see the default is MAX_INT for hbase.client.operation.timeout. Does that mean the hbase.rpc.timeout prevails? If hbase.client.operation.timeout timeouts we retry? Is that right, the configured amount of times? Sort-of-related, shorter timeouts make it more critical that we do a better job server-side keeping account of when an operation arrives and making sure it does not go through if by the time it comes out of the RPC queue, so much time has elapsed, the client has gone away (We don't want operations completing on the server if no client to reply to). bq. On 2011-05-19 06:11:23, Michael Stack wrote: bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 106 bq. https://reviews.apache.org/r/755/diff/1/?file=19383#file19383line106 bq. bq. Are there other exceptions you think we should rethrow? Connection Exception? bq. bq. Karthick Sankarachary wrote: bq. How about we do what HBaseClient does, which is wrap the SocketTimeoutException inside another one, along with a context-specific error message? bq. bq. Michael Stack wrote: bq. I was more wondering if there were exceptions we should treat like SocketTimeoutException? bq. bq. Karthick Sankarachary wrote: bq. The other kinds of exceptions we might expect {{HBaseClient}} to throw include {{ConnectException}} and {{IOException}}. We could treat them similarly, but only if we have already spent more time than the operation timeout. If not, then we could retry the call, this time using a lower operation
[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client
[ https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037108#comment-13037108 ] jirapos...@reviews.apache.org commented on HBASE-2937: -- bq. On 2011-05-19 06:11:23, Michael Stack wrote: bq. This seems like a bunch of functionality for a relatively small change. Nice one Karthick. A few questions in the below. bq. bq. Karthick Sankarachary wrote: bq. Yes, it does seem like a big change for a relatively small feature, but an important one nevertheless. The complexity stems from the fact the scope of the operation timeout has to be limited to the {{ServerCallable#call}} method. bq. bq. By way of motivation, if you run the TestFromClientSide test with the following patch (which sets the hbase.rpc.timeout to 10ms), you'll see that 39 out of the 44 test cases will fail. bq. bq. 24 --- a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 25 +++ b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 26 @@ -94,6 +94,8 @@ public class TestFromClientSide { bq. 27@BeforeClass bq. 28public static void setUpBeforeClass() throws Exception { bq. 29 TEST_UTIL.startMiniCluster(3); bq. 30 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 10); bq. 32} bq. bq. On the other hand, if you run it with the default hbase.rpc.timeout but a hbase.client.operation.timeout set to 10ms, then you should see the test pass. bq. bq. 24 --- a/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 25 +++ b/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java bq. 26 @@ -94,6 +94,8 @@ public class TestFromClientSide { bq. 27@BeforeClass bq. 28public static void setUpBeforeClass() throws Exception { bq. 29 TEST_UTIL.startMiniCluster(3); bq. 30 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 6); bq. 31 + TEST_UTIL.getConfiguration().setInt(HConstants.HBASE_CLIENT_OPERATION_TIMEOUT, 10); bq. 32} bq. bq. bq. Michael Stack wrote: bq. Actually I was saying the opposite. I'm surprised at how little code had to change to make this fix. bq. bq. So, I don't recall if there is good documentation in this patch on the difference between hbase.rpc.timeout and hbase.client.operation.timeout? If not, we need it. bq. bq. Does the TestFromClientSide complete in shorter time if I set a hbase.client.operation.timeout of 10ms? bq. bq. Karthick Sankarachary wrote: bq. There's comments in {{HConstants}} for both of those configuration properties. Is there another place where we should document them? bq. bq. The test completes in more or less the same time, regardless of whether or not the hbase.client.operation.timeout is set to 10ms. I guess that's because the test server is running locally, which is probably why the test cases don't timeout. bq. bq. Michael Stack wrote: bq. So, high-level, IIUC, this patch will allow setting shorter operation timeouts. You'll have to do it by setting hbase.client.operation.timeout in the Configuration the HTable uses. Is that right? I see the default is MAX_INT for hbase.client.operation.timeout. Does that mean the hbase.rpc.timeout prevails? If hbase.client.operation.timeout timeouts we retry? Is that right, the configured amount of times? bq. bq. Sort-of-related, shorter timeouts make it more critical that we do a better job server-side keeping account of when an operation arrives and making sure it does not go through if by the time it comes out of the RPC queue, so much time has elapsed, the client has gone away (We don't want operations completing on the server if no client to reply to). bq. So, high-level, IIUC, this patch will allow setting shorter operation timeouts. You'll have to do it by setting hbase.client.operation.timeout in the Configuration the HTable uses. Is that right? I see the default is MAX_INT for hbase.client.operation.timeout. Does that mean the hbase.rpc.timeout prevails? Yes, to both of the above questions. bq. If hbase.client.operation.timeout timeouts we retry? Is that right, the configured amount of times? Actually, no we don't retry, as that would kind of defeat the purpose of the operation timeout, in my opinion. Note that if we were to retry we would have to pause (for at least 1000 ms by default). If the client does not have the luxury of spending say 10ms on a {{HTable}} operation, then it will probably not want to pause either, which rules out retries. bq. Sort-of-related, shorter timeouts make it more critical that we do a better job
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037141#comment-13037141 ] Ted Yu commented on HBASE-3909: --- I went over HADOOP-7001.5.patch We have the following decision to make: 1. HADOOP-7001 is in trunk only. Are we going to pull the interface/base class/util class over to hbase ? 2. ReconfigurationServlet would be convenient for admin to use. Are we going to support reloading conf from hbase shell ? 3. HADOOP-7001 provides fine-grained property reconfig through reconfigurePropertyImpl() calls. Shall we also provide coarse-grained property reconfig mechanism ? e.g. we can notify AssignmentManager of the properties it uses whose values have just changed. This mechanism is also related to getReconfigurableProperties(). I think HMaster, AssignmentManager, etc would all extend ReconfigurableBase. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.
[ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037150#comment-13037150 ] Ted Yu commented on HBASE-3904: --- I have run tests related to table creation and availability checking. Namely this code in LoadIncrementalHFiles: {code} while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MAX_RETRIES)) { {code} TestHFileOutputFormat, TestLoadIncrementalHFiles and TestAdmin. Please outline what more test(s) should be devised. HConnection.isTableAvailable returns true even with not all regions available. -- Key: HBASE-3904 URL: https://issues.apache.org/jira/browse/HBASE-3904 Project: HBase Issue Type: Bug Components: client Reporter: Vidhyashankar Venkataraman Priority: Minor Attachments: 3904.txt This function as per the java doc is supposed to return true iff all the regions in the table are available. But if the table is still being created this function may return inconsistent results (For example, when a table with a large number of split keys is created). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.
[ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037163#comment-13037163 ] Vidhyashankar Venkataraman commented on HBASE-3904: --- Ok, I tested your patch with the code attached below: And I get the following output: Caught Socket timeout.. Mostly caused by a slow region assignment by master! 11/05/20 23:26:00 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=b3110640.yst.yahoo.net:44481,b3110600.yst.yahoo.net:44481,b3110560.yst.yahoo.net:44481,b3110520.yst.yahoo.net:44481,b3110680.yst.yahoo.net:44481 sessionTimeout=18 watcher=hconnection 11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Opening socket connection to server b3110560.yst.yahoo.net/67.195.55.234:44481 11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Socket connection established to b3110560.yst.yahoo.net/67.195.55.234:44481, initiating session 11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Session establishment complete on server b3110560.yst.yahoo.net/67.195.55.234:44481, sessionid = 0x12ff6d3911179e8, negotiated timeout = 18 Table test-v6 not yet available... Sleeping for 5 more minutes... Expected #regions = 17933 Table is probably available!! : test-v6 Available? true Table test-v6 may not be available... Double checking: Sleeping for 5 minutes more... Table test-v6: Expected # Regions = 17933 Actual number = 4744 Table test-v6 may not be available... Double checking: Sleeping for 5 minutes more... And it is still trying to assign. 1. The good: Notice that tableAvailable got out of the loop because it was true and it also printed true in the following print message. This has never happened without the patch. 2. The doubtful part: isTableAvailable still doesn't return back when all regions are online as we see in the subsequent output. Can you let me know what your patch intended to do? Thank you Vidhya THE CODE: try { hbAdmin.createTableAsync(htd, keysArray.toArray(new byte[0][0])); } catch (java.net.SocketTimeoutException e) { System.err.println(Caught Socket timeout.. + Mostly caused by a slow region assignment by master!); } HTable table = new HTable(tableName); HConnection conn = table.getConnection(); do { System.out.println(Table + tableName + not yet available... + Sleeping for 5 more minutes... Expected #regions = + (keysArray.size()+1)); Thread.sleep(30); } while (!conn.isTableAvailable(table.getTableName())); System.err.println(Table is probably available!! : + tableName + Available? + conn.isTableAvailable(table.getTableName())); MapHRegionInfo, HServerAddress regionList = null; do { System.out.println(Table + tableName + may not be available... + Double checking: Sleeping for 5 minutes more...); Thread.sleep(30); regionList = table.getRegionsInfo(); System.out.println(Table + tableName + : Expected # Regions = + (keysArray.size()+1) + Actual number = + ((regionList!=null)?regionList.size():0) ); } while ((regionList==null) || (regionList.size()!=(keysArray.size()+1))); On 5/20/11 4:19 PM, Ted Yu (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037150#comment-13037150 ] Ted Yu commented on HBASE-3904: --- I have run tests related to table creation and availability checking. Namely this code in LoadIncrementalHFiles: {code} while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MAX_RETRIES)) { {code} TestHFileOutputFormat, TestLoadIncrementalHFiles and TestAdmin. Please outline what more test(s) should be devised. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira HConnection.isTableAvailable returns true even with not all regions available. -- Key: HBASE-3904 URL: https://issues.apache.org/jira/browse/HBASE-3904 Project: HBase Issue Type: Bug Components: client Reporter: Vidhyashankar Venkataraman Priority: Minor Attachments: 3904.txt This function as per the java doc is supposed to return true iff all the regions in the table are available. But if the table is still being created this function may return inconsistent results (For example, when a table with a large number of split keys is
[jira] [Updated] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.
[ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3904: - Comment: was deleted (was: Ok, I tested your patch with the code attached below: And I get the following output: Caught Socket timeout.. Mostly caused by a slow region assignment by master! 11/05/20 23:26:00 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=b3110640.yst.yahoo.net:44481,b3110600.yst.yahoo.net:44481,b3110560.yst.yahoo.net:44481,b3110520.yst.yahoo.net:44481,b3110680.yst.yahoo.net:44481 sessionTimeout=18 watcher=hconnection 11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Opening socket connection to server b3110560.yst.yahoo.net/67.195.55.234:44481 11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Socket connection established to b3110560.yst.yahoo.net/67.195.55.234:44481, initiating session 11/05/20 23:26:00 INFO zookeeper.ClientCnxn: Session establishment complete on server b3110560.yst.yahoo.net/67.195.55.234:44481, sessionid = 0x12ff6d3911179e8, negotiated timeout = 18 Table test-v6 not yet available... Sleeping for 5 more minutes... Expected #regions = 17933 Table is probably available!! : test-v6 Available? true Table test-v6 may not be available... Double checking: Sleeping for 5 minutes more... Table test-v6: Expected # Regions = 17933 Actual number = 4744 Table test-v6 may not be available... Double checking: Sleeping for 5 minutes more... And it is still trying to assign. 1. The good: Notice that tableAvailable got out of the loop because it was true and it also printed true in the following print message. This has never happened without the patch. 2. The doubtful part: isTableAvailable still doesn't return back when all regions are online as we see in the subsequent output. Can you let me know what your patch intended to do? Thank you Vidhya THE CODE: try { hbAdmin.createTableAsync(htd, keysArray.toArray(new byte[0][0])); } catch (java.net.SocketTimeoutException e) { System.err.println(Caught Socket timeout.. + Mostly caused by a slow region assignment by master!); } HTable table = new HTable(tableName); HConnection conn = table.getConnection(); do { System.out.println(Table + tableName + not yet available... + Sleeping for 5 more minutes... Expected #regions = + (keysArray.size()+1)); Thread.sleep(30); } while (!conn.isTableAvailable(table.getTableName())); System.err.println(Table is probably available!! : + tableName + Available? + conn.isTableAvailable(table.getTableName())); MapHRegionInfo, HServerAddress regionList = null; do { System.out.println(Table + tableName + may not be available... + Double checking: Sleeping for 5 minutes more...); Thread.sleep(30); regionList = table.getRegionsInfo(); System.out.println(Table + tableName + : Expected # Regions = + (keysArray.size()+1) + Actual number = + ((regionList!=null)?regionList.size():0) ); } while ((regionList==null) || (regionList.size()!=(keysArray.size()+1))); On 5/20/11 4:19 PM, Ted Yu (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037150#comment-13037150 ] Ted Yu commented on HBASE-3904: --- I have run tests related to table creation and availability checking. Namely this code in LoadIncrementalHFiles: {code} while (!conn.isTableAvailable(table.getTableName()) (ctrTABLE_CREATE_MAX_RETRIES)) { {code} TestHFileOutputFormat, TestLoadIncrementalHFiles and TestAdmin. Please outline what more test(s) should be devised. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira ) HConnection.isTableAvailable returns true even with not all regions available. -- Key: HBASE-3904 URL: https://issues.apache.org/jira/browse/HBASE-3904 Project: HBase Issue Type: Bug Components: client Reporter: Vidhyashankar Venkataraman Priority: Minor Attachments: 3904.txt This function as per the java doc is supposed to return true iff all the regions in the table are available. But if the table is still being created this function may return inconsistent results (For example, when a table with a large number of split keys is created). -- This message is automatically generated by JIRA. For
[jira] [Commented] (HBASE-3883) book.xml / added something in schema design and FAQ about not being able to change rowkeys
[ https://issues.apache.org/jira/browse/HBASE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037169#comment-13037169 ] Hudson commented on HBASE-3883: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) book.xml / added something in schema design and FAQ about not being able to change rowkeys -- Key: HBASE-3883 URL: https://issues.apache.org/jira/browse/HBASE-3883 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Fix For: 0.92.0 Attachments: book_HBASE_3883.xml.patch This question has come up enough times in the dist-list to warrant inclusion in the book. Added small entry in schema design and in FAQ (referencing schema design). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3826) Minor compaction needs to check if still over compactionThreshold after compacting
[ https://issues.apache.org/jira/browse/HBASE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037170#comment-13037170 ] Hudson commented on HBASE-3826: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) Minor compaction needs to check if still over compactionThreshold after compacting -- Key: HBASE-3826 URL: https://issues.apache.org/jira/browse/HBASE-3826 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.1 Environment: hbase-0.90.1 hbase-0.90.1-cdh3u0 Reporter: Schubert Zhang Assignee: Nicolas Spiegelberg Labels: compaction Fix For: 0.92.0 Attachments: HBASE-3826.patch, HBASE-3826_0.92.patch I have a busy region, and there are 43 StoreFiles (compactionThreshold=8) in this region. Now, I stopped the client and stopped putting new data into it. I expect these StoreFiles to be compacted later. But, almost one day later, these 43 StoreFiles are still there. (Note: in my hbase instance, I disabled the major compaction.) It seems the minor compaction does not be started continuiously to compact remaining storefiles. And I checked the code, it is true. - After more test, a obvious issue/problem is, the complete of a minor compaction does not check if current storefiles need more minor compaction. I think this may be a bug or leak. Try this test: 1. Put many data to a region, then there are 30 storefiles accumulated, because the backend compaction cannot catch up with the fast puts. (hbase.hstore.compactionThreshold=8, base.hstore.compaction.max=12) 2. Then stop put. 3. Then, these 30 storefiles are still there for a long time, (no automatic minor compaction) 4. Submit a compaction on this region, then, only 12 files are compaction, now, we have 19 storefiles. The minor compaction stopped. I think, when a minor compaction complete, it should check if the number of storefiles still many, if so, another minor compaction should start continuiously. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3902) Add Bytes.toBigDecimal and Bytes.toBytes(BigDecimal)
[ https://issues.apache.org/jira/browse/HBASE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037171#comment-13037171 ] Hudson commented on HBASE-3902: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) Add Bytes.toBigDecimal and Bytes.toBytes(BigDecimal) - Key: HBASE-3902 URL: https://issues.apache.org/jira/browse/HBASE-3902 Project: HBase Issue Type: Improvement Components: util Affects Versions: 0.90.1, 0.90.2 Reporter: Vaibhav Puranik Fix For: 0.90.4 Attachments: big-decimal-methods-patch.txt Bytes.toBigDecimal and Bytes.toBytes were removed in 0.90.x. Please add it back. We have data encoded using these methods. I don't think BigDecimal class as getBytes/toBytes methods. And even if it had it, if the logic of encoding it into bytes is different, it wouldn't work with the existing data. I am sure that lot of people might face this issue. I will submit the patch in a day or two. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor
[ https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037168#comment-13037168 ] Hudson commented on HBASE-3691: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) Add compressor support for 'snappy', google's compressor Key: HBASE-3691 URL: https://issues.apache.org/jira/browse/HBASE-3691 Project: HBase Issue Type: Task Reporter: stack Priority: Critical Fix For: 0.92.0 Attachments: hbase-snappy-3691-trunk-002.patch, hbase-snappy-3691-trunk-003.patch, hbase-snappy-3691-trunk-004.patch, hbase-snappy-3691-trunk.patch http://code.google.com/p/snappy/ is apache licensed. bq. Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. bq. Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as Zippy in some presentations and the likes.) Lets get it in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3905) HBaseAdmin.createTableAsync() should check for invalid split keys.
[ https://issues.apache.org/jira/browse/HBASE-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037167#comment-13037167 ] Hudson commented on HBASE-3905: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) HBaseAdmin.createTableAsync() should check for invalid split keys. -- Key: HBASE-3905 URL: https://issues.apache.org/jira/browse/HBASE-3905 Project: HBase Issue Type: Bug Environment: Considering this function is open to users, this function should validate the split key array. For example, I had tried creating a table with keys that had duplicate entries. The master (sometimes) crashed with a KeeperException. 2011-05-14 01:23:33,196 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating/setting node OFFLINE org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/39c3c2f26c777f9d2da8076d9b058c9b at org.apache.zookeeper.KeeperException.create(KeeperException.java:106) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038) at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:708) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createOrForceNodeOffline(ZKAssign.java:248) at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:936) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:887) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:729) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:709) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:805) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:773) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:740) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036) 2011-05-14 01:23:33,197 INFO org.apache.hadoop.hbase.master.HMaster: Aborting And just before exiting: 2011-05-14 01:23:34,048 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call createTable({BLAH BLAH BLAH}, [[B@244e3ce5) from 67.195.46.34:36335: output error Reporter: Vidhyashankar Venkataraman Assignee: Ted Yu Priority: Minor Fix For: 0.90.4 Attachments: 3905.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3901) Update documentation for ImportTsv to reflect recent features
[ https://issues.apache.org/jira/browse/HBASE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037172#comment-13037172 ] Hudson commented on HBASE-3901: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) Update documentation for ImportTsv to reflect recent features - Key: HBASE-3901 URL: https://issues.apache.org/jira/browse/HBASE-3901 Project: HBase Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.92.0 Attachments: HBASE-3901_1.patch HBASE-3880 added new features to ImportTsv. Here's a patch to update documentation for these and other recent features. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3820) Splitlog() executed while the namenode was in safemode may cause data-loss
[ https://issues.apache.org/jira/browse/HBASE-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037175#comment-13037175 ] Hudson commented on HBASE-3820: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) Splitlog() executed while the namenode was in safemode may cause data-loss -- Key: HBASE-3820 URL: https://issues.apache.org/jira/browse/HBASE-3820 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: Jieshan Bean Fix For: 0.90.4 Attachments: HBASE-3820-90-V3.patch, HBASE-3820-MFSFix-90-V2.patch, HBASE-3820-MFSFix-90.patch I found this problem while the namenode went into safemode due to some unclear reasons. There's one patch about this problem: try { HLogSplitter splitter = HLogSplitter.createLogSplitter( conf, rootdir, logDir, oldLogDir, this.fs); try { splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { LOG.warn(Retrying splitting because of:, e); // An HLogSplitter instance can only be used once. Get new instance. splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir, oldLogDir, this.fs); splitter.splitLog(); } splitTime = splitter.getTime(); splitLogSize = splitter.getSize(); } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); master.abort(Shutting down HBase cluster: Failed splitting hlog files..., e); } finally { this.splitLogLock.unlock(); } And it was really give some useful help to some extent, while the namenode process exited or been killed, but not considered the Namenode safemode exception. I think the root reason is the method of checkFileSystem(). It gives out an method to check whether the HDFS works normally(Read and write could be success), and that maybe the original propose of this method. This's how this method implements: DistributedFileSystem dfs = (DistributedFileSystem) fs; try { if (dfs.exists(new Path(/))) { return; } } catch (IOException e) { exception = RemoteExceptionHandler.checkIOException(e); } I have check the hdfs code, and learned that while the namenode was in safemode ,the dfs.exists(new Path(/)) returned true. Because the file system could provide read-only service. So this method just checks the dfs whether could be read. I think it's not reasonable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3881) Add disable balancer in graceful_stop.sh script
[ https://issues.apache.org/jira/browse/HBASE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037174#comment-13037174 ] Hudson commented on HBASE-3881: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) Add disable balancer in graceful_stop.sh script --- Key: HBASE-3881 URL: https://issues.apache.org/jira/browse/HBASE-3881 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.90.4 Attachments: balancer.txt If balancer is on when graceful_stop.sh runs, it can get messy. Add disable of balancer to the script. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2938) Add Thread-Local Behavior To HTable Pool
[ https://issues.apache.org/jira/browse/HBASE-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037177#comment-13037177 ] Hudson commented on HBASE-2938: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) HBASE-2938 Add Thread-Local Behavior To HTable Pool stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/PoolMap.java * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTablePool.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestHTablePool.java Add Thread-Local Behavior To HTable Pool Key: HBASE-2938 URL: https://issues.apache.org/jira/browse/HBASE-2938 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.89.20100621 Reporter: Karthick Sankarachary Assignee: Karthick Sankarachary Fix For: 0.92.0 Attachments: HBASE-2938-V2.patch, HBASE-2938.patch It is a well-documented fact that the HBase table client (viz., HTable) is not thread-safe. Hence, the recommendation has been to use a HTablePool or a ThreadLocal to manage access to tables. The downside of the latter is that it (a) requires the user to reinvent the wheel in terms of mapping table names to tables and (b) forces the user to maintain the thread-local objects. Ideally, it would be nice if we could make the HTablePool handle thread-local objects as well. That way, it not only becomes the one stop shop for all client-side tables, but also insulates the user from the ThreadLocal object. Here, we propose a way to generalize the HTablePool so that the underlying pool type is either reusable or thread-local. To make this possible, we introdudce the concept of a SharedMap, which essentially, maps a key to a collection of values, the elements of which are managed by a pool. In effect, that collection acts as a shared pool of resources, access to which is closely controlled as dictated by the particular semantics of the pool. Furthermore, to simplify the construction of HTablePools, we added a couple of parameters (viz. hbase.client.htable.pool.type and hbase.client.hbase.pool.size) to control the default behavior of a HTablePool. In case the size of the pool is set to a non-zero positive number, that is used to cap the number of resources that a pool may contain for any given key. A size of Integer#MAX_VALUE is interpreted to mean an unbounded pool. Currently, the SharedMap supports the following types of pools: * A ThreadLocalPool, which represents a pool that builds on the ThreadLocal class. It essentially binds the resource to the thread from which it is accessed. * A ReusablePool, which represents a pool that builds on the LinkedList class. It essentially allows resources to be checked out, at which point it is (temporarily) removed from the pool. When the resource is no longer required, it should be returned to the pool in order to be reused. * A RoundRobinPool, which represents a pool that stores its resources in an ArrayList. It load-balances access to its resources by returning a different resource every time a given key is looked up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3898) TestSplitTransactionOnCluster broke in TRUNK
[ https://issues.apache.org/jira/browse/HBASE-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037178#comment-13037178 ] Hudson commented on HBASE-3898: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) TestSplitTransactionOnCluster broke in TRUNK Key: HBASE-3898 URL: https://issues.apache.org/jira/browse/HBASE-3898 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Attachments: 3898.txt It hangs for 15 minutes. I see a NPE trying to split a region. The splitKey passed is null. Looks to be by-product of recent compaction refactorings. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3888) book.xml - filled in architecture 'daemon' section
[ https://issues.apache.org/jira/browse/HBASE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037176#comment-13037176 ] Hudson commented on HBASE-3888: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) book.xml - filled in architecture 'daemon' section --- Key: HBASE-3888 URL: https://issues.apache.org/jira/browse/HBASE-3888 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Fix For: 0.92.0 Attachments: book_HBASE_3888.xml.patch The 'daemon' section in architecture has been empty for a while. Filled in an overview of what HMaster and HRegionServer do, with a brief overview of what their functional interfaces look like, along with a short description of their background processes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3874) ServerShutdownHandler fails on NPE if a plan has a random region assignment
[ https://issues.apache.org/jira/browse/HBASE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037173#comment-13037173 ] Hudson commented on HBASE-3874: --- Integrated in HBase-TRUNK #1930 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1930/]) ServerShutdownHandler fails on NPE if a plan has a random region assignment --- Key: HBASE-3874 URL: https://issues.apache.org/jira/browse/HBASE-3874 Project: HBase Issue Type: Bug Affects Versions: 0.90.2 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.4 Attachments: HBASE-3874-trunk.patch, HBASE-3874.patch By chance, we were able to revert the ulimit on one of our clusters to 1024 and it started dying non-stop on Too many open files. Now the bad thing is that some region servers weren't completely ServerShutdownHandler'd because they failed on: {quote} 2011-05-07 00:04:46,203 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:1804) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:101) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} Reading the code, it seems the NPE is in the if statement: {code} Map.EntryString, RegionPlan e = i.next(); if (e.getValue().getDestination().equals(hsi)) { // Use iterator's remove else we'll get CME i.remove(); } {code} Which means that the destination (HSI) is null. Looking through the code, it seems we instantiate a RegionPlan with a null HSI when it's a random assignment. It means that if there's a random assignment going on while a node dies then this issue might happen. Initially I thought that this could mean data loss, but the logs are already split so it's just the reassignment that doesn't happen (still bad). Also it left the master with dead server being processed, so for two days the balancer didn't run failing on: bq. org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [] And the reason why the array is empty is because we are running 0.90.3 which removes the RS from the dead list if it comes back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)
[ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Echeverria updated HBASE-1316: --- Attachment: HBASE-1316-1.patch zookeeper-native-Linux-amd64-64.tgz zookeeper-native-headers.tgz I've got a partial patch ready. The build relies on native-maven-plugin to build the native code. This plugin pulls native dependencies as maven artifacts. To make this work, I packaged up the zookeeper header files and the static library compiled for x86-64 Linux. In order to test the patch you need to install the artifacts into your local maven repository. I've included a simple install.sh to do this for you. We'll need to upload these artifacts somewhere, along with other supported OSes/architectures in the future. I did attempt to make both the build and runtime code work if you're not on a supported platform, but I haven't extensively tested it. At this point, the patch just adds support for interacting with zookeeper via the native code. The interaction is very limited, currently only creating ephemeral nodes is supported. One thing I did do was add a callback for the native code to notify Java when it's session gets expired. Right now, I'm generating my own session expiration event to send to the Java zookeeper connection. I think this will allow the region server to shutdown if the native session expires. It should look just like an expiration of the Java session. Things that are not yet implemented: # The region server hasn't been modified to use the native code at all. # I haven't modified the packaging part of the build. I'm not sure how we'll want the build to generate versions of the native library for multiple platforms. Let me know if you think this is on the right track or if anything needs a big rethink. ZooKeeper: use native threads to avoid GC stalls (JNI integration) -- Key: HBASE-1316 URL: https://issues.apache.org/jira/browse/HBASE-1316 Project: HBase Issue Type: Improvement Affects Versions: 0.20.0 Reporter: Andrew Purtell Assignee: Berk D. Demir Attachments: HBASE-1316-1.patch, zk_wrapper.tar.gz, zookeeper-native-Linux-amd64-64.tgz, zookeeper-native-headers.tgz From Joey Echeverria up on hbase-users@: We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable. We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down. I don't know if it's worth doing the same kind of thing for HBase as it adds some unnecessary native code, but it's a solution that I found works. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.
[ https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037218#comment-13037218 ] Ted Yu commented on HBASE-3904: --- From Vidhyashankar's test: {code} Table test-v6 not yet available... Sleeping for 5 more minutes... Expected #regions = 17933 Table is probably available!! : test-v6 Available? true Table test-v6may not be available... Double checking: Sleeping for 5 minutes more... Table test-v6: Expected # Regions = 17933 Actual number = 4744 {code} We can see that after conn.isTableAvailable() returned true, there were still at least 13189 regions that were not assigned - not reaching .META. I think we should implement createTableSync() as I proposed earlier. We can ask user to call table.getRegionsInfo() but that is not convenient, and getRegionsInfo() is marked deprecated. HConnection.isTableAvailable returns true even with not all regions available. -- Key: HBASE-3904 URL: https://issues.apache.org/jira/browse/HBASE-3904 Project: HBase Issue Type: Bug Components: client Reporter: Vidhyashankar Venkataraman Priority: Minor Attachments: 3904.txt This function as per the java doc is supposed to return true iff all the regions in the table are available. But if the table is still being created this function may return inconsistent results (For example, when a table with a large number of split keys is created). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037255#comment-13037255 ] Andrew Purtell commented on HBASE-3909: --- Given that Hadoop does not require ZooKeeper, but we do anyway, I wonder if it makes more sense to go our own route and host all of configuration in the ZooKeeper namespace. It would therefore be possible to make one edit (committed into ZK) and watches on all processes would automatically pull it. The access controller on HBASE-3025 uses this approach for ACLs. Upon cold boot they are loaded from META into znodes. Then all processes open watches on the znode(s). Upon update, the znode is updated, firing the watchers, propagating the change cluster wide. For supporting dynamic configuration, the first process up could populate znode(s) from Configuration; otherwise if the znodes exist configuration would be read from there. Whenever the znode(s) are updated, the changes can be applied to running state by the watcher. How/if the updated configuration should be written back to the config xml files on local disk may be a subject of debate. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.
[ https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037257#comment-13037257 ] Andrew Purtell commented on HBASE-3906: --- How many of those 3G of objects on the heap are live? When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME. -- Key: HBASE-3906 URL: https://issues.apache.org/jira/browse/HBASE-3906 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.2, 0.90.3 Environment: 1 hmaster,4 regionserver,about 100,000 regions. Reporter: jian zhang Fix For: 0.90.4 Attachments: HBASE-3906.patch Original Estimate: 168h Remaining Estimate: 168h 1、Start hbase cluster; 2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster; 3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2077) NullPointerException with an open scanner that expired causing an immediate region server shutdown
[ https://issues.apache.org/jira/browse/HBASE-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-2077: - Attachment: 2077-v4.txt Ahemm.. this is a version that actually works (TestFromClientSide is a good test for this change). NullPointerException with an open scanner that expired causing an immediate region server shutdown -- Key: HBASE-2077 URL: https://issues.apache.org/jira/browse/HBASE-2077 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.2, 0.20.3 Environment: Hadoop 0.20.0, Mac OS X, Java 6 Reporter: Sam Pullara Assignee: Sam Pullara Priority: Critical Fix For: 0.92.0 Attachments: 2077-suggestion.txt, 2077-v4.txt, HBASE-2077-3.patch, HBASE-2077-redux.patch, [Bug_HBASE-2077]_Fixes_a_very_rare_race_condition_between_lease_expiration_and_renewal.patch Original Estimate: 1h Remaining Estimate: 1h 2009-12-29 18:05:55,432 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -4250070597157694417 lease expired 2009-12-29 18:05:55,443 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2009-12-29 18:05:55,446 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 55260, call next(-4250070597157694417, 1) from 192.168.1.90:54011: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:869) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:859) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1965) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1310) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:136) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117) at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641) at java.util.PriorityQueue.siftDown(PriorityQueue.java:612) at java.util.PriorityQueue.poll(PriorityQueue.java:523) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:113) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.nextInternal(HRegion.java:1776) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.next(HRegion.java:1719) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1944) ... 5 more 2009-12-29 18:05:55,447 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037265#comment-13037265 ] Todd Lipcon commented on HBASE-3909: I'm always skeptical of the suggestion to store configuration in ZooKeeper. Here's my reasoning: - we already require at least one piece of configuration in the client itself in order to connect to ZooKeeper (ie the ZK quorum info and session timeouts, etc) - operations teams are very good at managing text-based configuration files with tools like puppet, cfengine, etc. It's also easy to version-control these kinds of configs, add !-- comments --, etc. Moving to ZK makes these tasks more difficult -- we'd need lots of tooling, etc. - If we keep both the text-based and ZK-based, it's easy to accidentally change something in ZK but forget to update in text, so it would revert on next restart. - we currently have the somewhat nice property that nothing in ZK is critical - even if the ZK cluster is completely wiped out, we dont lose any info. This would be a change. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037266#comment-13037266 ] Ted Yu commented on HBASE-3909: --- +1 on Todd's comment. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: Bug Reporter: stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira