[jira] [Commented] (HBASE-1730) Near-instantaneous online schema and table state updates
[ https://issues.apache.org/jira/browse/HBASE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083966#comment-13083966 ] nileema shingte commented on HBASE-1730: Hi Ted, Thank you for reviewing the code! I have incorporated them and posted it on the review board here: https://reviews.apache.org/r/1479/ Near-instantaneous online schema and table state updates Key: HBASE-1730 URL: https://issues.apache.org/jira/browse/HBASE-1730 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 1730-v2.patch, 1730-v3.patch, 1730.patch, HBASE-1730.patch We should not need to take a table offline to update HCD or HTD. One option for that is putting HTDs and HCDs up into ZK, with mirror on disk catalog tables to be used only for cold init scenarios, as discussed on IRC. In this scheme, regionservers hosting regions of a table would watch permanent nodes in ZK associated with that table for schema updates and take appropriate actions out of the watcher. In effect, schema updates become another item in the ToDo list. {{/hbase/tables/table-name/schema}} Must be associated with a write locking scheme also handled with ZK primitives to avoid situations where one concurrent update clobbers another. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy
[ https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083965#comment-13083965 ] stack commented on HBASE-4015: -- bq. Timeout monitor DOESNOT preempt an znode to OFFLINE if in PENDING_OPEN state. Ok. I think I understand now. The addition of new state breaks the move to OPENING because the check for a previous OFFLINE state will fail... so the RS will no proceed with the open. But in fig (iii) in your doc. you check previous state is REALLOCATE? How is this case different from the fig (i) where you check for OFFLINE? Won't your code have to check for both REALLOCATE and OFFLINE and the presence of either mean its ok to procede to OPENING (and then aren't REALLOCATE and OFFLINE the 'same' state because the presence of either will mean proceed to OPENING?). I suppose the presence of the RS name will help. If its the 'same' name, then we can proceed to OPENING and so what if OFFLINE was hijacked and became a REALLOCATE. If they are not the same, then we'd abort the open. So, why not just add machine name to OFFLINE? Then we don't need REALLOCATE state? (Ideally it would be best if master told the regionserver the version of the znode to expect when it goes to move the znode to OPENING but that looks hard to pass from the master over to the RS EventHandlers). So, figuring how to do deal with timeout of regions in PENDING_OPEN is one aspect of this issue, right? The verification of state over in timeout monitor before acting is another aspect? You are working on TRUNK Ram? (I believe it acts a little differently from 0.90 because of recent work done in here). Good stuff Ram. Thanks for digging into this. Refactor the TimeoutMonitor to make it less racy Key: HBASE-4015 URL: https://issues.apache.org/jira/browse/HBASE-4015 Project: HBase Issue Type: Sub-task Affects Versions: 0.90.3 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.92.0 Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state diagrams.pdf The current implementation of the TimeoutMonitor acts like a race condition generator, mostly making things worse rather than better. It does it's own thing for a while without caring for what's happening in the rest of the master. The first thing that needs to happen is that the regions should not be processed in one big batch, because that sometimes can take minutes to process (meanwhile a region that timed out opening might have opened, then what happens is it will be reassigned by the TimeoutMonitor generating the never ending PENDING_OPEN situation). Those operations should also be done more atomically, although I'm not sure how to do it in a scalable way in this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1730) Near-instantaneous online schema and table state updates
[ https://issues.apache.org/jira/browse/HBASE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083967#comment-13083967 ] jirapos...@reviews.apache.org commented on HBASE-1730: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1479/ --- Review request for Dhruba Borthakur, Ted Yu, Michael Stack, and Jonathan Gray. Summary --- When the master receives an alter table call (addColumn, modifyColumn, deleteColumn, modifyTable), it updates the .tableinfo and then closes all the regions of that table. The patch includes: 1. Changes to reopen the regions when any of the above operations are performed. 2. Best effort is made to preserve the locality of regions by assigning it a region plan before closing it. 3. Throttling logic that ensures that only a configurable number of regions are closed per region server at a time. 4. alter command in the hbase shell will block until all the regions are updated, providing a status x/y regions updated every second. 5. alter_async command that works exactly like alter, except that it does not block for completion or provide the status. 6. alter_status table_name which is a sync call and blocks to provide the x/y regions updated status per second until all regions are updated. 7. modification in the unit test for enabling alter without disabling the table. This addresses bug HBASE-1730. https://issues.apache.org/jira/browse/HBASE-1730 Diffs - src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java f151c77 src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 13c8b8c src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c0aa024 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 49d1e7c src/main/java/org/apache/hadoop/hbase/master/BulkReOpen.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/HMaster.java 8beeb68 src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 57c1140 src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java ae43837 src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java 09891aa src/main/ruby/hbase/admin.rb 4460d6e src/main/ruby/shell.rb 1ec330f src/main/ruby/shell/commands/alter.rb 1dd43ad src/main/ruby/shell/commands/alter_async.rb PRE-CREATION src/main/ruby/shell/commands/alter_status.rb PRE-CREATION src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java a32dc99 Diff: https://reviews.apache.org/r/1479/diff Testing --- I am putting this up for initial review. I have tested the functionality in a pseudo distributed mode. Need to run unit tests. Thanks, Nileema Near-instantaneous online schema and table state updates Key: HBASE-1730 URL: https://issues.apache.org/jira/browse/HBASE-1730 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 1730-v2.patch, 1730-v3.patch, 1730.patch, HBASE-1730.patch We should not need to take a table offline to update HCD or HTD. One option for that is putting HTDs and HCDs up into ZK, with mirror on disk catalog tables to be used only for cold init scenarios, as discussed on IRC. In this scheme, regionservers hosting regions of a table would watch permanent nodes in ZK associated with that table for schema updates and take appropriate actions out of the watcher. In effect, schema updates become another item in the ToDo list. {{/hbase/tables/table-name/schema}} Must be associated with a write locking scheme also handled with ZK primitives to avoid situations where one concurrent update clobbers another. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4193) Enhance RPC debug logging to provide more details on call contents
[ https://issues.apache.org/jira/browse/HBASE-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083982#comment-13083982 ] Hudson commented on HBASE-4193: --- Integrated in HBase-TRUNK #2111 (See [https://builds.apache.org/job/HBase-TRUNK/2111/]) HBASE-4193 Enhance RPC debug logging with details on call contents garyh : Files : * /hbase/trunk/conf/log4j.properties * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Objects.java Enhance RPC debug logging to provide more details on call contents -- Key: HBASE-4193 URL: https://issues.apache.org/jira/browse/HBASE-4193 Project: HBase Issue Type: Improvement Components: ipc Reporter: Gary Helmling Assignee: Gary Helmling Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4193.patch, HBASE-4193_final.patch The current HBaseServer debug logging, while verbose, doesn't provide much information on the actual contents of RPC calls being handled. This makes it difficult to diagnose why some calls make take much longer to process that others. Have more information on the size of client calls, and the contents of those calls (especially in the case of batch or multi operations) would provide a lot more context for tracking down issues. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4186) No region is added to regionsInTransitionInRS
[ https://issues.apache.org/jira/browse/HBASE-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-4186. --- Resolution: Fixed Hadoop Flags: [Reviewed] No region is added to regionsInTransitionInRS - Key: HBASE-4186 URL: https://issues.apache.org/jira/browse/HBASE-4186 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.90.5 Attachments: 4186.txt We have a skip list set called regionsInTransitionInRS (introduced in HBASE-3741) where we try to maintain a list to know the currently processing regions for closing and opening. In open region handler we are trying to throw an error if the regions are in transition on that RS when we get an open call for the same region. But we are not adding the region into the set anywhere. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3741) Make HRegionServer aware of the regions it's opening/closing
[ https://issues.apache.org/jira/browse/HBASE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-3741: -- Fix Version/s: (was: 0.90.5) 0.90.3 Make HRegionServer aware of the regions it's opening/closing Key: HBASE-3741 URL: https://issues.apache.org/jira/browse/HBASE-3741 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.3 Attachments: HBASE-3741-rsfix-v2.patch, HBASE-3741-rsfix-v3.patch, HBASE-3741-rsfix.patch, HBASE-3741-trunk.patch This is a serious issue about a race between regions being opened and closed in region servers. We had this situation where the master tried to unassign a region for balancing, failed, force unassigned it, force assigned it somewhere else, failed to open it on another region server (took too long), and then reassigned it back to the original region server. A few seconds later, the region server processed the first closed and the region was left unassigned. This is from the master log: {quote} 11-04-05 15:11:17,758 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=sv4borg42,60020,1300920459477, load=(requests=187, regions=574, usedHeap=3918, maxHeap=6973) for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_CLOSE, ts=1302041477758 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 ... 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=CLOSED, ts=1302041685733 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x42ec2cece810b68 Creating (or updating) unassigned node for 1470298961 with OFFLINE state ... 2011-04-05 15:14:45,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961; plan=hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961, src=sv4borg42,60020,1300920459477, dest=sv4borg40,60020,1302041218196 2011-04-05 15:14:45,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 to sv4borg40,60020,1302041218196 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_OPEN, ts=1302041700944 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_OPEN, ts=1302041700944 ... 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 so generated a random one; hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961, src=, dest=sv4borg42,60020,1300920459477; 19 (online=19, exclude=null) available servers 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 to sv4borg42,60020,1300920459477 2011-04-05 15:15:40,951 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:6-0x42ec2cece810b68 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected,
[jira] [Commented] (HBASE-1730) Near-instantaneous online schema and table state updates
[ https://issues.apache.org/jira/browse/HBASE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083990#comment-13083990 ] jirapos...@reviews.apache.org commented on HBASE-1730: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1479/#review1416 --- Please test this patch using a real cluster. src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java https://reviews.apache.org/r/1479/#comment3273 Pair implements Serializable which is handled specially by HbaseObjectWritable (see line 350 below). Why is this needed ? - Ted On 2011-08-12 06:14:21, Nileema Shingte wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1479/ bq. --- bq. bq. (Updated 2011-08-12 06:14:21) bq. bq. bq. Review request for Dhruba Borthakur, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. When the master receives an alter table call (addColumn, modifyColumn, deleteColumn, modifyTable), it updates the .tableinfo and then closes all the regions of that table. The patch includes: bq. bq. 1. Changes to reopen the regions when any of the above operations are performed. bq. 2. Best effort is made to preserve the locality of regions by assigning it a region plan before closing it. bq. 3. Throttling logic that ensures that only a configurable number of regions are closed per region server at a time. bq. 4. alter command in the hbase shell will block until all the regions are updated, providing a status x/y regions updated every second. bq. 5. alter_async command that works exactly like alter, except that it does not block for completion or provide the status. bq. 6. alter_status table_name which is a sync call and blocks to provide the x/y regions updated status per second until all regions are updated. bq. 7. modification in the unit test for enabling alter without disabling the table. bq. bq. bq. This addresses bug HBASE-1730. bq. https://issues.apache.org/jira/browse/HBASE-1730 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java f151c77 bq.src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 13c8b8c bq.src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c0aa024 bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 49d1e7c bq.src/main/java/org/apache/hadoop/hbase/master/BulkReOpen.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 8beeb68 bq.src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 57c1140 bq. src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java ae43837 bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java 09891aa bq.src/main/ruby/hbase/admin.rb 4460d6e bq.src/main/ruby/shell.rb 1ec330f bq.src/main/ruby/shell/commands/alter.rb 1dd43ad bq.src/main/ruby/shell/commands/alter_async.rb PRE-CREATION bq.src/main/ruby/shell/commands/alter_status.rb PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java a32dc99 bq. bq. Diff: https://reviews.apache.org/r/1479/diff bq. bq. bq. Testing bq. --- bq. bq. I am putting this up for initial review. I have tested the functionality in a pseudo distributed mode. bq. Need to run unit tests. bq. bq. bq. Thanks, bq. bq. Nileema bq. bq. Near-instantaneous online schema and table state updates Key: HBASE-1730 URL: https://issues.apache.org/jira/browse/HBASE-1730 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 1730-v2.patch, 1730-v3.patch, 1730.patch, HBASE-1730.patch We should not need to take a table offline to update HCD or HTD. One option for that is putting HTDs and HCDs up into ZK, with mirror on disk catalog tables to be used only for cold init scenarios, as discussed on IRC. In this scheme, regionservers hosting regions of a table would watch permanent nodes in ZK associated with that table for schema updates and take appropriate actions out of the watcher. In effect, schema updates become another item in the ToDo list. {{/hbase/tables/table-name/schema}} Must be associated with a write locking scheme also handled with ZK primitives to avoid situations where one concurrent
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084003#comment-13084003 ] jirapos...@reviews.apache.org commented on HBASE-4027: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1214/ --- (Updated 2011-08-12 08:41:37.017483) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, Jonathan Gray, and Li Pi. Changes --- fixed test failure, as ted yu reported. Summary --- Review request - I apparently can't edit tlipcon's earlier posting of my diff, so creating a new one. This addresses bug HBase-4027. https://issues.apache.org/jira/browse/HBase-4027 Diffs (updated) - conf/hbase-env.sh 2d55d27 src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java 2d4002c src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 097dc50 src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java 1338453 src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java 886c31d src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/Slab.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 7a917da src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 7b7bf73 src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/HFileBlockCacheTestUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/SingleSizeCacheTestUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 4387170 Diff: https://reviews.apache.org/r/1214/diff Testing --- Ran benchmarks against it in HBase standalone mode. Wrote test cases for all classes, multithreaded test cases exist for the cache. Thanks, Li Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Li Pi Priority: Minor Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, hbase4027v11.5.diff, hbase4027v11.diff, slabcachepatch.diff, slabcachepatchv2.diff, slabcachepatchv3.1.diff, slabcachepatchv3.2.diff, slabcachepatchv3.diff, slabcachepatchv4.5.diff, slabcachepatchv4.diff Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy
[ https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084007#comment-13084007 ] ramkrishna.s.vasudevan commented on HBASE-4015: --- bq. You are working on TRUNK Ram? Yes Stack bq. Won't your code have to check for both REALLOCATE and OFFLINE and the presence of either mean its ok to procede to OPENING (and then aren't REALLOCATE and OFFLINE the 'same' state because the presence of either will mean proceed to OPENING?). Yes this is what my patch does. But why we do the same operation for both state? this is because previously if there is a change in state other than OFFLINE while moving to OPENING we were aborting, now this an additional state which says its ok to go to OPENING if you find me in RE_ALLOCATE and if the server name in me is same as your RS address. This avoids the problem of unnecessary region getting hijacked though the RS was doing his work correctly. bq.So, why not just add machine name to OFFLINE? Then we don't need REALLOCATE state? This you have already told like currently there is no version that is passed from master to rs. Thats why a new state. If this had been possible then OFFLINE with version passed by master would have been sufficient. bq.So, figuring how to do deal with timeout of regions in PENDING_OPEN is one aspect of this issue, right? The verification of state over in timeout monitor before acting is another aspect? Yes stack.. we have covered both these aspects and also the points told by JD. Taking action on timeout immediately and a mechanism for both master and RS to know what happened as part of timeout and who ever wins the race succeeds. bq.(I believe it acts a little differently from 0.90 because of recent work done in here). Reg timeout monitor the one major change is now the CLSOING state node is created by master itself and it was done by RS as in 0.90. Apart from this i dint find any big difference till now. As part of HBASE-4083 we have introduced the return types from Open RegionHandler which takes care of scenarios where a race condition happens between the master changes to RE_ALLOCATE by the time the RS has moved to OPENED. Refactor the TimeoutMonitor to make it less racy Key: HBASE-4015 URL: https://issues.apache.org/jira/browse/HBASE-4015 Project: HBase Issue Type: Sub-task Affects Versions: 0.90.3 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.92.0 Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state diagrams.pdf The current implementation of the TimeoutMonitor acts like a race condition generator, mostly making things worse rather than better. It does it's own thing for a while without caring for what's happening in the rest of the master. The first thing that needs to happen is that the regions should not be processed in one big batch, because that sometimes can take minutes to process (meanwhile a region that timed out opening might have opened, then what happens is it will be reassigned by the TimeoutMonitor generating the never ending PENDING_OPEN situation). Those operations should also be done more atomically, although I'm not sure how to do it in a scalable way in this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084062#comment-13084062 ] ramkrishna.s.vasudevan commented on HBASE-4175: --- @Ted, Currently as you mentioned FSUtils.createTableDescriptor() doesnot throw IOE. So I will make it throw IOE. Reg if the table already exists there is a check {code} if (fs.exists(tableInfoPath) fs.getFileStatus(tableInfoPath).getLen() 0) { LOG.info(TableInfo already exists.. Skipping creation); } {code} So we need not add any FSTableDescriptor.get() I think. Pls suggest. Also what should be the behaviour if the table already exists. As you had already told do we need to forcefully create? So for that we need to introduce a new api for forceful creation. In my current patch I am planning to return true or false and if IOE happens will throw the IOE to the caller. Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy
[ https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084060#comment-13084060 ] ramkrishna.s.vasudevan commented on HBASE-4015: --- @Stack, Was seeing the possibility of using OFFLINE state. Thought of few things - Now we need to change behaviour in all the cases in timeoutmonitor to preempt the node to OFFLINE with RS name. -Before changing to OFFLINE see what is the state in RS. If still OFFLINE/OPENING change it to OFFLINE+Servername address -After changing it to OFFLINE get the latest version and pass it to the RS from Master which inturn goes to the OpenRegionHandler. -This will be needed when we transit from OFFLINE to OPENING to ensure whether the current transition from OFFLINE to OPENING is for timeout call or previous OFFLINE to OPENING did not happen. -also the servername is necessary to avoid processing of the transition by the RS who is no longer owner of the znode. -And even in normal flow(normal assign flow) we need to add the servername of RS along with OFFLINE who will process the unassigned node These will be the highlevel changes that we need to make in the current patch if we need to avoid the new state. Refactor the TimeoutMonitor to make it less racy Key: HBASE-4015 URL: https://issues.apache.org/jira/browse/HBASE-4015 Project: HBase Issue Type: Sub-task Affects Versions: 0.90.3 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.92.0 Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state diagrams.pdf The current implementation of the TimeoutMonitor acts like a race condition generator, mostly making things worse rather than better. It does it's own thing for a while without caring for what's happening in the rest of the master. The first thing that needs to happen is that the regions should not be processed in one big batch, because that sometimes can take minutes to process (meanwhile a region that timed out opening might have opened, then what happens is it will be reassigned by the TimeoutMonitor generating the never ending PENDING_OPEN situation). Those operations should also be done more atomically, although I'm not sure how to do it in a scalable way in this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4175: -- Attachment: HBASE-4175.patch Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084124#comment-13084124 ] Ted Yu commented on HBASE-4175: --- We should add boolean parameter, force, to FSUtils.createTableDescriptor(). If the table already exists and force parameter is false, FSUtils.createTableDescriptor() can simply return false. Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4195) Possible unconsistency in a memstore read after a reseek, possible performance improvement
[ https://issues.apache.org/jira/browse/HBASE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084160#comment-13084160 ] nkeywal commented on HBASE-4195: The issue with the implementation calling only seek is that we can see writes in progress. From my understanding, it should not be the case (and at least, if it's allowed, there is an issue in the test case itself). The error is this assert: Assert.assertEquals(i= + i, expectedCount, result.size());, that's different from the one mentionned in HBASE-3855. If I change the reseek implementation to something that does no call seek at all, like: {noformat}public boolean reseek(KeyValue key) { while (kvsetNextRow != null comparator.compare(kvsetNextRow, key) 0) { kvsetNextRow = getNext(kvsetIt); } while (snapshotNextRow != null comparator.compare(snapshotNextRow, key) 0) { snapshotNextRow = getNext(snapshotIt); } numIterReseek = 0; return (kvsetNextRow != null || snapshotNextRow != null); }{noformat} The whole test works fine. So it seems the issue really comes from using seek. The current implementation should have the same issue I think. May be we don't see it often (or at all) because seek is not called that often because of the points mentionned in 2 3 in the analysis above. Can someone confirm that we should not see partial writes in this case? Possible unconsistency in a memstore read after a reseek, possible performance improvement -- Key: HBASE-4195 URL: https://issues.apache.org/jira/browse/HBASE-4195 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Environment: all Reporter: nkeywal Priority: Critical This follows the dicussion around HBASE-3855, and the random errors (20% failure on trunk) on the unit test org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting I saw some points related to numIterReseek, used in the MemStoreScanner#getNext (line 690): {noformat}679 protected KeyValue getNext(Iterator it) { 680 KeyValue ret = null; 681 long readPoint = ReadWriteConsistencyControl.getThreadReadPoint(); 682 //DebugPrint.println( MS@ + hashCode() + : threadpoint = + readPoint); 683 684 while (ret == null it.hasNext()) { 685 KeyValue v = it.next(); 686 if (v.getMemstoreTS() = readPoint) { 687 // keep it. 688 ret = v; 689 } 690 numIterReseek--; 691 if (numIterReseek == 0) { 692 break; 693} 694 } 695 return ret; 696 }{noformat} This function is called by seek, reseek, and next. The numIterReseek is only usefull for reseek. There are some issues, I am not totally sure it's the root cause of the test case error, but it could explain partly the randomness of the error, and one point is for sure a bug. 1) In getNext, numIterReseek is decreased, then compared to zero. The seek function sets numIterReseek to zero before calling getNext. It means that the value will be actually negative, hence the test will always fail, and the loop will continue. It is the expected behaviour, but it's quite smart. 2) In reseek, numIterReseek is not set between the loops on the two iterators. If the numIterReseek is equals to zero after the loop on the first one, the loop on the second one will never call seek, as numIterReseek will be negative. 3) Still in reseek, the test to call seek is (kvsetNextRow == null numIterReseek == 0). In other words, if kvsetNextRow is not null when numIterReseek equals zero, numIterReseek will start to be negative at the next iteration and seek will never be called. 4) You can have side effects if reseek ends with a numIterReseek 0: the following calls to the next function will decrease numIterReseek to zero, and getNext will break instead of continuing the loop. As a result, later calls to next() may return null or not depending on how is configured the default value for numIterReseek. To check if the issue comes from point 4, you can set the numIterReseek to zero before returning in reseek: {noformat} numIterReseek = 0; return (kvsetNextRow != null || snapshotNextRow != null); }{noformat} On my env, on trunk, it seems to work, but as it's random I am not really sure. I also had to modify the test (I added a loop) to make it fails more often, the original test was working quite well here. It has to be confirmed that this totally fix (it could be partial or unrelated) org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting before implementing a
[jira] [Commented] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084269#comment-13084269 ] Lars Hofhansl commented on HBASE-4197: -- I attached a minimal patch that makes it work for me. I am not happy with the patch, though, for two reason: 1. isFilterDone() now needs to be public. 2. If the regionserver can only ever deal with RegionScanners, maybe all the interfaces in coprocessors should also take RegionScanner instead. RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197.txt Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084311#comment-13084311 ] jirapos...@reviews.apache.org commented on HBASE-4027: -- bq. On 2011-08-12 17:26:40, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java, lines 101-103 bq. https://reviews.apache.org/r/1214/diff/8/?file=31771#file31771line101 bq. bq. this can race against getBlock() though: bq. bq. Thread A: backingMap.get(key) returns object bq. Thread B: put() returns same object bq. Thread B: free(object) bq. Thread A: use object. boom? bq. bq. putIfAbsent shouldn't be any slower than put, may as well make use of it Ah, gotcha! I see it now. Fixed. - Li --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1214/#review1423 --- On 2011-08-12 08:41:37, Li Pi wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1214/ bq. --- bq. bq. (Updated 2011-08-12 08:41:37) bq. bq. bq. Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, Jonathan Gray, and Li Pi. bq. bq. bq. Summary bq. --- bq. bq. Review request - I apparently can't edit tlipcon's earlier posting of my diff, so creating a new one. bq. bq. bq. This addresses bug HBase-4027. bq. https://issues.apache.org/jira/browse/HBase-4027 bq. bq. bq. Diffs bq. - bq. bq.conf/hbase-env.sh 2d55d27 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java 2d4002c bq.src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 097dc50 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java 1338453 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java 886c31d bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/Slab.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 7a917da bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 7b7bf73 bq.src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/io/hfile/HFileBlockCacheTestUtils.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/io/hfile/SingleSizeCacheTestUtils.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 4387170 bq. bq. Diff: https://reviews.apache.org/r/1214/diff bq. bq. bq. Testing bq. --- bq. bq. Ran benchmarks against it in HBase standalone mode. Wrote test cases for all classes, multithreaded test cases exist for the cache. bq. bq. bq. Thanks, bq. bq. Li bq. bq. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Li Pi Priority: Minor Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, hbase4027v11.5.diff, hbase4027v11.diff, slabcachepatch.diff, slabcachepatchv2.diff, slabcachepatchv3.1.diff, slabcachepatchv3.2.diff, slabcachepatchv3.diff, slabcachepatchv4.5.diff, slabcachepatchv4.diff Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated
[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy
[ https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084333#comment-13084333 ] Jonathan Gray commented on HBASE-4015: -- Sorry I'm a little late to this discussion but I like the idea of not adding a new state. Instead, we can just pass the znode version number in the RPC to the regionservers. Or encode the servername in the znode. Refactor the TimeoutMonitor to make it less racy Key: HBASE-4015 URL: https://issues.apache.org/jira/browse/HBASE-4015 Project: HBase Issue Type: Sub-task Affects Versions: 0.90.3 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.92.0 Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state diagrams.pdf The current implementation of the TimeoutMonitor acts like a race condition generator, mostly making things worse rather than better. It does it's own thing for a while without caring for what's happening in the rest of the master. The first thing that needs to happen is that the regions should not be processed in one big batch, because that sometimes can take minutes to process (meanwhile a region that timed out opening might have opened, then what happens is it will be reassigned by the TimeoutMonitor generating the never ending PENDING_OPEN situation). Those operations should also be done more atomically, although I'm not sure how to do it in a scalable way in this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084358#comment-13084358 ] jirapos...@reviews.apache.org commented on HBASE-4027: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1214/ --- (Updated 2011-08-12 20:26:21.230751) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, Jonathan Gray, and Li Pi. Changes --- Fixed another broken test case. (Didn't reset buffer position before doing compare) and fixed race. Summary --- Review request - I apparently can't edit tlipcon's earlier posting of my diff, so creating a new one. This addresses bug HBase-4027. https://issues.apache.org/jira/browse/HBase-4027 Diffs (updated) - conf/hbase-env.sh 2d55d27 src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java 2d4002c src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 097dc50 src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java 1338453 src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java 886c31d src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/Slab.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 7a917da src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 7b7bf73 src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/HFileBlockCacheTestUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/SingleSizeCacheTestUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 4387170 Diff: https://reviews.apache.org/r/1214/diff Testing --- Ran benchmarks against it in HBase standalone mode. Wrote test cases for all classes, multithreaded test cases exist for the cache. Thanks, Li Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Li Pi Priority: Minor Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, hbase4027v11.5.diff, hbase4027v11.diff, slabcachepatch.diff, slabcachepatchv2.diff, slabcachepatchv3.1.diff, slabcachepatchv3.2.diff, slabcachepatchv3.diff, slabcachepatchv4.5.diff, slabcachepatchv4.diff Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Pi updated HBASE-4027: - Attachment: hbase4027v11.6.diff fixed broken test and race condition. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Li Pi Priority: Minor Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, hbase4027v11.5.diff, hbase4027v11.6.diff, hbase4027v11.diff, slabcachepatch.diff, slabcachepatchv2.diff, slabcachepatchv3.1.diff, slabcachepatchv3.2.diff, slabcachepatchv3.diff, slabcachepatchv4.5.diff, slabcachepatchv4.diff Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4196) TableRecordReader may skip first row of region
[ https://issues.apache.org/jira/browse/HBASE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084421#comment-13084421 ] Ted Yu commented on HBASE-4196: --- Patch looks good. There're two TableRecordReaderImpl.java files, one under mapred and one under mapreduce. Both of them should be fixed. TableRecordReader may skip first row of region -- Key: HBASE-4196 URL: https://issues.apache.org/jira/browse/HBASE-4196 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Jan Lukavsky Assignee: Ming Ma Attachments: HBASE-4196-trunk.patch After the following scenario, the first record of region is skipped, without being sent to Mapper: - the reader is initialized with TableRecordReader.init() - then nextKeyValue is called, causing call to scanner.next() - here ScannerTimeoutException occurs - the scanner is restarted by call to restart() and then *two* calls to scanner.next() occur, causing we have lost the first row -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084425#comment-13084425 ] jirapos...@reviews.apache.org commented on HBASE-4027: -- bq. On 2011-08-12 21:52:43, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java, line 38 bq. https://reviews.apache.org/r/1214/diff/11/?file=32400#file32400line38 bq. bq. Still some white spaces to remove. Got it. bq. On 2011-08-12 21:52:43, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/io/hfile/slab/Slab.java, line 37 bq. https://reviews.apache.org/r/1214/diff/11/?file=32401#file32401line37 bq. bq. Incorrect class name. Doh. Fixed. - Li --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1214/#review1430 --- On 2011-08-12 20:26:21, Li Pi wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1214/ bq. --- bq. bq. (Updated 2011-08-12 20:26:21) bq. bq. bq. Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, Jonathan Gray, and Li Pi. bq. bq. bq. Summary bq. --- bq. bq. Review request - I apparently can't edit tlipcon's earlier posting of my diff, so creating a new one. bq. bq. bq. This addresses bug HBase-4027. bq. https://issues.apache.org/jira/browse/HBase-4027 bq. bq. bq. Diffs bq. - bq. bq.conf/hbase-env.sh 2d55d27 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java 2d4002c bq.src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 097dc50 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java 1338453 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java 886c31d bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/Slab.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 7a917da bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 7b7bf73 bq.src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/io/hfile/HFileBlockCacheTestUtils.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/io/hfile/SingleSizeCacheTestUtils.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 4387170 bq. bq. Diff: https://reviews.apache.org/r/1214/diff bq. bq. bq. Testing bq. --- bq. bq. Ran benchmarks against it in HBase standalone mode. Wrote test cases for all classes, multithreaded test cases exist for the cache. bq. bq. bq. Thanks, bq. bq. Li bq. bq. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Li Pi Priority: Minor Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, hbase4027v11.5.diff, hbase4027v11.6.diff, hbase4027v11.diff, slabcachepatch.diff, slabcachepatchv2.diff, slabcachepatchv3.1.diff, slabcachepatchv3.2.diff, slabcachepatchv3.diff, slabcachepatchv4.5.diff, slabcachepatchv4.diff Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see:
[jira] [Commented] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084428#comment-13084428 ] jirapos...@reviews.apache.org commented on HBASE-4027: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1214/ --- (Updated 2011-08-12 22:30:10.310736) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, Jonathan Gray, and Li Pi. Changes --- fixed two bugs as per ted yu's reviews. Summary --- Review request - I apparently can't edit tlipcon's earlier posting of my diff, so creating a new one. This addresses bug HBase-4027. https://issues.apache.org/jira/browse/HBase-4027 Diffs (updated) - conf/hbase-env.sh 2d55d27 src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCache.java 2d4002c src/main/java/org/apache/hadoop/hbase/io/hfile/CacheStats.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 097dc50 src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java 1338453 src/main/java/org/apache/hadoop/hbase/io/hfile/SimpleBlockCache.java 886c31d src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/Slab.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 7a917da src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 7b7bf73 src/main/java/org/apache/hadoop/hbase/util/DirectMemoryUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/HFileBlockCacheTestUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/SingleSizeCacheTestUtils.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlab.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 4387170 Diff: https://reviews.apache.org/r/1214/diff Testing --- Ran benchmarks against it in HBase standalone mode. Wrote test cases for all classes, multithreaded test cases exist for the cache. Thanks, Li Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Li Pi Priority: Minor Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, hbase4027v11.5.diff, hbase4027v11.6.diff, hbase4027v11.7.diff, hbase4027v11.diff, slabcachepatch.diff, slabcachepatchv2.diff, slabcachepatchv3.1.diff, slabcachepatchv3.2.diff, slabcachepatchv3.diff, slabcachepatchv4.5.diff, slabcachepatchv4.diff Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4027) Enable direct byte buffers LruBlockCache
[ https://issues.apache.org/jira/browse/HBASE-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Pi updated HBASE-4027: - Attachment: hbase4027v11.7.diff fixed two typos. Enable direct byte buffers LruBlockCache Key: HBASE-4027 URL: https://issues.apache.org/jira/browse/HBASE-4027 Project: HBase Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Li Pi Priority: Minor Attachments: 4027-v5.diff, 4027v7.diff, HBase-4027 (1).pdf, HBase-4027.pdf, HBase4027v8.diff, HBase4027v9.diff, hbase-4027-v10.5.diff, hbase-4027-v10.diff, hbase-4027v10.6.diff, hbase-4027v6.diff, hbase4027v11.5.diff, hbase4027v11.6.diff, hbase4027v11.7.diff, hbase4027v11.diff, slabcachepatch.diff, slabcachepatchv2.diff, slabcachepatchv3.1.diff, slabcachepatchv3.2.diff, slabcachepatchv3.diff, slabcachepatchv4.5.diff, slabcachepatchv4.diff Java offers the creation of direct byte buffers which are allocated outside of the heap. They need to be manually free'd, which can be accomplished using an documented {{clean}} method. The feature will be optional. After implementing, we can benchmark for differences in speed and garbage collection observances. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2399) Forced splits only act on the first family in a table
[ https://issues.apache.org/jira/browse/HBASE-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084443#comment-13084443 ] jirapos...@reviews.apache.org commented on HBASE-2399: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1484/ --- Review request for hbase. Summary --- 1. Add tests for forcesplit multi-column-family scenarios. 2. Modify HRegion so that it picks splitpoint based on largest store, instead of the first splittable store. It applies to both forcesplit and automatic split. This addresses bug hbase-2399. https://issues.apache.org/jira/browse/hbase-2399 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1157283 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 1157283 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java 1157283 Diff: https://reviews.apache.org/r/1484/diff Testing --- Thanks, Ming Forced splits only act on the first family in a table - Key: HBASE-2399 URL: https://issues.apache.org/jira/browse/HBASE-2399 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Fix For: 0.92.0 Attachments: HBASE-2399-test-v1.patch While working on a patch for HBASE-2375, I came across a few bugs in the existing code related to splits. If a user triggers a manual split, it flips a forceSplit boolean to true and then triggers a compaction (this is very similar to my current implementation for HBASE-2375). However, the forceSplit boolean is flipped back to false at the beginning of Store.compact(). So the force split only acts on the first family in the table. If that Store is not splittable for some reason (it is empty or has only one row), then the entire region will not be split, regardless of what is in other families. Even if there is data in the first family, the midKey is determined based solely on that family. If it has two rows and the next family has 1M rows, we pick the split key based on the two rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4195) Possible unconsistency in a memstore read after a reseek, possible performance improvement
[ https://issues.apache.org/jira/browse/HBASE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1308#comment-1308 ] nkeywal commented on HBASE-4195: With the current implementation, setting the config RESEEKMAX_KEY to -1 (read with conf.getInt(RESEEKMAX_KEY, RESEEKMAX_DEFAULT);) will have this effect. disclaimer: i did not test it. Possible unconsistency in a memstore read after a reseek, possible performance improvement -- Key: HBASE-4195 URL: https://issues.apache.org/jira/browse/HBASE-4195 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Environment: all Reporter: nkeywal Priority: Critical This follows the dicussion around HBASE-3855, and the random errors (20% failure on trunk) on the unit test org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting I saw some points related to numIterReseek, used in the MemStoreScanner#getNext (line 690): {noformat}679 protected KeyValue getNext(Iterator it) { 680 KeyValue ret = null; 681 long readPoint = ReadWriteConsistencyControl.getThreadReadPoint(); 682 //DebugPrint.println( MS@ + hashCode() + : threadpoint = + readPoint); 683 684 while (ret == null it.hasNext()) { 685 KeyValue v = it.next(); 686 if (v.getMemstoreTS() = readPoint) { 687 // keep it. 688 ret = v; 689 } 690 numIterReseek--; 691 if (numIterReseek == 0) { 692 break; 693} 694 } 695 return ret; 696 }{noformat} This function is called by seek, reseek, and next. The numIterReseek is only usefull for reseek. There are some issues, I am not totally sure it's the root cause of the test case error, but it could explain partly the randomness of the error, and one point is for sure a bug. 1) In getNext, numIterReseek is decreased, then compared to zero. The seek function sets numIterReseek to zero before calling getNext. It means that the value will be actually negative, hence the test will always fail, and the loop will continue. It is the expected behaviour, but it's quite smart. 2) In reseek, numIterReseek is not set between the loops on the two iterators. If the numIterReseek is equals to zero after the loop on the first one, the loop on the second one will never call seek, as numIterReseek will be negative. 3) Still in reseek, the test to call seek is (kvsetNextRow == null numIterReseek == 0). In other words, if kvsetNextRow is not null when numIterReseek equals zero, numIterReseek will start to be negative at the next iteration and seek will never be called. 4) You can have side effects if reseek ends with a numIterReseek 0: the following calls to the next function will decrease numIterReseek to zero, and getNext will break instead of continuing the loop. As a result, later calls to next() may return null or not depending on how is configured the default value for numIterReseek. To check if the issue comes from point 4, you can set the numIterReseek to zero before returning in reseek: {noformat} numIterReseek = 0; return (kvsetNextRow != null || snapshotNextRow != null); }{noformat} On my env, on trunk, it seems to work, but as it's random I am not really sure. I also had to modify the test (I added a loop) to make it fails more often, the original test was working quite well here. It has to be confirmed that this totally fix (it could be partial or unrelated) org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting before implementing a complete solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084451#comment-13084451 ] Mingjie Lai commented on HBASE-4197: @lars Yes, RegionScanner'd better to be an interface instead of a class for better extension. Overall the patch looks good to me. Can you finish the patch and post it to reviewboard? RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197.txt Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4196) TableRecordReader may skip first row of region
[ https://issues.apache.org/jira/browse/HBASE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HBASE-4196: --- Attachment: HBASE-4196-trunk.patch Thanks. Here is the update. Also, please note that the mapred version used to handle only UnknownScannerException. It is fixed to handle IOException. TableRecordReader may skip first row of region -- Key: HBASE-4196 URL: https://issues.apache.org/jira/browse/HBASE-4196 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Jan Lukavsky Assignee: Ming Ma Attachments: HBASE-4196-trunk.patch, HBASE-4196-trunk.patch After the following scenario, the first record of region is skipped, without being sent to Mapper: - the reader is initialized with TableRecordReader.init() - then nextKeyValue is called, causing call to scanner.next() - here ScannerTimeoutException occurs - the scanner is restarted by call to restart() and then *two* calls to scanner.next() occur, causing we have lost the first row -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084478#comment-13084478 ] Lars Hofhansl commented on HBASE-4197: -- Hey Mingjie, how do I do that? Is there some documentation where I can read about the process? Thanks. -- Lars RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197.txt Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions
[ https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084479#comment-13084479 ] jirapos...@reviews.apache.org commented on HBASE-4014: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/969/#review1433 --- Nice work, Eugene. I think we're getting close. Just two suggested improvements below. The main question still open to debate, I think, is whether or not aborting the server on unhandled exceptions is appropriate. On the one hand, aborting takes the fail-fast approach and makes buggy coprocessors much more visible. It's a lot more likely that a bug will be noticed and fixed if it brings down a region server! On the other hand, I think coprocessors already pose enough of a stability risk to a cluster. I think we should be working to minimize that by containing the impact that a buggy coprocessor can have. If they coprocessor really wants or needs to trigger an abort, it can already do so, since (Master|RegionServer)Services extend Server, which extends Abortable. I think I'd be more in favor of removing the coprocessor from the active set (we should make this as visible as possible so it's clear the coprocessor is no longer active), or at least wrapping the exception in a DoNotRetryIOException and communicating it back to the client? Maybe both? I guess I'd be okay with a configuration option to abort on error (I think a single config option is sufficient), as long as it's disabled by default. But that would still imply we need some other handling when the option is disabled. src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java https://reviews.apache.org/r/969/#comment3299 I would just synchronize the set here: SetString coprocessorNames = Collections.synchronizedSet(new HashSetString()); src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java https://reviews.apache.org/r/969/#comment3300 If you move this into loadInstance() then you don't have to duplicate it elsewhere, since all the other load methods wind up calling that. - Gary On 2011-08-10 22:48:08, Eugene Koontz wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/969/ bq. --- bq. bq. (Updated 2011-08-10 22:48:08) bq. bq. bq. Review request for hbase, Gary Helmling and Mingjie Lai. bq. bq. bq. Summary bq. --- bq. bq. https://issues.apache.org/jira/browse/HBASE-4014 Coprocessors: Flag the presence of coprocessors in logged exceptions bq. bq. The general gist here is to wrap each of {Master,RegionServer}CoprocessorHost's coprocessor call inside a bq. bq. try { ... } catch (Throwable e) { handleCoprocessorThrowable(e) } bq. bq. block. bq. bq. handleCoprocessorThrowable() is responsible for either passing 'e' along to the client (if 'e' is an IOException) or, otherwise, aborting the service (Regionserver or Master). bq. bq. The abort message contains a list of the loaded coprocessors for crash analysis. bq. bq. bq. This addresses bug HBASE-4014. bq. https://issues.apache.org/jira/browse/HBASE-4014 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 18ba6e7 bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 8beeb68 bq.src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java aa930f5 bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 23225d7 bq. src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java c44da73 bq. src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorException.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorException.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/969/diff bq. bq. bq. Testing bq. --- bq. bq. patch includes two tests: bq. bq. TestMasterCoprocessorException.java bq. TestRegionServerCoprocessorException.java bq. bq. both tests pass in my build environment. bq. bq. bq. Thanks, bq. bq. Eugene bq. bq. Coprocessors: Flag the presence of coprocessors in logged exceptions Key: HBASE-4014 URL: https://issues.apache.org/jira/browse/HBASE-4014 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Andrew Purtell Assignee: Eugene Koontz Fix
[jira] [Updated] (HBASE-4196) TableRecordReader may skip first row of region
[ https://issues.apache.org/jira/browse/HBASE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HBASE-4196: --- Attachment: HBASE-4196-trunk.patch That is due to the svn flag -w used. I have fixed it. TableRecordReader may skip first row of region -- Key: HBASE-4196 URL: https://issues.apache.org/jira/browse/HBASE-4196 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Jan Lukavsky Assignee: Ming Ma Attachments: HBASE-4196-trunk.patch, HBASE-4196-trunk.patch, HBASE-4196-trunk.patch After the following scenario, the first record of region is skipped, without being sent to Mapper: - the reader is initialized with TableRecordReader.init() - then nextKeyValue is called, causing call to scanner.next() - here ScannerTimeoutException occurs - the scanner is restarted by call to restart() and then *two* calls to scanner.next() occur, causing we have lost the first row -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4190) Coprocessors: pull up some cp constants from cp package to o.a.h.h.HConstants
[ https://issues.apache.org/jira/browse/HBASE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084505#comment-13084505 ] jirapos...@reviews.apache.org commented on HBASE-4190: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1461/ --- (Updated 2011-08-13 01:08:04.852897) Review request for hbase. Changes --- Based on stack's comments and an offline discussion with Gary and Andy. - pulled Coprocessor and CoprocessorEnvironment classes to root level, from o.a.h.h.coprocessor to o.a.h.h. - keep cp priority constants still in Coprocessor class. - htd pattern related constants in HConstant What do you think? Summary --- Coprocessors: pull up some cp constants from cp package to o.a.h.h.HConstants This addresses bug HBASE-4190. https://issues.apache.org/jira/browse/HBASE-4190 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/Coprocessor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/CoprocessorEnvironment.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/HConstants.java dda254d src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java d835582 src/main/java/org/apache/hadoop/hbase/coprocessor/BaseEndpointCoprocessor.java 2fc8f39 src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 506051d src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java ec88a01 src/main/java/org/apache/hadoop/hbase/coprocessor/Coprocessor.java 0290bf2 src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorEnvironment.java 54ccd6f src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 18ba6e7 src/main/java/org/apache/hadoop/hbase/coprocessor/MasterCoprocessorEnvironment.java 5d8cf4c src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java 9349d5b src/main/java/org/apache/hadoop/hbase/coprocessor/RegionCoprocessorEnvironment.java da8076c src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java cfbb29d src/main/java/org/apache/hadoop/hbase/coprocessor/WALCoprocessorEnvironment.java 6580c2c src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java b086747 src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java c44da73 src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java 03df574 src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java a81ff84 src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 36816e8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java c85146a src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 0ab1339 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java 6d31d70 src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java d9f6e5f src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java b4c407b Diff: https://reviews.apache.org/r/1461/diff Testing --- TestClassLoading passed locally. Thanks, Mingjie Coprocessors: pull up some cp constants from cp package to o.a.h.h.HConstants - Key: HBASE-4190 URL: https://issues.apache.org/jira/browse/HBASE-4190 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.90.4 Reporter: Mingjie Lai Assignee: Mingjie Lai Priority: Minor Fix For: 0.90.5 At HBase-3810, stack gave a comment after patch committed: This is a bit odd where a class in the parent package has references to a sub package. Should these classes or at least their constants be pulled up to be at same level as HTableD? Create a new jira where the constants will be pulled from o.a.h.h.regionserver.RegionCoprocessorHost to o.a.h.h.HConstants. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2399) Forced splits only act on the first family in a table
[ https://issues.apache.org/jira/browse/HBASE-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HBASE-2399: --- Attachment: HBASE-2399-trunk.patch Fix the issues raised by Ted. Forced splits only act on the first family in a table - Key: HBASE-2399 URL: https://issues.apache.org/jira/browse/HBASE-2399 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Fix For: 0.92.0 Attachments: HBASE-2399-test-v1.patch, HBASE-2399-trunk.patch While working on a patch for HBASE-2375, I came across a few bugs in the existing code related to splits. If a user triggers a manual split, it flips a forceSplit boolean to true and then triggers a compaction (this is very similar to my current implementation for HBASE-2375). However, the forceSplit boolean is flipped back to false at the beginning of Store.compact(). So the force split only acts on the first family in the table. If that Store is not splittable for some reason (it is empty or has only one row), then the entire region will not be split, regardless of what is in other families. Even if there is data in the first family, the midKey is determined based solely on that family. If it has two rows and the next family has 1M rows, we pick the split key based on the two rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2399) Forced splits only act on the first family in a table
[ https://issues.apache.org/jira/browse/HBASE-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084509#comment-13084509 ] jirapos...@reviews.apache.org commented on HBASE-2399: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1484/#review1438 --- http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java https://reviews.apache.org/r/1484/#comment3332 whitespace http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java https://reviews.apache.org/r/1484/#comment whitespace here and below - Jonathan On 2011-08-12 22:58:55, Ming Ma wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1484/ bq. --- bq. bq. (Updated 2011-08-12 22:58:55) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. 1. Add tests for forcesplit multi-column-family scenarios. bq. 2. Modify HRegion so that it picks splitpoint based on largest store, instead of the first splittable store. It applies to both forcesplit and automatic split. bq. bq. bq. This addresses bug hbase-2399. bq. https://issues.apache.org/jira/browse/hbase-2399 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1157283 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 1157283 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java 1157283 bq. bq. Diff: https://reviews.apache.org/r/1484/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Ming bq. bq. Forced splits only act on the first family in a table - Key: HBASE-2399 URL: https://issues.apache.org/jira/browse/HBASE-2399 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Fix For: 0.92.0 Attachments: HBASE-2399-test-v1.patch, HBASE-2399-trunk.patch While working on a patch for HBASE-2375, I came across a few bugs in the existing code related to splits. If a user triggers a manual split, it flips a forceSplit boolean to true and then triggers a compaction (this is very similar to my current implementation for HBASE-2375). However, the forceSplit boolean is flipped back to false at the beginning of Store.compact(). So the force split only acts on the first family in the table. If that Store is not splittable for some reason (it is empty or has only one row), then the entire region will not be split, regardless of what is in other families. Even if there is data in the first family, the midKey is determined based solely on that family. If it has two rows and the next family has 1M rows, we pick the split key based on the two rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4197: - Attachment: 4197-bigger.txt Slightly larger patch that does away with all casting and instanceof nonsense for scanners. Please let me know if you generally agree with the approach, if so I'll get the review started. RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197-bigger.txt, 4197.txt Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4150) Potentially too many connections may be opened if ThreadLocalPool or RoundRobinPool is used
[ https://issues.apache.org/jira/browse/HBASE-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084514#comment-13084514 ] Ted Yu commented on HBASE-4150: --- Integrate to TRUNK. Thanks for the continued effort, Karthick. Potentially too many connections may be opened if ThreadLocalPool or RoundRobinPool is used --- Key: HBASE-4150 URL: https://issues.apache.org/jira/browse/HBASE-4150 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Karthick Sankarachary Fix For: 0.92.0 Attachments: 4150-1.txt, 4150.txt, 5140-2.txt, HBASE-4150-DOC.patch, HBASE-4150_final.patch See 'Problem with hbase.client.ipc.pool.type=threadlocal in trunk' discussion started by Lars George. From Lars Hofhansl: Looking at HBaseClient.getConnection(...) I see this: {code} synchronized (connections) { connection = connections.get(remoteId); if (connection == null) { connection = new Connection(remoteId); connections.put(remoteId, connection); } } {code} At the same time PoolMap.ThreadLocalPool.put is defined like this: {code} public R put(R resource) { R previousResource = get(); if (previousResource == null) { ... if (poolSize.intValue() = maxSize) { return null; } ... } {code} So... If the ThreadLocalPool reaches its capacity it always returns null and hence all new threads will create a new connection every time getConnection is called! I have also verified with a test program that works fine as long as the number of client threads (which include the threads in HTable's threadpool of course) is poolsize. Once that is no longer the case the number of connections explodes and the program dies with OOMEs (mostly because each Connection is associated with yet another thread). It's not clear what should happen, though. Maybe (1) the ThreadLocalPool should not have a limit, or maybe (2) allocations past the pool size should throw an exception (i.e. there's a hard limit), or maybe (3) in that case a single connection is returned for all threads while the pool it over its limit or (4) we start round robin with the other connection in the other thread locals. For #1 means that the number of client threads needs to be more carefully managed by the client app. In this case it would also be somewhat pointless that Connection have their own threads, we just pass stuff between threads. #2 would work, but puts more logic in the client. #3 would lead to hard to debug performance issues. And #4 is messy :) From Ted Yu: For HBaseClient, at least the javadoc doesn't match: {code} * @param config configuration * @return either a {@link PoolType#Reusable} or {@link PoolType#ThreadLocal} */ private static PoolType getPoolType(Configuration config) { return PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE), PoolType.RoundRobin, PoolType.ThreadLocal); {code} I think for RoundRobinPool, we shouldn't allow maxSize to be Integer#MAX_VALUE. Otherwise connection explosion described by Lars may incur. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084516#comment-13084516 ] Ted Yu commented on HBASE-4197: --- I like the cleaner code after the change. I know the following existed prior to your patch: {code} +public HRegionInfo getRegionName(); {code} Can we rename the method to getRegionInfo ? This would make the following code a little easier to understand: {code} region = getRegion(rs.getRegionName().getRegionName()); {code} Thanks for your effort, Lars. RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197-bigger.txt, 4197.txt Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4150) Potentially too many connections may be opened if ThreadLocalPool or RoundRobinPool is used
[ https://issues.apache.org/jira/browse/HBASE-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084518#comment-13084518 ] Lars Hofhansl commented on HBASE-4150: -- Thanks for the doc patch Karthick, it explains trade-offs very nicely. Potentially too many connections may be opened if ThreadLocalPool or RoundRobinPool is used --- Key: HBASE-4150 URL: https://issues.apache.org/jira/browse/HBASE-4150 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Karthick Sankarachary Fix For: 0.92.0 Attachments: 4150-1.txt, 4150.txt, 5140-2.txt, HBASE-4150-DOC.patch, HBASE-4150_final.patch See 'Problem with hbase.client.ipc.pool.type=threadlocal in trunk' discussion started by Lars George. From Lars Hofhansl: Looking at HBaseClient.getConnection(...) I see this: {code} synchronized (connections) { connection = connections.get(remoteId); if (connection == null) { connection = new Connection(remoteId); connections.put(remoteId, connection); } } {code} At the same time PoolMap.ThreadLocalPool.put is defined like this: {code} public R put(R resource) { R previousResource = get(); if (previousResource == null) { ... if (poolSize.intValue() = maxSize) { return null; } ... } {code} So... If the ThreadLocalPool reaches its capacity it always returns null and hence all new threads will create a new connection every time getConnection is called! I have also verified with a test program that works fine as long as the number of client threads (which include the threads in HTable's threadpool of course) is poolsize. Once that is no longer the case the number of connections explodes and the program dies with OOMEs (mostly because each Connection is associated with yet another thread). It's not clear what should happen, though. Maybe (1) the ThreadLocalPool should not have a limit, or maybe (2) allocations past the pool size should throw an exception (i.e. there's a hard limit), or maybe (3) in that case a single connection is returned for all threads while the pool it over its limit or (4) we start round robin with the other connection in the other thread locals. For #1 means that the number of client threads needs to be more carefully managed by the client app. In this case it would also be somewhat pointless that Connection have their own threads, we just pass stuff between threads. #2 would work, but puts more logic in the client. #3 would lead to hard to debug performance issues. And #4 is messy :) From Ted Yu: For HBaseClient, at least the javadoc doesn't match: {code} * @param config configuration * @return either a {@link PoolType#Reusable} or {@link PoolType#ThreadLocal} */ private static PoolType getPoolType(Configuration config) { return PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE), PoolType.RoundRobin, PoolType.ThreadLocal); {code} I think for RoundRobinPool, we shouldn't allow maxSize to be Integer#MAX_VALUE. Otherwise connection explosion described by Lars may incur. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4195) Possible unconsistency in a memstore read after a reseek, possible performance improvement
[ https://issues.apache.org/jira/browse/HBASE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084521#comment-13084521 ] Ted Yu commented on HBASE-4195: --- I think that will do the trick. I propose setting RESEEKMAX_DEFAULT to -1. Possible unconsistency in a memstore read after a reseek, possible performance improvement -- Key: HBASE-4195 URL: https://issues.apache.org/jira/browse/HBASE-4195 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Environment: all Reporter: nkeywal Priority: Critical This follows the dicussion around HBASE-3855, and the random errors (20% failure on trunk) on the unit test org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting I saw some points related to numIterReseek, used in the MemStoreScanner#getNext (line 690): {noformat}679 protected KeyValue getNext(Iterator it) { 680 KeyValue ret = null; 681 long readPoint = ReadWriteConsistencyControl.getThreadReadPoint(); 682 //DebugPrint.println( MS@ + hashCode() + : threadpoint = + readPoint); 683 684 while (ret == null it.hasNext()) { 685 KeyValue v = it.next(); 686 if (v.getMemstoreTS() = readPoint) { 687 // keep it. 688 ret = v; 689 } 690 numIterReseek--; 691 if (numIterReseek == 0) { 692 break; 693} 694 } 695 return ret; 696 }{noformat} This function is called by seek, reseek, and next. The numIterReseek is only usefull for reseek. There are some issues, I am not totally sure it's the root cause of the test case error, but it could explain partly the randomness of the error, and one point is for sure a bug. 1) In getNext, numIterReseek is decreased, then compared to zero. The seek function sets numIterReseek to zero before calling getNext. It means that the value will be actually negative, hence the test will always fail, and the loop will continue. It is the expected behaviour, but it's quite smart. 2) In reseek, numIterReseek is not set between the loops on the two iterators. If the numIterReseek is equals to zero after the loop on the first one, the loop on the second one will never call seek, as numIterReseek will be negative. 3) Still in reseek, the test to call seek is (kvsetNextRow == null numIterReseek == 0). In other words, if kvsetNextRow is not null when numIterReseek equals zero, numIterReseek will start to be negative at the next iteration and seek will never be called. 4) You can have side effects if reseek ends with a numIterReseek 0: the following calls to the next function will decrease numIterReseek to zero, and getNext will break instead of continuing the loop. As a result, later calls to next() may return null or not depending on how is configured the default value for numIterReseek. To check if the issue comes from point 4, you can set the numIterReseek to zero before returning in reseek: {noformat} numIterReseek = 0; return (kvsetNextRow != null || snapshotNextRow != null); }{noformat} On my env, on trunk, it seems to work, but as it's random I am not really sure. I also had to modify the test (I added a loop) to make it fails more often, the original test was working quite well here. It has to be confirmed that this totally fix (it could be partial or unrelated) org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting before implementing a complete solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4195) Possible inconsistency in a memstore read after a reseek, possible performance improvement
[ https://issues.apache.org/jira/browse/HBASE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4195: -- Summary: Possible inconsistency in a memstore read after a reseek, possible performance improvement (was: Possible unconsistency in a memstore read after a reseek, possible performance improvement) Possible inconsistency in a memstore read after a reseek, possible performance improvement -- Key: HBASE-4195 URL: https://issues.apache.org/jira/browse/HBASE-4195 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Environment: all Reporter: nkeywal Priority: Critical This follows the dicussion around HBASE-3855, and the random errors (20% failure on trunk) on the unit test org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting I saw some points related to numIterReseek, used in the MemStoreScanner#getNext (line 690): {noformat}679 protected KeyValue getNext(Iterator it) { 680 KeyValue ret = null; 681 long readPoint = ReadWriteConsistencyControl.getThreadReadPoint(); 682 //DebugPrint.println( MS@ + hashCode() + : threadpoint = + readPoint); 683 684 while (ret == null it.hasNext()) { 685 KeyValue v = it.next(); 686 if (v.getMemstoreTS() = readPoint) { 687 // keep it. 688 ret = v; 689 } 690 numIterReseek--; 691 if (numIterReseek == 0) { 692 break; 693} 694 } 695 return ret; 696 }{noformat} This function is called by seek, reseek, and next. The numIterReseek is only usefull for reseek. There are some issues, I am not totally sure it's the root cause of the test case error, but it could explain partly the randomness of the error, and one point is for sure a bug. 1) In getNext, numIterReseek is decreased, then compared to zero. The seek function sets numIterReseek to zero before calling getNext. It means that the value will be actually negative, hence the test will always fail, and the loop will continue. It is the expected behaviour, but it's quite smart. 2) In reseek, numIterReseek is not set between the loops on the two iterators. If the numIterReseek is equals to zero after the loop on the first one, the loop on the second one will never call seek, as numIterReseek will be negative. 3) Still in reseek, the test to call seek is (kvsetNextRow == null numIterReseek == 0). In other words, if kvsetNextRow is not null when numIterReseek equals zero, numIterReseek will start to be negative at the next iteration and seek will never be called. 4) You can have side effects if reseek ends with a numIterReseek 0: the following calls to the next function will decrease numIterReseek to zero, and getNext will break instead of continuing the loop. As a result, later calls to next() may return null or not depending on how is configured the default value for numIterReseek. To check if the issue comes from point 4, you can set the numIterReseek to zero before returning in reseek: {noformat} numIterReseek = 0; return (kvsetNextRow != null || snapshotNextRow != null); }{noformat} On my env, on trunk, it seems to work, but as it's random I am not really sure. I also had to modify the test (I added a loop) to make it fails more often, the original test was working quite well here. It has to be confirmed that this totally fix (it could be partial or unrelated) org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting before implementing a complete solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2399) Forced splits only act on the first family in a table
[ https://issues.apache.org/jira/browse/HBASE-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084522#comment-13084522 ] jirapos...@reviews.apache.org commented on HBASE-2399: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1484/#review1441 --- Ship it! +1 after fixing the white space (can you make a new patch Ming) Good stuff. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/1484/#comment3335 Nice javadoc http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java https://reviews.apache.org/r/1484/#comment3336 Yeah, there is more in here... you can see it up here in review board ming. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java https://reviews.apache.org/r/1484/#comment3337 Nice test. - Michael On 2011-08-12 22:58:55, Ming Ma wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1484/ bq. --- bq. bq. (Updated 2011-08-12 22:58:55) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. 1. Add tests for forcesplit multi-column-family scenarios. bq. 2. Modify HRegion so that it picks splitpoint based on largest store, instead of the first splittable store. It applies to both forcesplit and automatic split. bq. bq. bq. This addresses bug hbase-2399. bq. https://issues.apache.org/jira/browse/hbase-2399 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1157283 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 1157283 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java 1157283 bq. bq. Diff: https://reviews.apache.org/r/1484/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Ming bq. bq. Forced splits only act on the first family in a table - Key: HBASE-2399 URL: https://issues.apache.org/jira/browse/HBASE-2399 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.3 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Labels: moved_from_0_20_5 Fix For: 0.92.0 Attachments: HBASE-2399-test-v1.patch, HBASE-2399-trunk.patch While working on a patch for HBASE-2375, I came across a few bugs in the existing code related to splits. If a user triggers a manual split, it flips a forceSplit boolean to true and then triggers a compaction (this is very similar to my current implementation for HBASE-2375). However, the forceSplit boolean is flipped back to false at the beginning of Store.compact(). So the force split only acts on the first family in the table. If that Store is not splittable for some reason (it is empty or has only one row), then the entire region will not be split, regardless of what is in other families. Even if there is data in the first family, the midKey is determined based solely on that family. If it has two rows and the next family has 1M rows, we pick the split key based on the two rows. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4190) Coprocessors: pull up some cp constants from cp package to o.a.h.h.HConstants
[ https://issues.apache.org/jira/browse/HBASE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084523#comment-13084523 ] jirapos...@reviews.apache.org commented on HBASE-4190: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1461/#review1442 --- Ship it! LGTM src/main/java/org/apache/hadoop/hbase/Coprocessor.java https://reviews.apache.org/r/1461/#comment3338 Interfaces up here in the base package is good I think. src/main/java/org/apache/hadoop/hbase/HConstants.java https://reviews.apache.org/r/1461/#comment3339 Do these constants belong here then now you've pulled up the Interfaces? If so, thats fine... just asking. src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java https://reviews.apache.org/r/1461/#comment3340 This is good. src/main/java/org/apache/hadoop/hbase/coprocessor/BaseEndpointCoprocessor.java https://reviews.apache.org/r/1461/#comment3341 This is fine too I think. - Michael On 2011-08-13 01:08:04, Mingjie Lai wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1461/ bq. --- bq. bq. (Updated 2011-08-13 01:08:04) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Coprocessors: pull up some cp constants from cp package to o.a.h.h.HConstants bq. bq. bq. This addresses bug HBASE-4190. bq. https://issues.apache.org/jira/browse/HBASE-4190 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/Coprocessor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/CoprocessorEnvironment.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/HConstants.java dda254d bq.src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java d835582 bq. src/main/java/org/apache/hadoop/hbase/coprocessor/BaseEndpointCoprocessor.java 2fc8f39 bq. src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 506051d bq. src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java ec88a01 bq.src/main/java/org/apache/hadoop/hbase/coprocessor/Coprocessor.java 0290bf2 bq. src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorEnvironment.java 54ccd6f bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 18ba6e7 bq. src/main/java/org/apache/hadoop/hbase/coprocessor/MasterCoprocessorEnvironment.java 5d8cf4c bq.src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java 9349d5b bq. src/main/java/org/apache/hadoop/hbase/coprocessor/RegionCoprocessorEnvironment.java da8076c bq.src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java cfbb29d bq. src/main/java/org/apache/hadoop/hbase/coprocessor/WALCoprocessorEnvironment.java 6580c2c bq.src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java b086747 bq. src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java c44da73 bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java 03df574 bq.src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java a81ff84 bq. src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 36816e8 bq. src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java c85146a bq. src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 0ab1339 bq. src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java 6d31d70 bq.src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java d9f6e5f bq.src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java b4c407b bq. bq. Diff: https://reviews.apache.org/r/1461/diff bq. bq. bq. Testing bq. --- bq. bq. TestClassLoading passed locally. bq. bq. bq. Thanks, bq. bq. Mingjie bq. bq. Coprocessors: pull up some cp constants from cp package to o.a.h.h.HConstants - Key: HBASE-4190 URL: https://issues.apache.org/jira/browse/HBASE-4190 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.90.4 Reporter: Mingjie Lai Assignee: Mingjie Lai Priority: Minor Fix For: 0.90.5 At HBase-3810, stack gave a comment after patch committed: This is a bit odd where a class in the parent package has references to a sub package. Should
[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions
[ https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084524#comment-13084524 ] jirapos...@reviews.apache.org commented on HBASE-4014: -- bq. On 2011-08-12 23:46:30, Gary Helmling wrote: bq. Nice work, Eugene. I think we're getting close. Just two suggested improvements below. bq. bq. The main question still open to debate, I think, is whether or not aborting the server on unhandled exceptions is appropriate. bq. bq. On the one hand, aborting takes the fail-fast approach and makes buggy coprocessors much more visible. It's a lot more likely that a bug will be noticed and fixed if it brings down a region server! bq. bq. On the other hand, I think coprocessors already pose enough of a stability risk to a cluster. I think we should be working to minimize that by containing the impact that a buggy coprocessor can have. If they coprocessor really wants or needs to trigger an abort, it can already do so, since (Master|RegionServer)Services extend Server, which extends Abortable. bq. bq. I think I'd be more in favor of removing the coprocessor from the active set (we should make this as visible as possible so it's clear the coprocessor is no longer active), or at least wrapping the exception in a DoNotRetryIOException and communicating it back to the client? Maybe both? bq. bq. I guess I'd be okay with a configuration option to abort on error (I think a single config option is sufficient), as long as it's disabled by default. But that would still imply we need some other handling when the option is disabled. I like Gary's reasoning here. - Michael --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/969/#review1433 --- On 2011-08-10 22:48:08, Eugene Koontz wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/969/ bq. --- bq. bq. (Updated 2011-08-10 22:48:08) bq. bq. bq. Review request for hbase, Gary Helmling and Mingjie Lai. bq. bq. bq. Summary bq. --- bq. bq. https://issues.apache.org/jira/browse/HBASE-4014 Coprocessors: Flag the presence of coprocessors in logged exceptions bq. bq. The general gist here is to wrap each of {Master,RegionServer}CoprocessorHost's coprocessor call inside a bq. bq. try { ... } catch (Throwable e) { handleCoprocessorThrowable(e) } bq. bq. block. bq. bq. handleCoprocessorThrowable() is responsible for either passing 'e' along to the client (if 'e' is an IOException) or, otherwise, aborting the service (Regionserver or Master). bq. bq. The abort message contains a list of the loaded coprocessors for crash analysis. bq. bq. bq. This addresses bug HBASE-4014. bq. https://issues.apache.org/jira/browse/HBASE-4014 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 18ba6e7 bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 8beeb68 bq.src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java aa930f5 bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 23225d7 bq. src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java c44da73 bq. src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorException.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorException.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/969/diff bq. bq. bq. Testing bq. --- bq. bq. patch includes two tests: bq. bq. TestMasterCoprocessorException.java bq. TestRegionServerCoprocessorException.java bq. bq. both tests pass in my build environment. bq. bq. bq. Thanks, bq. bq. Eugene bq. bq. Coprocessors: Flag the presence of coprocessors in logged exceptions Key: HBASE-4014 URL: https://issues.apache.org/jira/browse/HBASE-4014 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Andrew Purtell Assignee: Eugene Koontz Fix For: 0.92.0 Attachments: HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch For some initial triage of bug reports for core versus for deployments with loaded coprocessors, we need something like the Linux kernel's taint flag, and list of linked in modules that show up in the output of every OOPS, to appear
[jira] [Updated] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4197: - Attachment: 4197-v2.txt RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197-bigger.txt, 4197-v2.txt, 4197.txt Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084527#comment-13084527 ] Lars Hofhansl commented on HBASE-4197: -- Renamed getRegionName() to getRegionInfo(). Also cleaned up some more comments, and removed all references to InternalScanner from HRegionServer and HRegion (there were only 3 or 4 left anyway). RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197-bigger.txt, 4197-v2.txt, 4197.txt Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084528#comment-13084528 ] Ted Yu commented on HBASE-4197: --- +1 on patch version 2. Please use review board to get more feedback. RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197-bigger.txt, 4197-v2.txt, 4197.txt Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084531#comment-13084531 ] jirapos...@reviews.apache.org commented on HBASE-4197: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1496/ --- Review request for Ted Yu and Mingjie Lai. Summary --- 1. Don't require custom scanners created by conprocessors to be subclasses of HRegion.RegionScanner (see HBASE-4197). 2. Simplify the interfaces for Scanners in HRegion, HRegionServer, and RegionObserver. This avoids a bunch instanceof checks and casts to HRegion.RegionScanner. (Sorry HBase-git would accept my patch) This addresses bug HBASE-4197. https://issues.apache.org/jira/browse/HBASE-4197 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestWideScanner.java 1157311 Diff: https://reviews.apache.org/r/1496/diff Testing --- Manual test attached to the bug. Thanks, Lars RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197-bigger.txt, 4197-v2.txt, 4197.txt, ScannerTest.java Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084532#comment-13084532 ] jirapos...@reviews.apache.org commented on HBASE-4197: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1496/ --- (Updated 2011-08-13 04:38:38.030763) Review request for Ted Yu and Mingjie Lai. Summary (updated) --- 1. Don't require custom scanners created by conprocessors to be subclasses of HRegion.RegionScanner (see HBASE-4197). 2. Simplify the interfaces for Scanners in HRegion, HRegionServer, and RegionObserver. This avoids a bunch instanceof checks and casts to HRegion.RegionScanner. (Sorry HBase-git would not accept my patch) This addresses bug HBASE-4197. https://issues.apache.org/jira/browse/HBASE-4197 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 1157311 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestWideScanner.java 1157311 Diff: https://reviews.apache.org/r/1496/diff Testing --- Manual test attached to the bug. Thanks, Lars RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197-bigger.txt, 4197-v2.txt, 4197.txt, ScannerTest.java Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4150) Potentially too many connections may be opened if ThreadLocalPool or RoundRobinPool is used
[ https://issues.apache.org/jira/browse/HBASE-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084540#comment-13084540 ] stack commented on HBASE-4150: -- +1 on doc patch. Potentially too many connections may be opened if ThreadLocalPool or RoundRobinPool is used --- Key: HBASE-4150 URL: https://issues.apache.org/jira/browse/HBASE-4150 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Karthick Sankarachary Fix For: 0.92.0 Attachments: 4150-1.txt, 4150.txt, 5140-2.txt, HBASE-4150-DOC.patch, HBASE-4150_final.patch See 'Problem with hbase.client.ipc.pool.type=threadlocal in trunk' discussion started by Lars George. From Lars Hofhansl: Looking at HBaseClient.getConnection(...) I see this: {code} synchronized (connections) { connection = connections.get(remoteId); if (connection == null) { connection = new Connection(remoteId); connections.put(remoteId, connection); } } {code} At the same time PoolMap.ThreadLocalPool.put is defined like this: {code} public R put(R resource) { R previousResource = get(); if (previousResource == null) { ... if (poolSize.intValue() = maxSize) { return null; } ... } {code} So... If the ThreadLocalPool reaches its capacity it always returns null and hence all new threads will create a new connection every time getConnection is called! I have also verified with a test program that works fine as long as the number of client threads (which include the threads in HTable's threadpool of course) is poolsize. Once that is no longer the case the number of connections explodes and the program dies with OOMEs (mostly because each Connection is associated with yet another thread). It's not clear what should happen, though. Maybe (1) the ThreadLocalPool should not have a limit, or maybe (2) allocations past the pool size should throw an exception (i.e. there's a hard limit), or maybe (3) in that case a single connection is returned for all threads while the pool it over its limit or (4) we start round robin with the other connection in the other thread locals. For #1 means that the number of client threads needs to be more carefully managed by the client app. In this case it would also be somewhat pointless that Connection have their own threads, we just pass stuff between threads. #2 would work, but puts more logic in the client. #3 would lead to hard to debug performance issues. And #4 is messy :) From Ted Yu: For HBaseClient, at least the javadoc doesn't match: {code} * @param config configuration * @return either a {@link PoolType#Reusable} or {@link PoolType#ThreadLocal} */ private static PoolType getPoolType(Configuration config) { return PoolType.valueOf(config.get(HConstants.HBASE_CLIENT_IPC_POOL_TYPE), PoolType.RoundRobin, PoolType.ThreadLocal); {code} I think for RoundRobinPool, we shouldn't allow maxSize to be Integer#MAX_VALUE. Otherwise connection explosion described by Lars may incur. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4196) TableRecordReader may skip first row of region
[ https://issues.apache.org/jira/browse/HBASE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4196. -- Resolution: Fixed Fix Version/s: 0.90.5 Hadoop Flags: [Reviewed] Committed branch and trunk. Thanks for the patch Ming (And review Ted) TableRecordReader may skip first row of region -- Key: HBASE-4196 URL: https://issues.apache.org/jira/browse/HBASE-4196 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Jan Lukavsky Assignee: Ming Ma Fix For: 0.90.5 Attachments: HBASE-4196-trunk.patch, HBASE-4196-trunk.patch, HBASE-4196-trunk.patch After the following scenario, the first record of region is skipped, without being sent to Mapper: - the reader is initialized with TableRecordReader.init() - then nextKeyValue is called, causing call to scanner.next() - here ScannerTimeoutException occurs - the scanner is restarted by call to restart() and then *two* calls to scanner.next() occur, causing we have lost the first row -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4170) createTable java doc needs to be improved
[ https://issues.apache.org/jira/browse/HBASE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4170: - Resolution: Fixed Fix Version/s: (was: 0.90.1) 0.90.5 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to branch and trunk. Thanks for the patch Mubarak. createTable java doc needs to be improved - Key: HBASE-4170 URL: https://issues.apache.org/jira/browse/HBASE-4170 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.1, 0.90.2, 0.90.3, 0.90.4 Environment: HBase-0.90.1 Reporter: Mubarak Seyed Fix For: 0.90.5 Attachments: create_table_javadoc_HBASE_4170.patch HBaseAdmin.createTable() java doc says public void createTable(HTableDescriptor desc, byte[][] splitKeys) throws IOException Creates a new table with an initial set of empty regions defined by the specified split keys. The total number of regions created will be the number of split keys plus one (the first region has a null start key and the last region has a null end key). Synchronous operation. If we specify null values for first region start key and last region end key, geting NullPointerException as Arrays.sort compares each element. I guess the documentation should not talk about null values and explain about splitKeys[][] length as n-1, where n is number of regions. splitKeys[][] would look like splitKeys[0] = key value 1 .. splitKeys[n-1] = key value n-1 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4197) RegionServer expects all scanner to be subclasses of HRegion.RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4197: - Component/s: coprocessors RegionServer expects all scanner to be subclasses of HRegion.RegionScanner -- Key: HBASE-4197 URL: https://issues.apache.org/jira/browse/HBASE-4197 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Lars Hofhansl Attachments: 4197-bigger.txt, 4197-v2.txt, 4197.txt, ScannerTest.java Returning just an InternalScanner from RegionObsever.{pre|post}OpenScanner leads to the following exception when using the scanner. java.io.IOException: InternalScanner implementation is expected to be HRegion.RegionScanner. at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2023) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:314) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1225) The problem is in HRegionServer.next(...): {code} InternalScanner s = this.scanners.get(scannerName); ... // Call coprocessor. Get region info from scanner. HRegion region = null; if (s instanceof HRegion.RegionScanner) { HRegion.RegionScanner rs = (HRegion.RegionScanner) s; region = getRegion(rs.getRegionName().getRegionName()); } else { throw new IOException(InternalScanner implementation is expected + to be HRegion.RegionScanner.); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira