[jira] Commented: (HBASE-29) HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile
[ https://issues.apache.org/jira/browse/HBASE-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906683#action_12906683 ] Pranav Khaitan commented on HBASE-29: - I think the code for Store has changed since then and this issue doesn't exist anymore. In that case, we should consider closing this jira HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile - Key: HBASE-29 URL: https://issues.apache.org/jira/browse/HBASE-29 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.1.2, 0.2.0 Reporter: Bryan Duxbury Priority: Minor Attachments: 29.patch Ok, this one is a little tricky. Let's say that you write a row with some value without a timestamp, thus meaning right now. Then, the memcache gets flushed out to a MapFile. Then, you write another value to the same row, this time with a timestamp that is in the past, ie, before the now timestamp of the first put. Some time later, but before there is a compaction, if you do a get for this row, and only ask for a single version, you will logically be expecting the latest version of the cell, which you would assume would be the one written at now time. Instead, you will get the value written into the past cell, because even though it is tagged as having happened in the past, it actually *was written* after the now cell, and thus when #get searches for satisfying values, it runs into the one most recently written first. The result of this problem is inconsistent data results. Note that this problem only ever exists when there's an uncompacted HStore, because during compaction, these cells will all get sorted into the correct order by timestamp and such. In a way, this actually makes the problem worse, because then you could easily get inconsistent results from HBase about the same (unchanged) row depending on whether there's been a flush/compaction. The only solution I can think of for this problem at the moment is to scan all the MapFiles and Memcache for possible results, sort them, and then select the desired number of versions off of the top. This is unfortunate because it means you never get the snazzy shortcircuit logic except within a single mapfile or memcache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row
[ https://issues.apache.org/jira/browse/HBASE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906687#action_12906687 ] ryan rawson commented on HBASE-2959: If we got rid of DeleteFamily would make this easy to implement. Scanning always starts at the beginning of a row Key: HBASE-2959 URL: https://issues.apache.org/jira/browse/HBASE-2959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.4, 0.20.5, 0.20.6, 0.89.20100621 Reporter: Benoit Sigoure Priority: Blocker In HBASE-2248, the code in {{HRegion#get}} was changed like so: {code} - private void get(final Store store, final Get get, -final NavigableSetbyte [] qualifiers, ListKeyValue result) - throws IOException { -store.get(get, qualifiers, result); + /* + * Do a get based on the get parameter. + */ + private ListKeyValue get(final Get get) throws IOException { +Scan scan = new Scan(get); + +ListKeyValue results = new ArrayListKeyValue(); + +InternalScanner scanner = null; +try { + scanner = getScanner(scan); + scanner.next(results); +} finally { + if (scanner != null) +scanner.close(); +} +return results; } {code} So instead of doing a {{get}} straight on the {{Store}}, we now open a scanner. The problem is that we eventually end up in {{ScanQueryMatcher}} where the constructor does: {{this.startKey = KeyValue.createFirstOnRow(scan.getStartRow());}}. This entails that if we have a very wide row (thousands of columns), the scanner will need to go through thousands of {{KeyValue}}'s before finding the right entry, because it always starts from the beginning of the row, whereas before it was much more straightforward. This problem was under the radar for a while because the overhead isn't too unreasonable, but later on, {{incrementColumnValue}} was changed to do a {{get}} under the hood. At StumbleUpon we do thousands of ICV per second, so thousand of times per second we're scanning some really wide rows. When a row is contented, this results in all the IPC threads being stuck on acquiring a row lock, while one thread is doing the ICV (albeit slowly due to the excessive scanning). When all IPC threads are stuck, the region server is unable to serve more requests. As a nice side effect, fixing this bug will make {{get}} and {{incrementColumnValue}} faster, as well as the first call to {{next}} on a scanner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2960) Allow Incremental Table Alterations
[ https://issues.apache.org/jira/browse/HBASE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906691#action_12906691 ] chenjiajun commented on HBASE-2960: --- this is the only issue of version 0.89.20100621? when HBase 0.89.20100621 released ? Allow Incremental Table Alterations --- Key: HBASE-2960 URL: https://issues.apache.org/jira/browse/HBASE-2960 Project: HBase Issue Type: Wish Components: client Affects Versions: 0.89.20100621 Reporter: Karthick Sankarachary Fix For: 0.89.20100621 Attachments: HBASE-2960.patch As per the HBase shell help, the alter command will Alter column family schema; pass table name and a dictionary specifying new column family schema. The assumption here seems to be that the new column family schema must be completely specified. In other words, if a certain attribute is not specified in the column family schema, then it is effectively defaulted. Is this side-effect by design? I for one assumed (wrongly apparently) that I can alter a table in increments. Case in point, the following commands should've resulted in the final value of the VERSIONS attribute of my table to stay put at 1, but instead it got defaulted to 3. I guess there's no right or wrong answer here, but what should alter do by default? My expectation is that it only changes those attributes that were specified in the alter command, leaving the unspecified attributes untouched. hbase(main):003:0 create 't1', {NAME = 'f1', VERSIONS = 1} 0 row(s) in 1.7230 seconds hbase(main):004:0 describe 't1' DESCRIPTION {NAME = 't1', FAMILIES = [{NAME = 'f1', COMPRESSION = 'NONE', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = ' false', BLOCKCACHE = 'true'}]} 1 row(s) in 0.2030 seconds hbase(main):006:0 disable 't1' 0 row(s) in 0.1140 seconds hbase(main):007:0 alter 't1', {NAME = 'f1', IN_MEMORY = 'true'} 0 row(s) in 0.0160 seconds hbase(main):009:0 describe 't1' DESCRIPTION {NAME = 't1', FAMILIES = [{NAME = 'f1', VERSIONS = '3', COMPRESSION = 'NONE', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = ' true', BLOCKCACHE = 'true'}]} 1 row(s) in 0.1280 seconds -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-2964: --- Attachment: hbase-2964.txt I also had to move the new HTable call outside of the lock, since the HTable constructor does an RPC. This patch seems to fix the issue for me. Running an overnight load test - if it's still going in the morning I'd say we're good :) Deadlock when RS tries to RPC to itself inside SplitTransaction --- Key: HBASE-2964 URL: https://issues.apache.org/jira/browse/HBASE-2964 Project: HBase Issue Type: Bug Components: ipc, regionserver Affects Versions: 0.90.0 Reporter: Todd Lipcon Priority: Blocker Attachments: hbase-2964.txt In testing the 0.89.20100830 rc, I ran into a deadlock with the following situation: - All of the IPC Handler threads are blocked on the region lock, which is held by CompactSplitThread. - CompactSplitThread is in the process of trying to edit META to create the offline parent. META happens to be on the same server as is executing the split. Therefore, the CompactSplitThread is trying to connect back to itself, but all of the handler threads are blocked, so the IPC never happens. Thus, the entire RS gets deadlocked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906819#action_12906819 ] Todd Lipcon commented on HBASE-2964: Overnight test completed OK with that patch. I think we should rebuild the rc with this if Stack thinks it looks good. Deadlock when RS tries to RPC to itself inside SplitTransaction --- Key: HBASE-2964 URL: https://issues.apache.org/jira/browse/HBASE-2964 Project: HBase Issue Type: Bug Components: ipc, regionserver Affects Versions: 0.90.0 Reporter: Todd Lipcon Priority: Blocker Attachments: hbase-2964.txt In testing the 0.89.20100830 rc, I ran into a deadlock with the following situation: - All of the IPC Handler threads are blocked on the region lock, which is held by CompactSplitThread. - CompactSplitThread is in the process of trying to edit META to create the offline parent. META happens to be on the same server as is executing the split. Therefore, the CompactSplitThread is trying to connect back to itself, but all of the handler threads are blocked, so the IPC never happens. Thus, the entire RS gets deadlocked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1485) Wrong or indeterminate behavior when there are duplicate versions of a column
[ https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evert Arckens updated HBASE-1485: - Attachment: TestCellUpdates.java We've tried the patch posted at https://review.cloudera.org/r/780/ here at Outerthought. The attached file is a unit test showing that the patch works at first sight. However, performing updates on existing timestamps, in combination with triggering major compactions things don't work as expected. I've used negative-assertions in order to make the tests succeed, and added a comment where we would expect the result to be otherwise. I've also added a test with the example where a row is deleted and then an update on an older timestamp afterwards remains hidden by the delete. Wrong or indeterminate behavior when there are duplicate versions of a column - Key: HBASE-1485 URL: https://issues.apache.org/jira/browse/HBASE-1485 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.0 Reporter: Jonathan Gray Assignee: Pranav Khaitan Fix For: 0.90.0 Attachments: TestCellUpdates.java As of now, both gets and scanners will end up returning all duplicate versions of a column. The ordering of them is indeterminate. We need to decide what the desired/expected behavior should be and make it happen. Note: It's nearly impossible for this to work with Gets as they are now implemented in 1304 so this is really a Scanner issue. To implement this correctly with Gets, we would have to undo basically all the optimizations that Gets do and making them far slower than a Scanner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row
[ https://issues.apache.org/jira/browse/HBASE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906848#action_12906848 ] Jonathan Gray commented on HBASE-2959: -- I'm -1 on removing delete family, at least at this point. It's pretty widely used and the alternative is not scalable. I think we should first optimize with reseeks, then look at other optimizations using meta data / blooms, and if we still have issues we might think about removing delete family. However, I think with the use of meta data, that someone not using delete families would pay virtually no perf hit and would bypass the start-of-row seek. Scanning always starts at the beginning of a row Key: HBASE-2959 URL: https://issues.apache.org/jira/browse/HBASE-2959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.4, 0.20.5, 0.20.6, 0.89.20100621 Reporter: Benoit Sigoure Priority: Blocker In HBASE-2248, the code in {{HRegion#get}} was changed like so: {code} - private void get(final Store store, final Get get, -final NavigableSetbyte [] qualifiers, ListKeyValue result) - throws IOException { -store.get(get, qualifiers, result); + /* + * Do a get based on the get parameter. + */ + private ListKeyValue get(final Get get) throws IOException { +Scan scan = new Scan(get); + +ListKeyValue results = new ArrayListKeyValue(); + +InternalScanner scanner = null; +try { + scanner = getScanner(scan); + scanner.next(results); +} finally { + if (scanner != null) +scanner.close(); +} +return results; } {code} So instead of doing a {{get}} straight on the {{Store}}, we now open a scanner. The problem is that we eventually end up in {{ScanQueryMatcher}} where the constructor does: {{this.startKey = KeyValue.createFirstOnRow(scan.getStartRow());}}. This entails that if we have a very wide row (thousands of columns), the scanner will need to go through thousands of {{KeyValue}}'s before finding the right entry, because it always starts from the beginning of the row, whereas before it was much more straightforward. This problem was under the radar for a while because the overhead isn't too unreasonable, but later on, {{incrementColumnValue}} was changed to do a {{get}} under the hood. At StumbleUpon we do thousands of ICV per second, so thousand of times per second we're scanning some really wide rows. When a row is contented, this results in all the IPC threads being stuck on acquiring a row lock, while one thread is doing the ICV (albeit slowly due to the excessive scanning). When all IPC threads are stuck, the region server is unable to serve more requests. As a nice side effect, fixing this bug will make {{get}} and {{incrementColumnValue}} faster, as well as the first call to {{next}} on a scanner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1485) Wrong or indeterminate behavior when there are duplicate versions of a column
[ https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906849#action_12906849 ] Evert Arckens commented on HBASE-1485: -- The attached test TestCellUpdates.java also intializes a new HBaseTestingUtility for each test method. Setting this up only once for the whole test class causes issues when triggering the major compaction which I haven't been able to pinpoint yet. Wrong or indeterminate behavior when there are duplicate versions of a column - Key: HBASE-1485 URL: https://issues.apache.org/jira/browse/HBASE-1485 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.0 Reporter: Jonathan Gray Assignee: Pranav Khaitan Fix For: 0.90.0 Attachments: TestCellUpdates.java As of now, both gets and scanners will end up returning all duplicate versions of a column. The ordering of them is indeterminate. We need to decide what the desired/expected behavior should be and make it happen. Note: It's nearly impossible for this to work with Gets as they are now implemented in 1304 so this is really a Scanner issue. To implement this correctly with Gets, we would have to undo basically all the optimizations that Gets do and making them far slower than a Scanner. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2960) Allow Incremental Table Alterations
[ https://issues.apache.org/jira/browse/HBASE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthick Sankarachary updated HBASE-2960: - Fix Version/s: (was: 0.89.20100621) Allow Incremental Table Alterations --- Key: HBASE-2960 URL: https://issues.apache.org/jira/browse/HBASE-2960 Project: HBase Issue Type: Wish Components: client Affects Versions: 0.89.20100621 Reporter: Karthick Sankarachary Attachments: HBASE-2960.patch As per the HBase shell help, the alter command will Alter column family schema; pass table name and a dictionary specifying new column family schema. The assumption here seems to be that the new column family schema must be completely specified. In other words, if a certain attribute is not specified in the column family schema, then it is effectively defaulted. Is this side-effect by design? I for one assumed (wrongly apparently) that I can alter a table in increments. Case in point, the following commands should've resulted in the final value of the VERSIONS attribute of my table to stay put at 1, but instead it got defaulted to 3. I guess there's no right or wrong answer here, but what should alter do by default? My expectation is that it only changes those attributes that were specified in the alter command, leaving the unspecified attributes untouched. hbase(main):003:0 create 't1', {NAME = 'f1', VERSIONS = 1} 0 row(s) in 1.7230 seconds hbase(main):004:0 describe 't1' DESCRIPTION {NAME = 't1', FAMILIES = [{NAME = 'f1', COMPRESSION = 'NONE', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = ' false', BLOCKCACHE = 'true'}]} 1 row(s) in 0.2030 seconds hbase(main):006:0 disable 't1' 0 row(s) in 0.1140 seconds hbase(main):007:0 alter 't1', {NAME = 'f1', IN_MEMORY = 'true'} 0 row(s) in 0.0160 seconds hbase(main):009:0 describe 't1' DESCRIPTION {NAME = 't1', FAMILIES = [{NAME = 'f1', VERSIONS = '3', COMPRESSION = 'NONE', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = ' true', BLOCKCACHE = 'true'}]} 1 row(s) in 0.1280 seconds -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync
[ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906884#action_12906884 ] Prakash Khemani commented on HBASE-2957: Sorry, I was out and couldn't reply to this thread. I think a general solution that guarantees consistency for PUTs and ICVs and at the same time doesn't hold the row lock while updating hlog is possible. === Thinking aloud. First why do we want to hold the row lock around the log sync? Because we want the log sync to happen in causal ordering. Here is a scenario of what can go wrong if we release the row lock before the sync completes. 1. client-1 does a put/icv on regionserver-1. releases the row lock before the sync. 2. client-2 comes in and reads the new value. Based on this just read value, client-2 then does a put in regionserver-2. 3. client-2 is able to do its sync on rs-2 before client-1's sync on rs-1 completes. 4. rs-1 is brought down ungracefully. During recovery we will have client-2's update but not client-1's. And that violates the causal ordering of events. === So we don't want anyone to read a value which has not already been synced. I think we can transfer the wait-for-sync to the reader instead of asking all writers to wait. A simple way to do that will be to attach a log-sync-number with every cell. When a cell is updated it will keep the next log-sync-number within itself. A get will not return until the current log-sync-number is at least as big as log-sync-number stored in the cell. An update can return immediately after queuing the sync. The wait-for-sync is transferred from the writer to the reader. If the reader comes in sufficiently late (which is likely) then there will be no wait-for-syncs in the system. === Even in this scheme we will have to treat ICVs specially. Logically an ICV has a (a) GET the old value (b) PUT the new value (c) GET and return the new value There are 2 cases (1) The ICV caller doesn't use the return value of the ICV. In this case the ICV need not wait for the earlier sync to complere. (In my use case this what happens predominantly) (2) The ICV caller uses the return value of the ICV call to make further updates. In this case the ICV has to wait for its sync to complete before it returns. While the ICV is waiting for the sync to complete it need not hold the row lock. (At least in my use case this is a very rare case) === I think that it is true in general that while a GET is forced to wait for a sync to complete, there is no need to hold the row lock. === Release row lock when waiting for wal-sync -- Key: HBASE-2957 URL: https://issues.apache.org/jira/browse/HBASE-2957 Project: HBase Issue Type: Improvement Components: regionserver, wal Affects Versions: 0.20.0 Reporter: Prakash Khemani Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread? I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state) I think this should be a huge win. For my use case, and I am sure for others, the handler thread spends the bulk of its row-lock critical section time waiting for sync to complete. Even if the log-sync system cannot guarantee the orderly completion of sync records, the Don't hold row lock while waiting for sync option should be available to HBase clients on a per request basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906908#action_12906908 ] HBase Review Board commented on HBASE-2964: --- Message from: Todd Lipcon t...@cloudera.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/798/ --- Review request for hbase and stack. Summary --- Moves all RPCs outside of the region writeLock - the writeLock is now only used long enough to set the 'closing' flag. When we drop the lock any waiters will see 'closing' upon acquiring the lock, and thus throw NSRE. In the case that we abort the split, it will reopen the region as before. Accessors will have gotten NSRE but will just come back to the same region eventually. This addresses bug HBASE-2964. http://issues.apache.org/jira/browse/HBASE-2964 Diffs - src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 3507c0d Diff: http://review.cloudera.org/r/798/diff Testing --- YCSB testing on my cluster - it used to deadlock due to this bug within an hour. I ran a 5 hour load test overnight and it worked OK. Thanks, Todd Deadlock when RS tries to RPC to itself inside SplitTransaction --- Key: HBASE-2964 URL: https://issues.apache.org/jira/browse/HBASE-2964 Project: HBase Issue Type: Bug Components: ipc, regionserver Affects Versions: 0.90.0 Reporter: Todd Lipcon Priority: Blocker Attachments: hbase-2964.txt In testing the 0.89.20100830 rc, I ran into a deadlock with the following situation: - All of the IPC Handler threads are blocked on the region lock, which is held by CompactSplitThread. - CompactSplitThread is in the process of trying to edit META to create the offline parent. META happens to be on the same server as is executing the split. Therefore, the CompactSplitThread is trying to connect back to itself, but all of the handler threads are blocked, so the IPC never happens. Thus, the entire RS gets deadlocked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2962) Add missing methods to HTableInterface (and HTable)
[ https://issues.apache.org/jira/browse/HBASE-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2962. -- Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks for the polish Lars. Add missing methods to HTableInterface (and HTable) --- Key: HBASE-2962 URL: https://issues.apache.org/jira/browse/HBASE-2962 Project: HBase Issue Type: Bug Components: client Reporter: Lars Francke Fix For: 0.90.0 Attachments: HBASE-2962.1.diff HBASE-1845 added two new methods in HTable (batch). Those need to be in HTableInterface as well. And in HTable we have: * put(Put) * put(ListPut) * delete(Delete) * delete(ListDelete) * get(Get) Shouldn't we add a get(ListGet) as well for consistency? Others that are missing: * getRegionLocation * getScannerCaching / setgetScannerCaching * getStartKeys / getEndKeys / getStartEndKeys * getRegionsInfo * setAutoFlush * getWriteBufferSize / setWriteBufferSize * getWriteBuffer * prewarmRegionCache * serializeRegionInfo / deserializeRegionInfo For some of those it might not make sense to add them. I'm just listing them all. The patch is trivial once we've decided which to add, I'll prepare one for batch get. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-29) HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile
[ https://issues.apache.org/jira/browse/HBASE-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-29. Resolution: Invalid Agreed Pranav. We don't have a getFull any more and this mostly works since we moved get to use scanners always. There are issue lurking but we can open specific to address any found rather than operate under this old/stale description. HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile - Key: HBASE-29 URL: https://issues.apache.org/jira/browse/HBASE-29 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.1.2, 0.2.0 Reporter: Bryan Duxbury Priority: Minor Attachments: 29.patch Ok, this one is a little tricky. Let's say that you write a row with some value without a timestamp, thus meaning right now. Then, the memcache gets flushed out to a MapFile. Then, you write another value to the same row, this time with a timestamp that is in the past, ie, before the now timestamp of the first put. Some time later, but before there is a compaction, if you do a get for this row, and only ask for a single version, you will logically be expecting the latest version of the cell, which you would assume would be the one written at now time. Instead, you will get the value written into the past cell, because even though it is tagged as having happened in the past, it actually *was written* after the now cell, and thus when #get searches for satisfying values, it runs into the one most recently written first. The result of this problem is inconsistent data results. Note that this problem only ever exists when there's an uncompacted HStore, because during compaction, these cells will all get sorted into the correct order by timestamp and such. In a way, this actually makes the problem worse, because then you could easily get inconsistent results from HBase about the same (unchanged) row depending on whether there's been a flush/compaction. The only solution I can think of for this problem at the moment is to scan all the MapFiles and Memcache for possible results, sort them, and then select the desired number of versions off of the top. This is unfortunate because it means you never get the snazzy shortcircuit logic except within a single mapfile or memcache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2961) Close zookeeper when done with it (HCM, Master, and RS)
[ https://issues.apache.org/jira/browse/HBASE-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-2961: - Attachment: 2961.txt This gets rid of noise from zk in standalone mode (shutdown runs faster). It breaks tests though because fix is making a Configuration per server. and so, new HConnections need to each get root location (Previous, they all shared single connection and the single root location). Need to make the HCM use new RootRegionTracker. Cleaner. Close zookeeper when done with it (HCM, Master, and RS) --- Key: HBASE-2961 URL: https://issues.apache.org/jira/browse/HBASE-2961 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.0 Attachments: 2961.txt, debug.txt We're not closing down zk properly, mostly in HCM. Makes for spew in zk logs and it also causes shutdown to run longer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction
[ https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907052#action_12907052 ] HBase Review Board commented on HBASE-2964: --- Message from: Todd Lipcon t...@cloudera.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/798/#review1122 --- Seems to make sense. Let me try it on a cluster before I +1 it src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java http://review.cloudera.org/r/798/#comment3823 maybe now we can do an: assert !this.parent.lock.writeLock().isHeldByCurrentThread() : Unsafe to hold write lock while performing RPCs; - Todd Deadlock when RS tries to RPC to itself inside SplitTransaction --- Key: HBASE-2964 URL: https://issues.apache.org/jira/browse/HBASE-2964 Project: HBase Issue Type: Bug Components: ipc, regionserver Affects Versions: 0.90.0 Reporter: Todd Lipcon Priority: Blocker Attachments: hbase-2964.txt In testing the 0.89.20100830 rc, I ran into a deadlock with the following situation: - All of the IPC Handler threads are blocked on the region lock, which is held by CompactSplitThread. - CompactSplitThread is in the process of trying to edit META to create the offline parent. META happens to be on the same server as is executing the split. Therefore, the CompactSplitThread is trying to connect back to itself, but all of the handler threads are blocked, so the IPC never happens. Thus, the entire RS gets deadlocked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2933) Skip EOF Errors during Log Recovery
[ https://issues.apache.org/jira/browse/HBASE-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907092#action_12907092 ] stack commented on HBASE-2933: -- Looking in logs I see this kinda thing: {code} 2010-09-07 18:10:27,965 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://sv4borg9:9000/hbase/api_access_token_stats_day/1845102219/recovered.edits/68762427569, length=264167 {code} .. so we're some cleanup of old split attempts. Skip EOF Errors during Log Recovery --- Key: HBASE-2933 URL: https://issues.apache.org/jira/browse/HBASE-2933 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Critical Fix For: 0.90.0 While testing a cluster, we hit upon the following assert during region assigment. We were killing the master during a long run of splits. We think what happened is that the HMaster was killed while splitting, woke up split again. If this happens, we will have 2 files: 1 partially written and 1 complete one. Since encountering partial log splits upon Master failure is considered normal behavior, we should continue at the RS level if we encounter an EOFException not an filesystem-level exception, even with skip.errors == false. 2010-08-20 16:59:07,718 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening MailBox_dsanduleac,57db45276ece7ce03ef7e8d9969eb189:999...@facebook.com,1280960828959.7c542d24d4496e273b739231b01885e6. java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1902) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1932) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:121) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:113) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1981) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1956) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1915) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:344) at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1490) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1437) at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1345) at java.lang.Thread.run(Thread.java:619) 2010-08-20 16:59:07,719 ERROR org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of region 7c542d24d4496e273b739231b01885e6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2888) Review all our metrics
[ https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907096#action_12907096 ] stack commented on HBASE-2888: -- Setting hbase.period in hadoop-metrics.properties doesn't seem to have an effect; counts are off. Here's what I noticed digging in code: 'hadoop-metrics.properties' gets read up into a metrics attributes map but nothing seems to be done w/ them subsequently. Reading up in hadoop, in branch-0.20/src/core/org/apache/hadoop/metrics/package.html, it seems to imply that we need to getAttribute and set them after we make a metrics Context; i.e. in this case, call setPeriod in RegionServerMetrics, etc.? More broadly, need to make sure settings in hadoop-metrics.properties take effect when changed. Review all our metrics -- Key: HBASE-2888 URL: https://issues.apache.org/jira/browse/HBASE-2888 Project: HBase Issue Type: Improvement Components: master Reporter: Jean-Daniel Cryans Fix For: 0.90.0 HBase publishes a bunch of metrics, some useful some wasteful, that should be improved to deliver a better ops experience. Examples: - Block cache hit ratio converges at some point and stops moving - fsReadLatency goes down when compactions are running - storefileIndexSizeMB is the exact same number once a system is serving production load We could use new metrics too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.