[jira] Commented: (HBASE-29) HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile

2010-09-07 Thread Pranav Khaitan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906683#action_12906683
 ] 

Pranav Khaitan commented on HBASE-29:
-

I think the code for Store has changed since then and this issue doesn't exist 
anymore. In that case, we should consider closing this jira

 HStore#get and HStore#getFull may not return expected values by timestamp 
 when there is more than one MapFile
 -

 Key: HBASE-29
 URL: https://issues.apache.org/jira/browse/HBASE-29
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.1.2, 0.2.0
Reporter: Bryan Duxbury
Priority: Minor
 Attachments: 29.patch


 Ok, this one is a little tricky. Let's say that you write a row with some 
 value without a timestamp, thus meaning right now. Then, the memcache gets 
 flushed out to a MapFile. Then, you write another value to the same row, this 
 time with a timestamp that is in the past, ie, before the now timestamp of 
 the first put. 
 Some time later, but before there is a compaction, if you do a get for this 
 row, and only ask for a single version, you will logically be expecting the 
 latest version of the cell, which you would assume would be the one written 
 at now time. Instead, you will get the value written into the past cell, 
 because even though it is tagged as having happened in the past, it actually 
 *was written* after the now cell, and thus when #get searches for 
 satisfying values, it runs into the one most recently written first. 
 The result of this problem is inconsistent data results. Note that this 
 problem only ever exists when there's an uncompacted HStore, because during 
 compaction, these cells will all get sorted into the correct order by 
 timestamp and such. In a way, this actually makes the problem worse, because 
 then you could easily get inconsistent results from HBase about the same 
 (unchanged) row depending on whether there's been a flush/compaction.
 The only solution I can think of for this problem at the moment is to scan 
 all the MapFiles and Memcache for possible results, sort them, and then 
 select the desired number of versions off of the top. This is unfortunate 
 because it means you never get the snazzy shortcircuit logic except within a 
 single mapfile or memcache. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row

2010-09-07 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906687#action_12906687
 ] 

ryan rawson commented on HBASE-2959:


If we got rid of DeleteFamily would make this easy to implement.




 Scanning always starts at the beginning of a row
 

 Key: HBASE-2959
 URL: https://issues.apache.org/jira/browse/HBASE-2959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.4, 0.20.5, 0.20.6, 0.89.20100621
Reporter: Benoit Sigoure
Priority: Blocker

 In HBASE-2248, the code in {{HRegion#get}} was changed like so:
 {code}
 -  private void get(final Store store, final Get get,
 -final NavigableSetbyte [] qualifiers, ListKeyValue result)
 -  throws IOException {
 -store.get(get, qualifiers, result);
 +  /*
 +   * Do a get based on the get parameter.
 +   */
 +  private ListKeyValue get(final Get get) throws IOException {
 +Scan scan = new Scan(get);
 +
 +ListKeyValue results = new ArrayListKeyValue();
 +
 +InternalScanner scanner = null;
 +try {
 +  scanner = getScanner(scan);
 +  scanner.next(results);
 +} finally {
 +  if (scanner != null)
 +scanner.close();
 +}
 +return results;
}
 {code}
 So instead of doing a {{get}} straight on the {{Store}}, we now open a 
 scanner.  The problem is that we eventually end up in {{ScanQueryMatcher}} 
 where the constructor does: {{this.startKey = 
 KeyValue.createFirstOnRow(scan.getStartRow());}}.  This entails that if we 
 have a very wide row (thousands of columns), the scanner will need to go 
 through thousands of {{KeyValue}}'s before finding the right entry, because 
 it always starts from the beginning of the row, whereas before it was much 
 more straightforward.
 This problem was under the radar for a while because the overhead isn't too 
 unreasonable, but later on, {{incrementColumnValue}} was changed to do a 
 {{get}} under the hood.  At StumbleUpon we do thousands of ICV per second, so 
 thousand of times per second we're scanning some really wide rows.  When a 
 row is contented, this results in all the IPC threads being stuck on 
 acquiring a row lock, while one thread is doing the ICV (albeit slowly due to 
 the excessive scanning).  When all IPC threads are stuck, the region server 
 is unable to serve more requests.
 As a nice side effect, fixing this bug will make {{get}} and 
 {{incrementColumnValue}} faster, as well as the first call to {{next}} on a 
 scanner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2960) Allow Incremental Table Alterations

2010-09-07 Thread chenjiajun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906691#action_12906691
 ] 

chenjiajun commented on HBASE-2960:
---

this is the only issue of version 0.89.20100621?

when HBase 0.89.20100621 released ?

 Allow Incremental Table Alterations
 ---

 Key: HBASE-2960
 URL: https://issues.apache.org/jira/browse/HBASE-2960
 Project: HBase
  Issue Type: Wish
  Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
 Fix For: 0.89.20100621

 Attachments: HBASE-2960.patch


 As per the HBase shell help, the alter command will Alter column family 
 schema;  pass table name and a dictionary  specifying new column family 
 schema. The assumption here seems to be that the new column family schema 
 must be completely specified. In other words, if a certain attribute is not 
 specified in the column family schema, then it is effectively defaulted. Is 
 this side-effect by design? 
 I for one assumed (wrongly apparently) that I can alter a table in 
 increments. Case in point, the following commands should've resulted in the 
 final value of the VERSIONS attribute of my table to stay put at 1, but 
 instead it got defaulted to 3. I guess there's no right or wrong answer here, 
 but what should alter do by default? My expectation is that it only changes 
 those attributes that were specified in the alter command, leaving the 
 unspecified attributes untouched.
 hbase(main):003:0 create 't1', {NAME = 'f1', VERSIONS = 1}
 0 row(s) in 1.7230 seconds
 hbase(main):004:0 describe 't1'
 DESCRIPTION
  {NAME = 't1', FAMILIES = [{NAME = 'f1', COMPRESSION = 'NONE', VERSIONS 
 = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = ' false', 
 BLOCKCACHE = 'true'}]}
 1 row(s) in 0.2030 seconds
 hbase(main):006:0 disable 't1'
 0 row(s) in 0.1140 seconds
 hbase(main):007:0 alter 't1', {NAME = 'f1', IN_MEMORY = 'true'}
 0 row(s) in 0.0160 seconds
 hbase(main):009:0 describe 't1'
 DESCRIPTION
  {NAME = 't1', FAMILIES = [{NAME = 'f1', VERSIONS = '3', COMPRESSION = 
 'NONE', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = ' true', 
 BLOCKCACHE = 'true'}]}
 1 row(s) in 0.1280 seconds

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction

2010-09-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-2964:
---

Attachment: hbase-2964.txt

I also had to move the new HTable call outside of the lock, since the HTable 
constructor does an RPC.

This patch seems to fix the issue for me. Running an overnight load test - if 
it's still going in the morning I'd say we're good :)

 Deadlock when RS tries to RPC to itself inside SplitTransaction
 ---

 Key: HBASE-2964
 URL: https://issues.apache.org/jira/browse/HBASE-2964
 Project: HBase
  Issue Type: Bug
  Components: ipc, regionserver
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Attachments: hbase-2964.txt


 In testing the 0.89.20100830 rc, I ran into a deadlock with the following 
 situation:
 - All of the IPC Handler threads are blocked on the region lock, which is 
 held by CompactSplitThread.
 - CompactSplitThread is in the process of trying to edit META to create the 
 offline parent. META happens to be on the same server as is executing the 
 split.
 Therefore, the CompactSplitThread is trying to connect back to itself, but 
 all of the handler threads are blocked, so the IPC never happens. Thus, the 
 entire RS gets deadlocked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction

2010-09-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906819#action_12906819
 ] 

Todd Lipcon commented on HBASE-2964:


Overnight test completed OK with that patch. I think we should rebuild the rc 
with this if Stack thinks it looks good.

 Deadlock when RS tries to RPC to itself inside SplitTransaction
 ---

 Key: HBASE-2964
 URL: https://issues.apache.org/jira/browse/HBASE-2964
 Project: HBase
  Issue Type: Bug
  Components: ipc, regionserver
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Attachments: hbase-2964.txt


 In testing the 0.89.20100830 rc, I ran into a deadlock with the following 
 situation:
 - All of the IPC Handler threads are blocked on the region lock, which is 
 held by CompactSplitThread.
 - CompactSplitThread is in the process of trying to edit META to create the 
 offline parent. META happens to be on the same server as is executing the 
 split.
 Therefore, the CompactSplitThread is trying to connect back to itself, but 
 all of the handler threads are blocked, so the IPC never happens. Thus, the 
 entire RS gets deadlocked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-1485) Wrong or indeterminate behavior when there are duplicate versions of a column

2010-09-07 Thread Evert Arckens (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Evert Arckens updated HBASE-1485:
-

Attachment: TestCellUpdates.java

We've tried the patch posted at https://review.cloudera.org/r/780/ here at 
Outerthought.
The attached file is a unit test showing that the patch works at first sight.
However, performing updates on existing timestamps, in combination with 
triggering major compactions things don't work as expected.

I've used negative-assertions in order to make the tests succeed, and added a 
comment where we would expect the result to be otherwise.

I've also added a test with the example where a row is deleted and then an 
update on an older timestamp afterwards remains hidden by the delete.

 Wrong or indeterminate behavior when there are duplicate versions of a column
 -

 Key: HBASE-1485
 URL: https://issues.apache.org/jira/browse/HBASE-1485
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.0
Reporter: Jonathan Gray
Assignee: Pranav Khaitan
 Fix For: 0.90.0

 Attachments: TestCellUpdates.java


 As of now, both gets and scanners will end up returning all duplicate 
 versions of a column.  The ordering of them is indeterminate.
 We need to decide what the desired/expected behavior should be and make it 
 happen.
 Note:  It's nearly impossible for this to work with Gets as they are now 
 implemented in 1304 so this is really a Scanner issue.  To implement this 
 correctly with Gets, we would have to undo basically all the optimizations 
 that Gets do and making them far slower than a Scanner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2959) Scanning always starts at the beginning of a row

2010-09-07 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906848#action_12906848
 ] 

Jonathan Gray commented on HBASE-2959:
--

I'm -1 on removing delete family, at least at this point.  It's pretty widely 
used and the alternative is not scalable.

I think we should first optimize with reseeks, then look at other optimizations 
using meta data / blooms, and if we still have issues we might think about 
removing delete family.  However, I think with the use of meta data, that 
someone not using delete families would pay virtually no perf hit and would 
bypass the start-of-row seek.

 Scanning always starts at the beginning of a row
 

 Key: HBASE-2959
 URL: https://issues.apache.org/jira/browse/HBASE-2959
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.4, 0.20.5, 0.20.6, 0.89.20100621
Reporter: Benoit Sigoure
Priority: Blocker

 In HBASE-2248, the code in {{HRegion#get}} was changed like so:
 {code}
 -  private void get(final Store store, final Get get,
 -final NavigableSetbyte [] qualifiers, ListKeyValue result)
 -  throws IOException {
 -store.get(get, qualifiers, result);
 +  /*
 +   * Do a get based on the get parameter.
 +   */
 +  private ListKeyValue get(final Get get) throws IOException {
 +Scan scan = new Scan(get);
 +
 +ListKeyValue results = new ArrayListKeyValue();
 +
 +InternalScanner scanner = null;
 +try {
 +  scanner = getScanner(scan);
 +  scanner.next(results);
 +} finally {
 +  if (scanner != null)
 +scanner.close();
 +}
 +return results;
}
 {code}
 So instead of doing a {{get}} straight on the {{Store}}, we now open a 
 scanner.  The problem is that we eventually end up in {{ScanQueryMatcher}} 
 where the constructor does: {{this.startKey = 
 KeyValue.createFirstOnRow(scan.getStartRow());}}.  This entails that if we 
 have a very wide row (thousands of columns), the scanner will need to go 
 through thousands of {{KeyValue}}'s before finding the right entry, because 
 it always starts from the beginning of the row, whereas before it was much 
 more straightforward.
 This problem was under the radar for a while because the overhead isn't too 
 unreasonable, but later on, {{incrementColumnValue}} was changed to do a 
 {{get}} under the hood.  At StumbleUpon we do thousands of ICV per second, so 
 thousand of times per second we're scanning some really wide rows.  When a 
 row is contented, this results in all the IPC threads being stuck on 
 acquiring a row lock, while one thread is doing the ICV (albeit slowly due to 
 the excessive scanning).  When all IPC threads are stuck, the region server 
 is unable to serve more requests.
 As a nice side effect, fixing this bug will make {{get}} and 
 {{incrementColumnValue}} faster, as well as the first call to {{next}} on a 
 scanner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1485) Wrong or indeterminate behavior when there are duplicate versions of a column

2010-09-07 Thread Evert Arckens (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906849#action_12906849
 ] 

Evert Arckens commented on HBASE-1485:
--

The attached test TestCellUpdates.java also intializes a new 
HBaseTestingUtility for each test method.
Setting this up only once for the whole test class causes issues when 
triggering the major compaction which I haven't been able to pinpoint yet.

 Wrong or indeterminate behavior when there are duplicate versions of a column
 -

 Key: HBASE-1485
 URL: https://issues.apache.org/jira/browse/HBASE-1485
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.0
Reporter: Jonathan Gray
Assignee: Pranav Khaitan
 Fix For: 0.90.0

 Attachments: TestCellUpdates.java


 As of now, both gets and scanners will end up returning all duplicate 
 versions of a column.  The ordering of them is indeterminate.
 We need to decide what the desired/expected behavior should be and make it 
 happen.
 Note:  It's nearly impossible for this to work with Gets as they are now 
 implemented in 1304 so this is really a Scanner issue.  To implement this 
 correctly with Gets, we would have to undo basically all the optimizations 
 that Gets do and making them far slower than a Scanner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2960) Allow Incremental Table Alterations

2010-09-07 Thread Karthick Sankarachary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick Sankarachary updated HBASE-2960:
-

Fix Version/s: (was: 0.89.20100621)

 Allow Incremental Table Alterations
 ---

 Key: HBASE-2960
 URL: https://issues.apache.org/jira/browse/HBASE-2960
 Project: HBase
  Issue Type: Wish
  Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
 Attachments: HBASE-2960.patch


 As per the HBase shell help, the alter command will Alter column family 
 schema;  pass table name and a dictionary  specifying new column family 
 schema. The assumption here seems to be that the new column family schema 
 must be completely specified. In other words, if a certain attribute is not 
 specified in the column family schema, then it is effectively defaulted. Is 
 this side-effect by design? 
 I for one assumed (wrongly apparently) that I can alter a table in 
 increments. Case in point, the following commands should've resulted in the 
 final value of the VERSIONS attribute of my table to stay put at 1, but 
 instead it got defaulted to 3. I guess there's no right or wrong answer here, 
 but what should alter do by default? My expectation is that it only changes 
 those attributes that were specified in the alter command, leaving the 
 unspecified attributes untouched.
 hbase(main):003:0 create 't1', {NAME = 'f1', VERSIONS = 1}
 0 row(s) in 1.7230 seconds
 hbase(main):004:0 describe 't1'
 DESCRIPTION
  {NAME = 't1', FAMILIES = [{NAME = 'f1', COMPRESSION = 'NONE', VERSIONS 
 = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = ' false', 
 BLOCKCACHE = 'true'}]}
 1 row(s) in 0.2030 seconds
 hbase(main):006:0 disable 't1'
 0 row(s) in 0.1140 seconds
 hbase(main):007:0 alter 't1', {NAME = 'f1', IN_MEMORY = 'true'}
 0 row(s) in 0.0160 seconds
 hbase(main):009:0 describe 't1'
 DESCRIPTION
  {NAME = 't1', FAMILIES = [{NAME = 'f1', VERSIONS = '3', COMPRESSION = 
 'NONE', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = ' true', 
 BLOCKCACHE = 'true'}]}
 1 row(s) in 0.1280 seconds

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

2010-09-07 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906884#action_12906884
 ] 

Prakash Khemani commented on HBASE-2957:


Sorry, I was out and couldn't reply to this thread.

I think a general solution that guarantees consistency for PUTs and ICVs and at 
the same time doesn't hold the row lock while updating hlog is possible.

===

Thinking aloud. First why do we want to hold the row lock around the log sync? 
Because we want the log sync to happen in causal ordering. Here is a scenario 
of what can go wrong if we release the row lock before the sync completes.
1. client-1 does a put/icv on regionserver-1. releases the row lock 
before the sync.
2. client-2 comes in and reads the new value. Based on this just read 
value, client-2 then does a put in regionserver-2.
3. client-2 is able to do its sync on rs-2 before client-1's sync on 
rs-1 completes.
4. rs-1 is brought down ungracefully. During recovery we will have 
client-2's update but not client-1's. And that violates the causal ordering of 
events.

===
So we don't want anyone to read a value which has not already been synced. I 
think we can transfer the wait-for-sync to the reader instead of asking all 
writers to wait.

A simple way to do that will be to attach a log-sync-number with every cell. 
When a cell is updated it will keep the next log-sync-number within itself. A 
get will not return until the current log-sync-number is at least as big as 
log-sync-number stored in the cell.

An update can return immediately after queuing the sync. The wait-for-sync is 
transferred from the writer to the reader. If the reader comes in sufficiently 
late (which is likely) then there will be no wait-for-syncs in the system.

===
Even in this scheme we will have to treat ICVs specially. Logically an ICV has 
a (a) GET the old value (b) PUT the new value (c) GET and return the new value

There are 2 cases
(1) The ICV caller doesn't use the return value of the ICV. In this case the 
ICV need not wait for the earlier sync to complere. (In my use case this what 
happens predominantly)

(2) The ICV caller uses the return value of the ICV call to make further 
updates. In this case the ICV has to wait for its sync to complete before it 
returns. While the ICV is waiting for the sync to complete it need not hold the 
row lock. (At least in my use case this is a very rare case)

===
I think that it is true in general that while a GET is forced to wait for a 
sync to complete, there is no need to hold the row lock.

===






 Release row lock when waiting for wal-sync
 --

 Key: HBASE-2957
 URL: https://issues.apache.org/jira/browse/HBASE-2957
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, wal
Affects Versions: 0.20.0
Reporter: Prakash Khemani

 Is there a reason to hold on to the row-lock while waiting for the WAL-sync 
 to be completed by the logSyncer thread?
 I think data consistency will be guaranteed even if the following happens (a) 
 the row lock is held while the row is updated in memory (b) the row lock is 
 released after queuing the KV record for WAL-syncing (c) the log-sync system 
 guarantees that the log records for any given row are synced in order (d) the 
 HBase client only receives a success notification after the sync completes 
 (no change from the current state)
 I think this should be a huge win. For my use case, and I am sure for others, 
  the handler thread spends the bulk of its row-lock critical section  time 
 waiting for sync to complete.
 Even if the log-sync system cannot guarantee the orderly completion of sync 
 records, the Don't hold row lock while waiting for sync option should be 
 available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction

2010-09-07 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906908#action_12906908
 ] 

HBase Review Board commented on HBASE-2964:
---

Message from: Todd Lipcon t...@cloudera.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/798/
---

Review request for hbase and stack.


Summary
---

Moves all RPCs outside of the region writeLock - the writeLock is now only used 
long enough to set the 'closing' flag. When we drop the lock any waiters will 
see 'closing' upon acquiring the lock, and thus throw NSRE.

In the case that we abort the split, it will reopen the region as before. 
Accessors will have gotten NSRE but will just come back to the same region 
eventually.


This addresses bug HBASE-2964.
http://issues.apache.org/jira/browse/HBASE-2964


Diffs
-

  src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 
3507c0d 

Diff: http://review.cloudera.org/r/798/diff


Testing
---

YCSB testing on my cluster - it used to deadlock due to this bug within an 
hour. I ran a 5 hour load test overnight and it worked OK.


Thanks,

Todd




 Deadlock when RS tries to RPC to itself inside SplitTransaction
 ---

 Key: HBASE-2964
 URL: https://issues.apache.org/jira/browse/HBASE-2964
 Project: HBase
  Issue Type: Bug
  Components: ipc, regionserver
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Attachments: hbase-2964.txt


 In testing the 0.89.20100830 rc, I ran into a deadlock with the following 
 situation:
 - All of the IPC Handler threads are blocked on the region lock, which is 
 held by CompactSplitThread.
 - CompactSplitThread is in the process of trying to edit META to create the 
 offline parent. META happens to be on the same server as is executing the 
 split.
 Therefore, the CompactSplitThread is trying to connect back to itself, but 
 all of the handler threads are blocked, so the IPC never happens. Thus, the 
 entire RS gets deadlocked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2962) Add missing methods to HTableInterface (and HTable)

2010-09-07 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2962.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed.  Thanks for the polish Lars.

 Add missing methods to HTableInterface (and HTable)
 ---

 Key: HBASE-2962
 URL: https://issues.apache.org/jira/browse/HBASE-2962
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Lars Francke
 Fix For: 0.90.0

 Attachments: HBASE-2962.1.diff


 HBASE-1845 added two new methods in HTable (batch). Those need to be in 
 HTableInterface as well.
 And in HTable we have:
 * put(Put)
 * put(ListPut)
 * delete(Delete)
 * delete(ListDelete)
 * get(Get)
 Shouldn't we add a get(ListGet) as well for consistency?
 Others that are missing:
 * getRegionLocation
 * getScannerCaching / setgetScannerCaching
 * getStartKeys / getEndKeys / getStartEndKeys
 * getRegionsInfo
 * setAutoFlush
 * getWriteBufferSize / setWriteBufferSize
 * getWriteBuffer
 * prewarmRegionCache
 * serializeRegionInfo / deserializeRegionInfo
 For some of those it might not make sense to add them. I'm just listing them 
 all.
 The patch is trivial once we've decided which to add, I'll prepare one for 
 batch  get.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-29) HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile

2010-09-07 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-29.


Resolution: Invalid

Agreed Pranav.  We don't have a getFull any more and this mostly works since we 
moved get to use scanners always.  There are issue lurking but we can open 
specific to address any found rather than operate under this old/stale 
description.

 HStore#get and HStore#getFull may not return expected values by timestamp 
 when there is more than one MapFile
 -

 Key: HBASE-29
 URL: https://issues.apache.org/jira/browse/HBASE-29
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.1.2, 0.2.0
Reporter: Bryan Duxbury
Priority: Minor
 Attachments: 29.patch


 Ok, this one is a little tricky. Let's say that you write a row with some 
 value without a timestamp, thus meaning right now. Then, the memcache gets 
 flushed out to a MapFile. Then, you write another value to the same row, this 
 time with a timestamp that is in the past, ie, before the now timestamp of 
 the first put. 
 Some time later, but before there is a compaction, if you do a get for this 
 row, and only ask for a single version, you will logically be expecting the 
 latest version of the cell, which you would assume would be the one written 
 at now time. Instead, you will get the value written into the past cell, 
 because even though it is tagged as having happened in the past, it actually 
 *was written* after the now cell, and thus when #get searches for 
 satisfying values, it runs into the one most recently written first. 
 The result of this problem is inconsistent data results. Note that this 
 problem only ever exists when there's an uncompacted HStore, because during 
 compaction, these cells will all get sorted into the correct order by 
 timestamp and such. In a way, this actually makes the problem worse, because 
 then you could easily get inconsistent results from HBase about the same 
 (unchanged) row depending on whether there's been a flush/compaction.
 The only solution I can think of for this problem at the moment is to scan 
 all the MapFiles and Memcache for possible results, sort them, and then 
 select the desired number of versions off of the top. This is unfortunate 
 because it means you never get the snazzy shortcircuit logic except within a 
 single mapfile or memcache. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2961) Close zookeeper when done with it (HCM, Master, and RS)

2010-09-07 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2961:
-

Attachment: 2961.txt

This gets rid of noise from zk in standalone mode (shutdown runs faster).  It 
breaks tests though because fix is making a Configuration per server. and 
so, new HConnections need to each get root location (Previous, they all shared 
single connection and the single root location).   Need to make the HCM use new 
RootRegionTracker.  Cleaner.

 Close zookeeper when done with it (HCM, Master, and RS)
 ---

 Key: HBASE-2961
 URL: https://issues.apache.org/jira/browse/HBASE-2961
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.0

 Attachments: 2961.txt, debug.txt


 We're not closing down zk properly, mostly in HCM.  Makes for spew in zk logs 
 and it also causes shutdown to run longer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2964) Deadlock when RS tries to RPC to itself inside SplitTransaction

2010-09-07 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907052#action_12907052
 ] 

HBase Review Board commented on HBASE-2964:
---

Message from: Todd Lipcon t...@cloudera.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/798/#review1122
---


Seems to make sense. Let me try it on a cluster before I +1 it


src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
http://review.cloudera.org/r/798/#comment3823

maybe now we can do an:

assert !this.parent.lock.writeLock().isHeldByCurrentThread() : Unsafe to 
hold write lock while performing RPCs;


- Todd





 Deadlock when RS tries to RPC to itself inside SplitTransaction
 ---

 Key: HBASE-2964
 URL: https://issues.apache.org/jira/browse/HBASE-2964
 Project: HBase
  Issue Type: Bug
  Components: ipc, regionserver
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Attachments: hbase-2964.txt


 In testing the 0.89.20100830 rc, I ran into a deadlock with the following 
 situation:
 - All of the IPC Handler threads are blocked on the region lock, which is 
 held by CompactSplitThread.
 - CompactSplitThread is in the process of trying to edit META to create the 
 offline parent. META happens to be on the same server as is executing the 
 split.
 Therefore, the CompactSplitThread is trying to connect back to itself, but 
 all of the handler threads are blocked, so the IPC never happens. Thus, the 
 entire RS gets deadlocked.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2933) Skip EOF Errors during Log Recovery

2010-09-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907092#action_12907092
 ] 

stack commented on HBASE-2933:
--

Looking in logs I see this kinda thing:

{code}
2010-09-07 18:10:27,965 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
Found existing old edits file. It could be the result of a previous failed 
split attempt. Deleting 
hdfs://sv4borg9:9000/hbase/api_access_token_stats_day/1845102219/recovered.edits/68762427569,
 length=264167
{code}

.. so we're some cleanup of old split attempts.

 Skip EOF Errors during Log Recovery
 ---

 Key: HBASE-2933
 URL: https://issues.apache.org/jira/browse/HBASE-2933
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.90.0


 While testing a cluster, we hit upon the following assert during region 
 assigment.  We were killing the master during a long run of splits.  We think 
 what happened is that the HMaster was killed while splitting, woke up  split 
 again.  If this happens, we will have 2 files: 1 partially written and 1 
 complete one.  Since encountering partial log splits upon Master failure is 
 considered normal behavior, we should continue at the RS level if we 
 encounter an EOFException  not an filesystem-level exception, even with 
 skip.errors == false.
 2010-08-20 16:59:07,718 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
 MailBox_dsanduleac,57db45276ece7ce03ef7e8d9969eb189:999...@facebook.com,1280960828959.7c542d24d4496e273b739231b01885e6.
 java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:375)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1902)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1932)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:121)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:113)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1981)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1956)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1915)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:344)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1490)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1437)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1345)
 at java.lang.Thread.run(Thread.java:619)
 2010-08-20 16:59:07,719 ERROR 
 org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of 
 region 7c542d24d4496e273b739231b01885e6

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2888) Review all our metrics

2010-09-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907096#action_12907096
 ] 

stack commented on HBASE-2888:
--

Setting hbase.period in hadoop-metrics.properties doesn't seem to have an 
effect; counts are off.  Here's what I noticed digging in code:

'hadoop-metrics.properties' gets read up into a metrics attributes map but 
nothing seems to be done w/ them subsequently. Reading up in hadoop, in 
branch-0.20/src/core/org/apache/hadoop/metrics/package.html, it seems to imply 
that we need to getAttribute and set them after we make a metrics Context; i.e. 
in this case, call setPeriod in RegionServerMetrics, etc.?

More broadly, need to make sure settings in hadoop-metrics.properties take 
effect when changed.

 Review all our metrics
 --

 Key: HBASE-2888
 URL: https://issues.apache.org/jira/browse/HBASE-2888
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.0


 HBase publishes a bunch of metrics, some useful some wasteful, that should be 
 improved to deliver a better ops experience. Examples:
  - Block cache hit ratio converges at some point and stops moving
  - fsReadLatency goes down when compactions are running
  - storefileIndexSizeMB is the exact same number once a system is serving 
 production load
 We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.