[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-04-10 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965680#comment-13965680
 ] 

Cosmin Lehene commented on HBASE-10829:
---

I can't find this issue in the 0.98.1 release notes. Perhaps fix version should 
be 0.98.2?

 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.98.1, 0.99.0, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch, 
 hbase-10829_v3.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949262#comment-13949262
 ] 

Hudson commented on HBASE-10829:


SUCCESS: Integrated in hbase-0.96 #369 (See 
[https://builds.apache.org/job/hbase-0.96/369/])
HBASE-10829 Flush is skipped after log replay if the last recovered edits file 
is skipped (enis: rev 1581957)
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.98.1, 0.99.0, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch, 
 hbase-10829_v3.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949704#comment-13949704
 ] 

Hudson commented on HBASE-10829:


SUCCESS: Integrated in HBase-0.98 #253 (See 
[https://builds.apache.org/job/HBase-0.98/253/])
HBASE-10829 Flush is skipped after log replay if the last recovered edits file 
is skipped (enis: rev 1581954)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.98.1, 0.99.0, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch, 
 hbase-10829_v3.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948378#comment-13948378
 ] 

Hudson commented on HBASE-10829:


SUCCESS: Integrated in HBase-TRUNK #5042 (See 
[https://builds.apache.org/job/HBase-TRUNK/5042/])
HBASE-10829 Flush is skipped after log replay if the last recovered edits file 
is skipped (enis: rev 1581947)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.98.1, 0.99.0, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch, 
 hbase-10829_v3.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948447#comment-13948447
 ] 

Hudson commented on HBASE-10829:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #236 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/236/])
HBASE-10829 Flush is skipped after log replay if the last recovered edits file 
is skipped (enis: rev 1581954)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.98.1, 0.99.0, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch, 
 hbase-10829_v3.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948789#comment-13948789
 ] 

Hudson commented on HBASE-10829:


FAILURE: Integrated in hbase-0.96-hadoop2 #253 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/253/])
HBASE-10829 Flush is skipped after log replay if the last recovered edits file 
is skipped (enis: rev 1581957)
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.98.1, 0.99.0, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch, 
 hbase-10829_v3.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-25 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946911#comment-13946911
 ] 

Enis Soztutar commented on HBASE-10829:
---

Here is a log of events in case you are interested. The region 
(9514935e6a659bd90faa21bf458a842e) was happily hosted by some region server. 
After the writes have settled down, the region had some un-flushed data. The 
last flush happened, and after some time, the write tasks finished, so no more 
data was coming in for some time:
{code}
2014-03-24 20:54:30,924 INFO  [Thread-22] regionserver.HRegion: Finished 
memstore flush of ~128.2 M/134443296, currentsize=12.7 M/13270608 for region 
IntegrationTestBigLinkedList,\x07\xFE\xDA\x1Chv\xF9\x7F\x18s\xEE\x0C\x85X\xFCU,1395690539958.9514935e6a659bd90faa21bf458a842e.
 in 7324ms, sequenceid=119978, compaction requested=true
{code}

After some more time, the region decided to do a compaction. At this point no 
writes were coming.
{code}
compaction
2014-03-24 20:55:52,764 INFO  
[regionserver60020-smallCompactions-1395694311085] regionserver.HStore: 
Starting compaction of 5 file(s) in meta of 
IntegrationTestBigLinkedList,\x07\xFE\xDA\x1Chv\xF9\x7F\x18s\xEE\x0C\x85X\xFCU,1395690539958.9514935e6a659bd90faa21bf458a842e.
 into 
tmpdir=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/.tmp,
 totalSize=212.2 M
{code}

After this compaction, but before any more flush, the region server got killed 
around: 
{code}
2014-03-24 20:56:44,466 DEBUG [regionserver60020-EventThread] 
regionserver.SplitLogWorker: tasks arrived or departed
{code}

Because of the region server got killed, the cluster performed a log split, 
which completed without any issues (logs are not necessary). This resulted in 7 
log files to be split, resulting in 7 files in recovered.edits under region 
dir. 

Then, some other region server opens the region and applies the recovered edits 
in memory:
{code}
Open region:
2014-03-24 20:57:28,196 DEBUG [StoreOpener-9514935e6a659bd90faa21bf458a842e-1] 
regionserver.HStore: loaded 
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/meta/02f7152afee34b07b40fa31e0de5a3de,
 isReference=false, isBulkLoadResult=false, seqid=119978, majorCompaction=false
2014-03-24 20:57:28,240 DEBUG [StoreOpener-9514935e6a659bd90faa21bf458a842e-1] 
regionserver.HStore: loaded 
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/meta/69260a6d4ffc45a1806dd501204b73ce,
 isReference=false, isBulkLoadResult=false, seqid=88532, majorCompaction=true

2014-03-24 20:57:28,264 INFO  [RS_OPEN_REGION-hor9n08:60020-2] 
regionserver.HRegion: Replaying edits from 
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0118699
2014-03-24 20:57:28,457 DEBUG [RS_OPEN_REGION-hor9n08:60020-2] 
regionserver.HRegion: Applied 0, skipped 187830, firstSequenceidInLog=118084, 
maxSequenceidInLog=-1, 
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0118699
2014-03-24 20:57:28,460 INFO  [RS_OPEN_REGION-hor9n08:60020-2] 
regionserver.HRegion: Replaying edits from 
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0119351
2014-03-24 20:57:28,630 DEBUG [RS_OPEN_REGION-hor9n08:60020-2] 
regionserver.HRegion: Applied 0, skipped 199401, firstSequenceidInLog=118700, 
maxSequenceidInLog=-1, 
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0119351
2014-03-24 20:57:28,632 INFO  [RS_OPEN_REGION-hor9n08:60020-2] 
regionserver.HRegion: Replaying edits from 
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0120086
2014-03-24 20:57:28,873 DEBUG [RS_OPEN_REGION-hor9n08:60020-2] 
regionserver.HRegion: Applied 37938, skipped 148185, 
firstSequenceidInLog=119352, maxSequenceidInLog=120086, 
path=hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0120086
2014-03-24 20:57:28,876 INFO  [RS_OPEN_REGION-hor9n08:60020-2] 
regionserver.HRegion: Replaying edits from 
hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestBigLinkedList/9514935e6a659bd90faa21bf458a842e/recovered.edits/0120806
2014-03-24 20:57:30,130 DEBUG [RS_OPEN_REGION-hor9n08:60020-2] 
regionserver.HRegion: 

[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947004#comment-13947004
 ] 

Ted Yu commented on HBASE-10829:


lgtm
{code}
+  public void testSkipRecoveredEditsReplayTheLastFileIgnored() throws 
Exception {
+String method = testSkipRecoveredEditsReplaySomeIgnored;
{code}
nit: method name should match test name.


 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.99.0, 0.98.2, 0.96.3

 Attachments: hbase-10829_v1.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947036#comment-13947036
 ] 

Ted Yu commented on HBASE-10829:


+1

 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.99.0, 0.98.2, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947053#comment-13947053
 ] 

Ted Yu commented on HBASE-10829:


Spoke too soon :-)
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
(default-testCompile) on project hbase-server: Compilation failure: Compilation 
failure:
[ERROR] 
/homes/hortonzy/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java:[583,49]
 error: cannot find symbol
[ERROR] symbol:   variable conf
[ERROR] location: class TestHRegion
[ERROR] 
/homes/hortonzy/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java:[600,88]
 error: cannot find symbol
{code}

 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.99.0, 0.98.2, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947217#comment-13947217
 ] 

Hadoop QA commented on HBASE-10829:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636761/hbase-10829_v2.patch
  against trunk revision .
  ATTACHMENT ID: 12636761

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail.

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 1.3.9) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9093//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9093//console

This message is automatically generated.

 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.99.0, 0.98.2, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947222#comment-13947222
 ] 

stack commented on HBASE-10829:
---

Nice debugging lads.  Patch lgtm

 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.99.0, 0.98.2, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10829) Flush is skipped after log replay if the last recovered edits file is skipped

2014-03-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947407#comment-13947407
 ] 

Hadoop QA commented on HBASE-10829:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636804/hbase-10829_v3.patch
  against trunk revision .
  ATTACHMENT ID: 12636804

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:368)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9094//console

This message is automatically generated.

 Flush is skipped after log replay if the last recovered edits file is skipped
 -

 Key: HBASE-10829
 URL: https://issues.apache.org/jira/browse/HBASE-10829
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.99.0, 0.98.2, 0.96.3

 Attachments: hbase-10829_v1.patch, hbase-10829_v2.patch, 
 hbase-10829_v3.patch


 We caught this in an extended test run where IntegrationTestBigLinkedList 
 failed with some missing keys. 
 The problem is that HRegion.replayRecoveredEdits() would return -1 if all the 
 edits in the log file is skipped, which is true for example if the log file 
 only contains a single compaction record (HBASE-2231) or somehow the edits 
 cannot be applied (column family deleted, etc). 
 The callee, HRegion.replayRecoveredEditsIfAny() only looks for the last 
 returned seqId to decide whether a flush is necessary or not before opening 
 the region, and discarding replayed recovered edits files. 
 Therefore, if the last recovered edits file is skipped but some edits from 
 earlier recovered edits files are applied, the mandatory flush before opening 
 the region is skipped. If the region server dies after this point before a 
 flush, the edits are lost. 
 This is important to fix, though the sequence of events are super rare for a 
 production cluster. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)