[ 
https://issues.apache.org/jira/browse/HBASE-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733918#comment-13733918
 ] 

Jean-Daniel Cryans commented on HBASE-8615:
-------------------------------------------

Thanks for the patch Ted. To commit we should at least fix the unit test though 
because what I did was kind of a hack, TestReplicationHLogReaderManager isn't 
supposed to run on compressed data. Maybe do a 
TestReplicationHLogReaderManagerCompressed that just enables it?

Then we could also test more than just one failure mode in there, easy refactor 
where you just have to pass the two ints to a method, then get rid of most of 
the comments. Right now it's just dirty.

Finally, if we are fixing HLog compression for real, we need to also put 
TestReplicationKillMasterRSCompressed back AKA HBASE-9061.
                
> HLog Compression fails in mysterious ways (working title)
> ---------------------------------------------------------
>
>                 Key: HBASE-8615
>                 URL: https://issues.apache.org/jira/browse/HBASE-8615
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.98.0, 0.96.0
>
>         Attachments: 172.21.3.117%2C60020%2C1375222888304.1375222894855.zip, 
> 8615-v2.txt, 8615-v3.txt, HBASE-8615-test.patch, 
> org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
>
>
> In a recent test run, I noticed the following in test output:
> {code}
> 2013-05-24 22:01:02,424 DEBUG 
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  fs.HFileSystem$ReorderWALBlocks(327): 
> /user/hortonzy/hbase/.logs/kiyo.gq1.ygridcore.net,42690,1369432806911/kiyo.gq1.ygridcore.net%2C42690%2C1369432806911.1369432840428
>  is an HLog file, so reordering blocks, last hostname will 
> be:kiyo.gq1.ygridcore.net
> 2013-05-24 22:01:02,429 DEBUG 
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  wal.ProtobufLogReader(118): After reading the trailer: walEditsStopOffset: 
> 132235, fileLength: 132243, trailerPresent: true
> 2013-05-24 22:01:02,438 ERROR 
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  wal.ProtobufLogReader(236): Error  while reading 691 WAL KVs; started 
> reading at 53272 and read up to 65538
> 2013-05-24 22:01:02,438 WARN  
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  regionserver.ReplicationSource(324): 2 Got:
> java.io.IOException: Error  while reading 691 WAL KVs; started reading at 
> 53272 and read up to 65538
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:237)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:96)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:404)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:320)
> Caused by: java.lang.IndexOutOfBoundsException: index (30062) must be less 
> than size (1)
>         at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
>         at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.get(LRUDictionary.java:124)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary$BidirectionalLRUMap.access$000(LRUDictionary.java:71)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.LRUDictionary.getEntry(LRUDictionary.java:42)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.readIntoArray(WALCellCodec.java:210)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$CompressedKvDecoder.parseCell(WALCellCodec.java:184)
>         at 
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:213)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:217)
>         ... 4 more
> 2013-05-24 22:01:02,439 DEBUG 
> [RegionServer:0;kiyo.gq1.ygridcore.net,42690,1369432806911.replicationSource,2]
>  regionserver.ReplicationSource(583): Nothing to replicate, sleeping 100 
> times 10
> {code}
> Will attach test output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to