[ https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Daniel Cryans updated HBASE-9373: -------------------------------------- Attachment: 9373-v3.txt Patch v3 wraps the method with a try catch that handles EOF, which is now thrown inside if something goes wrong while parsing. Now we do the seek+return false only in one place. I also dropped down the log level to trace. I tested it twice at this point and didn't lose data. > [replication] data loss because replication doesn't expect partial reads > ------------------------------------------------------------------------ > > Key: HBASE-9373 > URL: https://issues.apache.org/jira/browse/HBASE-9373 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.95.2 > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Priority: Blocker > Fix For: 0.98.0, 0.96.0 > > Attachments: 9373.txt, 9373-v2.txt, 9373-v3.txt > > > When I see this in the logs it often means we got a partial read and then we > have the wrong offset when reading the rest of the file > {noformat} > 2013-08-28 23:16:07,182 ERROR > [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617] > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while > reading WAL, probably an unexpected EOF, ignoring > com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had > invalid wire type. > at > com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99) > at > com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498) > at > com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193) > at > org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:686) > at > org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.<init>(WALProtos.java:644) > at > org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771) > at > org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766) > at > org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444) > at > org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218) > at > com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220) > at > com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912) > at > com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267) > at > com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290) > at > com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926) > at > com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296) > at > com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira