[jira] [Updated] (HBASE-9373) [replication] data loss because replication doesn't expect partial reads

2013-08-30 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-9373:
--

Attachment: 9373-v3.txt

Patch v3 wraps the method with a try catch that handles EOF, which is now 
thrown inside if something goes wrong while parsing. Now we do the seek+return 
false only in one place. I also dropped down the log level to trace.

I tested it twice at this point and didn't lose data.

 [replication] data loss because replication doesn't expect partial reads
 

 Key: HBASE-9373
 URL: https://issues.apache.org/jira/browse/HBASE-9373
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.95.2
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.98.0, 0.96.0

 Attachments: 9373.txt, 9373-v2.txt, 9373-v3.txt


 When I see this in the logs it often means we got a partial read and then we 
 have the wrong offset when reading the rest of the file
 {noformat}
 2013-08-28 23:16:07,182 ERROR 
 [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
  org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while 
 reading WAL, probably an unexpected EOF, ignoring
 com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
 invalid wire type.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
 at 
 com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
 at 
 com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:686)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:644)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9373) [replication] data loss because replication doesn't expect partial reads

2013-08-30 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-9373:
--

Attachment: 9373-v4.patch

v4 is fixing most of what Stack pointed out, we'll keep the messaging via EOFE 
because I tested it a lot.

 [replication] data loss because replication doesn't expect partial reads
 

 Key: HBASE-9373
 URL: https://issues.apache.org/jira/browse/HBASE-9373
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.95.2
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.98.0, 0.96.0

 Attachments: 9373.txt, 9373-v2.txt, 9373-v3.txt, 9373-v4.patch


 When I see this in the logs it often means we got a partial read and then we 
 have the wrong offset when reading the rest of the file
 {noformat}
 2013-08-28 23:16:07,182 ERROR 
 [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
  org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while 
 reading WAL, probably an unexpected EOF, ignoring
 com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
 invalid wire type.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
 at 
 com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
 at 
 com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:686)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:644)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9373) [replication] data loss because replication doesn't expect partial reads

2013-08-30 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-9373:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to branch and trunk, thank you Stack for your help.

 [replication] data loss because replication doesn't expect partial reads
 

 Key: HBASE-9373
 URL: https://issues.apache.org/jira/browse/HBASE-9373
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.95.2
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.98.0, 0.96.0

 Attachments: 9373.txt, 9373-v2.txt, 9373-v3.txt, 9373-v4.patch


 When I see this in the logs it often means we got a partial read and then we 
 have the wrong offset when reading the rest of the file
 {noformat}
 2013-08-28 23:16:07,182 ERROR 
 [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
  org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while 
 reading WAL, probably an unexpected EOF, ignoring
 com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
 invalid wire type.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
 at 
 com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
 at 
 com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:686)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:644)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9373) [replication] data loss because replication doesn't expect partial reads

2013-08-29 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-9373:
--

Description: 
When I see this in the logs it often means we got a partial read and then we 
have the wrong offset when reading the rest of the file

{noformat}
2013-08-28 23:16:07,182 ERROR 
[ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while 
reading WAL, probably an unexpected EOF, ignoring
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
invalid wire type.
at 
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
at 
com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:686)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:644)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
at 
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
at 
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
at 
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
at 
com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
at 
com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
at 
com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
at 
com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
at 
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
{noformat}

  was:
Two things that are bugging me.

First this one where we try to be more responsive now and only sleep 1 second 
if we didn't get data. Let's set it down to TRACE.

bq. 2013-08-28 23:17:47,421 DEBUG [regionserver60020.replicationSource,1] 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Nothing to 
replicate, sleeping 1000 times 1

Then I've seen cases where we can hit an EOF and instead of just being silent 
we hit this:

{noformat}
2013-08-28 23:16:07,182 ERROR 
[ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while 
reading WAL, probably an unexpected EOF, ignoring
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
invalid wire type.
at 
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
at 
com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:686)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:644)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
at 
org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
at 
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
at 
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
at 

[jira] [Updated] (HBASE-9373) [replication] data loss because replication doesn't expect partial reads

2013-08-29 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-9373:
--

Attachment: 9373-v2.txt

This v2 augments Stack's patch by re-seeking to our original position if we get 
partial data and then return false (basically if we get partial reads, roll 
back our latest read and act as if it wasn't there).

It fixed replication for me, at least in the few tests that I ran. I also saw 
the relevant logs and saw replication doing the right thing. I might saw 2-3 
partial reads but then replication would finally get to see the full data.

 [replication] data loss because replication doesn't expect partial reads
 

 Key: HBASE-9373
 URL: https://issues.apache.org/jira/browse/HBASE-9373
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.95.2
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.98.0, 0.96.0

 Attachments: 9373.txt, 9373-v2.txt


 When I see this in the logs it often means we got a partial read and then we 
 have the wrong offset when reading the rest of the file
 {noformat}
 2013-08-28 23:16:07,182 ERROR 
 [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
  org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while 
 reading WAL, probably an unexpected EOF, ignoring
 com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
 invalid wire type.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
 at 
 com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
 at 
 com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:686)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:644)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9373) [replication] data loss because replication doesn't expect partial reads

2013-08-29 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-9373:
--

Status: Patch Available  (was: Open)

Submitting patch to see if it works on the unit tests but it needs more 
cleaning and I'd like to tune down the ERRORs.

 [replication] data loss because replication doesn't expect partial reads
 

 Key: HBASE-9373
 URL: https://issues.apache.org/jira/browse/HBASE-9373
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.95.2
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.98.0, 0.96.0

 Attachments: 9373.txt, 9373-v2.txt


 When I see this in the logs it often means we got a partial read and then we 
 have the wrong offset when reading the rest of the file
 {noformat}
 2013-08-28 23:16:07,182 ERROR 
 [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-5,60020,1377730319617]
  org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader: Invalid PB while 
 reading WAL, probably an unexpected EOF, ignoring
 com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
 invalid wire type.
 at 
 com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
 at 
 com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
 at 
 com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:686)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey.init(WALProtos.java:644)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:771)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$1.parsePartialFrom(WALProtos.java:766)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1444)
 at 
 org.apache.hadoop.hbase.protobuf.generated.WALProtos$WALKey$Builder.mergeFrom(WALProtos.java:1218)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:220)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:912)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:290)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:926)
 at 
 com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:296)
 at 
 com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:918)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:89)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:390)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:298)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira