[ https://issues.apache.org/jira/browse/HBASE-27963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740668#comment-17740668 ]
Rushabh Shah commented on HBASE-27963: -------------------------------------- We are also seeing similar errors in our production environment. We are running some version of 1.7 version. As a work around we restart the regionserver and the new regionserver is able to replicate. So some in-memory data structure is out of sync. > Replication stuck when switch to new reader > ------------------------------------------- > > Key: HBASE-27963 > URL: https://issues.apache.org/jira/browse/HBASE-27963 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.5 > Reporter: Xiaolin Ha > Assignee: Xiaolin Ha > Priority: Major > > After creating new reader for next WAL, it immediately seek() to the > currentPositionOfEntry, but this position may be spill over the length of > current WAL. > {code:java} > WARN > [RpcServer.default.FPRWQ.Fifo.read.handler=101,queue=1,port=16020.replicationSource.wal-reader.XXXXXXX] > regionserver.ReplicationSourceWALReader: Failed to read stream of > replication entries > java.io.EOFException: Cannot seek after EOF > at > org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1488) > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.seekOnFs(ProtobufLogReader.java:495) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.seek(ReaderBase.java:138) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.seek(WALEntryStream.java:399) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:341) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.handleFileNotFound(WALEntryStream.java:328) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openReader(WALEntryStream.java:347) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.openNextLog(WALEntryStream.java:310) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.checkReader(WALEntryStream.java:300) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:176) > at > org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:102) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.tryAdvanceStreamAndCreateWALBatch(ReplicationSourceWALReader.java:260) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:142) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)