[ https://issues.apache.org/jira/browse/HBASE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954446#comment-14954446 ]
Hudson commented on HBASE-14501: -------------------------------- FAILURE: Integrated in HBase-1.3-IT #229 (See [https://builds.apache.org/job/HBase-1.3-IT/229/]) HBASE-14501 NPE in replication with TDE (enis: rev bbbb52cfbf7ef05150431d00ee127eb45b545c28) * hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java * hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SecureWALCellCodec.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseDecoder.java * hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodec.java * hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValueUtil.java > NPE in replication with TDE > --------------------------- > > Key: HBASE-14501 > URL: https://issues.apache.org/jira/browse/HBASE-14501 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: hbase-14501_v1.patch > > > We are seeing a NPE when replication (or in this case async wal replay for > region replicas) is run on top of an HDFS cluster with TDE configured. > This is the stack trace: > {code} > java.lang.NullPointerException > at org.apache.hadoop.hbase.CellUtil.matchingRow(CellUtil.java:370) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.countDistinctRowKeys(ReplicationSource.java:649) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:450) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:346) > {code} > This stack trace can only happen if WALEdit.getCells() returns an array > containing null entries. I believe this happens due to > {{KeyValueCodec.parseCell()}} uses {{KeyValueUtil.iscreate()}} which returns > null in case of EOF at the beginning. However, the contract for the > Decoder.parseCell() is not clear whether returning null is acceptable or not. > The other Decoders (CompressedKvDecoder, CellCodec, etc) do not return null > while KeyValueCodec does. > BaseDecoder has this code: > {code} > public boolean advance() throws IOException { > if (!this.hasNext) return this.hasNext; > if (this.in.available() == 0) { > this.hasNext = false; > return this.hasNext; > } > try { > this.current = parseCell(); > } catch (IOException ioEx) { > rethrowEofException(ioEx); > } > return this.hasNext; > } > {code} > which is not correct since it uses {{IS.available()}} not according to the > javadoc: > (https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()). > DFSInputStream implements {{available()}} as the remaining bytes to read > from the stream, so we do not see the issue there. > {{CryptoInputStream.available()}} does a similar thing but see the issue. > So two questions: > - What should be the interface for Decoder.parseCell()? Can it return null? > - How to properly fix BaseDecoder.advance() to not rely on {{available()}} > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)