[ 
https://issues.apache.org/jira/browse/HBASE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954606#comment-14954606
 ] 

Hudson commented on HBASE-14501:
--------------------------------

FAILURE: Integrated in HBase-TRUNK #6900 (See 
[https://builds.apache.org/job/HBase-TRUNK/6900/])
HBASE-14501 NPE in replication with TDE (enis: rev 
2ff6d0fe4789857ab51685949711d755dedd459a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/codec/BaseDecoder.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValueUtil.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/codec/CellCodec.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SecureWALCellCodec.java


> NPE in replication with TDE
> ---------------------------
>
>                 Key: HBASE-14501
>                 URL: https://issues.apache.org/jira/browse/HBASE-14501
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
>         Attachments: hbase-14501_v1.patch
>
>
> We are seeing a NPE when replication (or in this case async wal replay for 
> region replicas) is run on top of an HDFS cluster with TDE configured.
> This is the stack trace:
> {code}
> java.lang.NullPointerException
>         at org.apache.hadoop.hbase.CellUtil.matchingRow(CellUtil.java:370)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.countDistinctRowKeys(ReplicationSource.java:649)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:450)
>         at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:346)
> {code}
> This stack trace can only happen if WALEdit.getCells() returns an array 
> containing null entries. I believe this happens due to 
> {{KeyValueCodec.parseCell()}} uses {{KeyValueUtil.iscreate()}} which returns 
> null in case of EOF at the beginning. However, the contract for the 
> Decoder.parseCell() is not clear whether returning null is acceptable or not. 
> The other Decoders (CompressedKvDecoder, CellCodec, etc) do not return null 
> while KeyValueCodec does. 
> BaseDecoder has this code: 
> {code}
>   public boolean advance() throws IOException {
>     if (!this.hasNext) return this.hasNext;
>     if (this.in.available() == 0) {
>       this.hasNext = false;
>       return this.hasNext;
>     }
>     try {
>       this.current = parseCell();
>     } catch (IOException ioEx) {
>       rethrowEofException(ioEx);
>     }
>     return this.hasNext;
>   }
> {code}
> which is not correct since it uses {{IS.available()}} not according to the 
> javadoc: 
> (https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()).
>  DFSInputStream implements {{available()}} as the remaining bytes to read 
> from the stream, so we do not see the issue there. 
> {{CryptoInputStream.available()}} does a similar thing but see the issue. 
> So two questions: 
>  - What should be the interface for Decoder.parseCell()? Can it return null? 
>  - How to properly fix  BaseDecoder.advance() to not rely on {{available()}} 
> call. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to