[ https://issues.apache.org/jira/browse/HBASE-20895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell updated HBASE-20895: ----------------------------------- Attachment: HBASE-20895-branch-1.patch > NPE in RpcServer#readAndProcess > ------------------------------- > > Key: HBASE-20895 > URL: https://issues.apache.org/jira/browse/HBASE-20895 > Project: HBase > Issue Type: Bug > Components: rpc > Affects Versions: 1.3.2 > Reporter: Andrew Purtell > Assignee: Monani Mihir > Priority: Major > Fix For: 1.5.0, 1.3.3, 1.4.7 > > Attachments: HBASE-20895-branch-1.patch, HBASE-20895-branch-1.patch > > > {noformat} > 2018-07-10 16:25:55,005 DEBUG [.sfdc.net,port=60020] ipc.RpcServer - > RpcServer.listener,port=60020: Caught exception while reading: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1761) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:949) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:730) > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:706) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This looks like it could be a use after close problem if there is concurrent > access to a Connection. > In process() we might store a null back to the 'data' field. > Meanwhile in readAndProcess() we have a case where we might be blocked on a > channel read and then after coming back from the read we go to use 'data' > after a null has been written back, leading to a NPE. > {quote}count = channelRead(channel, data); > 1761 ---> if (count >= 0 && *data.remaining()* == 0) > \{ process(); }{quote} > Whether a NPE happens or not is going to depend on the timing of the store > back to 'data' in another thread and use of 'data' in this thread and whether > or not the JVM has optimized away a reload of 'data' (it's not declared > volatile) > We should do a null check here just to be defensive. We should also look at > whether concurrent access to the Connection is happening and intended.The > above is just a theory. We should also look at other execution sequences that > could lead to 'data' being null in this location. At a glance I didn't find > one but the store to 'data' happens behind conditionals so it is possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)