Anoop Sam John created HBASE-15214:
--------------------------------------
Summary: Valid mutate Ops fail with RPC Codec in use and region
moves across
Key: HBASE-15214
URL: https://issues.apache.org/jira/browse/HBASE-15214
Project: HBase
Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Critical
Test failures in HBASE-15198 lead to this bug. Till now we are not doing cell
block (codec usage) for write requests. (Client -> server) Once we enabled
Codec usage by default, aw this issue.
A multi request came to RS with mutation for different regions. One of the
region which was in this RS got unavailable now. In RsRpcServices#multi, we
will fail that entire RegionAction (with N mutations in it) in that
MultiRequest. Then we will continue with remaining RegionActions. Those
Regions might be available. (The failed RegionAction will get retried from
client after fetching latest region location). This all works fine in pure PB
requests world. When a Codec is used, we wont convert the Mutation Cell to PB
Cells and pack them in PB Message. Instead we will pass all Cells serialized
into one byte[] cellblock. Using Decoder we will iterate over these cells at
server side. Each Mutation PB will know only the number of cells associated
with it. As in above case when an entire RegionAction was skipped, there might
be N Mutations under that which might have corresponding Cells in the
cellblock. We are not doing the skip in that Iterator. This makes the later
Mutations (for other Regions) to refer to invalid Cells and try to put those
into the a different region. This will make HRegion#checkRow() to throw
WrongRegionException which will be treated as Sanity check failure and so
throwing back a DNRIOE to client. So the op will get failed for the user code.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)