Viraj Jasani created HDFS-16953:
-----------------------------------
Summary: RBF Mount table store APIs should update cache only if
state store record is successfully updated
Key: HDFS-16953
URL: https://issues.apache.org/jira/browse/HDFS-16953
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Viraj Jasani
Assignee: Viraj Jasani
RBF Mount table state store APIs addMountTableEntry, updateMountTableEntry and
removeMountTableEntry performs cache refresh for all routers regardless of the
actual record update result. If the record fails to get updated on
zookeeper/file based store impl, reloading the cache for all routers would be
unnecessary.
For instance, simultaneously adding new mount point could lead to failure for
the second call if first call has not added new entry by the time second call
retrieves mount table entry from getMountTableEntries before attempting to call
addMountTableEntry.
{code:java}
DEBUG [{cluster}/{ip}:8111] ipc.Client - IPC Client (1826699684) connection to
nn-0-{ns}.{cluster}/{ip}:8111 from {user}IPC Client (1826699684) connection to
nn-0-{ns}.{cluster}/{ip}:8111 from {user} sending #1
org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocol.addMountTableEntry
DEBUG [{cluster}/{ip}:8111 from {user}] ipc.Client - IPC Client (1826699684)
connection to nn-0-{ns}.{cluster}/{ip}:8111 from {user} got value #1
DEBUG [main] ipc.ProtobufRpcEngine2 - Call: addMountTableEntry took 24ms
DEBUG [{cluster}/{ip}:8111 from {user}] ipc.Client - IPC Client (1826699684)
connection to nn-0-{ns}.{cluster}/{ip}:8111 from {user}: closed
DEBUG [{cluster}/{ip}:8111 from {user}] ipc.Client - IPC Client (1826699684)
connection to nn-0-{ns}.{cluster}/{ip}:8111 from {user}: stopped, remaining
connections 0
TRACE [main] ipc.ProtobufRpcEngine2 - 1: Response <-
nn-0-{ns}.{cluster}/{ip}:8111: addMountTableEntry {status: false}
Cannot add mount point /data503 {code}
The failure to write new record:
{code:java}
INFO [IPC Server handler 0 on default port 8111] impl.StateStoreZooKeeperImpl
- Cannot write record "/hdfs-federation/MountTable/0SLASH0data503", it already
exists {code}
Since the successful call has already refreshed cache for all routers, second
call that failed should not have refreshed cache for all routers again as
everyone already has updated records in cache.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]