[
https://issues.apache.org/jira/browse/NIFI-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tamas Palfy updated NIFI-7527:
------------------------------
Description:
The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos
issue after TGT expires) introduced a new bug: after TGT refresh the processor
ends up in a deadlock.
The reason is that the onTrigger initiates a read lock:
{code:java}
@Override
public void onTrigger(final ProcessContext context, final ProcessSession
session) throws ProcessException {
kuduClientReadLock.lock();
try {
onTrigger(context, session, kuduClientR);
} finally {
kuduClientReadLock.unlock();
}
}
{code}
and while the read lock is in effect, later (in the same stack) - if TGT
refresh occurs - a write lock is attempted:
{code:java}
...
public synchronized boolean checkTGTAndRelogin() throws
LoginException {
boolean didRelogin = super.checkTGTAndRelogin();
if (didRelogin) {
createKuduClient(context);
}
return didRelogin;
}
...
protected void createKuduClient(ProcessContext context) {
kuduClientWriteLock.lock();
try {
if (this.kuduClientR.get() != null) {
try {
this.kuduClientR.get().close();
} catch (KuduException e) {
getLogger().error("Couldn't close Kudu client.");
}
}
if (kerberosUser != null) {
final KerberosAction<KuduClient> kerberosAction = new
KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger());
this.kuduClientR.set(kerberosAction.execute());
} else {
this.kuduClientR.set(buildClient(context));
}
} finally {
kuduClientWriteLock.unlock();
}
}
{code}
This attempt at the write lock will get stuck, waiting for the previous read
lock to get released.
(Other threads may have acquired the same read lock but they can release it
eventually - unless they too try to acquire the write lock themselves.)
For the fix it seemed to be best to re-evalute the locking logic.
Previously basically the whole onTrigger logic was encapsulated in a read lock,
including the checking - and recreating as needed - the Kudu client (as
explained before).
It's best to just keep the actual privileged action in the read lock so the the
refreshing of the TGT and re-creation of the Kudu client can safely be done in
a write lock before that.
was:
The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos
issue after TGT expires) introduced a new bug: after TGT refresh the processor
ends up in a deadlock.
The reason is that the onTrigger initiates a read lock:
{code:java}
@Override
public void onTrigger(final ProcessContext context, final ProcessSession
session) throws ProcessException {
kuduClientReadLock.lock();
try {
onTrigger(context, session, kuduClientR);
} finally {
kuduClientReadLock.unlock();
}
}
{code}
and while the read lock is in effect, later (in the same stack) - if TGT
refresh occurs - a write lock is attempted:
{code:java}
...
public synchronized boolean checkTGTAndRelogin() throws
LoginException {
boolean didRelogin = super.checkTGTAndRelogin();
if (didRelogin) {
createKuduClient(context);
}
return didRelogin;
}
...
protected void createKuduClient(ProcessContext context) {
kuduClientWriteLock.lock();
try {
if (this.kuduClientR.get() != null) {
try {
this.kuduClientR.get().close();
} catch (KuduException e) {
getLogger().error("Couldn't close Kudu client.");
}
}
if (kerberosUser != null) {
final KerberosAction<KuduClient> kerberosAction = new
KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger());
this.kuduClientR.set(kerberosAction.execute());
} else {
this.kuduClientR.set(buildClient(context));
}
} finally {
kuduClientWriteLock.unlock();
}
}
{code}
This attempt at the write lock will get stuck, waiting for the previous read
lock to get released.
(Other threads may have acquired the same read lock but they can release it
eventually - unless they too try to acquire the write lock themselves.)
The fix should be fairly simple: need to release the read lock before trying to
acquire the write lock.
Need to take care of the following though:
* Need to reacquire the read lock after the new Kudu client has been created
because the current logic calls an unlock on the ReadLock object later in a
finally block. As this is a reentrant lock, that means decreasing the counter
for the number of locks. We need to make sure the that number of locks and
unlocks match.
* Need to make sure whenever we reference the Kudu client, it's actually the
latest one. volatile no longer suffice, we need to wrap it in an
AtomicReference.
> AbstractKuduProcessor deadlocks after TGT refresh
> --------------------------------------------------
>
> Key: NIFI-7527
> URL: https://issues.apache.org/jira/browse/NIFI-7527
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Tamas Palfy
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos
> issue after TGT expires) introduced a new bug: after TGT refresh the
> processor ends up in a deadlock.
> The reason is that the onTrigger initiates a read lock:
> {code:java}
> @Override
> public void onTrigger(final ProcessContext context, final ProcessSession
> session) throws ProcessException {
> kuduClientReadLock.lock();
> try {
> onTrigger(context, session, kuduClientR);
> } finally {
> kuduClientReadLock.unlock();
> }
> }
> {code}
> and while the read lock is in effect, later (in the same stack) - if TGT
> refresh occurs - a write lock is attempted:
> {code:java}
> ...
> public synchronized boolean checkTGTAndRelogin() throws
> LoginException {
> boolean didRelogin = super.checkTGTAndRelogin();
> if (didRelogin) {
> createKuduClient(context);
> }
> return didRelogin;
> }
> ...
> protected void createKuduClient(ProcessContext context) {
> kuduClientWriteLock.lock();
> try {
> if (this.kuduClientR.get() != null) {
> try {
> this.kuduClientR.get().close();
> } catch (KuduException e) {
> getLogger().error("Couldn't close Kudu client.");
> }
> }
> if (kerberosUser != null) {
> final KerberosAction<KuduClient> kerberosAction = new
> KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger());
> this.kuduClientR.set(kerberosAction.execute());
> } else {
> this.kuduClientR.set(buildClient(context));
> }
> } finally {
> kuduClientWriteLock.unlock();
> }
> }
> {code}
> This attempt at the write lock will get stuck, waiting for the previous read
> lock to get released.
> (Other threads may have acquired the same read lock but they can release it
> eventually - unless they too try to acquire the write lock themselves.)
> For the fix it seemed to be best to re-evalute the locking logic.
> Previously basically the whole onTrigger logic was encapsulated in a read
> lock, including the checking - and recreating as needed - the Kudu client
> (as explained before).
> It's best to just keep the actual privileged action in the read lock so the
> the refreshing of the TGT and re-creation of the Kudu client can safely be
> done in a write lock before that.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)