[ 
https://issues.apache.org/jira/browse/KAFKA-9729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiao Zhang updated KAFKA-9729:
------------------------------
    Description: 
Current SimpleAuthorizer needs 'inWriteLock' when processing add/remove acls 
requests, while getAcls in authorize() needs 'inReadLock'.
 That means handling add/remove acls requests would block all other requests 
for example produce and fetch requests.
 When processing add/remove acls, updateResourceAcls() access zk to update 
acls, which could be long in the case like network glitch.
 We did the simulation for zk delay.
 When adding 100ms delay on zk side, 'inWriteLock' in addAcls()/removeAcls 
lasts for 400ms~500ms.
 When adding 500ms delay on zk side, 'inWriteLock' in addAcls()/removeAcls 
lasts for 2000ms~2500ms.
{code:java}
override def addAcls(acls: Set[Acl], resource: Resource) {
  if (acls != null && acls.nonEmpty) {
    inWriteLock(lock) {
      val startMs = Time.SYSTEM.milliseconds()
      updateResourceAcls(resource) { currentAcls =>
        currentAcls ++ acls
      }
      warn(s"inWriteLock in addAcls consumes ${Time.SYSTEM.milliseconds() - 
startMs} milliseconds.")
    }
  }
}{code}
Blocking produce/fetch requests for 2s would cause apparent performance 
degradation for the whole cluster.
 So considering is it possible to remove 'inWriteLock' in addAcls/removeAcls 
and only put 'inWriteLock' inside updateCache, which is called by 
addAcls/removeAcls. 
{code:java}
// code placeholder
private def updateCache(resource: Resource, versionedAcls: VersionedAcls) {
 if (versionedAcls.acls.nonEmpty) {
 aclCache.put(resource, versionedAcls)
 } else {
 aclCache.remove(resource)
 }
 }
{code}
If do this, block time is only the time for updating local cache, which isn't 
influenced by network glitch. But don't know if there were special concerns to 
have current strict write lock and not sure if there are side effects if only 
put lock to updateCache.

Btw, the latest version keeps the same 'inWriteLock' as our version 1.1.0.

  was:
Current SimpleAuthorizer needs 'inWriteLock' when processing add/remove acls 
requests, while getAcls in authorize() needs 'inReadLock'.
 That means handling add/remove acls requests would block all other requests 
for example produce and fetch requests.
 When processing add/remove acls, updateResourceAcls() access zk to update 
acls, which could be long in the case like network glitch.
 We did the simulation for zk delay.
 When adding 100ms delay on zk side, 'inWriteLock' in addAcls()/removeAcls 
lasts for 400ms~500ms.
 When adding 500ms delay on zk side, 'inWriteLock' in addAcls()/removeAcls 
lasts for 2000ms~2500ms.
{code:java}
override def addAcls(acls: Set[Acl], resource: Resource) {
  if (acls != null && acls.nonEmpty) {
    inWriteLock(lock) {
      val startMs = Time.SYSTEM.milliseconds()
      updateResourceAcls(resource) { currentAcls =>
        currentAcls ++ acls
      }
      warn(s"inWriteLock in addAcls consumes ${Time.SYSTEM.milliseconds() - 
startMs} milliseconds.")
    }
  }
}{code}
Blocking produce/fetch requests for 2s would cause apparent performance 
degradation for the whole cluster.
 So considering is it possible to remove 'inWriteLock' in addAcls/removeAcls 
and only put 'inWriteLock' inside updateCache, which is called by 
addAcls/removeAcls. 
{code:java}
// code placeholder
private def updateCache(resource: Resource, versionedAcls: VersionedAcls) {
 if (versionedAcls.acls.nonEmpty) {
 aclCache.put(resource, versionedAcls)
 } else {
 aclCache.remove(resource)
 }
 }
{code}
If do this, block time is only the time for updating local cache, which isn't 
influenced by network glitch. But don't know if there were special concerns to 
have current strict write lock and not sure if there are side effects if only 
put lock to updateCache.


> Shrink inWriteLock time in SimpleAuthorizer
> -------------------------------------------
>
>                 Key: KAFKA-9729
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9729
>             Project: Kafka
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 1.1.0
>            Reporter: Jiao Zhang
>            Priority: Minor
>
> Current SimpleAuthorizer needs 'inWriteLock' when processing add/remove acls 
> requests, while getAcls in authorize() needs 'inReadLock'.
>  That means handling add/remove acls requests would block all other requests 
> for example produce and fetch requests.
>  When processing add/remove acls, updateResourceAcls() access zk to update 
> acls, which could be long in the case like network glitch.
>  We did the simulation for zk delay.
>  When adding 100ms delay on zk side, 'inWriteLock' in addAcls()/removeAcls 
> lasts for 400ms~500ms.
>  When adding 500ms delay on zk side, 'inWriteLock' in addAcls()/removeAcls 
> lasts for 2000ms~2500ms.
> {code:java}
> override def addAcls(acls: Set[Acl], resource: Resource) {
>   if (acls != null && acls.nonEmpty) {
>     inWriteLock(lock) {
>       val startMs = Time.SYSTEM.milliseconds()
>       updateResourceAcls(resource) { currentAcls =>
>         currentAcls ++ acls
>       }
>       warn(s"inWriteLock in addAcls consumes ${Time.SYSTEM.milliseconds() - 
> startMs} milliseconds.")
>     }
>   }
> }{code}
> Blocking produce/fetch requests for 2s would cause apparent performance 
> degradation for the whole cluster.
>  So considering is it possible to remove 'inWriteLock' in addAcls/removeAcls 
> and only put 'inWriteLock' inside updateCache, which is called by 
> addAcls/removeAcls. 
> {code:java}
> // code placeholder
> private def updateCache(resource: Resource, versionedAcls: VersionedAcls) {
>  if (versionedAcls.acls.nonEmpty) {
>  aclCache.put(resource, versionedAcls)
>  } else {
>  aclCache.remove(resource)
>  }
>  }
> {code}
> If do this, block time is only the time for updating local cache, which isn't 
> influenced by network glitch. But don't know if there were special concerns 
> to have current strict write lock and not sure if there are side effects if 
> only put lock to updateCache.
> Btw, the latest version keeps the same 'inWriteLock' as our version 1.1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to