[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

Yiqun Lin (JIRA) Thu, 22 Nov 2018 19:49:51 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696359#comment-16696359
 ]


Yiqun Lin commented on HDFS-13811:
----------------------------------

Thanks for the explanation, [~dibyendu_hadoop]. I think I have got your 
thought. I am still reviewing, but some initial comments for you:

*RouterQuotaManager.java*
I'd like to keep original logic in {{getQuotaUsage}} and make that cleaned. We 
can allow usage is not found within a short time and wait for quota periodic 
update behaviour. Also the logic we change is incorrect, if we don't find the 
usage, it will get its parent usage until we find the right one.

*RouterQuotaUpdateService.periodicInvoke(MountTable entry)*
Line84: I'd like to add a try-catch for {{periodicInvoke}} method. So that one 
mount table updated error won't lead a loop exit.
Line92: Rename {{periodicInvoke}} to {{updateQuotaUsage}}.
Line124: {{currentQuotaUsage}} is an aggregated quota. The quota here 
(currentQuotaUsage.getQuota) only mean the last subcluster's quota value not 
mean all sub-clusters. If one subcluster filesysem's quota was changed, it 
still cannot be checked in following logic. Here I prefer to file another JIRA 
to improve this and keep original logic temporary.
{code}
    // If there is a mismatch between the quota values in router cache
    // and sub-cluster file-system, sync the quota.
    if (currentQuotaUsage.getQuota() != nsQuota
        || currentQuotaUsage.getSpaceQuota() != ssQuota) {
      try {
        this.rpcServer.setQuota(src, nsQuota, ssQuota, null);
      } catch (IOException ioe) {
        LOG.error("Unable to set quota at remote location for " + src, ioe);
      }
    }
{code}
Line137: This line isn't needed.
Line176: In quota update service, we don't really need to use parameter 
{{updateQuotaCache}}. Why not just set {{false}}. And no need to pass 
{{updateQuotaCache}} parameter.

Haven't fully reviewed the UT, but I think we need to add a new test case for 
quota cache updating behaviour since we introduce the {{updateQuotaCache}} flag 
for mount table getting.

> RBF: Race condition between router admin quota update and periodic quota 
> update service
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-13811
>                 URL: https://issues.apache.org/jira/browse/HDFS-13811
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Dibyendu Karmakar
>            Assignee: Dibyendu Karmakar
>            Priority: Major
>         Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch
>
>
> If we try to update quota of an existing mount entry and at the same time 
> periodic quota update service is running on the same mount entry, it is 
> leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service

Reply via email to