[ 
https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194691#comment-15194691
 ] 

Jianwei Cui commented on HBASE-15433:
-------------------------------------

Thanks for your comment.
{quote}
Not required I think, because we are having enough quota for this table in the 
cache before restoring the snapshot and after restoring snapshot we are only 
decrementing it, so it will work.
{quote}
There may be corner case if I am not wrong. For example, if the table has 5 
regions, clientA is trying to restore the table to snapshot with 8 regions, 
while clientB is trying to restore the snapshot with 10 regions, then:
1. clientA firstly invokes {{checkAndUpdateNamespaceRegionQuota}} before 
{{restoreSnapshot}}, the {{tableRegionCount}} is 5 for clientA and it updates 
the region count of the table to 8
2. Before clientA invokes {{restoreSnapshot}}, clientB invokes 
{{checkAndUpdateNamespaceRegionQuota}} before {{restoreSnapshot}}, the 
{{tableRegionCount}} is 8 for clientB and it updates the region count of the 
table to 10
3. Both clientA and clientB encountered IOE in {{restoreSnapshot}}, and the two 
clients are trying to reset the region count in IOE catch clause
4. clientA firstly reset the region count to 5, and then clientB reset the 
region count to 8, so the final region count for the table is 8 in such case, 
but it should be 5 because both operations failed.
It seems not easy to update the quota information correctly without lock if 
there are concurrent restoreSnapshot requests IMO. Maybe, it is more easy to do 
such work in {{RestoreSnapshotHandler}} with table lock held(like 
{{CreateTableProcedure}})?
1. In {{RestoreSnapshotHandler}}, overwrite {{prepareWithTableLock}} method 
with {{checkAndUpdateNamespaceRegionQuota}} if {{snapshotRegionCount}} is 
larger than {{tableRegionCount}}. If {{checkAndUpdateNamespaceRegionQuota}} 
fails here, we do not need to reset the region count and {{SnapshotManager}} 
will throw exception directly.
2. In {{RestoreSnapshotHandler#completed}}, if exception received and 
{{tableRegionCount < SnapshotRegionCount}}, we reset region count to 
{{tableRegionCount}}, if no exception received and {{tableRegionCount > 
snapshotRegionCount}}, we set region count to {{snapshotRegionCount}}. 
What's your opinion about this issue? [~ashish singhi]

{quote}
We can also include quota has exceeded in the error message
{quote}
Yes, will polish the message and update the patch.

> SnapshotManager#restoreSnapshot not update table and region count quota 
> correctly when encountering exception
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15433
>                 URL: https://issues.apache.org/jira/browse/HBASE-15433
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 2.0.0
>            Reporter: Jianwei Cui
>         Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, 
> HBASE-15433-trunk.patch
>
>
> In SnapshotManager#restoreSnapshot, the table and region quota will be 
> checked and updated as:
> {code}
>       try {
>         // Table already exist. Check and update the region quota for this 
> table namespace
>         checkAndUpdateNamespaceRegionQuota(manifest, tableName);
>         restoreSnapshot(snapshot, snapshotTableDesc);
>       } catch (IOException e) {
>         
> this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName);
>         LOG.error("Exception occurred while restoring the snapshot " + 
> snapshot.getName()
>             + " as table " + tableName.getNameAsString(), e);
>         throw e;
>       }
> {code}
> The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot 
> make the region count quota exceeded, then, the table will be removed in the 
> 'catch' block. This will make the current table count and region count 
> decrease, following table creation or region split will succeed even if the 
> actual quota is exceeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to