[jira] [Commented] (HBASE-17330) SnapshotFileCache will always refresh the file cache
[ https://issues.apache.org/jira/browse/HBASE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771727#comment-15771727 ] Jianwei Cui commented on HBASE-17330: - Thanks for pointing out the mod time problem, [~stack]. I tried the patch locally as: 1. start a client to take snapshot periodically; 2. make {{SnapshotFileCache#refreshCache}} log the loading hfile names each time it scheduled. The log shows {{SnapshotFileCache}} could load the hfiles referenced by snapshots taken before {{refreshCache}} starting. However, as you mentioned, relying on the mod time is risky, the accuracy of mod time depends on the implementation of underlying file system, and the mod time could also be updated(such as by {{FSNamesystem#setTimes}}). To be more safe, we can make {{SnapshotFileCache#getUnreferencedFiles}} load hfile names through on-disk snapshots if the passed file is not in memory cache? as: {code} public synchronized Iterable getUnreferencedFiles(Iterable files, final SnapshotManager snapshotManager) throws IOException { ... for (FileStatus file : files) { String fileName = file.getPath().getName(); if (!refreshed && !cache.contains(fileName)) { refreshCache(); // ==> Always load hfile names through on-disk snapshots(not consider the mod time). refreshed = true; } if (cache.contains(fileName)) { continue; } {code} > SnapshotFileCache will always refresh the file cache > > > Key: HBASE-17330 > URL: https://issues.apache.org/jira/browse/HBASE-17330 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0, 1.3.1, 0.98.23 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17330-v1.patch, HBASE-17330-v2.patch > > > In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: > {code} > try { > FileStatus dirStatus = fs.getFileStatus(snapshotDir); > lastTimestamp = dirStatus.getModificationTime(); > hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make > hasChanges always be true > {code} > The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always > be true because {{lastTimestamp}} will be updated as: > {code} > this.lastModifiedTime = lastTimestamp; > {code} > So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17330) SnapshotFileCache will always refresh the file cache
[ https://issues.apache.org/jira/browse/HBASE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769907#comment-15769907 ] Jianwei Cui commented on HBASE-17330: - Thanks for the review, Ted. > SnapshotFileCache will always refresh the file cache > > > Key: HBASE-17330 > URL: https://issues.apache.org/jira/browse/HBASE-17330 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0, 1.3.1, 0.98.23 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17330-v1.patch, HBASE-17330-v2.patch > > > In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: > {code} > try { > FileStatus dirStatus = fs.getFileStatus(snapshotDir); > lastTimestamp = dirStatus.getModificationTime(); > hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make > hasChanges always be true > {code} > The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always > be true because {{lastTimestamp}} will be updated as: > {code} > this.lastModifiedTime = lastTimestamp; > {code} > So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17347) ExportSnapshot may write snapshot info file to wrong directory when specifying target name
[ https://issues.apache.org/jira/browse/HBASE-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765912#comment-15765912 ] Jianwei Cui commented on HBASE-17347: - Thanks for the review [~tedyu] > ExportSnapshot may write snapshot info file to wrong directory when > specifying target name > -- > > Key: HBASE-17347 > URL: https://issues.apache.org/jira/browse/HBASE-17347 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17347-v1.patch > > > Exportsnapshot will write a new snapshot info file when specifying the target > name: > {code} > if (!targetName.equals(snapshotName)) { > SnapshotDescription snapshotDesc = > SnapshotDescriptionUtils.readSnapshotInfo(inputFs, snapshotDir) > .toBuilder() > .setName(targetName) > .build(); > SnapshotDescriptionUtils.writeSnapshotInfo(snapshotDesc, > snapshotTmpDir, outputFs); > } > {code} > The snapshot info file will be written to the snapshot tmp directory, > however, it should be directly written to the snapshot directory if > {{snapshot.export.skip.tmp}} is true. In addition, owner and permission > should be set for the new snapshot info file when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17347) ExportSnapshot may write snapshot info file to wrong directory when specifying target name
[ https://issues.apache.org/jira/browse/HBASE-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-17347: Attachment: HBASE-17347-v1.patch > ExportSnapshot may write snapshot info file to wrong directory when > specifying target name > -- > > Key: HBASE-17347 > URL: https://issues.apache.org/jira/browse/HBASE-17347 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-17347-v1.patch > > > Exportsnapshot will write a new snapshot info file when specifying the target > name: > {code} > if (!targetName.equals(snapshotName)) { > SnapshotDescription snapshotDesc = > SnapshotDescriptionUtils.readSnapshotInfo(inputFs, snapshotDir) > .toBuilder() > .setName(targetName) > .build(); > SnapshotDescriptionUtils.writeSnapshotInfo(snapshotDesc, > snapshotTmpDir, outputFs); > } > {code} > The snapshot info file will be written to the snapshot tmp directory, > however, it should be directly written to the snapshot directory if > {{snapshot.export.skip.tmp}} is true. In addition, owner and permission > should be set for the new snapshot info file when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17347) ExportSnapshot may write snapshot info file to wrong directory when specifying target name
Jianwei Cui created HBASE-17347: --- Summary: ExportSnapshot may write snapshot info file to wrong directory when specifying target name Key: HBASE-17347 URL: https://issues.apache.org/jira/browse/HBASE-17347 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 2.0.0 Reporter: Jianwei Cui Priority: Minor Exportsnapshot will write a new snapshot info file when specifying the target name: {code} if (!targetName.equals(snapshotName)) { SnapshotDescription snapshotDesc = SnapshotDescriptionUtils.readSnapshotInfo(inputFs, snapshotDir) .toBuilder() .setName(targetName) .build(); SnapshotDescriptionUtils.writeSnapshotInfo(snapshotDesc, snapshotTmpDir, outputFs); } {code} The snapshot info file will be written to the snapshot tmp directory, however, it should be directly written to the snapshot directory if {{snapshot.export.skip.tmp}} is true. In addition, owner and permission should be set for the new snapshot info file when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17330) SnapshotFileCache will always refresh the file cache
[ https://issues.apache.org/jira/browse/HBASE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763115#comment-15763115 ] Jianwei Cui commented on HBASE-17330: - The failed test passed locally, could you please take a look at patch v2? [~tedyu]. Thanks. > SnapshotFileCache will always refresh the file cache > > > Key: HBASE-17330 > URL: https://issues.apache.org/jira/browse/HBASE-17330 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0, 1.3.1, 0.98.23 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Attachments: HBASE-17330-v1.patch, HBASE-17330-v2.patch > > > In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: > {code} > try { > FileStatus dirStatus = fs.getFileStatus(snapshotDir); > lastTimestamp = dirStatus.getModificationTime(); > hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make > hasChanges always be true > {code} > The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always > be true because {{lastTimestamp}} will be updated as: > {code} > this.lastModifiedTime = lastTimestamp; > {code} > So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-17330) SnapshotFileCache will always refresh the file cache
[ https://issues.apache.org/jira/browse/HBASE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui reassigned HBASE-17330: --- Assignee: Jianwei Cui > SnapshotFileCache will always refresh the file cache > > > Key: HBASE-17330 > URL: https://issues.apache.org/jira/browse/HBASE-17330 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0, 1.3.1, 0.98.23 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Attachments: HBASE-17330-v1.patch, HBASE-17330-v2.patch > > > In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: > {code} > try { > FileStatus dirStatus = fs.getFileStatus(snapshotDir); > lastTimestamp = dirStatus.getModificationTime(); > hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make > hasChanges always be true > {code} > The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always > be true because {{lastTimestamp}} will be updated as: > {code} > this.lastModifiedTime = lastTimestamp; > {code} > So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17330) SnapshotFileCache will always refresh the file cache
[ https://issues.apache.org/jira/browse/HBASE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-17330: Attachment: HBASE-17330-v2.patch > SnapshotFileCache will always refresh the file cache > > > Key: HBASE-17330 > URL: https://issues.apache.org/jira/browse/HBASE-17330 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0, 1.3.1, 0.98.23 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-17330-v1.patch, HBASE-17330-v2.patch > > > In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: > {code} > try { > FileStatus dirStatus = fs.getFileStatus(snapshotDir); > lastTimestamp = dirStatus.getModificationTime(); > hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make > hasChanges always be true > {code} > The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always > be true because {{lastTimestamp}} will be updated as: > {code} > this.lastModifiedTime = lastTimestamp; > {code} > So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17330) SnapshotFileCache will always refresh the file cache
[ https://issues.apache.org/jira/browse/HBASE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757119#comment-15757119 ] Jianwei Cui commented on HBASE-17330: - Thanks for the review [~tedyu]. As mentioned above, it seems not need to consider modify time of tmp directory in {{SnapshotFileCache#refreshCache}}? Then, the {{hasChange}} could be easily judged by {{fs.getFileStatus(snapshotDir).getModificationTime() > lastModifiedTime}}. > SnapshotFileCache will always refresh the file cache > > > Key: HBASE-17330 > URL: https://issues.apache.org/jira/browse/HBASE-17330 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0, 1.3.1, 0.98.23 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-17330-v1.patch > > > In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: > {code} > try { > FileStatus dirStatus = fs.getFileStatus(snapshotDir); > lastTimestamp = dirStatus.getModificationTime(); > hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make > hasChanges always be true > {code} > The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always > be true because {{lastTimestamp}} will be updated as: > {code} > this.lastModifiedTime = lastTimestamp; > {code} > So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17330) SnapshotFileCache will always refresh the file cache
[ https://issues.apache.org/jira/browse/HBASE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756766#comment-15756766 ] Jianwei Cui commented on HBASE-17330: - In {{SnapshotFileCache#refreshCache}}, the modify time of snapshot tmp directory will also be considered as: {code} // get the status of the snapshots temporary directory and check if it has changes // The top-level directory timestamp is not updated, so we have to check the inner-level. try { Path snapshotTmpDir = new Path(snapshotDir, SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME); FileStatus tempDirStatus = fs.getFileStatus(snapshotTmpDir); lastTimestamp = Math.min(lastTimestamp, tempDirStatus.getModificationTime()); hasChanges |= (lastTimestamp >= lastModifiedTime); ... } catch (FileNotFoundException e) { // Nothing todo, if the tmp dir is empty } {code} It seems the in-progress snapshots under tmp directory won't be loaded in {{SnapshotFileCache#refreshCache}} after [HBASE-12627|https://issues.apache.org/jira/browse/HBASE-12627], so do not need to consider modify time of tmp directory in {{SnapshotFileCache#refreshCache}}? > SnapshotFileCache will always refresh the file cache > > > Key: HBASE-17330 > URL: https://issues.apache.org/jira/browse/HBASE-17330 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0, 1.3.1, 0.98.23 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-17330-v1.patch > > > In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: > {code} > try { > FileStatus dirStatus = fs.getFileStatus(snapshotDir); > lastTimestamp = dirStatus.getModificationTime(); > hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make > hasChanges always be true > {code} > The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always > be true because {{lastTimestamp}} will be updated as: > {code} > this.lastModifiedTime = lastTimestamp; > {code} > So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17330) SnapshotFileCache will always refresh the file cache
[ https://issues.apache.org/jira/browse/HBASE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-17330: Attachment: HBASE-17330-v1.patch > SnapshotFileCache will always refresh the file cache > > > Key: HBASE-17330 > URL: https://issues.apache.org/jira/browse/HBASE-17330 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0, 1.3.1, 0.98.23 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-17330-v1.patch > > > In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: > {code} > try { > FileStatus dirStatus = fs.getFileStatus(snapshotDir); > lastTimestamp = dirStatus.getModificationTime(); > hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make > hasChanges always be true > {code} > The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always > be true because {{lastTimestamp}} will be updated as: > {code} > this.lastModifiedTime = lastTimestamp; > {code} > So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17330) SnapshotFileCache will always refresh the file cache
Jianwei Cui created HBASE-17330: --- Summary: SnapshotFileCache will always refresh the file cache Key: HBASE-17330 URL: https://issues.apache.org/jira/browse/HBASE-17330 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.98.23, 2.0.0, 1.3.1 Reporter: Jianwei Cui Priority: Minor In {{SnapshotFileCache#refreshCache}}, the {{hasChanges}} will be judged as: {code} try { FileStatus dirStatus = fs.getFileStatus(snapshotDir); lastTimestamp = dirStatus.getModificationTime(); hasChanges |= (lastTimestamp >= lastModifiedTime); // >= will make hasChanges always be true {code} The {{(lastTimestamp >= lastModifiedTime)}} will make {{hasChanges}} always be true because {{lastTimestamp}} will be updated as: {code} this.lastModifiedTime = lastTimestamp; {code} So, SnapshotFileCache will always refresh the file cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695572#comment-15695572 ] Jianwei Cui commented on HBASE-15616: - Yes, empty string can work well. I think all operations should have consistent responses when passing null qualifier. So we could allow null qualifier and convert null to empty string internally for all operations, or throw Exception if users pass null? It seems null qualifier is allowed for Put/Get/Scan/Append, users may have used null qualifier in these operations, so also need to allow null qualifier for checkAndMutate and increment? > CheckAndMutate will encouter NPE if qualifier to check is null > -- > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Attachments: HBASE-15616-v1.patch, HBASE-15616-v2.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) > ... 4 more > Caused by: java.lang.NullPointerException > at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) > at > com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) > at > com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) > at > org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) > at > org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) > ... 7 more > {code} > The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte > array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the > same column, because null qualifier is allowed for {{Put}}, users may be > confused if
[jira] [Commented] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15686690#comment-15686690 ] Jianwei Cui commented on HBASE-15616: - Sorry to reply late [~anoop.hbase], and thanks for your questions. bq. So we can do this way of not setting qualifier on PB when qualifier is null? Do we need pass empty string to be set? The mentioned code is defined in AccessControlUtil.java? The {{permissionBuilder}} is the builder of proto message {{TablePermission}}. The qualifier is optional in pb definition of TablePermission. When the qualifier is not setting, it has clear meaning as the permission is granted on any column of the family? On the other hand, checkAndMutate must check a specific column, so that qualifier field is required in pb definition of Condition: {code} message Condition { required bytes row = 1; required bytes family = 2; required bytes qualifier = 3; ... {code} When the qualifier is null, it's also a legal column, I think pass empty string seems more clear in this situation? bq. There are some other places in RequestConverter, we are doing this setQualifier(ByteStringer.wrap(qualifier)) See buildIncrementRequest() eg. HTable provides two ways to do increment: {code} public long incrementColumnValue(final byte [] row, final byte [] family, final byte [] qualifier, final long amount) public Result increment(final Increment increment) {code} The first method will check qualifier and throw NPE if qualifier is null before issuing request to server, so that we can't use the first method to do increment on null qualifier column. In the second method, {{Increment}} provides two ways to add a column: {code} public Increment addColumn(byte [] family, byte [] qualifier, long amount) public Increment add(Cell cell) {code} {{addColumn}} will also check qualifier is not null, however {{add(Cell cell)}} won't do such check, so we can do increment on null qualifier column as: {code} Increment incr = new Increment(Bytes.toBytes("row")); KeyValue kv = new KeyValue(Bytes.toBytes("row"), Bytes.toBytes("C"), null, HConstants.LATEST_TIMESTAMP, KeyValue.Type.Put, Bytes.toBytes(1l)); incr.add(kv); table.increment(incr); {code} Therefore, the increment methods of HTable have different behaviors when qualifier is null, which looks confused. I think the null qualifier is legal in HBase, so that should be allowed in different increment methods, and we can also pass empty string as null qualifier in buildIncrementRequest()? What do you think [~anoop.hbase]? Thanks! > CheckAndMutate will encouter NPE if qualifier to check is null > -- > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Attachments: HBASE-15616-v1.patch, HBASE-15616-v2.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at >
[jira] [Commented] (HBASE-17026) VerifyReplication log should distinguish whether good row key is result of revalidation
[ https://issues.apache.org/jira/browse/HBASE-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642957#comment-15642957 ] Jianwei Cui commented on HBASE-17026: - The patch looks good to me [~tedyu]. BTW, because the rowkey may be an unreadable binary array, do we need to use 'Bytes.toStringBinary(...)' to print the rowkey? > VerifyReplication log should distinguish whether good row key is result of > revalidation > --- > > Key: HBASE-17026 > URL: https://issues.apache.org/jira/browse/HBASE-17026 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: 17026.v1.txt > > > Inspecting app log from VerifyReplication, I saw lines in the following form: > {code} > 2016-11-03 15:28:44,877 INFO [main] > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: Good row > key: X000 X > {code} > where 'X' is the delimiter. > Without line number, it is difficult to tell whether the good row has gone > through revalidation. > This issue is to distinguish the two logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16771) VerifyReplication should increase GOODROWS counter if re-comparison passes
[ https://issues.apache.org/jira/browse/HBASE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561132#comment-15561132 ] Jianwei Cui commented on HBASE-16771: - [~tedyu], patch v3 looks good to me, thanks for the fix. > VerifyReplication should increase GOODROWS counter if re-comparison passes > -- > > Key: HBASE-16771 > URL: https://issues.apache.org/jira/browse/HBASE-16771 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.4.0 > > Attachments: 16771.v1.txt, 16771.v2.txt, 16771.v3.txt > > > HBASE-16423 added re-comparison feature to reduce false positive rate. > However, before logFailRowAndIncreaseCounter() is called, GOODROWS counter is > not incremented. Neither is GOODROWS incremented when re-comparison passes. > This may produce inconsistent results across multiple runs of the same > verifyrep command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16771) VerifyReplication should increase GOODROWS counter if re-comparison passes
[ https://issues.apache.org/jira/browse/HBASE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561049#comment-15561049 ] Jianwei Cui commented on HBASE-16771: - Patch v2 looks good to me. Btw, in Verifier#map: {code} if (rowCmpRet == 0) { // rowkey is same, need to compare the content of the row try { Result.compareResults(value, currentCompareRowInPeerTable); context.getCounter(Counters.GOODROWS).increment(1); if (verbose) { LOG.info("Good row key: " + delimiter + Bytes.toString(value.getRow()) + delimiter); } } catch (Exception e) { logFailRowAndIncreaseCounter(context, Counters.CONTENT_DIFFERENT_ROWS, value); LOG.error("Exception while comparing row : " + e); // > unnecessary to log an exception } {code} There will be an exception message when the values are different for the same rowkey. It may be a good row when doing re-check, and if not, the {{logFailRowAndIncreaseCounter}} will also log an error message for this row, so it is unnecessary to log an exception here? > VerifyReplication should increase GOODROWS counter if re-comparison passes > -- > > Key: HBASE-16771 > URL: https://issues.apache.org/jira/browse/HBASE-16771 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 16771.v1.txt, 16771.v2.txt > > > HBASE-16423 added re-comparison feature to reduce false positive rate. > However, before logFailRowAndIncreaseCounter() is called, GOODROWS counter is > not incremented. Neither is GOODROWS incremented when re-comparison passes. > This may produce inconsistent results across multiple runs of the same > verifyrep command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16762) NullPointerException is thrown when constructing sourceTable in verifyrep
[ https://issues.apache.org/jira/browse/HBASE-16762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561033#comment-15561033 ] Jianwei Cui commented on HBASE-16762: - [~tedyu], patch looks good to me, thanks for the fix. > NullPointerException is thrown when constructing sourceTable in verifyrep > - > > Key: HBASE-16762 > URL: https://issues.apache.org/jira/browse/HBASE-16762 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 16762.branch-1.txt > > > Branch-1 patch for HBASE-16423 incorrectly constructed sourceTable, leading > to the following exception: > {code} > 16/10/04 17:00:30 INFO mapreduce.Job: Task Id : > attempt_1473183665588_0082_m_16_1, Status : FAILED > Error: java.lang.NullPointerException > at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:436) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:150) > at > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.map(VerifyReplication.java:128) > > at > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier.map(VerifyReplication.java:86) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > {code} > I checked master patch where there is no such bug -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515650#comment-15515650 ] Jianwei Cui commented on HBASE-16423: - Thanks for the review [~tedyu], could you please take a look at the patch for branch-1? > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16423-branch-1-v1.patch, HBASE-16423-v1.patch, > HBASE-16423-v2.patch, HBASE-16423-v3.patch > > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-16423: Attachment: HBASE-16423-branch-1-v1.patch patch for branch-1. > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16423-branch-1-v1.patch, HBASE-16423-v1.patch, > HBASE-16423-v2.patch, HBASE-16423-v3.patch > > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-16423: Attachment: HBASE-16423-v3.patch trunk patch v3 to fix whitespace. > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16423-v1.patch, HBASE-16423-v2.patch, > HBASE-16423-v3.patch > > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513317#comment-15513317 ] Jianwei Cui commented on HBASE-16423: - Thanks for the review [~tedyu], upload patch v2. > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-16423-v1.patch, HBASE-16423-v2.patch > > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-16423: Attachment: HBASE-16423-v2.patch > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-16423-v1.patch, HBASE-16423-v2.patch > > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-16423: Attachment: HBASE-16423-v1.patch > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-16423-v1.patch > > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423733#comment-15423733 ] Jianwei Cui commented on HBASE-16423: - [~churromorales], there are cases will change the versions between [startTime, endTime) during VerifyReplication scanning. For example, if there are new versions being written to source cluster, the total versions may exceed the family max version so that compaction will cause the versions between [startTime, endTime) deleted, the compaction may happen at different time in source and peer clusters, making VerifyReplication may report inconsistent rows. > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
Jianwei Cui created HBASE-16423: --- Summary: Add re-compare option to VerifyReplication to avoid occasional inconsistent rows Key: HBASE-16423 URL: https://issues.apache.org/jira/browse/HBASE-16423 Project: HBase Issue Type: Improvement Components: Replication Affects Versions: 2.0.0 Reporter: Jianwei Cui Priority: Minor Because replication keeps eventually consistency, VerifyReplication may report inconsistent rows if there are data being written to source or peer clusters during scanning. These occasionally inconsistent rows will have the same data if we do the comparison again after a short period. It is not easy to find the really inconsistent rows if VerifyReplication report a large number of such occasionally inconsistency. To avoid this case, we can add an option to make VerifyReplication read out the inconsistent rows again after sleeping a few seconds and re-compare the rows during scanning. This behavior follows the eventually consistency of hbase's replication. Suggestions and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15626) RetriesExhaustedWithDetailsException#getDesc won't return the full message
[ https://issues.apache.org/jira/browse/HBASE-15626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259345#comment-15259345 ] Jianwei Cui commented on HBASE-15626: - Same problem as [HBASE-15710|https://issues.apache.org/jira/browse/HBASE-15710]. > RetriesExhaustedWithDetailsException#getDesc won't return the full message > -- > > Key: HBASE-15626 > URL: https://issues.apache.org/jira/browse/HBASE-15626 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-15626-v1.patch > > > The RetriesExhaustedWithDetailsException#getDesc will include server > addresses as: > {code} > public static String getDesc(List exceptions, >List actions, >List hostnamePort) { > String s = getDesc(classifyExs(exceptions)); > StringBuilder addrs = new StringBuilder(s); > addrs.append("servers with issues: "); > Set uniqAddr = new HashSet(); > uniqAddr.addAll(hostnamePort); > for(String addr : uniqAddr) { > addrs.append(addr).append(", "); > } > return s; // ==> should be addrs.toString() > } > {code} > However, the returned value is {{s}}, only includes the exceptions. To > include the server addresses, the returned value should be > {{addrs.toString()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15626) RetriesExhaustedWithDetailsException#getDesc won't return the full message
[ https://issues.apache.org/jira/browse/HBASE-15626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15626: Attachment: HBASE-15626-v1.patch > RetriesExhaustedWithDetailsException#getDesc won't return the full message > -- > > Key: HBASE-15626 > URL: https://issues.apache.org/jira/browse/HBASE-15626 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-15626-v1.patch > > > The RetriesExhaustedWithDetailsException#getDesc will include server > addresses as: > {code} > public static String getDesc(List exceptions, >List actions, >List hostnamePort) { > String s = getDesc(classifyExs(exceptions)); > StringBuilder addrs = new StringBuilder(s); > addrs.append("servers with issues: "); > Set uniqAddr = new HashSet(); > uniqAddr.addAll(hostnamePort); > for(String addr : uniqAddr) { > addrs.append(addr).append(", "); > } > return s; // ==> should be addrs.toString() > } > {code} > However, the returned value is {{s}}, only includes the exceptions. To > include the server addresses, the returned value should be > {{addrs.toString()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15626) RetriesExhaustedWithDetailsException#getDesc won't return the full message
Jianwei Cui created HBASE-15626: --- Summary: RetriesExhaustedWithDetailsException#getDesc won't return the full message Key: HBASE-15626 URL: https://issues.apache.org/jira/browse/HBASE-15626 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0 Reporter: Jianwei Cui Priority: Minor The RetriesExhaustedWithDetailsException#getDesc will include server addresses as: {code} public static String getDesc(List exceptions, List actions, List hostnamePort) { String s = getDesc(classifyExs(exceptions)); StringBuilder addrs = new StringBuilder(s); addrs.append("servers with issues: "); Set uniqAddr = new HashSet(); uniqAddr.addAll(hostnamePort); for(String addr : uniqAddr) { addrs.append(addr).append(", "); } return s; // ==> should be addrs.toString() } {code} However, the returned value is {{s}}, only includes the exceptions. To include the server addresses, the returned value should be {{addrs.toString()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15616: Attachment: HBASE-15616-v2.patch Add unit test for null qualifier > CheckAndMutate will encouter NPE if qualifier to check is null > -- > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Attachments: HBASE-15616-v1.patch, HBASE-15616-v2.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) > ... 4 more > Caused by: java.lang.NullPointerException > at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) > at > com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) > at > com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) > at > org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) > at > org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) > ... 7 more > {code} > The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte > array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the > same column, because null qualifier is allowed for {{Put}}, users may be > confused if null qualifier is not allowed for {{checkAndMutate}}. We can also > convert null qualifier to empty byte array for {{checkAndMutate}} in client > side. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15497) Incorrect javadoc for atomicity guarantee of Increment and Append
[ https://issues.apache.org/jira/browse/HBASE-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233922#comment-15233922 ] Jianwei Cui commented on HBASE-15497: - Can anyone help to review the patch? Thanks:) > Incorrect javadoc for atomicity guarantee of Increment and Append > - > > Key: HBASE-15497 > URL: https://issues.apache.org/jira/browse/HBASE-15497 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-15497-v1.patch > > > At the front of {{Increment.java}} file, there is comment about read > atomicity: > {code} > * This operation does not appear atomic to readers. Increments are done > * under a single row lock, so write operations to a row are synchronized, but > * readers do not take row locks so get and scan operations can see this > * operation partially completed. > {code} > It seems this comment is not true after MVCC integrated > [HBASE-4583|https://issues.apache.org/jira/browse/HBASE-4583]. Currently, the > readers can be guaranteed to read the whole result of Increment if I am not > wrong. Similar comments also exist in {{Append.java}}, {{Table#append(...)}} > and {{Table#increment(...)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15497) Incorrect javadoc for atomicity guarantee of Increment and Append
[ https://issues.apache.org/jira/browse/HBASE-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15497: Attachment: HBASE-15497-v1.patch > Incorrect javadoc for atomicity guarantee of Increment and Append > - > > Key: HBASE-15497 > URL: https://issues.apache.org/jira/browse/HBASE-15497 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-15497-v1.patch > > > At the front of {{Increment.java}} file, there is comment about read > atomicity: > {code} > * This operation does not appear atomic to readers. Increments are done > * under a single row lock, so write operations to a row are synchronized, but > * readers do not take row locks so get and scan operations can see this > * operation partially completed. > {code} > It seems this comment is not true after MVCC integrated > [HBASE-4583|https://issues.apache.org/jira/browse/HBASE-4583]. Currently, the > readers can be guaranteed to read the whole result of Increment if I am not > wrong. Similar comments also exist in {{Append.java}}, {{Table#append(...)}} > and {{Table#increment(...)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233882#comment-15233882 ] Jianwei Cui commented on HBASE-15616: - Thanks for the review [~stack]. The patch could also be applied to branch-1. > CheckAndMutate will encouter NPE if qualifier to check is null > -- > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Attachments: HBASE-15616-v1.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) > ... 4 more > Caused by: java.lang.NullPointerException > at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) > at > com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) > at > com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) > at > org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) > at > org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) > ... 7 more > {code} > The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte > array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the > same column, because null qualifier is allowed for {{Put}}, users may be > confused if null qualifier is not allowed for {{checkAndMutate}}. We can also > convert null qualifier to empty byte array for {{checkAndMutate}} in client > side. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15616: Status: Patch Available (was: In Progress) > CheckAndMutate will encouter NPE if qualifier to check is null > -- > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Attachments: HBASE-15616-v1.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) > ... 4 more > Caused by: java.lang.NullPointerException > at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) > at > com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) > at > com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) > at > org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) > at > org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) > ... 7 more > {code} > The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte > array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the > same column, because null qualifier is allowed for {{Put}}, users may be > confused if null qualifier is not allowed for {{checkAndMutate}}. We can also > convert null qualifier to empty byte array for {{checkAndMutate}} in client > side. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-15616 started by Jianwei Cui. --- > CheckAndMutate will encouter NPE if qualifier to check is null > -- > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Attachments: HBASE-15616-v1.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) > ... 4 more > Caused by: java.lang.NullPointerException > at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) > at > com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) > at > com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) > at > org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) > at > org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) > ... 7 more > {code} > The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte > array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the > same column, because null qualifier is allowed for {{Put}}, users may be > confused if null qualifier is not allowed for {{checkAndMutate}}. We can also > convert null qualifier to empty byte array for {{checkAndMutate}} in client > side. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui reassigned HBASE-15616: --- Assignee: Jianwei Cui > CheckAndMutate will encouter NPE if qualifier to check is null > -- > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Attachments: HBASE-15616-v1.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) > ... 4 more > Caused by: java.lang.NullPointerException > at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) > at > com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) > at > com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) > at > org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) > at > org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) > ... 7 more > {code} > The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte > array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the > same column, because null qualifier is allowed for {{Put}}, users may be > confused if null qualifier is not allowed for {{checkAndMutate}}. We can also > convert null qualifier to empty byte array for {{checkAndMutate}} in client > side. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15616: Attachment: HBASE-15616-v1.patch > CheckAndMutate will encouter NPE if qualifier to check is null > -- > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15616-v1.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) > ... 4 more > Caused by: java.lang.NullPointerException > at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) > at > com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) > at > com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) > at > org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) > at > org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) > ... 7 more > {code} > The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte > array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the > same column, because null qualifier is allowed for {{Put}}, users may be > confused if null qualifier is not allowed for {{checkAndMutate}}. We can also > convert null qualifier to empty byte array for {{checkAndMutate}} in client > side. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15616) CheckAndMutate will encouter NPE if qualifier to check is null
Jianwei Cui created HBASE-15616: --- Summary: CheckAndMutate will encouter NPE if qualifier to check is null Key: HBASE-15616 URL: https://issues.apache.org/jira/browse/HBASE-15616 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0 Reporter: Jianwei Cui If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete will encounter NPE. The test code: {code} table.checkAndPut(row, family, null, Bytes.toBytes(0), new Put(row).addColumn(family, null, Bytes.toBytes(1))); {code} The exception: {code} Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=3, exceptions: Fri Apr 08 15:51:31 CST 2016, RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, java.io.IOException: com.google.protobuf.ServiceException: java.lang.NullPointerException Fri Apr 08 15:51:31 CST 2016, RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, java.io.IOException: com.google.protobuf.ServiceException: java.lang.NullPointerException Fri Apr 08 15:51:32 CST 2016, RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, java.io.IOException: com.google.protobuf.ServiceException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) at ... Caused by: java.io.IOException: com.google.protobuf.ServiceException: java.lang.NullPointerException at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) ... 2 more Caused by: com.google.protobuf.ServiceException: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) ... 4 more Caused by: java.lang.NullPointerException at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) at com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) at com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) at org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) at org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) at org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) at org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) ... 7 more {code} The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the same column, because null qualifier is allowed for {{Put}}, users may be confused if null qualifier is not allowed for {{checkAndMutate}}. We can also convert null qualifier to empty byte array for {{checkAndMutate}} in client side. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15588) Use nonce for checkAndMutate operation
Jianwei Cui created HBASE-15588: --- Summary: Use nonce for checkAndMutate operation Key: HBASE-15588 URL: https://issues.apache.org/jira/browse/HBASE-15588 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0 Reporter: Jianwei Cui Like {{increment}}/{{append}}, the {{checkAndPut}}/{{checkAndDelete}} operation is non-idempotent, so that the client may get incorrect result if there are retries, and such incorrect result may lead the application enter an error state. A possible solution is using nonce for checkAndMutate operations, discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15327) Canary will always invoke admin.balancer() in each sniffing period when writeSniffing is enabled
[ https://issues.apache.org/jira/browse/HBASE-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15327: Attachment: HBASE-15327-branch-1-v1.patch HBASE-15327-v1.patch Make patches for trunk and branch-1. Thanks for the review [~stack] and [~yuzhih...@gmail.com] :) > Canary will always invoke admin.balancer() in each sniffing period when > writeSniffing is enabled > > > Key: HBASE-15327 > URL: https://issues.apache.org/jira/browse/HBASE-15327 > Project: HBase > Issue Type: Bug > Components: canary >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-15327-branch-1-v1.patch, HBASE-15327-trunk.patch, > HBASE-15327-trunk.patch, HBASE-15327-v1.patch > > > When Canary#writeSniffing is enabled, Canary#checkWriteTableDistribution will > make sure the regions of write table distributed on all region servers as: > {code} > int numberOfServers = admin.getClusterStatus().getServers().size(); > .. > int numberOfCoveredServers = serverSet.size(); > if (numberOfCoveredServers < numberOfServers) { > admin.balancer(); > } > {code} > The master will also work as a regionserver, so that ClusterStatus#getServers > will contain the master. On the other hand, write table of Canary will not be > assigned to master, making numberOfCoveredServers always smaller than > numberOfServers and admin.balancer always be invoked in each sniffing period. > This may cause frequent region moves. A simple fix is excluding master from > numberOfServers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205607#comment-15205607 ] Jianwei Cui commented on HBASE-15433: - Thanks for the review, Ted and Ashish. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2 > > Attachments: HBASE-15433-branch-1-v1.patch, > HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch, HBASE-15433-v4.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15497) Incorrect javadoc for atomicity guarantee of Increment and Append
Jianwei Cui created HBASE-15497: --- Summary: Incorrect javadoc for atomicity guarantee of Increment and Append Key: HBASE-15497 URL: https://issues.apache.org/jira/browse/HBASE-15497 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 2.0.0 Reporter: Jianwei Cui Priority: Minor At the front of {{Increment.java}} file, there is comment about read atomicity: {code} * This operation does not appear atomic to readers. Increments are done * under a single row lock, so write operations to a row are synchronized, but * readers do not take row locks so get and scan operations can see this * operation partially completed. {code} It seems this comment is not true after MVCC integrated [HBASE-4583|https://issues.apache.org/jira/browse/HBASE-4583]. Currently, the readers can be guaranteed to read the whole result of Increment if I am not wrong. Similar comments also exist in {{Append.java}}, {{Table#append(...)}} and {{Table#increment(...)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15469) Take snapshot by family
[ https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203878#comment-15203878 ] Jianwei Cui commented on HBASE-15469: - For our case, the goal is to copy existed data for given families and clone the snapshot, so that creating a new table with only the subset families is a better choice. For the restore case, the goal is to rollback the table to some history state? the snapshot with only a subset of families may not represent any history state of the table, so that should not be used for the restore purpose. {quote} we may block the restore of snapshots with only a subset of families. and that will solve the strange situation of restore. and when we clone we just create a new table with only the subset. In theory this is more clear for the end user. {quote} Agreed with your analysis [~mbertozzi], and also expect other opinions and cases. Thanks! > Take snapshot by family > --- > > Key: HBASE-15469 > URL: https://issues.apache.org/jira/browse/HBASE-15469 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15469-v1.patch, HBASE-15469-v2.patch > > > In our production environment, there are some 'wide' tables in offline > cluster. The 'wide' table has a number of families, different applications > will access different families of the table through MapReduce. When some > application starting to provide online service, we need to copy needed > families from offline cluster to online cluster. For future write, the > inter-cluster replication supports setting families for table, we can use it > to copy future edits for needed families. For existed data, we can take > snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to > copy snapshot to online cluster and clone the snapshot. However, we can only > take snapshot for the whole table in which many families are not needed for > the application, this will lead unnecessary data copy. I think it is useful > to support taking snapshot by family, so that we can only copy needed data. > Possible solution to support such function: > 1. Add family names field to the protobuf definition of > {{SnapshotDescription}} > 2. Allow to set families when taking snapshot in hbase shell, such as: > {code} >snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => > true} > {code} > 3. Add family names to {{SnapshotDescription}} in client side > 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, > keep only requested families when taking snapshot for region. > Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203848#comment-15203848 ] Jianwei Cui commented on HBASE-15433: - HBASE-15433-branch-1-v1.patch could also be applied to branch-1.1/1.2/1.3. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2 > > Attachments: HBASE-15433-branch-1-v1.patch, > HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch, HBASE-15433-v4.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203833#comment-15203833 ] Jianwei Cui commented on HBASE-15433: - Sorry to reply late and thanks for your review [~yuzhih...@gmail.com], have attached patch for branch-1 and will wait for the tests result. Thanks. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2 > > Attachments: HBASE-15433-branch-1-v1.patch, > HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch, HBASE-15433-v4.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15433: Attachment: HBASE-15433-branch-1-v1.patch > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2 > > Attachments: HBASE-15433-branch-1-v1.patch, > HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch, HBASE-15433-v4.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15469) Take snapshot by family
[ https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15469: Attachment: HBASE-15469-v1.patch > Take snapshot by family > --- > > Key: HBASE-15469 > URL: https://issues.apache.org/jira/browse/HBASE-15469 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15469-v1.patch > > > In our production environment, there are some 'wide' tables in offline > cluster. The 'wide' table has a number of families, different applications > will access different families of the table through MapReduce. When some > application starting to provide online service, we need to copy needed > families from offline cluster to online cluster. For future write, the > inter-cluster replication supports setting families for table, we can use it > to copy future edits for needed families. For existed data, we can take > snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to > copy snapshot to online cluster and clone the snapshot. However, we can only > take snapshot for the whole table in which many families are not needed for > the application, this will lead unnecessary data copy. I think it is useful > to support taking snapshot by family, so that we can only copy needed data. > Possible solution to support such function: > 1. Add family names field to the protobuf definition of > {{SnapshotDescription}} > 2. Allow to set families when taking snapshot in hbase shell, such as: > {code} >snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => > true} > {code} > 3. Add family names to {{SnapshotDescription}} in client side > 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, > keep only requested families when taking snapshot for region. > Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15469) Take snapshot by family
[ https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200847#comment-15200847 ] Jianwei Cui commented on HBASE-15469: - Good question! Yes, the current path will create all families when cloning or restoring. This could be optional for user. For most cases, it is more reasonable to only retain the requested families when taking snapshot? Users can add other needed families after cloning or restoring. What do you think? [~mbertozzi]. Thanks. > Take snapshot by family > --- > > Key: HBASE-15469 > URL: https://issues.apache.org/jira/browse/HBASE-15469 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15469-v1.patch > > > In our production environment, there are some 'wide' tables in offline > cluster. The 'wide' table has a number of families, different applications > will access different families of the table through MapReduce. When some > application starting to provide online service, we need to copy needed > families from offline cluster to online cluster. For future write, the > inter-cluster replication supports setting families for table, we can use it > to copy future edits for needed families. For existed data, we can take > snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to > copy snapshot to online cluster and clone the snapshot. However, we can only > take snapshot for the whole table in which many families are not needed for > the application, this will lead unnecessary data copy. I think it is useful > to support taking snapshot by family, so that we can only copy needed data. > Possible solution to support such function: > 1. Add family names field to the protobuf definition of > {{SnapshotDescription}} > 2. Allow to set families when taking snapshot in hbase shell, such as: > {code} >snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => > true} > {code} > 3. Add family names to {{SnapshotDescription}} in client side > 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, > keep only requested families when taking snapshot for region. > Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15469) Take snapshot by family
[ https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199512#comment-15199512 ] Jianwei Cui commented on HBASE-15469: - Upload the patch. In hbase shell, we scan specify families when taking snapshot as: {code} hbase(main):004:0> snapshot 'test_table', 'test-snapshot', 'f1' 0 row(s) in 0.3830 seconds {code} And {{list_snapshots}} will show the table and families of the snapshot: {code} hbase(main):001:0> list_snapshots SNAPSHOT TABLE/CFs + CREATION TIME test-snapshottest_table/f1 (Thu Mar 17 20:54:22 +0800 2016) 1 row(s) in 0.2890 seconds {code} This snapshot could be operated by other operations, such as {{clone_snapshot}}, {{restore_snapshot}}, etc. > Take snapshot by family > --- > > Key: HBASE-15469 > URL: https://issues.apache.org/jira/browse/HBASE-15469 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15469-v1.patch > > > In our production environment, there are some 'wide' tables in offline > cluster. The 'wide' table has a number of families, different applications > will access different families of the table through MapReduce. When some > application starting to provide online service, we need to copy needed > families from offline cluster to online cluster. For future write, the > inter-cluster replication supports setting families for table, we can use it > to copy future edits for needed families. For existed data, we can take > snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to > copy snapshot to online cluster and clone the snapshot. However, we can only > take snapshot for the whole table in which many families are not needed for > the application, this will lead unnecessary data copy. I think it is useful > to support taking snapshot by family, so that we can only copy needed data. > Possible solution to support such function: > 1. Add family names field to the protobuf definition of > {{SnapshotDescription}} > 2. Allow to set families when taking snapshot in hbase shell, such as: > {code} >snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => > true} > {code} > 3. Add family names to {{SnapshotDescription}} in client side > 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, > keep only requested families when taking snapshot for region. > Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15469) Take snapshot by family
[ https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201278#comment-15201278 ] Jianwei Cui commented on HBASE-15469: - Upload v2 to remove unrelated changes in hbase-site.xml and create RB. > Take snapshot by family > --- > > Key: HBASE-15469 > URL: https://issues.apache.org/jira/browse/HBASE-15469 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15469-v1.patch, HBASE-15469-v2.patch > > > In our production environment, there are some 'wide' tables in offline > cluster. The 'wide' table has a number of families, different applications > will access different families of the table through MapReduce. When some > application starting to provide online service, we need to copy needed > families from offline cluster to online cluster. For future write, the > inter-cluster replication supports setting families for table, we can use it > to copy future edits for needed families. For existed data, we can take > snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to > copy snapshot to online cluster and clone the snapshot. However, we can only > take snapshot for the whole table in which many families are not needed for > the application, this will lead unnecessary data copy. I think it is useful > to support taking snapshot by family, so that we can only copy needed data. > Possible solution to support such function: > 1. Add family names field to the protobuf definition of > {{SnapshotDescription}} > 2. Allow to set families when taking snapshot in hbase shell, such as: > {code} >snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => > true} > {code} > 3. Add family names to {{SnapshotDescription}} in client side > 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, > keep only requested families when taking snapshot for region. > Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15469) Take snapshot by family
[ https://issues.apache.org/jira/browse/HBASE-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15469: Attachment: HBASE-15469-v2.patch > Take snapshot by family > --- > > Key: HBASE-15469 > URL: https://issues.apache.org/jira/browse/HBASE-15469 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15469-v1.patch, HBASE-15469-v2.patch > > > In our production environment, there are some 'wide' tables in offline > cluster. The 'wide' table has a number of families, different applications > will access different families of the table through MapReduce. When some > application starting to provide online service, we need to copy needed > families from offline cluster to online cluster. For future write, the > inter-cluster replication supports setting families for table, we can use it > to copy future edits for needed families. For existed data, we can take > snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to > copy snapshot to online cluster and clone the snapshot. However, we can only > take snapshot for the whole table in which many families are not needed for > the application, this will lead unnecessary data copy. I think it is useful > to support taking snapshot by family, so that we can only copy needed data. > Possible solution to support such function: > 1. Add family names field to the protobuf definition of > {{SnapshotDescription}} > 2. Allow to set families when taking snapshot in hbase shell, such as: > {code} >snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => > true} > {code} > 3. Add family names to {{SnapshotDescription}} in client side > 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, > keep only requested families when taking snapshot for region. > Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15469) Take snapshot by family
Jianwei Cui created HBASE-15469: --- Summary: Take snapshot by family Key: HBASE-15469 URL: https://issues.apache.org/jira/browse/HBASE-15469 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 2.0.0 Reporter: Jianwei Cui In our production environment, there are some 'wide' tables in offline cluster. The 'wide' table has a number of families, different applications will access different families of the table through MapReduce. When some application starting to provide online service, we need to copy needed families from offline cluster to online cluster. For future write, the inter-cluster replication supports setting families for table, we can use it to copy future edits for needed families. For existed data, we can take snapshot of the table on offline cluster, then exploit {{ExportSnapshot}} to copy snapshot to online cluster and clone the snapshot. However, we can only take snapshot for the whole table in which many families are not needed for the application, this will lead unnecessary data copy. I think it is useful to support taking snapshot by family, so that we can only copy needed data. Possible solution to support such function: 1. Add family names field to the protobuf definition of {{SnapshotDescription}} 2. Allow to set families when taking snapshot in hbase shell, such as: {code} snapshot 'tableName', 'snapshotName', 'FamilyA', 'FamilyB', {SKIP_FLUSH => true} {code} 3. Add family names to {{SnapshotDescription}} in client side 4. Read family names from {{SnapshotDescription}} in Master/Regionserver, keep only requested families when taking snapshot for region. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201155#comment-15201155 ] Jianwei Cui commented on HBASE-15433: - Run the failed tests locally and passed, seems the test fail is unrelated to this patch. Could you please take a look at the patch v4? [~yuzhih...@gmail.com] [~mbertozzi]. Thanks. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.3.0, 1.2.1, 1.4.0, 1.1.5 > > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch, HBASE-15433-v4.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15433: Attachment: HBASE-15433-v4.patch fix checkstyle and whitespace > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Fix For: 2.0.0, 1.3.0, 1.2.1, 1.4.0, 1.1.5 > > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch, HBASE-15433-v4.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui reassigned HBASE-15433: --- Assignee: Jianwei Cui > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.3.0, 1.2.1, 1.4.0, 1.1.5 > > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch, HBASE-15433-v4.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195033#comment-15195033 ] Jianwei Cui commented on HBASE-15433: - {quote} Instead of getting table region count from quota cache we can get it from RegionLocator which will solve the corner case you described. {quote} This may make other corner cases fail if I am not wrong. For example, the table has 5 regions, clientA is trying to restore the table to snapshot with 8 regions, while clientB is trying to restore the snapshot with 10 regions, then: 1. clientA firstly invokes {{checkAndUpdateNamespaceRegionQuota}} before {{restoreSnapshot}}, the {{tableRegionCount}} is 5 for clientA and it updates the region count of the table to 8 2. Before clientA invokes {{restoreSnapshot}}, clientB invokes {{checkAndUpdateNamespaceRegionQuota}} before {{restoreSnapshot}}, the {{tableRegionCount}} is also 5(when using RegionLocator) for clientB and it updates the region count of the table to 10 3. clientA successfully restored its snapshot, so that the actual region count is 8 4. clientB encountered IOE in {{restoreSnapshot}} and will reset the region count to 5 in IOE catch clause. However, the region count should be 8 because clientA succeeded. I think it is not easy to resolve the concurrent issues in {{SnapshotManager}} without lock, we may wait for RestoreSnapshotHandler rewritten by procedure v2 and move quota updating in RestoreSnapshotHandler? > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Fix For: 2.0.0, 1.3.0, 1.2.1, 1.4.0, 1.1.5 > > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194966#comment-15194966 ] Jianwei Cui commented on HBASE-15433: - Upload patch v3 according to comments. There has been a unit test named as TestNamespaceAuditor#testRestoreSnapshotQuotaExceed, the new patch checks the exception cause type and region count in this unit test. {{TestNamespaceAuditor}} and {{TestRestoreFlushSnapshotFromClient}} passed locally. [~ashish singhi], could you please have a look at v3? Thanks. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15433: Attachment: HBASE-15433-v3.patch > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch, HBASE-15433-v3.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194845#comment-15194845 ] Jianwei Cui commented on HBASE-15433: - Reasonable for this case IMO. It seems there are other issues when concurrently restoring snapshots for the same table. We need to keep the {{checkAndUpdateNamespaceRegionQuota}} the same order before and after {{restoreSnapshot}} among concurrent restore requests. For example, beofre {{restoreSnapshot}}, if clientA invoked {{checkAndUpdateNamespaceRegionQuota}} ahead of clientB, then after {{restoreSnapshot}}, we need to make sure clientA also invoked {{checkAndUpdateNamespaceRegionQuota}} ahead of clientB? In the document of [HBASE-12439|https://issues.apache.org/jira/browse/HBASE-12439], it seems the CloneSnapshotHandler/RestoreSnapshotHandler will be rewritten by procedure v2? After that, we can keep the quota updating sync with CloneSnapshot/RestoreSnapshot steps and rollbacks. Currently, without steps and rollbacks, RestoreSnapshotHandler may not update the quota information correctly. Therefore, I think we can still keep the quota updating in {{SnapshotManager}} before procedure v2 rewritten? For concurrent request issues, we can add some comments in the code to explain the problem? > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194691#comment-15194691 ] Jianwei Cui commented on HBASE-15433: - Thanks for your comment. {quote} Not required I think, because we are having enough quota for this table in the cache before restoring the snapshot and after restoring snapshot we are only decrementing it, so it will work. {quote} There may be corner case if I am not wrong. For example, if the table has 5 regions, clientA is trying to restore the table to snapshot with 8 regions, while clientB is trying to restore the snapshot with 10 regions, then: 1. clientA firstly invokes {{checkAndUpdateNamespaceRegionQuota}} before {{restoreSnapshot}}, the {{tableRegionCount}} is 5 for clientA and it updates the region count of the table to 8 2. Before clientA invokes {{restoreSnapshot}}, clientB invokes {{checkAndUpdateNamespaceRegionQuota}} before {{restoreSnapshot}}, the {{tableRegionCount}} is 8 for clientB and it updates the region count of the table to 10 3. Both clientA and clientB encountered IOE in {{restoreSnapshot}}, and the two clients are trying to reset the region count in IOE catch clause 4. clientA firstly reset the region count to 5, and then clientB reset the region count to 8, so the final region count for the table is 8 in such case, but it should be 5 because both operations failed. It seems not easy to update the quota information correctly without lock if there are concurrent restoreSnapshot requests IMO. Maybe, it is more easy to do such work in {{RestoreSnapshotHandler}} with table lock held(like {{CreateTableProcedure}})? 1. In {{RestoreSnapshotHandler}}, overwrite {{prepareWithTableLock}} method with {{checkAndUpdateNamespaceRegionQuota}} if {{snapshotRegionCount}} is larger than {{tableRegionCount}}. If {{checkAndUpdateNamespaceRegionQuota}} fails here, we do not need to reset the region count and {{SnapshotManager}} will throw exception directly. 2. In {{RestoreSnapshotHandler#completed}}, if exception received and {{tableRegionCount < SnapshotRegionCount}}, we reset region count to {{tableRegionCount}}, if no exception received and {{tableRegionCount > snapshotRegionCount}}, we set region count to {{snapshotRegionCount}}. What's your opinion about this issue? [~ashish singhi] {quote} We can also include quota has exceeded in the error message {quote} Yes, will polish the message and update the patch. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193148#comment-15193148 ] Jianwei Cui commented on HBASE-15433: - The table must be disabled during restoreSnapshot, so that the {{tableRegionCount}} won't change. Assume there won't be concurrent restoreSnapshot requests for the same table, the {{checkAndUpdateNamespaceRegionQuota}} after {{restoreSnapshot}} will be executed only when {{tableRegionCount > snapshotRegionCount}} satisfied, this means we have preserved enough region count for the {{checkAndUpdateNamespaceRegionQuota}} from the namespace quota. Therefore, other thread operations won't make the {{checkAndUpdateNamespaceRegionQuota}} fail if they operating on different tables? However, if there are concurrent restoreSnapshot requests for the same table, it will cause problem, and we may need lock to make sure the quota information is updated correctly, or we can move the quota check and update logic in the {{RestoreSnapshotHandler}} after table lock is held? > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193038#comment-15193038 ] Jianwei Cui commented on HBASE-15433: - {quote} When QEE is thrown we will still end up in updating the region quota which is not really required, may be we can avoid that. {quote} Yes, we should catch QEE firstly and not update the quota information in such situation as you suggested above. {quote} Also suggest to rename currentRegionCount to tableRegionCount and updatedRegionCount to snapshotRegionCount for better understanding. Please add more comments like why are we doing this way. {quote} Good suggestions, will update the patch. {quote} If this throws exception then there will be another issue, because now the snapshot has been successfully restored but in the catch clause we are updating the table region count in namespace quota. {quote} Good find. Here, the {{checkAndUpdateNamespaceRegionQuota}} should succeed because it will reduce the region count for the table? However, if the {{checkAndUpdateNamespaceRegionQuota}} throws exception, there must be some unexpected reasons, and call {{checkAndUpdateNamespaceRegionQuota}} in catch clause may also fail. We can log an error message in QEE catch clause and throw it directly? And the code here can be updated as: {code} int tableRegionCount = -1; try { // Table already exist. Check and update the region quota for this table namespace // Table is disabled, table region count won't change during restoreSnapshot tableRegionCount = getRegionCountOfTable(tableName); int snapshotRegionCount = manifest.getRegionManifestsMap().size(); // Update region count before restoreSnapshot if snapshotRegionCount is larger. If we // updated the region count to a smaller value before retoreSnapshot and the retoreSnapshot // fails, we may fail to reset the region count to its original value if the namespace // region count quota is consumed by other tables during the restoreSnapshot, such as // region split or table create under the same namespace. if (tableRegionCount > 0 && tableRegionCount < snapshotRegionCount) { checkAndUpdateNamespaceRegionQuota(snapshotRegionCount, tableName); } restoreSnapshot(snapshot, snapshotTableDesc); // Update the region count after restoreSnapshot succeeded if snapshotRegionCount is // smaller. This step should not fail because it will reduce the region count for table if (tableRegionCount > 0 && tableRegionCount > snapshotRegionCount) { checkAndUpdateNamespaceRegionQuota(snapshotRegionCount, tableName); } } catch (QuotaExceededException e) { LOG.error("Exception occurred while restoring the snapshot " + snapshot.getName() + " as table " + tableName.getNameAsString(), e); // If QEE is thrown before restoreSnapshot, quota information is not updated, and we // should throw the exception directly. If QEE is thrown after restoreSnapshot, there // must be unexpected reasons, we also throw the exception directly throw e; } catch (IOException e) { if (tableRegionCount > 0) { // reset region count for table checkAndUpdateNamespaceRegionQuota(tableRegionCount, tableName); } LOG.error("Exception occurred while restoring the snapshot " + snapshot.getName() + " as table " + tableName.getNameAsString(), e); throw e; } {code} What's your opinion about this issue? [~ashish singhi] > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192690#comment-15192690 ] Jianwei Cui commented on HBASE-15433: - Thanks for your comment and sorry to reply late [~ashish singhi]. {quote} So in catch clause first let us catch QEE and then IOE. If QEE is caught then we will not update the quota information. {quote} Yes, we don't need to update the quota information if QEE is caught. However, if IOE is caught, this means {{checkAndUpdateNamespaceRegionQuota}} succeeded while the following {{restoreSnapshot(SnapshotDescription, HTableDescriptor)}} failed, and the quota information has been updated by the region count in the snapshot. For example, the original region count is 10 for the table, and there are 5 regions in the snapshot, the region count will be updated to 5 in such case? and we need to reset the region count to 10 for the table in {{catch}}? > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190901#comment-15190901 ] Jianwei Cui commented on HBASE-15433: - I get your point, Yes, it will be more concise to remove the 'else' keyword:) > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190895#comment-15190895 ] Jianwei Cui commented on HBASE-15433: - Thanks for your comment, will make a new unit test and update the patch. {quote} I was thinking of a simple fix like, just catch the QuoteExceededException and don't remove the table from namespace quota. {quote} If the snapshot contains less regions than the current table's, 'checkAndUpdateNamespaceRegionQuota' will update the region count for the table, we need to reset the region count in 'catch' block if 'restoreSnapshot' throw exception? > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15433: Attachment: HBASE-15433-trunk-v2.patch add unit test for this case > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk-v2.patch, > HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188919#comment-15188919 ] Jianwei Cui commented on HBASE-15433: - Thanks for your comments [~yuzhih...@gmail.com]. The 'else' block will throw exception when the NamespaceStateManager is not initialized, this will make sure the NamespaceAuditor in right state when the method is invoked? Will add unit test for this case. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15433: Attachment: HBASE-15433-trunk-v1.patch This patch could be applied to 2.0.0, 1.4.0, 1.3.0, 1.2.0 and 1.1.4. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk-v1.patch, HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187210#comment-15187210 ] Jianwei Cui commented on HBASE-15433: - If the snapshot contains less regions than the current table's, 'checkAndUpdateNamespaceRegionQuota' will reduce the region count of the table. The remaining region quota may be consumed by others(such as table creation, region split, etc) during the restore procedure, therefore, we can not reset the region count for the table in 'catch' block if the restore procedure fails. Will update another patch to fix this case. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187064#comment-15187064 ] Jianwei Cui commented on HBASE-15433: - I tried the patch on trunk and will try it on other branches. > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15433: Attachment: HBASE-15433-trunk.patch > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Attachments: HBASE-15433-trunk.patch > > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
[ https://issues.apache.org/jira/browse/HBASE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187024#comment-15187024 ] Jianwei Cui commented on HBASE-15433: - Thanks for your comment [~ashish singhi], I made a patch to fix this case and will upload after test passed:) > SnapshotManager#restoreSnapshot not update table and region count quota > correctly when encountering exception > - > > Key: HBASE-15433 > URL: https://issues.apache.org/jira/browse/HBASE-15433 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > > In SnapshotManager#restoreSnapshot, the table and region quota will be > checked and updated as: > {code} > try { > // Table already exist. Check and update the region quota for this > table namespace > checkAndUpdateNamespaceRegionQuota(manifest, tableName); > restoreSnapshot(snapshot, snapshotTableDesc); > } catch (IOException e) { > > this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); > LOG.error("Exception occurred while restoring the snapshot " + > snapshot.getName() > + " as table " + tableName.getNameAsString(), e); > throw e; > } > {code} > The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot > make the region count quota exceeded, then, the table will be removed in the > 'catch' block. This will make the current table count and region count > decrease, following table creation or region split will succeed even if the > actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15433) SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception
Jianwei Cui created HBASE-15433: --- Summary: SnapshotManager#restoreSnapshot not update table and region count quota correctly when encountering exception Key: HBASE-15433 URL: https://issues.apache.org/jira/browse/HBASE-15433 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 2.0.0 Reporter: Jianwei Cui In SnapshotManager#restoreSnapshot, the table and region quota will be checked and updated as: {code} try { // Table already exist. Check and update the region quota for this table namespace checkAndUpdateNamespaceRegionQuota(manifest, tableName); restoreSnapshot(snapshot, snapshotTableDesc); } catch (IOException e) { this.master.getMasterQuotaManager().removeTableFromNamespaceQuota(tableName); LOG.error("Exception occurred while restoring the snapshot " + snapshot.getName() + " as table " + tableName.getNameAsString(), e); throw e; } {code} The 'checkAndUpdateNamespaceRegionQuota' will fail if regions in the snapshot make the region count quota exceeded, then, the table will be removed in the 'catch' block. This will make the current table count and region count decrease, following table creation or region split will succeed even if the actual quota is exceeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
[ https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181663#comment-15181663 ] Jianwei Cui commented on HBASE-15340: - {quote} The solution of having a client aware readPnt will solve even that(?) {quote} It seems [HBASE-13099|https://issues.apache.org/jira/browse/HBASE-13099] has proposed such solution: https://issues.apache.org/jira/browse/HBASE-13099?focusedCommentId=14337017=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14337017. However, there are cases the solution can't cover(if I am not wrong). For example: 1. the client holds the readPoint when the scanner is created on serverA and the client has read partial row data from serverA 2. move the region to another serverB before the whole row returned 3. before the client created a new scanner for the row with the readPoint on serverB: new mutations applied to the region, including deletes for the row, and a major compaction happens and completed. The major compaction could delete the cells of the row because the new server can't get a proper smallestReadPoint for the compaction before all ongoing scan requests arrived. Then, the client can not read the remaining cells of the row after the compaction, and will break per-row atomicity for scan. > Partial row result of scan may return data violates the row-level transaction > -- > > Key: HBASE-15340 > URL: https://issues.apache.org/jira/browse/HBASE-15340 > Project: HBase > Issue Type: Bug > Components: Scanners, Transactions/MVCC >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > > There are cases the region sever will return partial row result, such as the > client set batch for scan or configured size limit reached. In these > situations, the client may return data that violates the row-level > transaction to the application. The following steps show the problem: > {code} > // assume there is a test table 'test_table' with one family 'F' and one > region 'region'. > // meanwhile there are two region servers 'rsA' and 'rsB'. > 1. Let 'region' firstly located in 'rsA' and put one row with two columns > 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1' > 2. Start a client to scan 'test_table', with scan.setBatch(1) and > scan.setCaching(1). The client will get one column as : {column='F:c1' and > value='value1'} in the first rpc call after scanner created, and the result > will be returned to application. > 3. Before the client issues the next request, the 'region' was moved to 'rsB' > and accepted another mutations for the two columns 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2' > 4. Then, the client will receive a RegionMovedException when issuing next > request and will retry to open scanner on 'rsB'. The newly opened scanner > will higher mvcc than old data so that could read out column as : { > column='F:c2' with value='value2'} and return the result to application. >Therefore, the application will get data as: > 'row'column='F:c1' value='value1' > 'row'column='F:c2', value='value2' >The returned data is combined from two different mutations and violates > the row-level transaction. > {code} > The reason is that the newly opened scanner after region moved will get a > different mvcc. I am not sure whether this result is by design for scan if > partial row result is allowed. However, such row result combined from > different transactions may make the application have unexpected state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15355) region.jsp can not be found on info server of master
[ https://issues.apache.org/jira/browse/HBASE-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175490#comment-15175490 ] Jianwei Cui commented on HBASE-15355: - [~stack] Thanks for your comment:). If we decide to undo master hosting meta in near future, this issue is not a problem IMO, otherwise, we can move jps files to fix this issue. BTW, do we have any design or plan to split meta for scaling? > region.jsp can not be found on info server of master > > > Key: HBASE-15355 > URL: https://issues.apache.org/jira/browse/HBASE-15355 > Project: HBase > Issue Type: Bug > Components: UI >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > > After [HBASE-10569|https://issues.apache.org/jira/browse/HBASE-10569], master > is also a regionserver and it will serve regions of system tables. The meta > region info could be viewed on master at the address such as : > http://localhost:16010/region.jsp?name=1588230740. The real path of > region.jsp for the request will be hbase-webapps/master/region.jsp on master, > however, the region.jsp is under the directory hbase-webapps/regionserver, so > that can not be found on master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15355) region.jsp can not be found on info server of master
[ https://issues.apache.org/jira/browse/HBASE-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170835#comment-15170835 ] Jianwei Cui commented on HBASE-15355: - Master is also a regionserver, do we need to put the jsp file in the same folder? > region.jsp can not be found on info server of master > > > Key: HBASE-15355 > URL: https://issues.apache.org/jira/browse/HBASE-15355 > Project: HBase > Issue Type: Bug > Components: UI >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > > After [HBASE-10569|https://issues.apache.org/jira/browse/HBASE-10569], master > is also a regionserver and it will serve regions of system tables. The meta > region info could be viewed on master at the address such as : > http://localhost:16010/region.jsp?name=1588230740. The real path of > region.jsp for the request will be hbase-webapps/master/region.jsp on master, > however, the region.jsp is under the directory hbase-webapps/regionserver, so > that can not be found on master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15355) region.jsp can not be found on info server of master
Jianwei Cui created HBASE-15355: --- Summary: region.jsp can not be found on info server of master Key: HBASE-15355 URL: https://issues.apache.org/jira/browse/HBASE-15355 Project: HBase Issue Type: Bug Components: UI Affects Versions: 2.0.0 Reporter: Jianwei Cui Priority: Minor After [HBASE-10569|https://issues.apache.org/jira/browse/HBASE-10569], master is also a regionserver and it will serve regions of system tables. The meta region info could be viewed on master at the address such as : http://localhost:16010/region.jsp?name=1588230740. The real path of region.jsp for the request will be hbase-webapps/master/region.jsp on master, however, the region.jsp is under the directory hbase-webapps/regionserver, so that can not be found on master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
[ https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168832#comment-15168832 ] Jianwei Cui commented on HBASE-15340: - {quote} The solution of having a client aware readPnt will solve even that(?) {quota} It seems work IMO, I will try to find whether there is any discussion about this issue. > Partial row result of scan may return data violates the row-level transaction > -- > > Key: HBASE-15340 > URL: https://issues.apache.org/jira/browse/HBASE-15340 > Project: HBase > Issue Type: Bug > Components: Scanners, Transactions/MVCC >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > > There are cases the region sever will return partial row result, such as the > client set batch for scan or configured size limit reached. In these > situations, the client may return data that violates the row-level > transaction to the application. The following steps show the problem: > {code} > // assume there is a test table 'test_table' with one family 'F' and one > region 'region'. > // meanwhile there are two region servers 'rsA' and 'rsB'. > 1. Let 'region' firstly located in 'rsA' and put one row with two columns > 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1' > 2. Start a client to scan 'test_table', with scan.setBatch(1) and > scan.setCaching(1). The client will get one column as : {column='F:c1' and > value='value1'} in the first rpc call after scanner created, and the result > will be returned to application. > 3. Before the client issues the next request, the 'region' was moved to 'rsB' > and accepted another mutations for the two columns 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2' > 4. Then, the client will receive a RegionMovedException when issuing next > request and will retry to open scanner on 'rsB'. The newly opened scanner > will higher mvcc than old data so that could read out column as : { > column='F:c2' with value='value2'} and return the result to application. >Therefore, the application will get data as: > 'row'column='F:c1' value='value1' > 'row'column='F:c2', value='value2' >The returned data is combined from two different mutations and violates > the row-level transaction. > {code} > The reason is that the newly opened scanner after region moved will get a > different mvcc. I am not sure whether this result is by design for scan if > partial row result is allowed. However, such row result combined from > different transactions may make the application have unexpected state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
[ https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168829#comment-15168829 ] Jianwei Cui commented on HBASE-15340: - After [HBASE-11544|https://issues.apache.org/jira/browse/HBASE-11544], the maxScannerResultSize of ClientScanner will be 2MB default, this will make server return partial result more easily when size limit reached, and this issue will happen even when the user not set batch for scan. > Partial row result of scan may return data violates the row-level transaction > -- > > Key: HBASE-15340 > URL: https://issues.apache.org/jira/browse/HBASE-15340 > Project: HBase > Issue Type: Bug > Components: Scanners, Transactions/MVCC >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > > There are cases the region sever will return partial row result, such as the > client set batch for scan or configured size limit reached. In these > situations, the client may return data that violates the row-level > transaction to the application. The following steps show the problem: > {code} > // assume there is a test table 'test_table' with one family 'F' and one > region 'region'. > // meanwhile there are two region servers 'rsA' and 'rsB'. > 1. Let 'region' firstly located in 'rsA' and put one row with two columns > 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1' > 2. Start a client to scan 'test_table', with scan.setBatch(1) and > scan.setCaching(1). The client will get one column as : {column='F:c1' and > value='value1'} in the first rpc call after scanner created, and the result > will be returned to application. > 3. Before the client issues the next request, the 'region' was moved to 'rsB' > and accepted another mutations for the two columns 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2' > 4. Then, the client will receive a RegionMovedException when issuing next > request and will retry to open scanner on 'rsB'. The newly opened scanner > will higher mvcc than old data so that could read out column as : { > column='F:c2' with value='value2'} and return the result to application. >Therefore, the application will get data as: > 'row'column='F:c1' value='value1' > 'row'column='F:c2', value='value2' >The returned data is combined from two different mutations and violates > the row-level transaction. > {code} > The reason is that the newly opened scanner after region moved will get a > different mvcc. I am not sure whether this result is by design for scan if > partial row result is allowed. However, such row result combined from > different transactions may make the application have unexpected state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
[ https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168771#comment-15168771 ] Jianwei Cui commented on HBASE-15340: - [~anoop.hbase], thanks for your comment, I get your point:). Yes, the case you mentioned will happen. The page https://hbase.apache.org/acid-semantics.html explains the consistency guarantee for scan: {code} A scan is not a consistent view of a table. Scans do not exhibit snapshot isolation. Rather, scans have the following properties: 1. Any row returned by the scan will be a consistent view (i.e. that version of the complete row existed at some point in time) [1] 2. A scan will always reflect a view of the data at least as new as the beginning of the scan. This satisfies the visibility guarantees enumerated below. 1. For example, if client A writes data X and then communicates via a side channel to client B, any scans started by client B will contain data at least as new as X. 2. A scan _must_ reflect all mutations committed prior to the construction of the scanner, and _may_ reflect some mutations committed subsequent to the construction of the scanner. 3. Scans must include all data written prior to the scan (except in the case where data is subsequently mutated, in which case it _may_ reflect the mutation) {code} It seems the consistent for scan only guarantee to read out data at least as new as the beginning of the scan, but no guarantee to whether read out data concurrently written or written after the beginning of the scan. At the end of the page: {code} [1] A consistent view is not guaranteed intra-row scanning -- i.e. fetching a portion of a row in one RPC then going back to fetch another portion of the row in a subsequent RPC. Intra-row scanning happens when you set a limit on how many values to return per Scan#next (See Scan#setBatch(int)). {code} It mentioned the problem of this jira that row-level consistent view is not guaranteed for intra-row scanning, so this is a known problem? > Partial row result of scan may return data violates the row-level transaction > -- > > Key: HBASE-15340 > URL: https://issues.apache.org/jira/browse/HBASE-15340 > Project: HBase > Issue Type: Bug > Components: Scanners, Transactions/MVCC >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > > There are cases the region sever will return partial row result, such as the > client set batch for scan or configured size limit reached. In these > situations, the client may return data that violates the row-level > transaction to the application. The following steps show the problem: > {code} > // assume there is a test table 'test_table' with one family 'F' and one > region 'region'. > // meanwhile there are two region servers 'rsA' and 'rsB'. > 1. Let 'region' firstly located in 'rsA' and put one row with two columns > 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1' > 2. Start a client to scan 'test_table', with scan.setBatch(1) and > scan.setCaching(1). The client will get one column as : {column='F:c1' and > value='value1'} in the first rpc call after scanner created, and the result > will be returned to application. > 3. Before the client issues the next request, the 'region' was moved to 'rsB' > and accepted another mutations for the two columns 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2' > 4. Then, the client will receive a RegionMovedException when issuing next > request and will retry to open scanner on 'rsB'. The newly opened scanner > will higher mvcc than old data so that could read out column as : { > column='F:c2' with value='value2'} and return the result to application. >Therefore, the application will get data as: > 'row'column='F:c1' value='value1' > 'row'column='F:c2', value='value2' >The returned data is combined from two different mutations and violates > the row-level transaction. > {code} > The reason is that the newly opened scanner after region moved will get a > different mvcc. I am not sure whether this result is by design for scan if > partial row result is allowed. However, such row result combined from > different transactions may make the application have unexpected state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
[ https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168736#comment-15168736 ] Jianwei Cui commented on HBASE-15340: - [~anoop.hbase], the intra-row scanning seems come from [HBASE-1537|https://issues.apache.org/jira/browse/HBASE-1537], so that versions after 0.90.0 will have this issue. I will make a patch following the idea and check the result:) > Partial row result of scan may return data violates the row-level transaction > -- > > Key: HBASE-15340 > URL: https://issues.apache.org/jira/browse/HBASE-15340 > Project: HBase > Issue Type: Bug > Components: Scanners, Transactions/MVCC >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > > There are cases the region sever will return partial row result, such as the > client set batch for scan or configured size limit reached. In these > situations, the client may return data that violates the row-level > transaction to the application. The following steps show the problem: > {code} > // assume there is a test table 'test_table' with one family 'F' and one > region 'region'. > // meanwhile there are two region servers 'rsA' and 'rsB'. > 1. Let 'region' firstly located in 'rsA' and put one row with two columns > 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1' > 2. Start a client to scan 'test_table', with scan.setBatch(1) and > scan.setCaching(1). The client will get one column as : {column='F:c1' and > value='value1'} in the first rpc call after scanner created, and the result > will be returned to application. > 3. Before the client issues the next request, the 'region' was moved to 'rsB' > and accepted another mutations for the two columns 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2' > 4. Then, the client will receive a RegionMovedException when issuing next > request and will retry to open scanner on 'rsB'. The newly opened scanner > will higher mvcc than old data so that could read out column as : { > column='F:c2' with value='value2'} and return the result to application. >Therefore, the application will get data as: > 'row'column='F:c1' value='value1' > 'row'column='F:c2', value='value2' >The returned data is combined from two different mutations and violates > the row-level transaction. > {code} > The reason is that the newly opened scanner after region moved will get a > different mvcc. I am not sure whether this result is by design for scan if > partial row result is allowed. However, such row result combined from > different transactions may make the application have unexpected state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
[ https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168678#comment-15168678 ] Jianwei Cui commented on HBASE-15340: - [~ram_krish], this is a different problem caused by region move when scanning IMO. When [HBASE-15325|https://issues.apache.org/jira/browse/HBASE-15325] is resolved, there is no data miss, however, the returned data may combined from different row-level transactions which is unexpected for application. I think we should also keep the READ_COMMITTED isolation level in this situation? > Partial row result of scan may return data violates the row-level transaction > -- > > Key: HBASE-15340 > URL: https://issues.apache.org/jira/browse/HBASE-15340 > Project: HBase > Issue Type: Bug > Components: Scanners, Transactions/MVCC >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > > There are cases the region sever will return partial row result, such as the > client set batch for scan or configured size limit reached. In these > situations, the client may return data that violates the row-level > transaction to the application. The following steps show the problem: > {code} > // assume there is a test table 'test_table' with one family 'F' and one > region 'region'. > // meanwhile there are two region servers 'rsA' and 'rsB'. > 1. Let 'region' firstly located in 'rsA' and put one row with two columns > 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1' > 2. Start a client to scan 'test_table', with scan.setBatch(1) and > scan.setCaching(1). The client will get one column as : {column='F:c1' and > value='value1'} in the first rpc call after scanner created, and the result > will be returned to application. > 3. Before the client issues the next request, the 'region' was moved to 'rsB' > and accepted another mutations for the two columns 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2' > 4. Then, the client will receive a RegionMovedException when issuing next > request and will retry to open scanner on 'rsB'. The newly opened scanner > will higher mvcc than old data so that could read out column as : { > column='F:c2' with value='value2'} and return the result to application. >Therefore, the application will get data as: > 'row'column='F:c1' value='value1' > 'row'column='F:c2', value='value2' >The returned data is combined from two different mutations and violates > the row-level transaction. > {code} > The reason is that the newly opened scanner after region moved will get a > different mvcc. I am not sure whether this result is by design for scan if > partial row result is allowed. However, such row result combined from > different transactions may make the application have unexpected state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15325) ResultScanner allowing partial result will miss the rest of the row if the region is moved between two rpc requests
[ https://issues.apache.org/jira/browse/HBASE-15325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168664#comment-15168664 ] Jianwei Cui commented on HBASE-15325: - When user set batch for scan, the client may also return partial row result to application and suffer this problem if region moves. The reason is that the server will judge whether the result is partial as: {code} boolean partialResultFormed() { return scannerState == NextState.SIZE_LIMIT_REACHED_MID_ROW || scannerState == NextState.TIME_LIMIT_REACHED_MID_ROW; } {code} The NextState.BATCH_LIMIT_REACHED is not considered as partial result, so that the ClientScanner won't get a partial result from server and will go to the next row when retrying: if (!this.lastResult.isPartial()) { if (scan.isReversed()) { scan.setStartRow(createClosestRowBefore(lastResult.getRow())); } else { scan.setStartRow(Bytes.add(lastResult.getRow(), new byte[1])); // <=== partial result from batch limit reached case will go to the next row and missing rest data } } else { // we need rescan this row because we only load partial row before scan.setStartRow(lastResult.getRow()); } {code} I think if user sets batch for scan, it means the user allows partial result? We can set scan.allowPartialResults to true in this situation, and the server should also take NextState.BATCH_LIMIT_REACHED as a partial result, then the ClientScanner will receive a partial result and retry the same row if region moved after applied the patch. > ResultScanner allowing partial result will miss the rest of the row if the > region is moved between two rpc requests > --- > > Key: HBASE-15325 > URL: https://issues.apache.org/jira/browse/HBASE-15325 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.3 >Reporter: Phil Yang >Assignee: Phil Yang >Priority: Critical > Attachments: 15325-test.txt, HBASE-15325-v1.txt > > > HBASE-11544 allow scan rpc return partial of a row to reduce memory usage for > one rpc request. And client can setAllowPartial or setBatch to get several > cells in a row instead of the whole row. > However, the status of the scanner is saved on server and we need this to get > the next part if there is a partial result before. If we move the region to > another RS, client will get a NotServingRegionException and open a new > scanner to the new RS which will be regarded as a new scan from the end of > this row. So the rest cells of the row of last result will be missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
[ https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168630#comment-15168630 ] Jianwei Cui commented on HBASE-15340: - A direct solution is that we can make ClientScanner record the readPoint when the scanner for the region is firstly opened, the following scanners for the same region use the same readPoint if RegionMovedException happens. Any suggestion? > Partial row result of scan may return data violates the row-level transaction > -- > > Key: HBASE-15340 > URL: https://issues.apache.org/jira/browse/HBASE-15340 > Project: HBase > Issue Type: Bug > Components: Scanners, Transactions/MVCC >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > > There are cases the region sever will return partial row result, such as the > client set batch for scan or configured size limit reached. In these > situations, the client may return data that violates the row-level > transaction to the application. The following steps show the problem: > {code} > // assume there is a test table 'test_table' with one family 'F' and one > region 'region'. > // meanwhile there are two region servers 'rsA' and 'rsB'. > 1. Let 'region' firstly located in 'rsA' and put one row with two columns > 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1' > 2. Start a client to scan 'test_table', with scan.setBatch(1) and > scan.setCaching(1). The client will get one column as : {column='F:c1' and > value='value1'} in the first rpc call after scanner created, and the result > will be returned to application. > 3. Before the client issues the next request, the 'region' was moved to 'rsB' > and accepted another mutations for the two columns 'c1' and 'c2' as: > > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2' > 4. Then, the client will receive a RegionMovedException when issuing next > request and will retry to open scanner on 'rsB'. The newly opened scanner > will higher mvcc than old data so that could read out column as : { > column='F:c2' with value='value2'} and return the result to application. >Therefore, the application will get data as: > 'row'column='F:c1' value='value1' > 'row'column='F:c2', value='value2' >The returned data is combined from two different mutations and violates > the row-level transaction. > {code} > The reason is that the newly opened scanner after region moved will get a > different mvcc. I am not sure whether this result is by design for scan if > partial row result is allowed. However, such row result combined from > different transactions may make the application have unexpected state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction
Jianwei Cui created HBASE-15340: --- Summary: Partial row result of scan may return data violates the row-level transaction Key: HBASE-15340 URL: https://issues.apache.org/jira/browse/HBASE-15340 Project: HBase Issue Type: Bug Components: Scanners, Transactions/MVCC Affects Versions: 2.0.0 Reporter: Jianwei Cui There are cases the region sever will return partial row result, such as the client set batch for scan or configured size limit reached. In these situations, the client may return data that violates the row-level transaction to the application. The following steps show the problem: {code} // assume there is a test table 'test_table' with one family 'F' and one region 'region'. // meanwhile there are two region servers 'rsA' and 'rsB'. 1. Let 'region' firstly located in 'rsA' and put one row with two columns 'c1' and 'c2' as: > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1' 2. Start a client to scan 'test_table', with scan.setBatch(1) and scan.setCaching(1). The client will get one column as : {column='F:c1' and value='value1'} in the first rpc call after scanner created, and the result will be returned to application. 3. Before the client issues the next request, the 'region' was moved to 'rsB' and accepted another mutations for the two columns 'c1' and 'c2' as: > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2' 4. Then, the client will receive a RegionMovedException when issuing next request and will retry to open scanner on 'rsB'. The newly opened scanner will higher mvcc than old data so that could read out column as : { column='F:c2' with value='value2'} and return the result to application. Therefore, the application will get data as: 'row'column='F:c1' value='value1' 'row'column='F:c2', value='value2' The returned data is combined from two different mutations and violates the row-level transaction. {code} The reason is that the newly opened scanner after region moved will get a different mvcc. I am not sure whether this result is by design for scan if partial row result is allowed. However, such row result combined from different transactions may make the application have unexpected state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15327) Canary will always invoke admin.balancer() in each sniffing period when writeSniffing is enabled
[ https://issues.apache.org/jira/browse/HBASE-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15327: Attachment: HBASE-15327-trunk.patch > Canary will always invoke admin.balancer() in each sniffing period when > writeSniffing is enabled > > > Key: HBASE-15327 > URL: https://issues.apache.org/jira/browse/HBASE-15327 > Project: HBase > Issue Type: Bug > Components: canary >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-15327-trunk.patch > > > When Canary#writeSniffing is enabled, Canary#checkWriteTableDistribution will > make sure the regions of write table distributed on all region servers as: > {code} > int numberOfServers = admin.getClusterStatus().getServers().size(); > .. > int numberOfCoveredServers = serverSet.size(); > if (numberOfCoveredServers < numberOfServers) { > admin.balancer(); > } > {code} > The master will also work as a regionserver, so that ClusterStatus#getServers > will contain the master. On the other hand, write table of Canary will not be > assigned to master, making numberOfCoveredServers always smaller than > numberOfServers and admin.balancer always be invoked in each sniffing period. > This may cause frequent region moves. A simple fix is excluding master from > numberOfServers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15327) Canary will always invoke admin.balancer() in each sniffing period when writeSniffing is enabled
Jianwei Cui created HBASE-15327: --- Summary: Canary will always invoke admin.balancer() in each sniffing period when writeSniffing is enabled Key: HBASE-15327 URL: https://issues.apache.org/jira/browse/HBASE-15327 Project: HBase Issue Type: Bug Components: canary Affects Versions: 2.0.0 Reporter: Jianwei Cui Priority: Minor When Canary#writeSniffing is enabled, Canary#checkWriteTableDistribution will make sure the regions of write table distributed on all region servers as: {code} int numberOfServers = admin.getClusterStatus().getServers().size(); .. int numberOfCoveredServers = serverSet.size(); if (numberOfCoveredServers < numberOfServers) { admin.balancer(); } {code} The master will also work as a regionserver, so that ClusterStatus#getServers will contain the master. On the other hand, write table of Canary will not be assigned to master, making numberOfCoveredServers always smaller than numberOfServers and admin.balancer always be invoked in each sniffing period. This may cause frequent region moves. A simple fix is excluding master from numberOfServers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15325) ResultScanner allowing partial result will reset to the start of the row if the region is moved between two rpc requests
[ https://issues.apache.org/jira/browse/HBASE-15325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166847#comment-15166847 ] Jianwei Cui commented on HBASE-15325: - In ScannerCallable#call(), the NotServingRegionException will be wrapped as DoNotRetryIOException: {code} if (ioe instanceof NotServingRegionException) { // Throw a DNRE so that we break out of cycle of calling NSRE // when what we need is to open scanner against new location. // Attach NSRE to signal client that it needs to re-setup scanner. if (this.scanMetrics != null) { this.scanMetrics.countOfNSRE.incrementAndGet(); } throw new DoNotRetryIOException("Resetting the scanner -- see exception cause", ioe); {code} > ResultScanner allowing partial result will reset to the start of the row if > the region is moved between two rpc requests > > > Key: HBASE-15325 > URL: https://issues.apache.org/jira/browse/HBASE-15325 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.3 >Reporter: Phil Yang >Assignee: Phil Yang >Priority: Critical > Attachments: 15325-test.txt > > > HBASE-11544 allow scan rpc return partial of a row to reduce memory usage for > one rpc request. And client can setAllowPartial or setBatch to get several > cells in a row instead of the whole row. > However, the status of the scanner is saved on server and we need this to get > the next part if there is a partial result before. If we move the region to > another RS, client will get a NotServingRegionException and open a new > scanner to the new RS which will be regarded as a new scan from the start of > this row. So we will see the cells which have been seen before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15304) SecureBulkLoadEndpoint#bulkLoadHFiles not consider assignSeqNum flag(In 0.94 branch)
[ https://issues.apache.org/jira/browse/HBASE-15304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-15304: Attachment: HBASE-15304-0.94-v1.patch > SecureBulkLoadEndpoint#bulkLoadHFiles not consider assignSeqNum flag(In 0.94 > branch) > > > Key: HBASE-15304 > URL: https://issues.apache.org/jira/browse/HBASE-15304 > Project: HBase > Issue Type: Bug > Components: Coprocessors >Affects Versions: 0.94.27 >Reporter: Jianwei Cui >Priority: Minor > Attachments: HBASE-15304-0.94-v1.patch > > > In 0.94, it seems SecureBulkLoadEndpoint#bulkLoadHFiles never use the > assignSeqNum flag, so that the server won't assign a sequence number for bulk > load hfiles even when the assignSeqNum is set to true by client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15304) SecureBulkLoadEndpoint#bulkLoadHFiles not consider assignSeqNum flag(In 0.94 branch)
Jianwei Cui created HBASE-15304: --- Summary: SecureBulkLoadEndpoint#bulkLoadHFiles not consider assignSeqNum flag(In 0.94 branch) Key: HBASE-15304 URL: https://issues.apache.org/jira/browse/HBASE-15304 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 0.94.27 Reporter: Jianwei Cui Priority: Minor In 0.94, it seems SecureBulkLoadEndpoint#bulkLoadHFiles never use the assignSeqNum flag, so that the server won't assign a sequence number for bulk load hfiles even when the assignSeqNum is set to true by client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-15303) LoadIncrementalHFiles will encounter NoSuchMethodException when using secure
[ https://issues.apache.org/jira/browse/HBASE-15303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui resolved HBASE-15303. - Resolution: Invalid > LoadIncrementalHFiles will encounter NoSuchMethodException when using secure > > > Key: HBASE-15303 > URL: https://issues.apache.org/jira/browse/HBASE-15303 > Project: HBase > Issue Type: Bug > Components: Coprocessors >Affects Versions: 0.94.27 >Reporter: Jianwei Cui > > After [HBASE-8521|https://issues.apache.org/jira/browse/HBASE-8521], the > LoadIncrementalHFiles could ask server to assign sequence id for bulk load > hfiles by invoking SecureBulkLoadClient#bulkLoadHFiles: > {code} > public boolean bulkLoadHFiles(List> familyPaths, > Token userToken, > String bulkToken, boolean assignSeqNum) throws IOException { > try { > return (Boolean) Methods.call(protocolClazz, proxy, "bulkLoadHFiles", > new Class[] { > List.class, Token.class, String.class, Boolean.class }, > new Object[] { familyPaths, userToken, bulkToken, assignSeqNum }); > } catch (Exception e) { > throw new IOException("Failed to bulkLoadHFiles", e); > } > } > {code} > However, SecureBulkLoadProtocol does not define such interface(with > assignSeqNum as last parameter), so that the client will encounter > NoSuchMethodException when using secure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15303) LoadIncrementalHFiles will encounter NoSuchMethodException when using secure
Jianwei Cui created HBASE-15303: --- Summary: LoadIncrementalHFiles will encounter NoSuchMethodException when using secure Key: HBASE-15303 URL: https://issues.apache.org/jira/browse/HBASE-15303 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 0.94.27 Reporter: Jianwei Cui After [HBASE-8521|https://issues.apache.org/jira/browse/HBASE-8521], the LoadIncrementalHFiles could ask server to assign sequence id for bulk load hfiles by invoking SecureBulkLoadClient#bulkLoadHFiles: {code} public boolean bulkLoadHFiles(List> familyPaths, Token userToken, String bulkToken, boolean assignSeqNum) throws IOException { try { return (Boolean) Methods.call(protocolClazz, proxy, "bulkLoadHFiles", new Class[] { List.class, Token.class, String.class, Boolean.class }, new Object[] { familyPaths, userToken, bulkToken, assignSeqNum }); } catch (Exception e) { throw new IOException("Failed to bulkLoadHFiles", e); } } {code} However, SecureBulkLoadProtocol does not define such interface(with assignSeqNum as last parameter), so that the client will encounter NoSuchMethodException when using secure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14259) Backport Namespace quota support to 98 branch
[ https://issues.apache.org/jira/browse/HBASE-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118848#comment-15118848 ] Jianwei Cui commented on HBASE-14259: - Thanks for the patch, [~avandana]. Seems one typo in patch v3? {code} + public void updateQuotaForRegionMerge(HRegionInfo hri) throws IOException { +if (isInitialized()) { // ===> if (!isInitialized()) { + throw new IOException( + "Merge operation is being performed even before namespace auditor is initialized."); +} {code} In TestZKLessNamespaceAuditor, setBoolean("hbase.assignment.usezk", false) will be applied to TestZKLessNamespaceAuditor#UTIL, not TestNamespaceAuditor#UTIL, making TestZKLessNamespaceAuditor will also use zookeeper to assign region. Need to make TestNamespaceAuditor#UTIL protected? {code} +@Category(MediumTests.class) +public class TestZKLessNamespaceAuditor extends TestNamespaceAuditor { + private static final HBaseTestingUtility UTIL = new HBaseTestingUtility(); + + @BeforeClass + public static void before() throws Exception { +UTIL.getConfiguration().setBoolean("hbase.assignment.usezk", false); +setupOnce(); + } {code} In AssignmentManager: {code} case MERGED: case READY_TO_MERGE: case MERGE_PONR: case MERGED: + try { +regionStateListener.onRegionMerged(hri); + } catch (IOException exp) { +errorMsg = StringUtils.stringifyException(exp); + } {code} should invoke regionStateListener.onRegionMerged only in MERGED case? {code} case MERGE_PONR: case MERGED: + if (code == TransitionCode.MERGED) { + try { +regionStateListener.onRegionMerged(hri); + } catch (IOException exp) { +errorMsg = StringUtils.stringifyException(exp); + } + } {code} > Backport Namespace quota support to 98 branch > -- > > Key: HBASE-14259 > URL: https://issues.apache.org/jira/browse/HBASE-14259 > Project: HBase > Issue Type: Task >Reporter: Vandana Ayyalasomayajula >Assignee: Andrew Purtell > Fix For: 0.98.18 > > Attachments: HBASE-14259_v1_0.98.patch, HBASE-14259_v2_0.98.patch, > HBASE-14259_v3_0.98.patch > > > Namespace quota support (HBASE-8410) has been backported to branch-1 > (HBASE-13438). This jira would backport the same to 98 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14992) Add cache stats of past n periods in region server status page
Jianwei Cui created HBASE-14992: --- Summary: Add cache stats of past n periods in region server status page Key: HBASE-14992 URL: https://issues.apache.org/jira/browse/HBASE-14992 Project: HBase Issue Type: Improvement Components: BlockCache, metrics Affects Versions: 2.0.0 Reporter: Jianwei Cui Priority: Minor The cache stats of past n periods, such as SumHitCountsPastNPeriods, SumHitCachingCountsPastNPeriods, etc, are useful to indicate the real-time read load of region server, especially for temporary read peak. It is helpful to add such metrics to BlockCache#Stats tab of region server status page. Discussion and suggestion are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14936) CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod()
[ https://issues.apache.org/jira/browse/HBASE-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055856#comment-15055856 ] Jianwei Cui commented on HBASE-14936: - Thanks for your review [~chenheng] > CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() > -- > > Key: HBASE-14936 > URL: https://issues.apache.org/jira/browse/HBASE-14936 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.1.2 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.2, 1.1, 1.3, 1.0 > > Attachments: HBASE-14936-branch-1.0-1.1.patch, > HBASE-14936-branch-1.0-addendum.patch, HBASE-14936-trunk-v1.patch, > HBASE-14936-trunk-v2.patch, HBASE-14936-trunk.patch > > > It seems CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() as > {code} > public void rollMetricsPeriod() { > lruCacheStats.rollMetricsPeriod(); > bucketCacheStats.rollMetricsPeriod(); > } > {code} > otherwise, CombinedBlockCache.getHitRatioPastNPeriods() and > CombinedBlockCache.getHitCachingRatioPastNPeriods() will always return 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14936) CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod()
[ https://issues.apache.org/jira/browse/HBASE-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055857#comment-15055857 ] Jianwei Cui commented on HBASE-14936: - Thanks for your review [~chenheng] > CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() > -- > > Key: HBASE-14936 > URL: https://issues.apache.org/jira/browse/HBASE-14936 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.1.2 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.2, 1.1, 1.3, 1.0 > > Attachments: HBASE-14936-branch-1.0-1.1.patch, > HBASE-14936-branch-1.0-addendum.patch, HBASE-14936-trunk-v1.patch, > HBASE-14936-trunk-v2.patch, HBASE-14936-trunk.patch > > > It seems CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() as > {code} > public void rollMetricsPeriod() { > lruCacheStats.rollMetricsPeriod(); > bucketCacheStats.rollMetricsPeriod(); > } > {code} > otherwise, CombinedBlockCache.getHitRatioPastNPeriods() and > CombinedBlockCache.getHitCachingRatioPastNPeriods() will always return 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14936) CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod()
[ https://issues.apache.org/jira/browse/HBASE-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055786#comment-15055786 ] Jianwei Cui commented on HBASE-14936: - Sure, it seems HBASE-14936-trunk-v2.patch could be applied to branch-1.2, I add a patch for branch-1.0 and branch-1.1 > CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() > -- > > Key: HBASE-14936 > URL: https://issues.apache.org/jira/browse/HBASE-14936 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.1.2 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.2, 1.1, 1.3 > > Attachments: HBASE-14936-branch-1.0-1.1.patch, > HBASE-14936-trunk-v1.patch, HBASE-14936-trunk-v2.patch, > HBASE-14936-trunk.patch > > > It seems CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() as > {code} > public void rollMetricsPeriod() { > lruCacheStats.rollMetricsPeriod(); > bucketCacheStats.rollMetricsPeriod(); > } > {code} > otherwise, CombinedBlockCache.getHitRatioPastNPeriods() and > CombinedBlockCache.getHitCachingRatioPastNPeriods() will always return 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14936) CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod()
[ https://issues.apache.org/jira/browse/HBASE-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-14936: Attachment: HBASE-14936-branch-1.0-1.1.patch patch for branch-1.0 and branch-1.1 > CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() > -- > > Key: HBASE-14936 > URL: https://issues.apache.org/jira/browse/HBASE-14936 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.1.2 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.2, 1.1, 1.3 > > Attachments: HBASE-14936-branch-1.0-1.1.patch, > HBASE-14936-trunk-v1.patch, HBASE-14936-trunk-v2.patch, > HBASE-14936-trunk.patch > > > It seems CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() as > {code} > public void rollMetricsPeriod() { > lruCacheStats.rollMetricsPeriod(); > bucketCacheStats.rollMetricsPeriod(); > } > {code} > otherwise, CombinedBlockCache.getHitRatioPastNPeriods() and > CombinedBlockCache.getHitCachingRatioPastNPeriods() will always return 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14936) CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod()
[ https://issues.apache.org/jira/browse/HBASE-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianwei Cui updated HBASE-14936: Attachment: HBASE-14936-trunk-v2.patch add license for TestCombinedBlockCache.java > CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() > -- > > Key: HBASE-14936 > URL: https://issues.apache.org/jira/browse/HBASE-14936 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.1.2 >Reporter: Jianwei Cui >Assignee: Jianwei Cui > Fix For: 2.0.0, 1.2, 1.1, 1.3 > > Attachments: HBASE-14936-trunk-v1.patch, HBASE-14936-trunk-v2.patch, > HBASE-14936-trunk.patch > > > It seems CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() as > {code} > public void rollMetricsPeriod() { > lruCacheStats.rollMetricsPeriod(); > bucketCacheStats.rollMetricsPeriod(); > } > {code} > otherwise, CombinedBlockCache.getHitRatioPastNPeriods() and > CombinedBlockCache.getHitCachingRatioPastNPeriods() will always return 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14936) CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod()
[ https://issues.apache.org/jira/browse/HBASE-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050452#comment-15050452 ] Jianwei Cui commented on HBASE-14936: - Do not need to overwrite getHitRatio() because getHitCount() and getRequestCount() has been overwritten. > CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() > -- > > Key: HBASE-14936 > URL: https://issues.apache.org/jira/browse/HBASE-14936 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 1.1.2 >Reporter: Jianwei Cui > Attachments: HBASE-14936-trunk-v1.patch, HBASE-14936-trunk.patch > > > It seems CombinedBlockCache should overwrite CacheStats#rollMetricsPeriod() as > {code} > public void rollMetricsPeriod() { > lruCacheStats.rollMetricsPeriod(); > bucketCacheStats.rollMetricsPeriod(); > } > {code} > otherwise, CombinedBlockCache.getHitRatioPastNPeriods() and > CombinedBlockCache.getHitCachingRatioPastNPeriods() will always return 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)