Re: Uninitialized Message Exception thrown while getting values.
Hi Which version of HBase you get this problem? Do you have any pb classpath issues? Regards Ram On Thu, Jan 18, 2018 at 12:40 PM, Karthick Ramwrote: > "UninitializedMessageException : Message missing required fields : region, > get", is thrown while performing Get. Due to this all the Get requests to > the same Region Server are getting stalled. > > com.google.protobuf.UninitializedMessageException: Message missing > required fields : region, get > at com.google.protobuf.AbstractMessage$Build. > newUninitializedMessageException(AbstractMessage.java:770) > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$ > Builder.build(ClientProtos.java:6377) > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$ > Builder.build(ClientProtos.java:6309) > at org.apache.hadoop.hbase.ipc.RpcServer$Connection. > processRequest(RpcServer.java:1840) > at org.apache.hadoop.hbase.ipc.RpcServer$Connection. > processOneRpc(RpcServer.java:1775) > at org.apache.hadoop.hbase.ipc.RpcServer$Connection.process( > RPcServer.java:1623) > at org.apache.hadoop.hbase.ipc.RpcServer$Connection. > readAndProcess(RpcServer.java:1603) > at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead( > RpcServer.java:861) > at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader. > doRunLoop(RpcServer.java:643) > at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run( > RpcServer.java:619) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) >
Uninitialized Message Exception thrown while getting values.
"UninitializedMessageException : Message missing required fields : region, get", is thrown while performing Get. Due to this all the Get requests to the same Region Server are getting stalled. com.google.protobuf.UninitializedMessageException: Message missing required fields : region, get at com.google.protobuf.AbstractMessage$Build.newUninitializedMessageException(AbstractMessage.java:770) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:6377) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:6309) at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1840) at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1775) at org.apache.hadoop.hbase.ipc.RpcServer$Connection.process(RPcServer.java:1623) at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1603) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:861) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:643) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:619) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
[jira] [Comment Edited] (HBASE-19820) Restore public API compat of MiniHBaseCluster
[ https://issues.apache.org/jira/browse/HBASE-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329916#comment-16329916 ] Mike Drob edited comment on HBASE-19820 at 1/18/18 2:52 AM: If you say somebody is using it, let's add it back. Maybe good to consider a builder pattern for the mini cluster in the future? +1 was (Author: mdrob): If you say somebody is using it, let's add it back. Maybe good to consider a builder pattern for the mini cluster in the future? > Restore public API compat of MiniHBaseCluster > - > > Key: HBASE-19820 > URL: https://issues.apache.org/jira/browse/HBASE-19820 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Major > Attachments: HBASE-19820.master.001.patch > > > HBASE-18352 removed a public constructor of MiniHBaseCluster. Adding it back. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19799) Add web UI to rsgroup
[ https://issues.apache.org/jira/browse/HBASE-19799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329917#comment-16329917 ] Guangxu Cheng commented on HBASE-19799: --- Attach 002 patch as [~yuzhih...@gmail.com] suggestions. > Add web UI to rsgroup > - > > Key: HBASE-19799 > URL: https://issues.apache.org/jira/browse/HBASE-19799 > Project: HBase > Issue Type: New Feature > Components: rsgroup, UI >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng >Priority: Major > Attachments: HBASE-19799.master.001.patch, > HBASE-19799.master.002.patch, master_rsgroup.png, rsgroup_detail.png > > > When the RSGroup feature is enabled, there is’t a webui to show the details > of rsgroup. we can only view the details of the rsgroup via shell commands, > which is inconvenient. > This issue will add a webui to rsgroup. To show the statistics and details of > each rsgroup -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19820) Restore public API compat of MiniHBaseCluster
[ https://issues.apache.org/jira/browse/HBASE-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329916#comment-16329916 ] Mike Drob commented on HBASE-19820: --- If you say somebody is using it, let's add it back. Maybe good to consider a builder pattern for the mini cluster in the future? > Restore public API compat of MiniHBaseCluster > - > > Key: HBASE-19820 > URL: https://issues.apache.org/jira/browse/HBASE-19820 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Major > Attachments: HBASE-19820.master.001.patch > > > HBASE-18352 removed a public constructor of MiniHBaseCluster. Adding it back. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19799) Add web UI to rsgroup
[ https://issues.apache.org/jira/browse/HBASE-19799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangxu Cheng updated HBASE-19799: -- Attachment: HBASE-19799.master.002.patch > Add web UI to rsgroup > - > > Key: HBASE-19799 > URL: https://issues.apache.org/jira/browse/HBASE-19799 > Project: HBase > Issue Type: New Feature > Components: rsgroup, UI >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng >Priority: Major > Attachments: HBASE-19799.master.001.patch, > HBASE-19799.master.002.patch, master_rsgroup.png, rsgroup_detail.png > > > When the RSGroup feature is enabled, there is’t a webui to show the details > of rsgroup. we can only view the details of the rsgroup via shell commands, > which is inconvenient. > This issue will add a webui to rsgroup. To show the statistics and details of > each rsgroup -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19820) Restore public API compat of MiniHBaseCluster
[ https://issues.apache.org/jira/browse/HBASE-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329912#comment-16329912 ] Appy commented on HBASE-19820: -- Ping [~stack], [~mdrob] for review. > Restore public API compat of MiniHBaseCluster > - > > Key: HBASE-19820 > URL: https://issues.apache.org/jira/browse/HBASE-19820 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Major > Attachments: HBASE-19820.master.001.patch > > > HBASE-18352 removed a public constructor of MiniHBaseCluster. Adding it back. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19820) Restore public API compat of MiniHBaseCluster
[ https://issues.apache.org/jira/browse/HBASE-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-19820: - Attachment: HBASE-19820.master.001.patch > Restore public API compat of MiniHBaseCluster > - > > Key: HBASE-19820 > URL: https://issues.apache.org/jira/browse/HBASE-19820 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Priority: Major > Attachments: HBASE-19820.master.001.patch > > > HBASE-18352 removed a public constructor of MiniHBaseCluster. Adding it back. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19820) Restore public API compat of MiniHBaseCluster
[ https://issues.apache.org/jira/browse/HBASE-19820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-19820: - Assignee: Appy Status: Patch Available (was: Open) > Restore public API compat of MiniHBaseCluster > - > > Key: HBASE-19820 > URL: https://issues.apache.org/jira/browse/HBASE-19820 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Major > Attachments: HBASE-19820.master.001.patch > > > HBASE-18352 removed a public constructor of MiniHBaseCluster. Adding it back. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19820) Restore public API compat of MiniHBaseCluster
Appy created HBASE-19820: Summary: Restore public API compat of MiniHBaseCluster Key: HBASE-19820 URL: https://issues.apache.org/jira/browse/HBASE-19820 Project: HBase Issue Type: Improvement Reporter: Appy HBASE-18352 removed a public constructor of MiniHBaseCluster. Adding it back. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19813) clone_snapshot fails with region failing to open when RS group feature is enabled
[ https://issues.apache.org/jira/browse/HBASE-19813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329902#comment-16329902 ] Ted Yu commented on HBASE-19813: The RS group from cluster 1 doesn't exist in cluster 2. In this case, the clone should be allowed to proceed. > clone_snapshot fails with region failing to open when RS group feature is > enabled > - > > Key: HBASE-19813 > URL: https://issues.apache.org/jira/browse/HBASE-19813 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Priority: Major > > The following scenario came from support case. > In cluster 1, create RS group rsg. Move table to rsg group. > Take snapshot of the table and copy the snapshot to cluster 2 where there is > no group called rsg. > Cloning snapshot to table new_t4 on cluster 2 fails : > {code} > 2018-01-09 11:45:30,468 INFO [RestoreSnapshot-pool68-t1] > regionserver.HRegion: Closed > new_t4,,1514454789243.a6173d2955182ac5bde208301681c6af. > 2018-01-09 11:45:30,468 INFO [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > snapshot.CloneSnapshotHandler: Clone snapshot=snap_t3 on table=new_t4 > completed! > 2018-01-09 11:45:30,492 INFO [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > hbase.MetaTableAccessor: Added 1 > 2018-01-09 11:45:30,492 WARN [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > rsgroup.RSGroupBasedLoadBalancer: Group for table new_t4 is null > 2018-01-09 11:45:30,492 DEBUG [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > rsgroup.RSGroupBasedLoadBalancer: Group Information found to be null. Some > regions might be unassigned. > 2018-01-09 11:45:30,492 WARN [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > master.RegionStates: Failed to open/close a6173d2955182ac5bde208301681c6af on > null, set to FAILED_OPEN > {code} > Here is related code from RSGroupBasedLoadBalancer: > {code} > List candidateList = filterOfflineServers(info, servers); > for (RegionInfo region : regionList) { > currentAssignmentMap.put(region, regions.get(region)); > } > if(candidateList.size() > 0) { > assignments.putAll(this.internalBalancer.retainAssignment( > currentAssignmentMap, candidateList)); > {code} > candidateList is empty for table new_t4, leaving region for the table in > FAILED_OPEN state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19813) clone_snapshot fails with region failing to open when RS group feature is enabled
[ https://issues.apache.org/jira/browse/HBASE-19813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329899#comment-16329899 ] Guangxu Cheng commented on HBASE-19813: --- duplicate of HBASE-17785 ? > clone_snapshot fails with region failing to open when RS group feature is > enabled > - > > Key: HBASE-19813 > URL: https://issues.apache.org/jira/browse/HBASE-19813 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Priority: Major > > The following scenario came from support case. > In cluster 1, create RS group rsg. Move table to rsg group. > Take snapshot of the table and copy the snapshot to cluster 2 where there is > no group called rsg. > Cloning snapshot to table new_t4 on cluster 2 fails : > {code} > 2018-01-09 11:45:30,468 INFO [RestoreSnapshot-pool68-t1] > regionserver.HRegion: Closed > new_t4,,1514454789243.a6173d2955182ac5bde208301681c6af. > 2018-01-09 11:45:30,468 INFO [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > snapshot.CloneSnapshotHandler: Clone snapshot=snap_t3 on table=new_t4 > completed! > 2018-01-09 11:45:30,492 INFO [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > hbase.MetaTableAccessor: Added 1 > 2018-01-09 11:45:30,492 WARN [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > rsgroup.RSGroupBasedLoadBalancer: Group for table new_t4 is null > 2018-01-09 11:45:30,492 DEBUG [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > rsgroup.RSGroupBasedLoadBalancer: Group Information found to be null. Some > regions might be unassigned. > 2018-01-09 11:45:30,492 WARN [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] > master.RegionStates: Failed to open/close a6173d2955182ac5bde208301681c6af on > null, set to FAILED_OPEN > {code} > Here is related code from RSGroupBasedLoadBalancer: > {code} > List candidateList = filterOfflineServers(info, servers); > for (RegionInfo region : regionList) { > currentAssignmentMap.put(region, regions.get(region)); > } > if(candidateList.size() > 0) { > assignments.putAll(this.internalBalancer.retainAssignment( > currentAssignmentMap, candidateList)); > {code} > candidateList is empty for table new_t4, leaving region for the table in > FAILED_OPEN state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19780) Change execution phase of checkstyle plugin back to default 'verify'
[ https://issues.apache.org/jira/browse/HBASE-19780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329895#comment-16329895 ] Appy commented on HBASE-19780: -- On master, mvn install takes 3 min and mvn checkstyle:checkstyle takes 2 min. So if we do both CS will roughly be 30-40% of build time. Anyways, can we make the decision separately please. HBASE-19819 This patch is at least solving the problem with the build. > Change execution phase of checkstyle plugin back to default 'verify' > > > Key: HBASE-19780 > URL: https://issues.apache.org/jira/browse/HBASE-19780 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy >Priority: Major > Attachments: HBASE-19780.master.001.patch, > HBASE-19780.master.002.patch, HBASE-19780.master.003.patch > > > Not able to run following command successfully: > {{mvn -DskipTests install site > -Dmaven.repo.local=/Users/appy/Desktop/temp_repo}} > Use a clean separate repo so that existing packages don't pollute the build. > Error is following. > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on project > hbase: failed to get report for > org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal > org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (checkstyle) on > project hbase-error-prone: Execution checkstyle of goal > org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check failed: Plugin > org.apache.maven.plugins:maven-checkstyle-plugin:2.17 or one of its > dependencies could not be resolved: Failure to find > org.apache.hbase:hbase-checkstyle:jar:2.0.0-beta-1 in > http://repository.apache.org/snapshots/ was cached in the local repository, > resolution will not be reattempted until the update interval of > apache.snapshots has elapsed or updates are forced -> [Help 1] > {noformat} > Note that master build goes pass this point. > Need to figure out what's the difference and fix the overall build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19819) Decide the place of Checkstyle in build flow
Appy created HBASE-19819: Summary: Decide the place of Checkstyle in build flow Key: HBASE-19819 URL: https://issues.apache.org/jira/browse/HBASE-19819 Project: HBase Issue Type: Improvement Reporter: Appy Ref: https://issues.apache.org/jira/browse/HBASE-19780 Main questions: # Should checkstyle (CS) be part of {{mvn install}}. On master, mvn install (without clean) takes ~3 min and {{mvn checkstyle:checkstyle}} takes ~2min. I think the reason it's not part of default build might be - our project isn't clean, and failing because of existing 10k CS issues is useless. Maybe there's no trivial way of reporting just the new CS issues, and that's why we depend on QA (which gives just the diff) for checkstyle? # How to avoid regressions in modules which have been sanitized by [~Jan Hentschel]. Here's a suggestion building on his: ** Let's add a recommendation in documentation that run mvn checkstyle:check before submitting patches since it'll catch CS violations in modules which are perfectly clean. ** Add checkstyle:check as part of main pre-commit build. If there is any violation in these clean modules (towards which you have put great effort), then the pre-commit will fail also for the mvn install step, which is an important one. Thus, clean CK in these modules become hard pre-commit requirement indirectly. Let's put a note on dev@ proposing these changes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19818) Scan time limit not work if the filter always filter row key
[ https://issues.apache.org/jira/browse/HBASE-19818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-19818: --- Environment: (was: [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java] nextInternal() method. {code} // Check if rowkey filter wants to exclude this row. If so, loop to next. // Technically, if we hit limits before on this row, we don't need this call. if (filterRowKey(current)) { incrementCountOfRowsFilteredMetric(scannerContext); // early check, see HBASE-16296 if (isFilterDoneInternal()) { return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); } // Typically the count of rows scanned is incremented inside #populateResult. However, // here we are filtering a row based purely on its row key, preventing us from calling // #populateResult. Thus, perform the necessary increment here to rows scanned metric incrementCountOfRowsScannedMetric(scannerContext); boolean moreRows = nextRow(scannerContext, current); if (!moreRows) { return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); } results.clear(); continue; } // Ok, we are good, let's try to get some results from the main heap. populateResult(results, this.storeHeap, scannerContext, current); if (scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)) { if (hasFilterRow) { throw new IncompatibleFilterException( "Filter whose hasFilterRow() returns true is incompatible with scans that must " + " stop mid-row because of a limit. ScannerContext:" + scannerContext); } return true; } {code} If filterRowKey always return ture, then it skip to checkAnyLimitReached. For batch/size limit, it is ok to skip as we don't read anything. But for time limit, it is not right. If the filter always filter row key, we will stuck here for a long time.) > Scan time limit not work if the filter always filter row key > > > Key: HBASE-19818 > URL: https://issues.apache.org/jira/browse/HBASE-19818 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19818) Scan time limit not work if the filter always filter row key
[ https://issues.apache.org/jira/browse/HBASE-19818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-19818: --- Description: [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java] nextInternal() method. {code:java} // Check if rowkey filter wants to exclude this row. If so, loop to next. // Technically, if we hit limits before on this row, we don't need this call. if (filterRowKey(current)) { incrementCountOfRowsFilteredMetric(scannerContext); // early check, see HBASE-16296 if (isFilterDoneInternal()) { return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); } // Typically the count of rows scanned is incremented inside #populateResult. However, // here we are filtering a row based purely on its row key, preventing us from calling // #populateResult. Thus, perform the necessary increment here to rows scanned metric incrementCountOfRowsScannedMetric(scannerContext); boolean moreRows = nextRow(scannerContext, current); if (!moreRows) { return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); } results.clear(); continue; } // Ok, we are good, let's try to get some results from the main heap. populateResult(results, this.storeHeap, scannerContext, current); if (scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)) { if (hasFilterRow) { throw new IncompatibleFilterException( "Filter whose hasFilterRow() returns true is incompatible with scans that must " + " stop mid-row because of a limit. ScannerContext:" + scannerContext); } return true; } {code} If filterRowKey always return ture, then it skip to checkAnyLimitReached. For batch/size limit, it is ok to skip as we don't read anything. But for time limit, it is not right. If the filter always filter row key, we will stuck here for a long time. > Scan time limit not work if the filter always filter row key > > > Key: HBASE-19818 > URL: https://issues.apache.org/jira/browse/HBASE-19818 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java] > nextInternal() method. > {code:java} > // Check if rowkey filter wants to exclude this row. If so, loop to next. > // Technically, if we hit limits before on this row, we don't need this call. > if (filterRowKey(current)) { > incrementCountOfRowsFilteredMetric(scannerContext); > // early check, see HBASE-16296 > if (isFilterDoneInternal()) { > return > scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); > } > // Typically the count of rows scanned is incremented inside > #populateResult. However, > // here we are filtering a row based purely on its row key, preventing us > from calling > // #populateResult. Thus, perform the necessary increment here to rows > scanned metric > incrementCountOfRowsScannedMetric(scannerContext); > boolean moreRows = nextRow(scannerContext, current); > if (!moreRows) { > return > scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); > } > results.clear(); > continue; > } > // Ok, we are good, let's try to get some results from the main heap. > populateResult(results, this.storeHeap, scannerContext, current); > if (scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)) { > if (hasFilterRow) { > throw new IncompatibleFilterException( > "Filter whose hasFilterRow() returns true is incompatible with scans that > must " > + " stop mid-row because of a limit. ScannerContext:" + scannerContext); > } > return true; > } > {code} > If filterRowKey always return ture, then it skip to checkAnyLimitReached. For > batch/size limit, it is ok to skip as we don't read anything. But for time > limit, it is not right. If the filter always filter row key, we will stuck > here for a long time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19818) Scan time limit not work if the filter always filter row key
Guanghao Zhang created HBASE-19818: -- Summary: Scan time limit not work if the filter always filter row key Key: HBASE-19818 URL: https://issues.apache.org/jira/browse/HBASE-19818 Project: HBase Issue Type: Bug Environment: [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java] nextInternal() method. {code} // Check if rowkey filter wants to exclude this row. If so, loop to next. // Technically, if we hit limits before on this row, we don't need this call. if (filterRowKey(current)) { incrementCountOfRowsFilteredMetric(scannerContext); // early check, see HBASE-16296 if (isFilterDoneInternal()) { return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); } // Typically the count of rows scanned is incremented inside #populateResult. However, // here we are filtering a row based purely on its row key, preventing us from calling // #populateResult. Thus, perform the necessary increment here to rows scanned metric incrementCountOfRowsScannedMetric(scannerContext); boolean moreRows = nextRow(scannerContext, current); if (!moreRows) { return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); } results.clear(); continue; } // Ok, we are good, let's try to get some results from the main heap. populateResult(results, this.storeHeap, scannerContext, current); if (scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)) { if (hasFilterRow) { throw new IncompatibleFilterException( "Filter whose hasFilterRow() returns true is incompatible with scans that must " + " stop mid-row because of a limit. ScannerContext:" + scannerContext); } return true; } {code} If filterRowKey always return ture, then it skip to checkAnyLimitReached. For batch/size limit, it is ok to skip as we don't read anything. But for time limit, it is not right. If the filter always filter row key, we will stuck here for a long time. Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19781) Add a new cluster state flag for synchronous replication
[ https://issues.apache.org/jira/browse/HBASE-19781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329871#comment-16329871 ] Guanghao Zhang commented on HBASE-19781: Reattach 003 for hadoop QA. > Add a new cluster state flag for synchronous replication > > > Key: HBASE-19781 > URL: https://issues.apache.org/jira/browse/HBASE-19781 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: HBASE-19064 > > Attachments: HBASE-19781.HBASE-19064.001.patch, > HBASE-19781.HBASE-19064.002.patch, HBASE-19781.HBASE-19064.003.patch, > HBASE-19781.HBASE-19064.003.patch > > > The state may be S, DA, or A. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19781) Add a new cluster state flag for synchronous replication
[ https://issues.apache.org/jira/browse/HBASE-19781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-19781: --- Attachment: HBASE-19781.HBASE-19064.003.patch > Add a new cluster state flag for synchronous replication > > > Key: HBASE-19781 > URL: https://issues.apache.org/jira/browse/HBASE-19781 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: HBASE-19064 > > Attachments: HBASE-19781.HBASE-19064.001.patch, > HBASE-19781.HBASE-19064.002.patch, HBASE-19781.HBASE-19064.003.patch, > HBASE-19781.HBASE-19064.003.patch > > > The state may be S, DA, or A. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329863#comment-16329863 ] Appy commented on HBASE-19527: -- We can't make production use case worse (abrupt endings on shutdown and unnecessary recoveries on restart) for the sake of extra testing. Former is far more important than latter. bq. The Master or RegionServer threads determine whether we should go down or not. If they are stopped or aborted, then all else should go down. Lets not be having to do a decision-per-thread on when to go down (this gets really hard to do... sometimes its exit if process is stopped, other times it is if cluster is up or down, and other combos...). I agree it's hard to do for everything, but worth doing (in this case, keeping what is) for few critical systems. ProcExecutor is critical and will only become more important with time as more sub-systems (replication, backup, etc) move to it. I don't mind the change in ExecutorService (mostly because i don't enough to make a case for it, nor have time to dig.) Among other thread pools, RPC executors for user requests are probably even lesser important and can go down randomly (not relevant here, but trying to think holistically to bring good points to table) bq. If a worker thread is doing something that it can't give up, that we cannot recover from, thats a problem; lets find it sooner rather than later given threads can exit any which way at any time. True we can use more fault tolerance testing. But the answer to that should be adding more fault tolerant testing, rather than making the system indeterministic. bq. Finding all the combinations, the code paths that lead to an exit, and exits concurrent with various combinations of operations, would be too much work; we'd never achieve complete coverage – I suggest. Yeah, we can never do that. If we could, we won't need the "guard". bq. Suggest we try this and the watch the flakies a while... Can revert if a bad idea. The alternate is having extra complexity for cleaner shutdown and restart. How will changes in flakies justify for/against that? Btw, all our ProcFramework's fault tolerance testing is doing join() on these threads, so making them daemon doesn't make those tests any better. > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329850#comment-16329850 ] Hadoop QA commented on HBASE-19527: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 39s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 46s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} The patch hbase-procedure passed checkstyle {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} hbase-server: The patch generated 0 new + 21 unchanged - 1 fixed = 21 total (was 22) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 25s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 17m 6s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 11s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m 1s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}137m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db | | JIRA Issue | HBASE-19527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12906490/HBASE-19527.branch-2.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 46524693c5e5 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2 / af2d890055 | | maven
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329832#comment-16329832 ] Duo Zhang commented on HBASE-19803: --- It seems a surefire issue. I run mvn test locally in hbase-server module, and TestJMXConnectorServer fails, this is a known issue, and then comes lots of crashes. This is one of the failed UT {noformat} Error occurred in starting fork, check output in log Process Exit Code: 1 Crashed tests: org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:496) at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:443) at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:295) at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1124) at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:954) ... 23 more Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Command was /bin/sh -c cd /home/zhangduo/hbase/code/hbase-server && /home/zhangduo/opt/jdk1.8.0_151/jre/bin/java -enableassertions -Dhbase.build.id=2018-01-17T22:44:23Z -Xmx2800m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -jar /home/zhangduo/hbase/code/hbase-server/target/surefire/surefirebooter3125641250160453662.jar /home/zhangduo/hbase/code/hbase-server/target/surefire 2018-01-18T06-44-36_642-jvmRun2 surefire7506668156192398602tmp surefire_14263036952065448117423tmp {noformat} And I checked org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions-output.txt, the only place where we call System.exit is {noformat} org.apache.hadoop.hbase.HConstants$ExitException: There is no escape! at org.apache.hadoop.hbase.HConstants$NoExitSecurityManager.checkExit(HConstants.java:63) at java.lang.Runtime.halt(Runtime.java:273) at org.apache.maven.surefire.booter.ForkedBooter.kill(ForkedBooter.java:300) at org.apache.maven.surefire.booter.ForkedBooter.kill(ForkedBooter.java:294) at org.apache.maven.surefire.booter.ForkedBooter.access$300(ForkedBooter.java:68) at org.apache.maven.surefire.booter.ForkedBooter$4.update(ForkedBooter.java:247) at org.apache.maven.surefire.booter.CommandReader$CommandRunnable.insertToListeners(CommandReader.java:475) at org.apache.maven.surefire.booter.CommandReader$CommandRunnable.run(CommandReader.java:421) at java.lang.Thread.run(Thread.java:748) {noformat} Notice here I only log the exception without throwing it out if it is called from the surefire plugin. So it is killed by the surefire plugin? And then surefire plugin tells us the VM exited abnormally... > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Priority: Major > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v6.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v7.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Status: Patch Available (was: Open) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v8.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v9.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Status: Open (was: Patch Available) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v3.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, HBASE-17852-v6.patch, > HBASE-17852-v7.patch, HBASE-17852-v8.patch, HBASE-17852-v9.patch, > screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v2.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, HBASE-17852-v6.patch, > HBASE-17852-v7.patch, HBASE-17852-v8.patch, HBASE-17852-v9.patch, > screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v5.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, HBASE-17852-v6.patch, > HBASE-17852-v7.patch, HBASE-17852-v8.patch, HBASE-17852-v9.patch, > screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v4.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, HBASE-17852-v6.patch, > HBASE-17852-v7.patch, HBASE-17852-v8.patch, HBASE-17852-v9.patch, > screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: (was: HBASE-17852-v1.patch) > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v10.patch, HBASE-17852-v6.patch, > HBASE-17852-v7.patch, HBASE-17852-v8.patch, HBASE-17852-v9.patch, > screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329767#comment-16329767 ] Hadoop QA commented on HBASE-17852: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HBASE-17852 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.6.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-17852 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12887458/HBASE-17852-v1.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11096/console | | Powered by | Apache Yetus 0.6.0 http://yetus.apache.org | This message was automatically generated. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v10.patch, > HBASE-17852-v2.patch, HBASE-17852-v3.patch, HBASE-17852-v4.patch, > HBASE-17852-v5.patch, HBASE-17852-v6.patch, HBASE-17852-v7.patch, > HBASE-17852-v8.patch, HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19806) Lower max versions for selected system table column family
[ https://issues.apache.org/jira/browse/HBASE-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329762#comment-16329762 ] Hadoop QA commented on HBASE-19806: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 4s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 1s{color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.6.0/precommit-patchnames for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 7s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 55s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 22m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 98m 32s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19806 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12906492/19806.v1.txt | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux f00881e6f059 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 53d0c2388d | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11093/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output |
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-17852: -- Attachment: HBASE-17852-v10.patch > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v10.patch, > HBASE-17852-v2.patch, HBASE-17852-v3.patch, HBASE-17852-v4.patch, > HBASE-17852-v5.patch, HBASE-17852-v6.patch, HBASE-17852-v7.patch, > HBASE-17852-v8.patch, HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329761#comment-16329761 ] stack commented on HBASE-19527: --- The Master or RegionServer threads determine whether we should go down or not. If they are stopped or aborted, then all else should go down. Lets not be having to do a decision-per-thread on when to go down (this gets really hard to do... sometimes its exit if process is stopped, other times it is if cluster is up or down, and other combos...). If a worker thread is doing something that it can't give up, that we cannot recover from, thats a problem; lets find it sooner rather than later given threads can exit any which way at any time. {quote}... it'll crash end many background work like create table, merge regions, and anything that we aim to build on top of proc framework - backup, replication, etc {quote} Yeah. We pick up the work again when the Master comes back up. {quote} - Pro: Reliably ending ongoing work at defined sync points{quote} Finding all the combinations, the code paths that lead to an exit, and exits concurrent with various combinations of operations, would be too much work; we'd never achieve complete coverage – I suggest. {quote}Start a ShutdownMonitor thread in HMaster.stop() (which should be Daemon thread) and if it finds itself running for more than X seconds, then call System.exit() (with a nice msg on why such abruptness of course). {quote} Extra complexity in my view. We have shutdown handlers and too many threads already. Suggest we try this and the watch the flakies a while... Can revert if a bad idea. > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush
[ https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329755#comment-16329755 ] Duo Zhang commented on HBASE-19812: --- It is the semantic of the region.flush call sir. In the past we define this as flush to disk, but now the flush to disk can be canceled by a running in memory compaction, this may cause lots of UTs to be flakey, and even in the future. Do we have a way to force a flush to disk operation? > TestFlushSnapshotFromClient fails because of failing region.flush > - > > Key: HBASE-19812 > URL: https://issues.apache.org/jira/browse/HBASE-19812 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Priority: Major > Attachments: 19812.patch, 19812.patch > > > {noformat} > 2018-01-17 06:43:48,390 INFO [MemStoreFlusher.1] regionserver.HRegion(2516): > Flushing 1/1 column families, memstore=549.25 KB > 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] > regionserver.CompactingMemStore(205): FLUSHING TO DISK: region > test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam > 2018-01-17 06:43:48,406 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312] > regionserver.CompactionPipeline(206): Compaction pipeline segment > Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, > totalHeapSize=1828120, min timestamp=1516171428258, max > timestamp=1516171428258Num uniques -1; flattened > 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] > regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new > segement=null > 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): > NOT flushing memstore for region > test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, > writesEnabled=true > {noformat} > You can see that we start a background flush first, and then we decided to do > an in memory compaction, at the same time we call the region.flush from test, > and it find that the region is already flushing so it give up. > This test is a bit awkward that we create the table with 6 regions which > start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so > there is only one region has data. And in the above scenario the only one > region gives up flushing, then there is no data, and then our test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-19816: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.0.0-beta-2 1.5.0 1.4.1 Status: Resolved (was: Patch Available) Thanks for the patch, Scott. Please include JIRA number in the subject: {code} Subject: [PATCH] HBASE-19816 Refresh repliation sinks on UnknownHostException {code} I added it before integration. > Replication sink list is not updated on UnknownHostException > > > Key: HBASE-19816 > URL: https://issues.apache.org/jira/browse/HBASE-19816 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.2.0 > Environment: We have two clusters set up with bi-directional > replication. The clusters are around 400 nodes each and hosted in AWS. >Reporter: Scott Wilson >Assignee: Scott Wilson >Priority: Major > Fix For: 1.4.1, 1.5.0, 2.0.0-beta-2 > > Attachments: HBASE-19816.master.001.patch > > > We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" > cluster and taking all live traffic which is replicated to cluster 2. We > decommissioned several instances in cluster 2 which involves deleting the > instance and its DNS record. After this happened most of the regions servers > in cluster 1 showed this message in their logs repeatedly. > > {code} > 2018-01-12 23:49:36,507 WARN > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint: > Can't replicate because of a local or network error: > java.net.UnknownHostException: data-017b.hbase-2.prod > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > The host data-017b.hbase-2.prod was one of those that had been removed from > cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 > was elevated. Some region servers reported ageOfLastShippedOperation to be > close to an hour. > The only way we found to clear the message was to restart the region servers > that showed this message in the log. Once we did replication returned to > normal. Restarting the affected region servers in cluster 1 took several days > because we could not bring the cluster down. > From reading the code it appears the cause was the zookeeper watch not being > triggered for the region server list change in cluster 2. We verified the > list in zookeeper for cluster 2 was correct and did not include the removed > nodes. > One concrete improvement to make would be to force a refresh of the sink > cluster region server list when an {{UnknownHostException}} is found. This is > already done if the there is a {{ConnectException}} in > {{HBaseInterClusterReplicationEndpoint.java}} > {code:java} > } else if (ioe instanceof ConnectException) { > LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe); > replicationSinkMgr.chooseSinks(); > {code} > I propose that should be extended to cover {{UnknownHostException}}. > We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code > still exists on the current master branch. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18963) Remove MultiRowMutationProcessor and implement mutateRows... methods using batchMutate()
[ https://issues.apache.org/jira/browse/HBASE-18963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329750#comment-16329750 ] Umesh Agashe commented on HBASE-18963: -- [~stack], can you review the addendum patch? > Remove MultiRowMutationProcessor and implement mutateRows... methods using > batchMutate() > > > Key: HBASE-18963 > URL: https://issues.apache.org/jira/browse/HBASE-18963 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-1 > > Attachments: hbase-18963.master.001.patch, > hbase-18963.master.002.patch, hbase-18963.master.003.patch, > hbase-18963.master.004.patch, hbase-18963.master.005.patch, > hbase-18963.master.005.patch, hbase-18963.master.addendum.patch > > > Remove MultiRowMutationProcessor and implement mutateRows... methods using > batchMutate() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18963) Remove MultiRowMutationProcessor and implement mutateRows... methods using batchMutate()
[ https://issues.apache.org/jira/browse/HBASE-18963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329748#comment-16329748 ] Umesh Agashe commented on HBASE-18963: -- Results seen in my dev setup, with changes: {code:java} Put request execution time - avg: 25823ns, min: 17899ns, max: 716803944ns Test exec time: 26439ms{code} Without (prior) changes: {code:java} Put request execution time - avg: 26270ns, min: 18497ns, max: 635278859ns Test exec time: 26918ms{code} > Remove MultiRowMutationProcessor and implement mutateRows... methods using > batchMutate() > > > Key: HBASE-18963 > URL: https://issues.apache.org/jira/browse/HBASE-18963 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-1 > > Attachments: hbase-18963.master.001.patch, > hbase-18963.master.002.patch, hbase-18963.master.003.patch, > hbase-18963.master.004.patch, hbase-18963.master.005.patch, > hbase-18963.master.005.patch, hbase-18963.master.addendum.patch > > > Remove MultiRowMutationProcessor and implement mutateRows... methods using > batchMutate() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-18963) Remove MultiRowMutationProcessor and implement mutateRows... methods using batchMutate()
[ https://issues.apache.org/jira/browse/HBASE-18963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Umesh Agashe updated HBASE-18963: - Attachment: hbase-18963.master.addendum.patch > Remove MultiRowMutationProcessor and implement mutateRows... methods using > batchMutate() > > > Key: HBASE-18963 > URL: https://issues.apache.org/jira/browse/HBASE-18963 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-1 > > Attachments: hbase-18963.master.001.patch, > hbase-18963.master.002.patch, hbase-18963.master.003.patch, > hbase-18963.master.004.patch, hbase-18963.master.005.patch, > hbase-18963.master.005.patch, hbase-18963.master.addendum.patch > > > Remove MultiRowMutationProcessor and implement mutateRows... methods using > batchMutate() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
[ https://issues.apache.org/jira/browse/HBASE-19817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Umesh Agashe resolved HBASE-19817. -- Resolution: Duplicate Changes can be done as an addendum to HBASE-18963. > Add a test to benchmark an impact of HBASE-18963 on write path > -- > > Key: HBASE-19817 > URL: https://issues.apache.org/jira/browse/HBASE-19817 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0-beta-2 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > > Add a test to benchmark impact of changes for HBASE-18963 on write path. > Running the test and comparing the results on the commit for HBASE-18963 and > the commit one prior to it is good starting point. Also it can be used to > measure write performance in general at region server level from Junit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19816) Replication sink list is not updated on UnknownHostException
[ https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329741#comment-16329741 ] Hadoop QA commented on HBASE-19816: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 42s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 54s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 26m 0s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 22s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}157m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.TestReplicationDroppedTables | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19816 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12906488/HBASE-19816.master.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 5b5b4faf01d9 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / c1a8dc09d6 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/11090/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11090/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output |
[jira] [Commented] (HBASE-18963) Remove MultiRowMutationProcessor and implement mutateRows... methods using batchMutate()
[ https://issues.apache.org/jira/browse/HBASE-18963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329740#comment-16329740 ] Umesh Agashe commented on HBASE-18963: -- Add a test to benchmark impact of these changes on the write path. Running the test and comparing the results on the commit for these changes and the commit one prior to it will be helpful. Also it can be used to measure write performance in general at region server level from Junit. > Remove MultiRowMutationProcessor and implement mutateRows... methods using > batchMutate() > > > Key: HBASE-18963 > URL: https://issues.apache.org/jira/browse/HBASE-18963 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-1 > > Attachments: hbase-18963.master.001.patch, > hbase-18963.master.002.patch, hbase-18963.master.003.patch, > hbase-18963.master.004.patch, hbase-18963.master.005.patch, > hbase-18963.master.005.patch > > > Remove MultiRowMutationProcessor and implement mutateRows... methods using > batchMutate() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
[ https://issues.apache.org/jira/browse/HBASE-19817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329737#comment-16329737 ] Umesh Agashe commented on HBASE-19817: -- I think thats better idea [~stack], will update HBASE-18963 with a note and an addendum. > Add a test to benchmark an impact of HBASE-18963 on write path > -- > > Key: HBASE-19817 > URL: https://issues.apache.org/jira/browse/HBASE-19817 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0-beta-2 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > > Add a test to benchmark impact of changes for HBASE-18963 on write path. > Running the test and comparing the results on the commit for HBASE-18963 and > the commit one prior to it is good starting point. Also it can be used to > measure write performance in general at region server level from Junit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329734#comment-16329734 ] Appy commented on HBASE-19527: -- Oh, and that thread should also do thread dump before quitting. My overall motivation is - Build well-defined and deterministic system, and put guards in place if things go sideways. > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329729#comment-16329729 ] Appy commented on HBASE-19527: -- Idk what's good - make them deamon thread or not. Been trying to reason in my head for quite a while now, but can't answer for sure. Just adding some thoughts here. WorkerThread class in ProcExecutor can stop itself if it's idle and the server is going down. But what about the threads doing actual work? Making daemon thread - Pro: master shutdown will not get stuck in any case - Con: Our 'clean' shutdown isn't exactly clean, it'll crash end many background work like create table, merge regions, and anything that we aim to build on top of proc framework - backup, replication, etc Not making threads daemon: - Pro: Reliably ending ongoing work at defined sync points - Cons: Master shutdown can get stuck I personally hate landing in a place where our 'clean' shutdown will actually be like a crash. One alternative I have is: Don't make them daemon threads. Start a ShutdownMonitor thread in HMaster.stop() (which should be Daemon thread) and if it finds itself running for more than X seconds, then call System.exit() (with a nice msg on why such abruptness of course). > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush
[ https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329722#comment-16329722 ] Hadoop QA commented on HBASE-19812: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 9s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 4s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 7s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 33s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}130m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19812 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12906491/19812.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 080fa0897da0 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / c1a8dc09d6 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11091/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11091/console | | Powered by | Apache Yetus 0.6.0 http://yetus.apache.org | This message was automatically generated. > TestFlushSnapshotFromClient fails because of failing region.flush > - > > Key: HBASE-19812 > URL:
[jira] [Updated] (HBASE-19784) stop-hbase gives unfriendly message when local hbase isn't running
[ https://issues.apache.org/jira/browse/HBASE-19784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19784: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to branch-2 and master (trying to make the state of beta-2 look better). The warnings seem to be from elsewhere in the altered script. > stop-hbase gives unfriendly message when local hbase isn't running > -- > > Key: HBASE-19784 > URL: https://issues.apache.org/jira/browse/HBASE-19784 > Project: HBase > Issue Type: Bug > Components: scripts >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19784.patch > > > {noformat} > $ bin/stop-hbase.sh > stopping hbasecat: /tmp/hbase-mdrob-master.pid: No such file or directory > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-13300) Fix casing in getTimeStamp() and setTimestamp() for Mutations
[ https://issues.apache.org/jira/browse/HBASE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329706#comment-16329706 ] stack commented on HBASE-13300: --- Moved it out to 2.0.0 Pull in again after addressing the [~chia7712] comments above. Thanks. > Fix casing in getTimeStamp() and setTimestamp() for Mutations > - > > Key: HBASE-13300 > URL: https://issues.apache.org/jira/browse/HBASE-13300 > Project: HBase > Issue Type: Bug > Components: API >Affects Versions: 1.0.0 >Reporter: Lars George >Assignee: Jan Hentschel >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-13300.master.001.patch, > HBASE-13300.master.002.patch, HBASE-13300.master.003.patch, HBASE-13300.xlsx > > > For some reason we have two ways of writing this method. It should be > consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-13300) Fix casing in getTimeStamp() and setTimestamp() for Mutations
[ https://issues.apache.org/jira/browse/HBASE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13300: -- Fix Version/s: (was: 2.0.0-beta-2) 2.0.0 > Fix casing in getTimeStamp() and setTimestamp() for Mutations > - > > Key: HBASE-13300 > URL: https://issues.apache.org/jira/browse/HBASE-13300 > Project: HBase > Issue Type: Bug > Components: API >Affects Versions: 1.0.0 >Reporter: Lars George >Assignee: Jan Hentschel >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-13300.master.001.patch, > HBASE-13300.master.002.patch, HBASE-13300.master.003.patch, HBASE-13300.xlsx > > > For some reason we have two ways of writing this method. It should be > consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option
[ https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329705#comment-16329705 ] stack commented on HBASE-4224: -- These should be our shaded, internal versions? |import com.google.common.collect.Maps;| Fix this ... * @param name ame table or region to flush Is the region name the full-on region name or just encoded name? Should this be a method instead, an override of ServerName#valueOf? |91|public ServerName(final byte[] serverName) {| | | |92|this(Bytes.toString(serverName));| | | |93|}| Should ServerName be able to create invalid ServerNames? If not, then we'd not need isValid ? Otherwise patch LGTM > Need a flush by regionserver rather than by table option > > > Key: HBASE-4224 > URL: https://issues.apache.org/jira/browse/HBASE-4224 > Project: HBase > Issue Type: Bug > Components: shell >Reporter: stack >Assignee: Chia-Ping Tsai >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-4224.v0.patch, HBase-4224-v2.patch, > HBase-4224.patch > > > This evening needed to clean out logs on the cluster. logs are by > regionserver. to let go of logs, we need to have all edits emptied from > memory. only flush is by table or region. We need to be able to flush the > regionserver. Need to add this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19815) Flakey TestAssignmentManager.testAssignWithRandExec
[ https://issues.apache.org/jira/browse/HBASE-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-19815: --- I fixed classcastexception but test still fails. Here the test fails on the commit that includes the patch attached here! https://builds.apache.org/job/HBASE-Flaky-Tests-branch2.0/299/ > Flakey TestAssignmentManager.testAssignWithRandExec > --- > > Key: HBASE-19815 > URL: https://issues.apache.org/jira/browse/HBASE-19815 > Project: HBase > Issue Type: Bug > Components: flakey, test >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19815.branch-2.001.patch > > > Saw the below in flakies failures > https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html > Seems to be highest failing incidence in branch-2. > {code} > 2018-01-17 15:43:52,872 ERROR [ProcExecWrkr-12] > procedure2.ProcedureExecutor(1481): CODE-BUG: Uncaught runtime exception: > pid=5, ppid=4, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure > failedMetaServer=localhost,104,1, splitWal=false > java.lang.ClassCastException: > org.apache.hadoop.hbase.master.assignment.MockMasterServices cannot be cast > to org.apache.hadoop.hbase.master.HMaster > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.prepare(RecoverMetaProcedure.java:253) > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:96) > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:51) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:182) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19791) TestZKAsyncRegistry hangs
[ https://issues.apache.org/jira/browse/HBASE-19791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329673#comment-16329673 ] stack commented on HBASE-19791: --- The above integration message comes because I pushed the do-nothing attached patch. Reverted it from master. > TestZKAsyncRegistry hangs > - > > Key: HBASE-19791 > URL: https://issues.apache.org/jira/browse/HBASE-19791 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: 0001-HBASE-19791-do-nothing.patch, jstack, output > > > It hangs in TEST_UTIL.shutdownMiniCluster() for me locally. > Will upload the test output and jstack result for further digging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329667#comment-16329667 ] stack commented on HBASE-19527: --- 2.002 is retry. > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329666#comment-16329666 ] stack commented on HBASE-19527: --- I'd prematurely applied this to master and branch. Had to revert in both places. Thats why above application failed. > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19527: -- Attachment: HBASE-19527.branch-2.002.patch > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.branch-2.002.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19527: -- Attachment: HBASE-19527.master.002.patch > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.002.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush
[ https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329657#comment-16329657 ] stack commented on HBASE-19812: --- {quote}I think this could be a general reason that causes lots of flakey tests since we usually expected a file on disk when we call region.flush. So I think we should try to fix the general problem? {quote} OK. The flush just gets aborted though in usual case? Here though, the test fails in a flakey manner. Do you think other tests could fail because of this? It just failed again up on jenkins... Thanks. > TestFlushSnapshotFromClient fails because of failing region.flush > - > > Key: HBASE-19812 > URL: https://issues.apache.org/jira/browse/HBASE-19812 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Priority: Major > Attachments: 19812.patch, 19812.patch > > > {noformat} > 2018-01-17 06:43:48,390 INFO [MemStoreFlusher.1] regionserver.HRegion(2516): > Flushing 1/1 column families, memstore=549.25 KB > 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] > regionserver.CompactingMemStore(205): FLUSHING TO DISK: region > test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam > 2018-01-17 06:43:48,406 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312] > regionserver.CompactionPipeline(206): Compaction pipeline segment > Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, > totalHeapSize=1828120, min timestamp=1516171428258, max > timestamp=1516171428258Num uniques -1; flattened > 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] > regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new > segement=null > 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): > NOT flushing memstore for region > test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, > writesEnabled=true > {noformat} > You can see that we start a background flush first, and then we decided to do > an in memory compaction, at the same time we call the region.flush from test, > and it find that the region is already flushing so it give up. > This test is a bit awkward that we create the table with 6 regions which > start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so > there is only one region has data. And in the above scenario the only one > region gives up flushing, then there is no data, and then our test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18965) Create alternate API to processRowsWithLock() that doesn't take RowProcessor as an argument
[ https://issues.apache.org/jira/browse/HBASE-18965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329650#comment-16329650 ] stack commented on HBASE-18965: --- OK. Moving out to 2.0.0. Thanks [~uagashe] > Create alternate API to processRowsWithLock() that doesn't take RowProcessor > as an argument > --- > > Key: HBASE-18965 > URL: https://issues.apache.org/jira/browse/HBASE-18965 > Project: HBase > Issue Type: Improvement > Components: Coprocessors >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0 > > > Create alternate API to processRowsWithLock() that doesn't take RowProcessor > as an argument. Also write example showing how coprocessors and batchMutate() > can be used instead of RowProcessors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-18965) Create alternate API to processRowsWithLock() that doesn't take RowProcessor as an argument
[ https://issues.apache.org/jira/browse/HBASE-18965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-18965: -- Fix Version/s: (was: 2.0.0-beta-2) 2.0.0 > Create alternate API to processRowsWithLock() that doesn't take RowProcessor > as an argument > --- > > Key: HBASE-18965 > URL: https://issues.apache.org/jira/browse/HBASE-18965 > Project: HBase > Issue Type: Improvement > Components: Coprocessors >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0 > > > Create alternate API to processRowsWithLock() that doesn't take RowProcessor > as an argument. Also write example showing how coprocessors and batchMutate() > can be used instead of RowProcessors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329648#comment-16329648 ] Vladimir Rodionov edited comment on HBASE-17852 at 1/17/18 10:50 PM: - {quote}For now, what do you think are the biggest blockers for making procv2 + backup happen [~vrodionov]? {quote} If we could do procv2 implementation w/o getting into server <- backup dependency, then no blockers. But this won't be possible, for sure: {quote} - Can't use CP hooks for incremental backup. Backup should/will become first class feature - more important and critical than Coprocessor. {quote} was (Author: vrodionov): {quote} For now, what do you think are the biggest blockers for making procv2 + backup happen [~vrodionov]? {quote} If we could do procv2 implementation w/o getting into server <- backup dependency, then no blockers. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19794) TestZooKeeper hangs
[ https://issues.apache.org/jira/browse/HBASE-19794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329649#comment-16329649 ] stack commented on HBASE-19794: --- {quote}but I do not think set the thread daemon can solve all the problems... {quote} Agreed. We have a bunch of shutdown issues at mo. Making daemon seems to solve at least TestRegionsOnServer. Will see what is left over. I wish I could get this to fail locally (smile). > TestZooKeeper hangs > --- > > Key: HBASE-19794 > URL: https://issues.apache.org/jira/browse/HBASE-19794 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Seems like the TestZKAsyncRegistry that hangs in shutdown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329648#comment-16329648 ] Vladimir Rodionov commented on HBASE-17852: --- {quote} For now, what do you think are the biggest blockers for making procv2 + backup happen [~vrodionov]? {quote} If we could do procv2 implementation w/o getting into server <- backup dependency, then no blockers. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
[ https://issues.apache.org/jira/browse/HBASE-19817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329645#comment-16329645 ] stack commented on HBASE-19817: --- I was thinking you'd just add note to HBASE-18963 attaching any perf checking code/harness rather than file new issue. Thanks [~uagashe] > Add a test to benchmark an impact of HBASE-18963 on write path > -- > > Key: HBASE-19817 > URL: https://issues.apache.org/jira/browse/HBASE-19817 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0-beta-2 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > > Add a test to benchmark impact of changes for HBASE-18963 on write path. > Running the test and comparing the results on the commit for HBASE-18963 and > the commit one prior to it is good starting point. Also it can be used to > measure write performance in general at region server level from Junit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
[ https://issues.apache.org/jira/browse/HBASE-19817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19817: -- Fix Version/s: (was: 2.0.0-beta-2) > Add a test to benchmark an impact of HBASE-18963 on write path > -- > > Key: HBASE-19817 > URL: https://issues.apache.org/jira/browse/HBASE-19817 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0-beta-2 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > > Add a test to benchmark impact of changes for HBASE-18963 on write path. > Running the test and comparing the results on the commit for HBASE-18963 and > the commit one prior to it is good starting point. Also it can be used to > measure write performance in general at region server level from Junit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329642#comment-16329642 ] Vladimir Rodionov commented on HBASE-17852: --- I will rebase patch to the current master. The majority of this code (but not all) went into master in HBASE-19568 btw. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329638#comment-16329638 ] Duo Zhang commented on HBASE-19803: --- If I do not throw ExitException for ForkedBooter.kill then the test run will crash... Let me add more logs and try again. > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Priority: Major > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19810) Fix findbugs and error-prone warnings in hbase-metrics (branch-2)
[ https://issues.apache.org/jira/browse/HBASE-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329633#comment-16329633 ] Hudson commented on HBASE-19810: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4420 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4420/]) HBASE-19810 Fix findbugs and error-prone warnings in hbase-metrics (stack: rev c1a8dc09d64ef3f6062aced595c3bb918724025d) * (edit) hbase-metrics/src/main/java/org/apache/hadoop/hbase/metrics/impl/MetricRegistriesImpl.java * (edit) hbase-metrics/src/main/java/org/apache/hadoop/hbase/metrics/impl/HistogramImpl.java > Fix findbugs and error-prone warnings in hbase-metrics (branch-2) > - > > Key: HBASE-19810 > URL: https://issues.apache.org/jira/browse/HBASE-19810 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19810.master.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
[ https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329632#comment-16329632 ] Hudson commented on HBASE-19809: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4420 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4420/]) HBASE-19809 Fix findbugs and error-prone warnings in hbase-procedure (stack: rev c269e63a073606db57e65b06f3df12b91373) * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureEvents.java * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestYieldProcedures.java * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/store/wal/ProcedureWALPerformanceEvaluation.java * (edit) hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/ProcedureWALPrettyPrinter.java * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/store/wal/ProcedureWALLoaderPerformanceEvaluation.java * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureReplayOrder.java * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestStateMachineProcedure.java * (edit) hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/util/ByteSlot.java * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureExecutor.java * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureInMemoryChore.java * (edit) hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java > Fix findbugs and error-prone warnings in hbase-procedure (branch-2) > --- > > Key: HBASE-19809 > URL: https://issues.apache.org/jira/browse/HBASE-19809 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19809.master.001.patch, > HBASE-19809.master.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19770) Add '--return-values' option to Shell to print return values of commands in interactive mode
[ https://issues.apache.org/jira/browse/HBASE-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329631#comment-16329631 ] Hudson commented on HBASE-19770: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4420 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4420/]) HBASE-19770 Separate command return values from interactive shells (elserj: rev 7224546b1ed99faf41ecef82043971c7d44a5836) * (edit) hbase-shell/src/main/ruby/shell.rb * (edit) bin/hirb.rb * (edit) hbase-shell/src/test/ruby/test_helper.rb * (edit) hbase-shell/src/test/ruby/shell/noninteractive_test.rb > Add '--return-values' option to Shell to print return values of commands in > interactive mode > > > Key: HBASE-19770 > URL: https://issues.apache.org/jira/browse/HBASE-19770 > Project: HBase > Issue Type: Bug > Components: shell >Reporter: Romil Choksi >Assignee: Josh Elser >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19770.001.branch-2.patch, > HBASE-19770.002.branch-2.patch, HBASE-19770.003.branch-2.patch, > HBASE-19770.004.branch-2.patch > > > Another good find by our Romil. > {code} > hbase(main):001:0> list > TABLE > a > 1 row(s) > Took 0.8385 seconds > hbase(main):002:0> tables=list > TABLE > a > 1 row(s) > Took 0.0267 seconds > hbase(main):003:0> puts tables > hbase(main):004:0> p tables > nil > {code} > The {{list}} command should be returning {{\['a'\]}} but is not. > The command class itself appears to be doing the right thing -- maybe the > retval is getting lost somewhere else? > FYI [~stack]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
[ https://issues.apache.org/jira/browse/HBASE-19817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329628#comment-16329628 ] Umesh Agashe commented on HBASE-19817: -- Consider using jmh or StopWatch. > Add a test to benchmark an impact of HBASE-18963 on write path > -- > > Key: HBASE-19817 > URL: https://issues.apache.org/jira/browse/HBASE-19817 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0-beta-2 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > > Add a test to benchmark impact of changes for HBASE-18963 on write path. > Running the test and comparing the results on the commit for HBASE-18963 and > the commit one prior to it is good starting point. Also it can be used to > measure write performance in general at region server level from Junit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
[ https://issues.apache.org/jira/browse/HBASE-19817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329628#comment-16329628 ] Umesh Agashe edited comment on HBASE-19817 at 1/17/18 10:38 PM: Suggestion from [~stack]: Consider using jmh or StopWatch. was (Author: uagashe): Consider using jmh or StopWatch. > Add a test to benchmark an impact of HBASE-18963 on write path > -- > > Key: HBASE-19817 > URL: https://issues.apache.org/jira/browse/HBASE-19817 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0-beta-2 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > > Add a test to benchmark impact of changes for HBASE-18963 on write path. > Running the test and comparing the results on the commit for HBASE-18963 and > the commit one prior to it is good starting point. Also it can be used to > measure write performance in general at region server level from Junit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
[ https://issues.apache.org/jira/browse/HBASE-19817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Umesh Agashe updated HBASE-19817: - Environment: (was: Add a test to benchmark impact of changes for HBASE-18963 on write path. Running the test and comparing the results on the commit for HBASE-18963 and the commit one prior to it is good starting point. Also it can be used to measure write performance in general at region server level from Junit.) > Add a test to benchmark an impact of HBASE-18963 on write path > -- > > Key: HBASE-19817 > URL: https://issues.apache.org/jira/browse/HBASE-19817 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0-beta-2 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
Umesh Agashe created HBASE-19817: Summary: Add a test to benchmark an impact of HBASE-18963 on write path Key: HBASE-19817 URL: https://issues.apache.org/jira/browse/HBASE-19817 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0-beta-2 Environment: Add a test to benchmark impact of changes for HBASE-18963 on write path. Running the test and comparing the results on the commit for HBASE-18963 and the commit one prior to it is good starting point. Also it can be used to measure write performance in general at region server level from Junit. Reporter: Umesh Agashe Assignee: Umesh Agashe Fix For: 2.0.0-beta-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19817) Add a test to benchmark an impact of HBASE-18963 on write path
[ https://issues.apache.org/jira/browse/HBASE-19817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Umesh Agashe updated HBASE-19817: - Description: Add a test to benchmark impact of changes for HBASE-18963 on write path. Running the test and comparing the results on the commit for HBASE-18963 and the commit one prior to it is good starting point. Also it can be used to measure write performance in general at region server level from Junit. > Add a test to benchmark an impact of HBASE-18963 on write path > -- > > Key: HBASE-19817 > URL: https://issues.apache.org/jira/browse/HBASE-19817 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0-beta-2 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > > Add a test to benchmark impact of changes for HBASE-18963 on write path. > Running the test and comparing the results on the commit for HBASE-18963 and > the commit one prior to it is good starting point. Also it can be used to > measure write performance in general at region server level from Junit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19794) TestZooKeeper hangs
[ https://issues.apache.org/jira/browse/HBASE-19794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329619#comment-16329619 ] Duo Zhang commented on HBASE-19794: --- +1 on making it daemon, but I do not think set the thread daemon can solve all the problems. I saw it blocked in shutdownMiniCluster, not in the process exit... > TestZooKeeper hangs > --- > > Key: HBASE-19794 > URL: https://issues.apache.org/jira/browse/HBASE-19794 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Seems like the TestZKAsyncRegistry that hangs in shutdown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19005) Mutation batch should not accept operations with different durabilities
[ https://issues.apache.org/jira/browse/HBASE-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329607#comment-16329607 ] Umesh Agashe commented on HBASE-19005: -- [~stack], As [~chia7712] and [~anoop.hbase] has suggested this may result into large change. As BC and possibly replication uses this code path fixing this will involve changes in those features as well. Thanks for moving it out of beta-2. > Mutation batch should not accept operations with different durabilities > --- > > Key: HBASE-19005 > URL: https://issues.apache.org/jira/browse/HBASE-19005 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0 > > > Javadoc and change client side API to not accept operations with different > durabilities in a mutation batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18965) Create alternate API to processRowsWithLock() that doesn't take RowProcessor as an argument
[ https://issues.apache.org/jira/browse/HBASE-18965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329603#comment-16329603 ] Umesh Agashe commented on HBASE-18965: -- Sounds good [~stack]! Considering we would like to get this in 2.0 but beta-1 is feature freeze we can move it out of beta-2. Old implementation is left as is but deprecated. Alternative API with examples can be added later. The release note for HBASE-18964 already suggests to use coprocessors instead of row processor. > Create alternate API to processRowsWithLock() that doesn't take RowProcessor > as an argument > --- > > Key: HBASE-18965 > URL: https://issues.apache.org/jira/browse/HBASE-18965 > Project: HBase > Issue Type: Improvement > Components: Coprocessors >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > > Create alternate API to processRowsWithLock() that doesn't take RowProcessor > as an argument. Also write example showing how coprocessors and batchMutate() > can be used instead of RowProcessors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329597#comment-16329597 ] Hadoop QA commented on HBASE-17852: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HBASE-17852 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.6.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-17852 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12887458/HBASE-17852-v1.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11094/console | | Powered by | Apache Yetus 0.6.0 http://yetus.apache.org | This message was automatically generated. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329600#comment-16329600 ] Duo Zhang commented on HBASE-19803: --- This is only a temporary approach to find out who calls System.exit in test, and then we could find the solution. > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Priority: Major > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329593#comment-16329593 ] Appy commented on HBASE-19803: -- I was thinking of going the way where we would replace all System.exit(X) calls with a util function which would additionally dump stack trace at LOG.debug (testing level) before calling System.exit itself. Adding SecurityManager globally for everything seems like a strong and major change to jvm environment (even if just for tests). > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Priority: Major > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19806) Lower max versions for selected system table column family
[ https://issues.apache.org/jira/browse/HBASE-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-19806: --- Description: On an hbase 2 cluster, I got the description of hbase:meta table: {code} {NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '81 92'} ... {NAME => 'table', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0' , REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => ' 8192'} {code} You can see that 'table' family has MAX VERSIONS much higher than the other families. The MAX VERSIONS value should be brought in sync with the other families. was: On an hbase 2 cluster, I got the description of hbase:meta table: {code} {NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '81 92'} ... {NAME => 'table', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0' , REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => ' 8192'} {code} You can see that 'table' family has MAX VERSIONS much higher than the other families. The MAX VERSIONS value should be brought in sync with the other families. For namespace table: {code} {NAME => 'info', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '81 92'} {code} Having MAX VERSIONS of 3 should be enough. > Lower max versions for selected system table column family > -- > > Key: HBASE-19806 > URL: https://issues.apache.org/jira/browse/HBASE-19806 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: 19806.v1.txt > > > On an hbase 2 cluster, I got the description of hbase:meta table: > {code} > {NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > ... > {NAME => 'table', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0' > , REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => ' > 8192'} > {code} > You can see that 'table' family has MAX VERSIONS much higher than the other > families. > The MAX VERSIONS value should be brought in sync with the other families. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329577#comment-16329577 ] Appy edited comment on HBASE-17852 at 1/17/18 10:13 PM: I said hbase-backup --> hbase-server above because backup needs snapshot. Our dependencies are in a state of orgy right now, otherwise following would have been perfect shape to be in. !screenshot-1.png|width=800! That said, we should still be able to do procv2+backup without having to refactor other modules out of hbase-server. was (Author: appy): I said hbase-backup --> hbase-server above because backup needs snapshot. Our dependencies are in a state of orgy right now, otherwise following would have been perfect shape to be in. !screenshot-1.png|width=800px! That said, we should still be able to do procv2+backup without all the other refactoring. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329577#comment-16329577 ] Appy commented on HBASE-17852: -- I said hbase-backup --> hbase-server above because backup needs snapshot. Our dependencies are in a state of orgy right now, otherwise following would have been perfect shape to be in. !screenshot-1.png! That said, we should still be able to do procv2+backup without all the other refactoring. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329577#comment-16329577 ] Appy edited comment on HBASE-17852 at 1/17/18 10:12 PM: I said hbase-backup --> hbase-server above because backup needs snapshot. Our dependencies are in a state of orgy right now, otherwise following would have been perfect shape to be in. !screenshot-1.png|width=800px! That said, we should still be able to do procv2+backup without all the other refactoring. was (Author: appy): I said hbase-backup --> hbase-server above because backup needs snapshot. Our dependencies are in a state of orgy right now, otherwise following would have been perfect shape to be in. !screenshot-1.png! That said, we should still be able to do procv2+backup without all the other refactoring. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush
[ https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329576#comment-16329576 ] Duo Zhang commented on HBASE-19812: --- I think this could be a general reason that causes lots of flakey tests since we usually expected a file on disk when we call region.flush. So I think we should try to fix the general problem? > TestFlushSnapshotFromClient fails because of failing region.flush > - > > Key: HBASE-19812 > URL: https://issues.apache.org/jira/browse/HBASE-19812 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Priority: Major > Attachments: 19812.patch, 19812.patch > > > {noformat} > 2018-01-17 06:43:48,390 INFO [MemStoreFlusher.1] regionserver.HRegion(2516): > Flushing 1/1 column families, memstore=549.25 KB > 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] > regionserver.CompactingMemStore(205): FLUSHING TO DISK: region > test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam > 2018-01-17 06:43:48,406 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312] > regionserver.CompactionPipeline(206): Compaction pipeline segment > Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, > totalHeapSize=1828120, min timestamp=1516171428258, max > timestamp=1516171428258Num uniques -1; flattened > 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] > regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new > segement=null > 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): > NOT flushing memstore for region > test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, > writesEnabled=true > {noformat} > You can see that we start a background flush first, and then we decided to do > an in memory compaction, at the same time we call the region.flush from test, > and it find that the region is already flushing so it give up. > This test is a bit awkward that we create the table with 6 regions which > start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so > there is only one region has data. And in the above scenario the only one > region gives up flushing, then there is no data, and then our test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-17852: - Attachment: screenshot-1.png > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch, screenshot-1.png > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18965) Create alternate API to processRowsWithLock() that doesn't take RowProcessor as an argument
[ https://issues.apache.org/jira/browse/HBASE-18965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329562#comment-16329562 ] stack commented on HBASE-18965: --- Whats the story here [~uagashe]? You going to leave old implementation in place for 2.0.0 and not do this? If so, move out of beta-2? Move to 2.0.0 (general bucket for stuff that we'd like in 2.0.0 but won't do likely). > Create alternate API to processRowsWithLock() that doesn't take RowProcessor > as an argument > --- > > Key: HBASE-18965 > URL: https://issues.apache.org/jira/browse/HBASE-18965 > Project: HBase > Issue Type: Improvement > Components: Coprocessors >Affects Versions: 2.0.0-alpha-3 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > > Create alternate API to processRowsWithLock() that doesn't take RowProcessor > as an argument. Also write example showing how coprocessors and batchMutate() > can be used instead of RowProcessors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19093) Check Admin/Table to ensure all operations go via AccessControl
[ https://issues.apache.org/jira/browse/HBASE-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329554#comment-16329554 ] stack commented on HBASE-19093: --- Hows this one doing [~balazs.meszaros] ? > Check Admin/Table to ensure all operations go via AccessControl > --- > > Key: HBASE-19093 > URL: https://issues.apache.org/jira/browse/HBASE-19093 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Balazs Meszaros >Priority: Blocker > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19093.master.001.patch, > HBASE-19093.master.002.patch, RegionObserver.txt > > > A cursory review of Admin Interface has a bunch of methods as open, with out > AccessControl checks. For example, procedure executor has not check on it. > This issue is about given the Admin and Table Interfaces a once-over to see > what is missing and to fill in access control where missing. > This is a follow-on from work over in HBASE-19048 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19806) Lower max versions for selected system table column family
[ https://issues.apache.org/jira/browse/HBASE-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329550#comment-16329550 ] Ted Yu commented on HBASE-19806: I didn't change MAX VERSIONS for namespace table: TableDescriptorBuilder.NAMESPACE_TABLEDESC doesn't have access to configuration. > Lower max versions for selected system table column family > -- > > Key: HBASE-19806 > URL: https://issues.apache.org/jira/browse/HBASE-19806 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: 19806.v1.txt > > > On an hbase 2 cluster, I got the description of hbase:meta table: > {code} > {NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > ... > {NAME => 'table', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0' > , REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => ' > 8192'} > {code} > You can see that 'table' family has MAX VERSIONS much higher than the other > families. > The MAX VERSIONS value should be brought in sync with the other families. > For namespace table: > {code} > {NAME => 'info', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > {code} > Having MAX VERSIONS of 3 should be enough. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-19533) How to do controlled shutdown in branch-2?
[ https://issues.apache.org/jira/browse/HBASE-19533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329522#comment-16329522 ] stack edited comment on HBASE-19533 at 1/17/18 10:03 PM: - Over in HBASE-19527, I set all ProcV2 threads to be daemon – they were not – and I remove the above special casing that was added to TestRegionsOnMasterOptions. This makes it so the likes of the Master will go down more promptly. This is but an aspect of what this issue is about, whether Master should run all shutdown/close of regions, even on cluster shutdown? was (Author: stack): Over in HBASE-19527, I set all ProcV2 threads to be daemon – they were not – and I remove the above special casing that was added to TestRegionsOnMasterOptions. > How to do controlled shutdown in branch-2? > -- > > Key: HBASE-19533 > URL: https://issues.apache.org/jira/browse/HBASE-19533 > Project: HBase > Issue Type: Task >Reporter: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > > Before HBASE-18946, setting shutdown of a cluster, the Master would exit > immediately. RegionServers would run region closes and then try and notify > the Master of the close and would spew exceptions that the Master was > unreachable. > This is different to how branch-1 used to do it. It used to keep Master up > and it would be like the captain of the ship, the last to go down. As of > HBASE-18946, this is again the case but there are still open issues. > # Usually Master does all open and close of regions. On cluster shutdown, it > is the one time where the Regions run the region close. Currently, the > regions report the close to the Master which disregards the message since it > did not start the region closes. Should we do different? Try and update state > in hbase:meta setting it to CLOSE? We might not be able to write CLOSE for > all regions since hbase:meta will be closing too (the RS that is hosting > hbase:meta will close it last but that may not be enough). > # Should the Master run the cluster shutdown sending out close for all > regions? What if cluster of 1M regions? Untenable? Send a message per server? > That might be better. > Anyways, this needs attention. Filing issue in meantime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19806) Lower max versions for selected system table column family
[ https://issues.apache.org/jira/browse/HBASE-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329546#comment-16329546 ] Ted Yu commented on HBASE-19806: Patch v1 aligns MAX VERSIONS of 'table' family with the other families of hbase:meta > Lower max versions for selected system table column family > -- > > Key: HBASE-19806 > URL: https://issues.apache.org/jira/browse/HBASE-19806 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: 19806.v1.txt > > > On an hbase 2 cluster, I got the description of hbase:meta table: > {code} > {NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > ... > {NAME => 'table', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0' > , REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => ' > 8192'} > {code} > You can see that 'table' family has MAX VERSIONS much higher than the other > families. > The MAX VERSIONS value should be brought in sync with the other families. > For namespace table: > {code} > {NAME => 'info', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > {code} > Having MAX VERSIONS of 3 should be enough. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19806) Lower max versions for selected system table column family
[ https://issues.apache.org/jira/browse/HBASE-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-19806: --- Assignee: Ted Yu Status: Patch Available (was: Open) > Lower max versions for selected system table column family > -- > > Key: HBASE-19806 > URL: https://issues.apache.org/jira/browse/HBASE-19806 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: 19806.v1.txt > > > On an hbase 2 cluster, I got the description of hbase:meta table: > {code} > {NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > ... > {NAME => 'table', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0' > , REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => ' > 8192'} > {code} > You can see that 'table' family has MAX VERSIONS much higher than the other > families. > The MAX VERSIONS value should be brought in sync with the other families. > For namespace table: > {code} > {NAME => 'info', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > {code} > Having MAX VERSIONS of 3 should be enough. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19806) Lower max versions for selected system table column family
[ https://issues.apache.org/jira/browse/HBASE-19806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-19806: --- Attachment: 19806.v1.txt > Lower max versions for selected system table column family > -- > > Key: HBASE-19806 > URL: https://issues.apache.org/jira/browse/HBASE-19806 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Attachments: 19806.v1.txt > > > On an hbase 2 cluster, I got the description of hbase:meta table: > {code} > {NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > ... > {NAME => 'table', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0' > , REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => ' > 8192'} > {code} > You can see that 'table' family has MAX VERSIONS much higher than the other > families. > The MAX VERSIONS value should be brought in sync with the other families. > For namespace table: > {code} > {NAME => 'info', VERSIONS => '10', EVICT_BLOCKS_ON_CLOSE => 'false', > NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', > CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => > 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'true', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => > 'true', BLOCKSIZE => '81 > 92'} > {code} > Having MAX VERSIONS of 3 should be enough. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19815) Flakey TestAssignmentManager.testAssignWithRandExec
[ https://issues.apache.org/jira/browse/HBASE-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19815: -- Resolution: Fixed Status: Resolved (was: Patch Available) I pushed this to master and branch-2. Lets see if it fixes it. > Flakey TestAssignmentManager.testAssignWithRandExec > --- > > Key: HBASE-19815 > URL: https://issues.apache.org/jira/browse/HBASE-19815 > Project: HBase > Issue Type: Bug > Components: flakey, test >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19815.branch-2.001.patch > > > Saw the below in flakies failures > https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html > Seems to be highest failing incidence in branch-2. > {code} > 2018-01-17 15:43:52,872 ERROR [ProcExecWrkr-12] > procedure2.ProcedureExecutor(1481): CODE-BUG: Uncaught runtime exception: > pid=5, ppid=4, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure > failedMetaServer=localhost,104,1, splitWal=false > java.lang.ClassCastException: > org.apache.hadoop.hbase.master.assignment.MockMasterServices cannot be cast > to org.apache.hadoop.hbase.master.HMaster > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.prepare(RecoverMetaProcedure.java:253) > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:96) > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:51) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:182) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-15666) shaded dependencies for hbase-testing-util
[ https://issues.apache.org/jira/browse/HBASE-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15666: -- Fix Version/s: (was: 2.0.0-beta-2) 2.0.0 > shaded dependencies for hbase-testing-util > -- > > Key: HBASE-15666 > URL: https://issues.apache.org/jira/browse/HBASE-15666 > Project: HBase > Issue Type: New Feature > Components: test >Affects Versions: 1.1.0, 1.2.0 >Reporter: Sean Busbey >Priority: Critical > Fix For: 2.0.0, 1.5.0 > > > Folks that make use of our shaded client but then want to test things using > the hbase-testing-util end up getting all of our dependencies again in the > test scope. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)
[ https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329538#comment-16329538 ] Appy commented on HBASE-17852: -- Adding more, so the likely dependencies will end up being: hbase-backup --> hbase-server hbase-backup --> hbase-procedure B's functionalities will be implementations of Procedure/StateMachineProcedure and use masterServices.getMasterProcedureExecutor().submitProcedure() to get stuff done. I do see some deps issues, but we can come with solutions. One thing we should definitely try to stay away from is, merging the code back in hbase-server module. For now, what do you think are the biggest blockers for making procv2 + backup happen [~vrodionov]? You're right, we should definitely discuss concrete design/problems/solutions before staring with the refactoring. Can help with design review. > Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental > backup) > > > Key: HBASE-17852 > URL: https://issues.apache.org/jira/browse/HBASE-17852 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, > HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, > HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, > HBASE-17852-v9.patch > > > Design approach rollback-via-snapshot implemented in this ticket: > # Before backup create/delete/merge starts we take a snapshot of the backup > meta-table (backup system table). This procedure is lightweight because meta > table is small, usually should fit a single region. > # When operation fails on a server side, we handle this failure by cleaning > up partial data in backup destination, followed by restoring backup > meta-table from a snapshot. > # When operation fails on a client side (abnormal termination, for example), > next time user will try create/merge/delete he(she) will see error message, > that system is in inconsistent state and repair is required, he(she) will > need to run backup repair tool. > # To avoid multiple writers to the backup system table (backup client and > BackupObserver's) we introduce small table ONLY to keep listing of bulk > loaded files. All backup observers will work only with this new tables. The > reason: in case of a failure during backup create/delete/merge/restore, when > system performs automatic rollback, some data written by backup observers > during failed operation may be lost. This is what we try to avoid. > # Second table keeps only bulk load related references. We do not care about > consistency of this table, because bulk load is idempotent operation and can > be repeated after failure. Partially written data in second table does not > affect on BackupHFileCleaner plugin, because this data (list of bulk loaded > files) correspond to a files which have not been loaded yet successfully and, > hence - are not visible to the system -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19527) Make ExecutorService threads daemon=true.
[ https://issues.apache.org/jira/browse/HBASE-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329528#comment-16329528 ] Hadoop QA commented on HBASE-19527: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HBASE-19527 does not apply to branch-2. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.6.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-19527 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12906490/HBASE-19527.branch-2.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11092/console | | Powered by | Apache Yetus 0.6.0 http://yetus.apache.org | This message was automatically generated. > Make ExecutorService threads daemon=true. > - > > Key: HBASE-19527 > URL: https://issues.apache.org/jira/browse/HBASE-19527 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19527.branch-2.001.patch, > HBASE-19527.master.001.patch, HBASE-19527.master.001.patch, > HBASE-19527.master.001.patch > > > Let me try this. ExecutorService runs OPENs, CLOSE, etc. If Server is going > down, no point in these threads sticking around (I think). Let me try this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)