[jira] [Commented] (HBASE-24609) Move MetaTableAccessor out of hbase-client
[ https://issues.apache.org/jira/browse/HBASE-24609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213379#comment-17213379 ] Michael Stack commented on HBASE-24609: --- A few questions after looking at this (belatedly) Why move MetaTableAccessor to hbase-balancer? The accessor seems more general than balancing affairs only. Class is described as bq. Read/write operations on hbase:meta region as well as assignment information stored bq. * to hbase:meta. The 'hbase-balancer' is described as ' HBase Balancer Support' The rename of asyncmetatableaccessor to clientmetatableaccessor looks good but I'm unclear on when clientmetatableaccessor and when metatableaccessor? The clientmetatableaccessor is described like the metatableaccessor. bq. * The (asynchronous) meta table accessor used at client side. Used to read/write region and bq. * assignment information store in hbase:meta. Previous I used MTA to write hbase:meta. Now I use CMTA too or exclusively? What you thinking? s/CatalogFamilyFormat/CatalogColumnFamily/? Or CatalogColumnFamilyParser? Will this class cover the 'info' columnfamily only? Or you foresee it doing other columnfamilies too? Why we need it as standalone class that is making an appearance in a few places around the code base? It was an internal affair of MetaTableAccessor previously? Thanks > Move MetaTableAccessor out of hbase-client > -- > > Key: HBASE-24609 > URL: https://issues.apache.org/jira/browse/HBASE-24609 > Project: HBase > Issue Type: Task > Components: amv2, Client, meta >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0-alpha-1 > > > On master branch we have AsyncMetaTableAccessor which is used at client side > and MetaTableAccessor has lots of internal methods for implementing > assignment, which is not part of our client code. > So let's move it to hbase-server, and in the future, maybe in hbase-balancer? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25017) Attach a design doc to code base
[ https://issues.apache.org/jira/browse/HBASE-25017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213352#comment-17213352 ] Michael Stack commented on HBASE-25017: --- The linked PR seems to be for another issue? > Attach a design doc to code base > > > Key: HBASE-25017 > URL: https://issues.apache.org/jira/browse/HBASE-25017 > Project: HBase > Issue Type: Sub-task > Components: documentation > Environment: Ata >Reporter: Duo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25007) Make HBCK2 work for 'root table'
[ https://issues.apache.org/jira/browse/HBASE-25007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213349#comment-17213349 ] Michael Stack commented on HBASE-25007: --- bq. Add a new HBCK option to scan meta and fix the incosistency between meta and in memory state? Sounds good. If it finds 'Unknown Server' (for read replicas too), it would queue an SCP? What else? HBASE-25142 is trying to address 'Unknown Server' via the regular CatalogJanitor run. > Make HBCK2 work for 'root table' > > > Key: HBASE-25007 > URL: https://issues.apache.org/jira/browse/HBASE-25007 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > > We will also scan catalog table and fix them in HBCK2, we should add support > for root too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25144) Add Hadoop-3.3.0 to personality hadoopcheck
[ https://issues.apache.org/jira/browse/HBASE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212734#comment-17212734 ] Michael Stack commented on HBASE-25144: --- HBASE-23834 made it so hadoop 3.3.0 works with hbase-2.4.0 and master/hbase-3 via hbase-thirdparty additions. > Add Hadoop-3.3.0 to personality hadoopcheck > --- > > Key: HBASE-25144 > URL: https://issues.apache.org/jira/browse/HBASE-25144 > Project: HBase > Issue Type: Task > Components: build, community >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Minor > Fix For: 3.0.0-alpha-1 > > > Now that Hadoop 3.3.0 is released, let's figure out where it goes in our > testing matrix. Start by adding it to precommit checks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-22938) Fold all the system tables to hbase:meta
[ https://issues.apache.org/jira/browse/HBASE-22938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212603#comment-17212603 ] Michael Stack edited comment on HBASE-22938 at 10/12/20, 7:15 PM: -- Just to repeat note added on HBASE-15867, some of the obstacles noted above have been undone; i.e. region replicas are working to make it so they do not need to keep state (see '4.1 Skip maintaining zookeeper replication queue (offsets/WALs)' in https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit#heading=h.5hn1d8pikvrr). Folding all system tables into hbase:meta should run a bit smoother (with asserts as suggested above that we do not replicate system/catalog tables). was (Author: stack): {quote}This will make the read replicas feature can not 100% work for meta and system tables, as we can not use in-cluster replication for them to spread the edits to the secondary replicas any more. {quote} IIUC, this was once the case but no longer given read replicas no longer keep ongoing state (see '4.1 Skip maintaining zookeeper replication queue (offsets/WALs)' in https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit#heading=h.5hn1d8pikvrr) > Fold all the system tables to hbase:meta > > > Key: HBASE-22938 > URL: https://issues.apache.org/jira/browse/HBASE-22938 > Project: HBase > Issue Type: Brainstorming >Reporter: Duo Zhang >Priority: Major > > Quote my post on HBASE-15867 here, on how to deal with the dead lock when we > want to store replication queues to hbase:replication table. > {quote} > We could add a special prefix in the row key for different system tables, and > make a special family for it. For example, for all the records in hbase:acl, > we could introduce a prefix like ':::acl:::', since we do not allow ':' in > either namespace or table name, so it will not conflict with the existing > table related records. And the family could be namd as 'acl'. > And we could make a special split policy that only splits at these special > prefixs, so it will not break any assumptions so far, as all the records for > the 'system table' are in the same region. > {quote} > And I think there are also other advantages, for example the start up logic > can be greatly simplified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22938) Fold all the system tables to hbase:meta
[ https://issues.apache.org/jira/browse/HBASE-22938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212603#comment-17212603 ] Michael Stack commented on HBASE-22938: --- {quote}This will make the read replicas feature can not 100% work for meta and system tables, as we can not use in-cluster replication for them to spread the edits to the secondary replicas any more. {quote} IIUC, this was once the case but no longer given read replicas no longer keep ongoing state (see '4.1 Skip maintaining zookeeper replication queue (offsets/WALs)' in https://docs.google.com/document/d/1jJWVc-idHhhgL4KDRpjMsQJKCl_NRaCLGiH3Wqwd3O8/edit#heading=h.5hn1d8pikvrr) > Fold all the system tables to hbase:meta > > > Key: HBASE-22938 > URL: https://issues.apache.org/jira/browse/HBASE-22938 > Project: HBase > Issue Type: Brainstorming >Reporter: Duo Zhang >Priority: Major > > Quote my post on HBASE-15867 here, on how to deal with the dead lock when we > want to store replication queues to hbase:replication table. > {quote} > We could add a special prefix in the row key for different system tables, and > make a special family for it. For example, for all the records in hbase:acl, > we could introduce a prefix like ':::acl:::', since we do not allow ':' in > either namespace or table name, so it will not conflict with the existing > table related records. And the family could be namd as 'acl'. > And we could make a special split policy that only splits at these special > prefixs, so it will not break any assumptions so far, as all the records for > the 'system table' are in the same region. > {quote} > And I think there are also other advantages, for example the start up logic > can be greatly simplified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25168) Unify WAL name timestamp parsers
[ https://issues.apache.org/jira/browse/HBASE-25168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25168. --- Fix Version/s: 2.2.7 2.4.0 2.3.3 3.0.0-alpha-1 Hadoop Flags: Reviewed Assignee: Michael Stack Resolution: Fixed Merged to branch-2.2+ Thanks for reviews [~zhangduo] and [~psomogyi] > Unify WAL name timestamp parsers > > > Key: HBASE-25168 > URL: https://issues.apache.org/jira/browse/HBASE-25168 > Project: HBase > Issue Type: Bug >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7 > > > Turns out there are two methods for extracting timestamp from WAL filename. > Fix. > Spotted by [~zhangduo] in review of HBASE-22976... > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211489#comment-17211489 ] Michael Stack commented on HBASE-22976: --- Let me fix HBASE-25168. Added to WAL as utility only – no enforcement of any naming pattern. Devs might find it here (less so if in AFSWProvider... witness my experience). Thanks for fingering the duplication. Can move to AbstractFSWALProvider (though it hurts). > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2, walplayer >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7 > > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25168) Unify WAL name timestamp parsers
Michael Stack created HBASE-25168: - Summary: Unify WAL name timestamp parsers Key: HBASE-25168 URL: https://issues.apache.org/jira/browse/HBASE-25168 Project: HBase Issue Type: Bug Reporter: Michael Stack Turns out there are two methods for extracting timestamp from WAL filename. Fix. Spotted by [~zhangduo] in review of HBASE-22976... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25034) Table regions details on master GUI display slowly.
[ https://issues.apache.org/jira/browse/HBASE-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211373#comment-17211373 ] Michael Stack commented on HBASE-25034: --- How does the refactor improve load speed [~lidingsheng] ? Thanks. > Table regions details on master GUI display slowly. > --- > > Key: HBASE-25034 > URL: https://issues.apache.org/jira/browse/HBASE-25034 > Project: HBase > Issue Type: Improvement >Reporter: DingSheng Li >Priority: Major > Labels: newbie > Attachments: The table display after pagination.html > > > When a table has a large number of regions (e.g.,a single table contains more > than 100,000 regions), it takes about 20 to 30 minutes to display the table > regions on the master GUI, which is unacceptable to users. After testing, we > find that web page rendering takes up the most time,and this can be solved by > pagination query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-20874) Sending compaction descriptions from all regionservers to master.
[ https://issues.apache.org/jira/browse/HBASE-20874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211319#comment-17211319 ] Michael Stack commented on HBASE-20874: --- Patch has a Compaction class that is to describe a Compaction. It has region name and family and list of files, which is good, but then it has whether the compaction is running or not and if running, a bunch of attributes not good (this is not description of the Compaction but of the Compaction execution). Also has attributes minor or major and whether short or long. The first is an attribute used constructing the compaction. The second is an attribute of execution scheduling irrelevant, say, we externalized the compaction runner. It adds to the Admin interface new APIs that allows getting cluster Compaction objects – from Master... rather than to each RS. We have volumes of compaction API in Admin already. Adds a RunningTasksThreadPoolExecutor so can ask the executor of its current state – what is it running. Adds all current Compactions on the heartbeat (though we have a problem here, see HBASE-11747) Adds a shell command to dump current set of cluster Compactions. > Sending compaction descriptions from all regionservers to master. > - > > Key: HBASE-20874 > URL: https://issues.apache.org/jira/browse/HBASE-20874 > Project: HBase > Issue Type: Sub-task >Reporter: Mohit Goel >Assignee: Mohit Goel >Priority: Minor > Attachments: HBASE-20874.master.004.patch, > HBASE-20874.master.005.patch, HBASE-20874.master.006.patch, > HBASE-20874.master.007.patch, HBASE-20874.master.008.patch, > hbase-20874.master.009.patch, hbase-20874.master.010.patch > > > Need to send the compaction description from region servers to Master , to > let master know of the entire compaction state of the cluster. Further need > to change the implementation of client Side API than like getCompactionState, > which will consult master for the result instead of sending individual > request to regionservers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24025) Improve performance of move_servers_rsgroup and move_tables_rsgroup by using async region move API
[ https://issues.apache.org/jira/browse/HBASE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24025. --- Fix Version/s: 2.4.0 Resolution: Fixed Merged to branch-2. You want it in older branches [~arshad.mohammad] ? > Improve performance of move_servers_rsgroup and move_tables_rsgroup by using > async region move API > -- > > Key: HBASE-24025 > URL: https://issues.apache.org/jira/browse/HBASE-24025 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Currently move_servers_rsgroup and move_tables_rsgroup commands and APIs are > taking lot of time. > In my test environment, to move a server with 100 regions it takes around 137 > seconds. > Similarly it takes around same time to move a table with 100 regions to other > group. > The time taken in rsgroup meta update is negligible. Almost all the time is > taken in region moment. This is happening because region is moved serially > using getAssignmentManager().move(region) API > API getAssignmentManager().moveAsync(regionplan) can be used to move the > regions in parallel to improve the performance of region group move servers > and tables commands and APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211182#comment-17211182 ] Michael Stack commented on HBASE-22976: --- Backported this and HBASE-25109 to branch-2.2. [~mbz] This changes will be available to you when we release 2.2.7/2.3.3. > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2, walplayer >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7 > > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-22976: -- Fix Version/s: 2.2.7 > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2, walplayer >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7 > > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25109) Add MR Counters to WALPlayer; currently hard to tell if it is doing anything
[ https://issues.apache.org/jira/browse/HBASE-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25109: -- Fix Version/s: 2.2.7 > Add MR Counters to WALPlayer; currently hard to tell if it is doing anything > > > Key: HBASE-25109 > URL: https://issues.apache.org/jira/browse/HBASE-25109 > Project: HBase > Issue Type: Improvement >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7 > > > For example, when WALPlayer runs, it emits this: > {code:java} > 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] > mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: > Counters: 20 > File System Counters > FILE: Number of bytes read=268891453 > FILE: Number of bytes written=1018719 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=4375 > Map output records=5369 > Input split bytes=245 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=59 > Total committed heap usage (bytes)=518979584 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 {code} > Change it so it does this: > {code:java} > 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] > mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: > Counters: 20 > File System Counters > FILE: Number of bytes read=268891453 > FILE: Number of bytes written=1018719 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=4375 > Map output records=5369 > Input split bytes=245 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=59 > Total committed heap usage (bytes)=518979584 > org.apache.hadoop.hbase.mapreduce.WALPlayer$Counter > CELLS_READ=89574 > CELLS_WRITTEN=89572 > DELETES=64 > PUTS=5305 > WALEDITS=4375 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25109) Add MR Counters to WALPlayer; currently hard to tell if it is doing anything
[ https://issues.apache.org/jira/browse/HBASE-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211181#comment-17211181 ] Michael Stack commented on HBASE-25109: --- Backported to branch-2.2 > Add MR Counters to WALPlayer; currently hard to tell if it is doing anything > > > Key: HBASE-25109 > URL: https://issues.apache.org/jira/browse/HBASE-25109 > Project: HBase > Issue Type: Improvement >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7 > > > For example, when WALPlayer runs, it emits this: > {code:java} > 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] > mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: > Counters: 20 > File System Counters > FILE: Number of bytes read=268891453 > FILE: Number of bytes written=1018719 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=4375 > Map output records=5369 > Input split bytes=245 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=59 > Total committed heap usage (bytes)=518979584 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 {code} > Change it so it does this: > {code:java} > 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] > mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: > Counters: 20 > File System Counters > FILE: Number of bytes read=268891453 > FILE: Number of bytes written=1018719 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=4375 > Map output records=5369 > Input split bytes=245 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=59 > Total committed heap usage (bytes)=518979584 > org.apache.hadoop.hbase.mapreduce.WALPlayer$Counter > CELLS_READ=89574 > CELLS_WRITTEN=89572 > DELETES=64 > PUTS=5305 > WALEDITS=4375 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25156) TestMasterFailover.testSimpleMasterFailover is flaky
[ https://issues.apache.org/jira/browse/HBASE-25156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211168#comment-17211168 ] Michael Stack commented on HBASE-25156: --- 30 seconds is a long time > TestMasterFailover.testSimpleMasterFailover is flaky > - > > Key: HBASE-25156 > URL: https://issues.apache.org/jira/browse/HBASE-25156 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 3.0.0-alpha-1 >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 2.2.7 > > > {noformat} > [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 32.996 s <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover > [ERROR] > org.apache.hadoop.hbase.master.TestMasterFailover.testSimpleMasterFailover > Time elapsed: 12.317 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<2> > at > org.apache.hadoop.hbase.master.TestMasterFailover.testSimpleMasterFailover(TestMasterFailover.java:133) > {noformat} > Looks like this test depends on metrics to be updated as a source of > side-effect used to verify the test. Seems like it should retry the check a > few times, or maybe we need a last-updated monotonic value that the test can > check before and after it expects a change to be visible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-22976. --- Hadoop Flags: Reviewed Resolution: Fixed Merged to master and backported to branch-2/branch-2.3. Thanks for review [~wchevreuil] > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2, walplayer >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0 > > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-11747) ClusterStatus (heartbeat) is too bulky
[ https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210408#comment-17210408 ] Michael Stack commented on HBASE-11747: --- Looking at a running cluster where servers are carrying about 500 Regions each, heartbeat size is not too bad: {code:java} 2020-10-08 19:09:47,344 TRACE [RpcServer.priority.RWQ.Fifo.write.handler=0,queue=0,port=16000] ipc.RpcServer: callId: 396871 service: RegionServerStatusService methodName: RegionServerReport size: 84.5 K connection: 192.192.118.146:50556 deadline: 1602184197326 2020-10-08 19:09:47,344 TRACE [RpcServer.priority.RWQ.Fifo.write.handler=0,queue=0,port=16000] ipc.RpcServer: callId: 396218 service: RegionServerStatusService methodName: RegionServerReport size: 83.7 K connection: 192.192.113.230:17968 deadline: 1602184197326 {code} > ClusterStatus (heartbeat) is too bulky > --- > > Key: HBASE-11747 > URL: https://issues.apache.org/jira/browse/HBASE-11747 > Project: HBase > Issue Type: Sub-task > Components: master, Operability, scaling >Reporter: Virag Kothari >Priority: Critical > Attachments: exceptiontrace > > > Following exception on 0.98 with 1M regions on cluster with 160 region servers > {code} > Caused by: java.io.IOException: Call to regionserverhost:port failed on local > exception: com.google.protobuf.InvalidProtocolBufferException: Protocol > message was too large. May be malicious. Use > CodedInputStream.setSizeLimit() to increase the size limit. > at > org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132) > at > org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166) > at > org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114) > ... 43 more > Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol > message was too large. May be malicious. Use > CodedInputStream.setSizeLimit() to increase the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24025) Improve performance of move_servers_rsgroup and move_tables_rsgroup by using async region move API
[ https://issues.apache.org/jira/browse/HBASE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24025. --- Fix Version/s: 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Merged to master. Thanks for patch [~arshad.mohammad] > Improve performance of move_servers_rsgroup and move_tables_rsgroup by using > async region move API > -- > > Key: HBASE-24025 > URL: https://issues.apache.org/jira/browse/HBASE-24025 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Currently move_servers_rsgroup and move_tables_rsgroup commands and APIs are > taking lot of time. > In my test environment, to move a server with 100 regions it takes around 137 > seconds. > Similarly it takes around same time to move a table with 100 regions to other > group. > The time taken in rsgroup meta update is negligible. Almost all the time is > taken in region moment. This is happening because region is moved serially > using getAssignmentManager().move(region) API > API getAssignmentManager().moveAsync(regionplan) can be used to move the > regions in parallel to improve the performance of region group move servers > and tables commands and APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25165) Change 'State time' in UI so sorts
[ https://issues.apache.org/jira/browse/HBASE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25165. --- Fix Version/s: 2.4.0 3.0.0-alpha-1 Hadoop Flags: Reviewed Release Note: Start time on the Master UI is now displayed using ISO8601 format instead of java Date#toString(). Resolution: Fixed Merged to master and backported to branch-2. Thanks for review [~ndimiduk] > Change 'State time' in UI so sorts > -- > > Key: HBASE-25165 > URL: https://issues.apache.org/jira/browse/HBASE-25165 > Project: HBase > Issue Type: Bug > Components: UI >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > Attachments: Screen Shot 2020-10-07 at 4.15.32 PM.png, Screen Shot > 2020-10-07 at 4.15.42 PM.png > > > Here is a minor issue. > I had an issue w/ crashing servers. The servers were auto-restarted on crash. > To find the crashing servers, I was sorting on the 'Start time' column in the > Master UI. This basically worked but the sort is unreliable as the date we > display starts with days-of-the-week. > This issue is about moving to display start time in iso8601 which is sortable > (and occupies less real estate). Let me add some images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-4040) Make HFilePrettyPrinter programmatically invocable and add JSON output
[ https://issues.apache.org/jira/browse/HBASE-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-4040. -- Resolution: Later > Make HFilePrettyPrinter programmatically invocable and add JSON output > -- > > Key: HBASE-4040 > URL: https://issues.apache.org/jira/browse/HBASE-4040 > Project: HBase > Issue Type: New Feature > Components: tooling, UI >Reporter: Riley Patterson >Priority: Major > > Implement JSON output in HFilePrettyPrinter, similar to the work done for the > HLogPrettyPrinter, so that scripts can easily parse the information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-3672) MapReduce test cases shouldn't use system /tmp
[ https://issues.apache.org/jira/browse/HBASE-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-3672. -- Resolution: Not A Problem > MapReduce test cases shouldn't use system /tmp > -- > > Key: HBASE-3672 > URL: https://issues.apache.org/jira/browse/HBASE-3672 > Project: HBase > Issue Type: Improvement > Components: test >Reporter: Todd Lipcon >Priority: Minor > > Right now some of our MR test cases seem to put local directories in /tmp - > this can cause conflicts when running multiple builds on the same Hudson box. > We should instead use a build/tmp directory for this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-1978) Change the range/block index scheme from [start,end) to (start, end], and index range/block by endKey, specially in HFile
[ https://issues.apache.org/jira/browse/HBASE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-1978. -- Resolution: Later > Change the range/block index scheme from [start,end) to (start, end], and > index range/block by endKey, specially in HFile > - > > Key: HBASE-1978 > URL: https://issues.apache.org/jira/browse/HBASE-1978 > Project: HBase > Issue Type: New Feature > Components: io, master, regionserver >Reporter: Schubert Zhang >Assignee: Schubert Zhang >Priority: Major > Attachments: HBASE-1978-HFile-v1.patch > > > From the code review of HFile (HBASE-1818), we found the HFile allows > duplicated key. But the old implementation would lead to missing of > duplicated key when seek and scan, when the duplicated key span multiple > blocks. > We provide a patch (HBASE-1841 is't step1) to resolve above issue. This patch > modified HFile.Writer to avoid generating a problem hfile with above > cross-block duplicated key. It only start a new block when current appending > key is different from the last appended key. But it still has a rish when the > user of HFile.Writer append many same duplicated key which lead to a very > large block and need much memory or Out-of-memory. > The current HFile's block-index use startKey to index a block, i.e. the > range/block index scheme is [startKey,endKey). > As refering to the section 5.1 of the Google Bigtable paper. > "The METADATA table stores the location of a tablet under a row key that is > an encoding of the tablet's table identifer and its end row." > The theory of Bigtable's METADATA is same as the BlockIndex in a SSTable or > HFile, so we should use EndKey in HFile's BlockIndex. In my experiences of > Hypertable, the METADATA is also "tableID:endRow". > We would change the index scheme in HFile, from [startKey,endKey) to > (startKey,endKey]. And change the binary search method to meet this index > scheme. > This change can resolve above duplicated-key issue. > Note: > The totally fix need to modify many modules in HBase, seems include HFile, > META schema, some internal code, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23959) Fix javadoc for JDK11
[ https://issues.apache.org/jira/browse/HBASE-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-23959: -- Fix Version/s: 2.4.0 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Merged and backported to branch-2 (doesn't go to branch-2.3). Thanks for patch [~semensanyok] and for the review [~janh] . > Fix javadoc for JDK11 > - > > Key: HBASE-23959 > URL: https://issues.apache.org/jira/browse/HBASE-23959 > Project: HBase > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha-1, 2.3.0 >Reporter: Nick Dimiduk >Assignee: Semen Komissarov >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Javadoc build fails with JDK11. See if this can be fixed to pass on both 8 > and 11. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23742) Document that with split-to-hfile data over the MOB threshold will be treated as normal data
[ https://issues.apache.org/jira/browse/HBASE-23742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-23742. --- Fix Version/s: 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Merged the doc change. Thanks for the patch [~pankajkumar] > Document that with split-to-hfile data over the MOB threshold will be treated > as normal data > > > Key: HBASE-23742 > URL: https://issues.apache.org/jira/browse/HBASE-23742 > Project: HBase > Issue Type: Task > Components: documentation, MTTR, wal >Affects Versions: 3.0.0-alpha-1, 2.3.0 >Reporter: Y. SREENIVASULU REDDY >Assignee: Pankaj Kumar >Priority: Minor > Fix For: 3.0.0-alpha-1 > > > h3. documentation update > Update the troubleshooting section of the MOB chapter to include a note for > "Why is there data over the MOB threshold in the normal {{/hbase/data}} > directory rather than in {{/hbase/mobdir}}. List the split-to-hfile feature > as one source and bulk loading as another. Note that in both cases the next > compaction to include those files will write the data out to MOB hfiles. > h3. original > Steps to reproduce this issue. > 1. create a table with 1 region, and mob enabled, keep threshold value to 5. > 2. Load data into the table, keep the value size should be more than 5. > 3. flush the table. > 4. observe the mobdir and data dir, hfiles should be there. > 5. load data again with different data set, keep the value size is greater > than 5. > 6. Kill -9 RS where table region is online > 7. Start RS > check the mob dir and data dir, both should have 2 hfiles each. > But data dir only have 2 hfiles, that means mob threshold crossed data is > considered as normal data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-13690) Client Scanner Initialization Reformats strings every time
[ https://issues.apache.org/jira/browse/HBASE-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-13690. --- Resolution: Later > Client Scanner Initialization Reformats strings every time > -- > > Key: HBASE-13690 > URL: https://issues.apache.org/jira/browse/HBASE-13690 > Project: HBase > Issue Type: Improvement >Reporter: John Leach >Priority: Critical > Attachments: ClientScanner_String_Format.tiff > > > The client scanner continually goes back into the conf for values... > public ClientScanner(final Configuration conf, final Scan scan, final > TableName tableName, > HConnection connection, RpcRetryingCallerFactory rpcFactory, > RpcControllerFactory controllerFactory) throws IOException { > if (LOG.isTraceEnabled()) { > LOG.trace("Scan table=" + tableName > + ", startRow=" + Bytes.toStringBinary(scan.getStartRow())); > } > this.scan = scan; > this.tableName = tableName; > this.lastNext = System.currentTimeMillis(); > this.connection = connection; > if (scan.getMaxResultSize() > 0) { > this.maxScannerResultSize = scan.getMaxResultSize(); > } else { > this.maxScannerResultSize = conf.getLong( > HConstants.HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY, > HConstants.DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE); > } > this.scannerTimeout = HBaseConfiguration.getInt(conf, > HConstants.HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD, > HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, > HConstants.DEFAULT_HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD); > // check if application wants to collect scan metrics > initScanMetrics(scan); > // Use the caching from the Scan. If not set, use the default cache > setting for this table. > if (this.scan.getCaching() > 0) { > this.caching = this.scan.getCaching(); > } else { > this.caching = conf.getInt( > HConstants.HBASE_CLIENT_SCANNER_CACHING, > HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING); > } > this.caller = rpcFactory. newCaller(); > this.rpcControllerFactory = controllerFactory; > initializeScannerInConstruction(); > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-13691) HTable and RPC Code Accessing Configuration each time (Blocking)
[ https://issues.apache.org/jira/browse/HBASE-13691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-13691. --- Resolution: Later > HTable and RPC Code Accessing Configuration each time (Blocking) > > > Key: HBASE-13691 > URL: https://issues.apache.org/jira/browse/HBASE-13691 > Project: HBase > Issue Type: Improvement >Reporter: John Leach >Priority: Major > Attachments: Properties_getProperty.tiff > > > Properties.getProperty blocks under load... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-4002) Int array based skip list
[ https://issues.apache.org/jira/browse/HBASE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-4002. -- Resolution: Later > Int array based skip list > - > > Key: HBASE-4002 > URL: https://issues.apache.org/jira/browse/HBASE-4002 > Project: HBase > Issue Type: Improvement >Reporter: Jason Rutherglen >Priority: Minor > Attachments: HBASE-4002.patch, HBASE-4002.patch, HBASE-4002.patch, > HBASE-4002.patch > > > We can implement an AtomicIntegerArray based skip list, where the int values > point to locations in a byte block structure. This can be useful for testing > against ConcurrentSkipListMap. It can also be used in Lucene for the > realtime terms dictionary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-3993) Alternatives to ConcurrentSkipListMap in MemStore
[ https://issues.apache.org/jira/browse/HBASE-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-3993. -- Resolution: Later > Alternatives to ConcurrentSkipListMap in MemStore > - > > Key: HBASE-3993 > URL: https://issues.apache.org/jira/browse/HBASE-3993 > Project: HBase > Issue Type: Improvement >Reporter: Jason Rutherglen >Priority: Minor > > This can be an umbrella issue for evaluating and testing alternatives to > java.util.concurrent.ConcurrentSkipListMap in MemStore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-3992) Evaluate Lock Free Skip Tree
[ https://issues.apache.org/jira/browse/HBASE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-3992. -- Resolution: Later Stale > Evaluate Lock Free Skip Tree > > > Key: HBASE-3992 > URL: https://issues.apache.org/jira/browse/HBASE-3992 > Project: HBase > Issue Type: Improvement >Reporter: Jason Rutherglen >Priority: Minor > > We can test out this variant of the ConcurrentSkipListMap. > "Drop-in replacement for java.util.concurrent.ConcurrentSkipList[Map|Set]" > https://github.com/mspiegel/lockfreeskiptree -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-1935) Scan in parallel
[ https://issues.apache.org/jira/browse/HBASE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-1935. -- Resolution: Later > Scan in parallel > > > Key: HBASE-1935 > URL: https://issues.apache.org/jira/browse/HBASE-1935 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Reporter: Michael Stack >Priority: Major > Attachments: 1935-idea.txt, pscanner-v2.patch, pscanner-v3.patch, > pscanner-v4.patch, pscanner.patch > > > A scanner that rather than scan in series, instead scanned multiple regions > in parallell would be more involved but could complete much faster > partiularly if results are sparse. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-3340) Eventually Consistent Secondary Indexing via Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-3340. -- Resolution: Later > Eventually Consistent Secondary Indexing via Coprocessors > - > > Key: HBASE-3340 > URL: https://issues.apache.org/jira/browse/HBASE-3340 > Project: HBase > Issue Type: New Feature > Components: Coprocessors >Reporter: Jonathan Gray >Priority: Major > > Secondary indexing support via coprocessors with an eventual consistency > guarantee. Design to come. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-3557) Quick disable call to drop a table
[ https://issues.apache.org/jira/browse/HBASE-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-3557. -- Resolution: Later > Quick disable call to drop a table > -- > > Key: HBASE-3557 > URL: https://issues.apache.org/jira/browse/HBASE-3557 > Project: HBase > Issue Type: New Feature >Reporter: Jean-Daniel Cryans >Priority: Major > > From the mailing list, it seems that a feature that enables the disabling of > tables without having to wait for regions to flush would be quite popular. In > the case where you do rapid development, you often churn through tables and > need to drop/truncate them often. This is often true after a big MR job when > the data isn't right. > As a solution, I'm thinking of a flag that you can pass to the disable > command to jettison the memstores like we already have when doing an > emergency shutdown. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-2934) Improve Example API Usage (in package-summary.html of org.apache.hadoop.hbase.client package)
[ https://issues.apache.org/jira/browse/HBASE-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-2934. -- Resolution: Later > Improve Example API Usage (in package-summary.html of > org.apache.hadoop.hbase.client package) > - > > Key: HBASE-2934 > URL: https://issues.apache.org/jira/browse/HBASE-2934 > Project: HBase > Issue Type: Improvement > Components: documentation >Affects Versions: 0.20.5 >Reporter: Matthias Wessendorf >Priority: Major > Attachments: simple_doc.patch, simple_doc_with_disable_delete.patch > > > Going to > http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/package-summary.html > > and simply cut-past the example to Eclipse I got an exception that the > suggested table is missing. > Adding a few lines does create it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-2739) Master should fail to start if it cannot successfully split logs
[ https://issues.apache.org/jira/browse/HBASE-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-2739. -- Resolution: Later > Master should fail to start if it cannot successfully split logs > > > Key: HBASE-2739 > URL: https://issues.apache.org/jira/browse/HBASE-2739 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.20.4, 0.90.0 >Reporter: Todd Lipcon >Priority: Critical > > In trunk, in splitLogAfterStartup(), we log the error splitting, but don't > shut down. Depending on configuration, we should probably shut down here > rather than continue with dataloss. > In 0.20, we print the stacktrace to stdout in verifyClusterState, but > continue through and often fail to start up -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-2685) Add activeMaster field to AClusterStatus record in Avro interface
[ https://issues.apache.org/jira/browse/HBASE-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-2685. -- Resolution: Later > Add activeMaster field to AClusterStatus record in Avro interface > - > > Key: HBASE-2685 > URL: https://issues.apache.org/jira/browse/HBASE-2685 > Project: HBase > Issue Type: Improvement >Reporter: Jeff Hammerbacher >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-2690) Add support for setCacheBlocks() (regionserver level caching) and setCaching() (connector level caching) to scan operation in Avro interface
[ https://issues.apache.org/jira/browse/HBASE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-2690. -- Resolution: Later > Add support for setCacheBlocks() (regionserver level caching) and > setCaching() (connector level caching) to scan operation in Avro interface > > > Key: HBASE-2690 > URL: https://issues.apache.org/jira/browse/HBASE-2690 > Project: HBase > Issue Type: Improvement >Reporter: Jeff Hammerbacher >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-2688) Implement user attribute get and set for Tables and Families in Avro interface
[ https://issues.apache.org/jira/browse/HBASE-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-2688. -- Resolution: Later > Implement user attribute get and set for Tables and Families in Avro > interface > --- > > Key: HBASE-2688 > URL: https://issues.apache.org/jira/browse/HBASE-2688 > Project: HBase > Issue Type: Improvement >Reporter: Jeff Hammerbacher >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-2687) Implement filters for get and scan operations in Avro interface
[ https://issues.apache.org/jira/browse/HBASE-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-2687. -- Resolution: Later > Implement filters for get and scan operations in Avro interface > --- > > Key: HBASE-2687 > URL: https://issues.apache.org/jira/browse/HBASE-2687 > Project: HBase > Issue Type: Improvement >Reporter: Jeff Hammerbacher >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-2484) refactor package names from o.a.h.h to o.a.hbase
[ https://issues.apache.org/jira/browse/HBASE-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-2484. -- Resolution: Later > refactor package names from o.a.h.h to o.a.hbase > > > Key: HBASE-2484 > URL: https://issues.apache.org/jira/browse/HBASE-2484 > Project: HBase > Issue Type: Task >Reporter: Karthik K >Priority: Major > Fix For: 3.0.0-alpha-1 > > > After becoming a TLP, it makes sense to refactor to o.a.hbase instead of > o.a.h.hbase as it exists now. > There is a consensus among the team, but concerns about the migration effects > on the end-user remain. placeholder ticket for the refactoring + opinions on > the migration cost of the API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-2218) MiniZooKeeperCluster - to be refactored and moved upstream to zk
[ https://issues.apache.org/jira/browse/HBASE-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-2218. -- Resolution: Later > MiniZooKeeperCluster - to be refactored and moved upstream to zk > - > > Key: HBASE-2218 > URL: https://issues.apache.org/jira/browse/HBASE-2218 > Project: HBase > Issue Type: Improvement >Reporter: Karthik K >Priority: Major > > As rightly mentioned in the comments - MiniZooKeeperCluster should be > refactored and moved up to the ZK tree as appropriate and reused as > necessary. > Marked as an improvement to remember the task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-1346) Split column names using a delimeter other than space for TableInputFormat
[ https://issues.apache.org/jira/browse/HBASE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-1346. -- Resolution: Later > Split column names using a delimeter other than space for TableInputFormat > --- > > Key: HBASE-1346 > URL: https://issues.apache.org/jira/browse/HBASE-1346 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0 >Reporter: Justin Becker >Priority: Major > Labels: beginner > > Split column names using a delimiter other than space for TableInputFormat. > The configure(JobConf) method currently splits column names by the space > character. This prevents scanning by columns where the qualifier contains a > space. For example, "myColumn:some key". To be consistent with the shell > maybe allow the following syntax "['myColumn:some key','myOtherColumn:key']" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-1605) TableInputFormat should support 'limit'
[ https://issues.apache.org/jira/browse/HBASE-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-1605. -- Resolution: Later > TableInputFormat should support 'limit' > --- > > Key: HBASE-1605 > URL: https://issues.apache.org/jira/browse/HBASE-1605 > Project: HBase > Issue Type: Improvement >Reporter: Chris Wensel >Priority: Major > Labels: beginner > > Would be useful if TableInputFormat could be passed a 'limit' property value > that limited the total result set to the value of 'limit'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-1199) TableMapReduceUtil.initTableReduceJob smart reduce assigned to server hosting region
[ https://issues.apache.org/jira/browse/HBASE-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-1199. -- Resolution: Later > TableMapReduceUtil.initTableReduceJob smart reduce assigned to server hosting > region > > > Key: HBASE-1199 > URL: https://issues.apache.org/jira/browse/HBASE-1199 > Project: HBase > Issue Type: Improvement >Reporter: Billy Pearson >Priority: Major > > we should be able ot add this feature when HADOOP-589 is done. > This will allow for lower network usage when writing to the regions. > While will over all improve speed when you scale to over a rack size when > interlinks of switchs are limited. > Have to consider partitioner in all of this maybe the partitioner can hint > the reduce_task->region->server -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25165) Change 'State time' in UI so sorts
[ https://issues.apache.org/jira/browse/HBASE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209925#comment-17209925 ] Michael Stack commented on HBASE-25165: --- Here is how 'start time' currently displays !Screen Shot 2020-10-07 at 4.15.32 PM.png! The proposal is to change it to this: !Screen Shot 2020-10-07 at 4.15.42 PM.png! > Change 'State time' in UI so sorts > -- > > Key: HBASE-25165 > URL: https://issues.apache.org/jira/browse/HBASE-25165 > Project: HBase > Issue Type: Bug > Components: UI >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Attachments: Screen Shot 2020-10-07 at 4.15.32 PM.png, Screen Shot > 2020-10-07 at 4.15.42 PM.png > > > Here is a minor issue. > I had an issue w/ crashing servers. The servers were auto-restarted on crash. > To find the crashing servers, I was sorting on the 'Start time' column in the > Master UI. This basically worked but the sort is unreliable as the date we > display starts with days-of-the-week. > This issue is about moving to display start time in iso8601 which is sortable > (and occupies less real estate). Let me add some images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25165) Change 'State time' in UI so sorts
[ https://issues.apache.org/jira/browse/HBASE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25165: -- Attachment: Screen Shot 2020-10-07 at 4.15.32 PM.png > Change 'State time' in UI so sorts > -- > > Key: HBASE-25165 > URL: https://issues.apache.org/jira/browse/HBASE-25165 > Project: HBase > Issue Type: Bug > Components: UI >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Attachments: Screen Shot 2020-10-07 at 4.15.32 PM.png, Screen Shot > 2020-10-07 at 4.15.42 PM.png > > > Here is a minor issue. > I had an issue w/ crashing servers. The servers were auto-restarted on crash. > To find the crashing servers, I was sorting on the 'Start time' column in the > Master UI. This basically worked but the sort is unreliable as the date we > display starts with days-of-the-week. > This issue is about moving to display start time in iso8601 which is sortable > (and occupies less real estate). Let me add some images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25165) Change 'State time' in UI so sorts
[ https://issues.apache.org/jira/browse/HBASE-25165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25165: -- Attachment: Screen Shot 2020-10-07 at 4.15.42 PM.png > Change 'State time' in UI so sorts > -- > > Key: HBASE-25165 > URL: https://issues.apache.org/jira/browse/HBASE-25165 > Project: HBase > Issue Type: Bug > Components: UI >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Attachments: Screen Shot 2020-10-07 at 4.15.42 PM.png > > > Here is a minor issue. > I had an issue w/ crashing servers. The servers were auto-restarted on crash. > To find the crashing servers, I was sorting on the 'Start time' column in the > Master UI. This basically worked but the sort is unreliable as the date we > display starts with days-of-the-week. > This issue is about moving to display start time in iso8601 which is sortable > (and occupies less real estate). Let me add some images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25165) Change 'State time' in UI so sorts
Michael Stack created HBASE-25165: - Summary: Change 'State time' in UI so sorts Key: HBASE-25165 URL: https://issues.apache.org/jira/browse/HBASE-25165 Project: HBase Issue Type: Bug Components: UI Reporter: Michael Stack Assignee: Michael Stack Here is a minor issue. I had an issue w/ crashing servers. The servers were auto-restarted on crash. To find the crashing servers, I was sorting on the 'Start time' column in the Master UI. This basically worked but the sort is unreliable as the date we display starts with days-of-the-week. This issue is about moving to display start time in iso8601 which is sortable (and occupies less real estate). Let me add some images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25151) warmupRegion frustrates registering WALs on the catalog replicationsource
[ https://issues.apache.org/jira/browse/HBASE-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25151. --- Hadoop Flags: Reviewed Assignee: Michael Stack Resolution: Fixed Merged to branch HBASE-18070. Thanks for reviews [~huaxiangsun] and [~zhangduo] > warmupRegion frustrates registering WALs on the catalog replicationsource > - > > Key: HBASE-25151 > URL: https://issues.apache.org/jira/browse/HBASE-25151 > Project: HBase > Issue Type: Sub-task > Components: read replicas >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: HBASE-18070 > > > Writing a test for HBASE-25145 > I noticed that the warmupRegion call triggered by the Master on Region move > mess-up registering hbase:meta ReplicationSource. Add accommodation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-22976: -- Release Note: WALPlayer can replay the content of recovered.edits directories. Side-effect is that WAL filename timestamp is now factored when setting start/end times for WALInputFormat; i.e. wal.start.time and wal.end.time values on a job context. Previous we looked at wal.end.time only. Now we consider wal.start.time too. If a file has a name outside of wal.start.time<->wal.end.time, it'll be by-passed. This change-in-behavior will make it easier on operator crafting timestamp filters processing WALs. was:WALPlayer can replay the content of recovered.edits directories. > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2, walplayer >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0 > > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209241#comment-17209241 ] Michael Stack commented on HBASE-22976: --- [~mbz] The attached PR makes it so WALPlayer can be passed a 'recovered.edits' directory. It will then pick up the content of this directory and replay all edits into the running cluster. It does not delete the recovered.edits directory when done. What version of hbase are you running [~mbz] ? > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0 > > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-22976: -- Component/s: walplayer > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2, walplayer >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0 > > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-22976: -- Fix Version/s: 2.4.0 2.3.3 3.0.0-alpha-1 > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0 > > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209240#comment-17209240 ] Michael Stack commented on HBASE-22976: --- The 'Orphan Regions' adoption service would use this new facility to replay the content of the recovered.edits directory of an Orphan Region – if one – and when done, it would delete the recovered.edits directory. > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack reassigned HBASE-22976: - Release Note: WALPlayer can replay the content of recovered.edits directories. Assignee: Michael Stack > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25159) [hbck2] Add an 'adoption service' for 'Orphaned Regions'
Michael Stack created HBASE-25159: - Summary: [hbck2] Add an 'adoption service' for 'Orphaned Regions' Key: HBASE-25159 URL: https://issues.apache.org/jira/browse/HBASE-25159 Project: HBase Issue Type: Bug Components: hbck2 Reporter: Michael Stack The 'HBCK Report' has a section for 'Orphaned Regions', regions in the filesystem that are no longer referenced by the running hbase. They should have been cleaned up as part of normal processing but for whatever reason, they were not. Usually these are dessicated directories with nothing in them but sometimes they might have an hfile or two. They could have content in recovered.edits directory too. The "HBCK Report" page outline how to run the bulk load tool. This will pick up any hfiles in the 'Orphan Region' if there is worry that they have been dropped mistakenly. For the content under 'recovered.edits', the WALPlayer has just been adjusted so it can pick up this content (See over in HBASE-22976). The 'adoption service' would be run over an orphan region and it would apply the 'bulk load' if hfiles found and the WALPlayer if 'recovered.edits' found... it would then clean up the region directory on successful load after leaving audit so the 'orphan' was cleaned-up. hbck2 tool would run the adoption service at first. Once we had some experience and confidence that the adoption service was running smoothly, we'd consider integrating it into the catalogjanitor. The 'adoption service' first gets mention in the body of HBASE-21745 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23951) Avoid high speed recursion trap in AsyncRequestFutureImpl.
[ https://issues.apache.org/jira/browse/HBASE-23951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-23951. --- Resolution: Incomplete > Avoid high speed recursion trap in AsyncRequestFutureImpl. > -- > > Key: HBASE-23951 > URL: https://issues.apache.org/jira/browse/HBASE-23951 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Mark Robert Miller >Priority: Minor > > While working on branch-2, I ran into an issue where a retryable error kept > occurring and code in AsyncRequestFutureImpl would reduce the backoff wait to > 0 and extremely rapidly eat up a of thread stack space with recursive retry > calls. This little patch stops the backoff wait kill after 3 retries. Chosen > kind of arbitrarily, perhaps 5 is the right number, but I find large retry > counts tend to hide things and that has made me default to fairly > conservative in all my arbitrary number picking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24447) Contribute a Test class that shows some examples for using the Async Client API
[ https://issues.apache.org/jira/browse/HBASE-24447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24447. --- Resolution: Incomplete > Contribute a Test class that shows some examples for using the Async Client > API > --- > > Key: HBASE-24447 > URL: https://issues.apache.org/jira/browse/HBASE-24447 > Project: HBase > Issue Type: Test > Components: test >Reporter: Mark Robert Miller >Priority: Minor > > 类似于 > [https://github.com/apache/hbase/blob/master/hbase-examples/src/main/java/org/apache/hadoop/hbase/client/example/AsyncClientExample.java] > 但最初是以测试的形式进行,以使验证和环境变得容易。 > 这基本上是一些示例,说明如何将CompletableFuture API与Async Client一起使用-鉴于CompletableFuture > API的表达力和大小,从头开始做一个新手可能会有些痛苦,但是使用其他示例代码要容易得多建立或修补。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24497) Close one off connections in RawSyncHBaseAdmin.
[ https://issues.apache.org/jira/browse/HBASE-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24497. --- Resolution: Incomplete > Close one off connections in RawSyncHBaseAdmin. > --- > > Key: HBASE-24497 > URL: https://issues.apache.org/jira/browse/HBASE-24497 > Project: HBase > Issue Type: Bug >Reporter: Mark Miller >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24498) Include jetty-schemas so that xml parsing does not need to hit sun/oracle urls.
[ https://issues.apache.org/jira/browse/HBASE-24498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24498. --- Resolution: Incomplete > Include jetty-schemas so that xml parsing does not need to hit sun/oracle > urls. > --- > > Key: HBASE-24498 > URL: https://issues.apache.org/jira/browse/HBASE-24498 > Project: HBase > Issue Type: Improvement >Reporter: Mark Miller >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24054) The Jetty's version number leak occurred while using the thrift service
[ https://issues.apache.org/jira/browse/HBASE-24054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24054. --- Fix Version/s: 2.4.0 3.0.0-alpha-1 Hadoop Flags: Reviewed Assignee: shenshengli Resolution: Fixed Merged. Thanks for the patch [~shenshengli] > The Jetty's version number leak occurred while using the thrift service > --- > > Key: HBASE-24054 > URL: https://issues.apache.org/jira/browse/HBASE-24054 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.1 >Reporter: shenshengli >Assignee: shenshengli >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > > When the port is checked with curl -I host:port, the version number of jetty > is displayed. > To be safe, jetty's version number should be blocked. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-22976: -- Attachment: 22976.txt > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Michael Stack >Priority: Major > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208428#comment-17208428 ] Michael Stack commented on HBASE-22976: --- Something like the attached patch? > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Michael Stack >Priority: Major > Attachments: 22976.txt > > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22976) [HBCK2] Add RecoveredEditsPlayer
[ https://issues.apache.org/jira/browse/HBASE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208418#comment-17208418 ] Michael Stack commented on HBASE-22976: --- hbck doesn't work properly against hbase2. Have you tried changing the code that tries to parse the file name so it is more accommodating of the filenames under recovered.edits [~mbz] ? Just skip the time compare stuff and add it to the result list. If you are unable to put your changes out on the cluster and are having trouble trying them inside MR, shout and I'll tell you how I got it to work recently for myself. > [HBCK2] Add RecoveredEditsPlayer > > > Key: HBASE-22976 > URL: https://issues.apache.org/jira/browse/HBASE-22976 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Michael Stack >Priority: Major > > We need a recovered edits player. Messing w/ the 'adoption service' -- > tooling to adopt orphan regions and hfiles -- I've been manufacturing damaged > clusters by moving stuff around under the running cluster. No reason to think > that an hbase couldn't lose accounting of a whole region if a cataclysm. If > so, region will have stuff like the '.regioninfo', dirs per column family w/ > store files but it could too have a 'recovered_edits' directory with content > in it. We have a WALPlayer for errant WALs. We have the FSHLog tool which can > read recovered_edits content for debugging data loss. Missing is a > RecoveredEditsPlayer. > I took a look at extending the WALPlayer since it has a bunch of nice options > and it can run at bulk. Ideally, it would just digest recovered edits content > if passed an option or recovered edits directories. On first glance, it > didn't seem like an easy integration Would be worth taking a look again. > Would be good if we could avoid making a new, distinct tool, just for > Recovered Edits. > The bulkload tool expects hfiles in column family directories. Recovered > edits files are not hfiles and the files are x-columnfamily so this is not > the way to go though a bulkload-like tool that moved the recovered edits > files under the appropriate region dir and asked the region reopen would be a > possibility (Would need the bulk load complete trick of splitting input if > the region boundaries in the live cluster do not align w/ those of the errant > recovered edits files). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25144) Add Hadoop-3.3.0 to personality hadoopcheck
[ https://issues.apache.org/jira/browse/HBASE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208411#comment-17208411 ] Michael Stack commented on HBASE-25144: --- {quote}In practice, this should add maybe 5 minutes to that existing stage. {quote} Sounds good (I was thinking full test suite run too...) {quote}I think we should at least compile against all the minor versions that we claim to support. {quote} +1 > Add Hadoop-3.3.0 to personality hadoopcheck > --- > > Key: HBASE-25144 > URL: https://issues.apache.org/jira/browse/HBASE-25144 > Project: HBase > Issue Type: Task > Components: build, community >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Minor > Fix For: 3.0.0-alpha-1 > > > Now that Hadoop 3.3.0 is released, let's figure out where it goes in our > testing matrix. Start by adding it to precommit checks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25145) WALReader quits if nothing to replicate (and won't restart)
[ https://issues.apache.org/jira/browse/HBASE-25145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25145. --- Resolution: Not A Problem Resolving as 'Not A Problem' given test shows the Reader thread stays up. > WALReader quits if nothing to replicate (and won't restart) > --- > > Key: HBASE-25145 > URL: https://issues.apache.org/jira/browse/HBASE-25145 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Priority: Major > > Noticed by [~huaxiangsun] in his review of HBASE-25055 "Add ReplicationSource > for meta WALs; add enable/disable w…" > {quote}bq. Eventually, the meta wal file will be gced and there is no more > logs in the queue. In that case, the walReader thread will quit. When the > meta region is moved back, it does not seem that walReader thread will be > restarted. So it seems that something is broken. > {quote} > This issue is about writing a test to run the above scenario and fix any > probs. found. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25048) [HBCK2] Bypassed parent procedures are not updated in store
[ https://issues.apache.org/jira/browse/HBASE-25048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25048. --- Fix Version/s: 2.4.0 2.3.3 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Merged to branch-2.3+ Thanks for the patch [~Joseph295] > [HBCK2] Bypassed parent procedures are not updated in store > --- > > Key: HBASE-25048 > URL: https://issues.apache.org/jira/browse/HBASE-25048 > Project: HBase > Issue Type: Bug > Components: hbck2, proc-v2 >Reporter: Yi Mei >Assignee: Junhong Xu >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0 > > > See code in > [ProcedureExecutor|https://github.com/apache/hbase/blob/master/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java#L980]: > {code:java} > Procedure current = procedure; > while (current != null) { > LOG.debug("Bypassing {}", current); > current.bypass(getEnvironment()); > store.update(procedure); // update current procedure > long parentID = current.getParentProcId(); > current = getProcedure(parentID); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25145) WALReader quits if nothing to replicate (and won't restart)
[ https://issues.apache.org/jira/browse/HBASE-25145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206647#comment-17206647 ] Michael Stack commented on HBASE-25145: --- I wrote a test and it doesn't look like WALReader thread quits. It stays up as long as there is a WAL around – as per [~zhangduo] above – which will be the case given how HBASE-25055 is implemented. Let me try some more. Shout if you have a pointer on what to look at [~huaxiangsun] - thanks. (The test did find a good issue – see HBASE-25151) > WALReader quits if nothing to replicate (and won't restart) > --- > > Key: HBASE-25145 > URL: https://issues.apache.org/jira/browse/HBASE-25145 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Priority: Major > > Noticed by [~huaxiangsun] in his review of HBASE-25055 "Add ReplicationSource > for meta WALs; add enable/disable w…" > {quote}bq. Eventually, the meta wal file will be gced and there is no more > logs in the queue. In that case, the walReader thread will quit. When the > meta region is moved back, it does not seem that walReader thread will be > restarted. So it seems that something is broken. > {quote} > This issue is about writing a test to run the above scenario and fix any > probs. found. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25151) warmupRegion frustrates registering WALs on the catalog replicationsource
Michael Stack created HBASE-25151: - Summary: warmupRegion frustrates registering WALs on the catalog replicationsource Key: HBASE-25151 URL: https://issues.apache.org/jira/browse/HBASE-25151 Project: HBase Issue Type: Sub-task Components: read replicas Reporter: Michael Stack Fix For: HBASE-18070 Writing a test for HBASE-25145 I noticed that the warmupRegion call triggered by the Master on Region move mess-up registering hbase:meta ReplicationSource. Add accommodation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25055) Add ReplicationSource for meta WALs; add enable/disable when hbase:meta assigned to RS
[ https://issues.apache.org/jira/browse/HBASE-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25055. --- Hadoop Flags: Reviewed Release Note: Set hbase.region.replica.replication.catalog.enabled to enable async WAL Replication for hbase:meta region replicas. Its off by default. Defaults to the RegionReadReplicaEndpoint.class shipping edits -- set hbase.region.replica.catalog.replication to target a different endpoint implementation. Resolution: Fixed Merged to feature branch. Thanks for reviews [~huaxiangsun] and [~zhangduo] > Add ReplicationSource for meta WALs; add enable/disable when hbase:meta > assigned to RS > -- > > Key: HBASE-25055 > URL: https://issues.apache.org/jira/browse/HBASE-25055 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: HBASE-18070 > > > Add ReplicationSource that feeds on hbase:meta WAL files. Add enabling this > source when hbase:meta is opened and hbase:meta region replicas are > configured ON. Disable the source when the hbase:meta Region moves away. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25055) Add ReplicationSource for meta WALs; add enable/disable when hbase:meta assigned to RS
[ https://issues.apache.org/jira/browse/HBASE-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25055: -- Fix Version/s: (was: 2.4.0) (was: 3.0.0-alpha-1) HBASE-18070 > Add ReplicationSource for meta WALs; add enable/disable when hbase:meta > assigned to RS > -- > > Key: HBASE-25055 > URL: https://issues.apache.org/jira/browse/HBASE-25055 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: HBASE-18070 > > > Add ReplicationSource that feeds on hbase:meta WAL files. Add enabling this > source when hbase:meta is opened and hbase:meta region replicas are > configured ON. Disable the source when the hbase:meta Region moves away. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25055) Add ReplicationSource for meta WALs; add enable/disable when hbase:meta assigned to RS
[ https://issues.apache.org/jira/browse/HBASE-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25055: -- Fix Version/s: 3.0.0-alpha-1 > Add ReplicationSource for meta WALs; add enable/disable when hbase:meta > assigned to RS > -- > > Key: HBASE-25055 > URL: https://issues.apache.org/jira/browse/HBASE-25055 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Add ReplicationSource that feeds on hbase:meta WAL files. Add enabling this > source when hbase:meta is opened and hbase:meta region replicas are > configured ON. Disable the source when the hbase:meta Region moves away. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25145) WALReader quits if nothing to replicate (and won't restart)
[ https://issues.apache.org/jira/browse/HBASE-25145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206356#comment-17206356 ] Michael Stack commented on HBASE-25145: --- Did you comment here before review of HBASE-25055 or after? A test of what happens when meta moves back after a lag seems like it would be good to have. You suggest there will be nothing to fix. Huaxiang thinks we might have to revive WALReader if it auto-closes. Let me see. > WALReader quits if nothing to replicate (and won't restart) > --- > > Key: HBASE-25145 > URL: https://issues.apache.org/jira/browse/HBASE-25145 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Priority: Major > > Noticed by [~huaxiangsun] in his review of HBASE-25055 "Add ReplicationSource > for meta WALs; add enable/disable w…" > {quote}bq. Eventually, the meta wal file will be gced and there is no more > logs in the queue. In that case, the walReader thread will quit. When the > meta region is moved back, it does not seem that walReader thread will be > restarted. So it seems that something is broken. > {quote} > This issue is about writing a test to run the above scenario and fix any > probs. found. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25145) WALReader quits if nothing to replicate (and won't restart)
Michael Stack created HBASE-25145: - Summary: WALReader quits if nothing to replicate (and won't restart) Key: HBASE-25145 URL: https://issues.apache.org/jira/browse/HBASE-25145 Project: HBase Issue Type: Sub-task Reporter: Michael Stack Noticed by [~huaxiangsun] in his review of HBASE-25055 "Add ReplicationSource for meta WALs; add enable/disable w…" {quote}bq. Eventually, the meta wal file will be gced and there is no more logs in the queue. In that case, the walReader thread will quit. When the meta region is moved back, it does not seem that walReader thread will be restarted. So it seems that something is broken. {quote} This issue is about writing a test to run the above scenario and fix any probs. found. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25144) Add Hadoop-3.3.0 to personality hadoopcheck
[ https://issues.apache.org/jira/browse/HBASE-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205934#comment-17205934 ] Michael Stack commented on HBASE-25144: --- What'll it do to our nightlies? You think we have to build against three hadoop versions on master? > Add Hadoop-3.3.0 to personality hadoopcheck > --- > > Key: HBASE-25144 > URL: https://issues.apache.org/jira/browse/HBASE-25144 > Project: HBase > Issue Type: Task > Components: build, community >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Minor > Fix For: 3.0.0-alpha-1 > > > Now that Hadoop 3.3.0 is released, let's figure out where it goes in our > testing matrix. Start by adding it to precommit checks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25142) Auto-fix 'Unknown Server'
[ https://issues.apache.org/jira/browse/HBASE-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205766#comment-17205766 ] Michael Stack commented on HBASE-25142: --- At a minimum, we might add to 'hbck2 fixMeta' the scheduling of SCPs for all servers in 'Unknown Servers' list. > Auto-fix 'Unknown Server' > - > > Key: HBASE-25142 > URL: https://issues.apache.org/jira/browse/HBASE-25142 > Project: HBase > Issue Type: Improvement >Reporter: Michael Stack >Priority: Major > > Addressing reports of 'Unknown Server' has come up in various conversations > lately. This issue is about fixing instances of 'Unknown Server' > automatically as part of the tasks undertaken by CatalogJanitor when it runs. > First though, would like to figure a definition for 'Unknown Server' and a > list of ways in which they arise. We need this to figure how to do safe > auto-fixing. > Currently an 'Unknown Server' is a server found in hbase:meta that is not > online (no recent heartbeat) and that is not mentioned in the dead servers > list. > In outline, I'd think CatalogJanitor could schedule an expiration of the RS > znode in zk (if exists) and then an SCP if it finds an 'Unknown Server'. > Perhaps it waits for 2x or 10x the heartbeat interval just-in-case (or not). > The SCP would clean up any references in hbase:meta by reassigning Regions > assigned the 'Unknown Server' after replaying any WALs found in hdfs > attributed to the dead server. > As to how they arise: > * A contrived illustration would be a large online cluster crashes down with > a massive backlog of WAL files – zk went down for some reason say. The replay > of the WALs look like it could take a very long time (lets say the cluster > was badly configured and a bug and misconfig made it so each RS was carrying > hundreds of WALs and there are hundreds of servers). To get the service back > online, the procedure store and WALs are moved aside (for later replay with > WALPlayer). The cluster comes up. meta is onlined but refers to server > instances that are no longer around. Can schedule an SCP per server mentioned > in the 'HBCK Report' by scraping and scripting hbck2 or, better, > catalogjanitor could just do it. > * HBASE-24286 HMaster won't become healthy after after cloning... describes > starting a cluster over data that is hfile-content only. In this case the > original servers used manufacture the hfile cluster data are long dead yet > meta still refers to the old servers. They will not make the 'dead servers' > list. > Let this issue stew awhile. Meantime collect how 'Unknown Server' gets > created and best way to fix. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25142) Auto-fix 'Unknown Server'
Michael Stack created HBASE-25142: - Summary: Auto-fix 'Unknown Server' Key: HBASE-25142 URL: https://issues.apache.org/jira/browse/HBASE-25142 Project: HBase Issue Type: Improvement Reporter: Michael Stack Addressing reports of 'Unknown Server' has come up in various conversations lately. This issue is about fixing instances of 'Unknown Server' automatically as part of the tasks undertaken by CatalogJanitor when it runs. First though, would like to figure a definition for 'Unknown Server' and a list of ways in which they arise. We need this to figure how to do safe auto-fixing. Currently an 'Unknown Server' is a server found in hbase:meta that is not online (no recent heartbeat) and that is not mentioned in the dead servers list. In outline, I'd think CatalogJanitor could schedule an expiration of the RS znode in zk (if exists) and then an SCP if it finds an 'Unknown Server'. Perhaps it waits for 2x or 10x the heartbeat interval just-in-case (or not). The SCP would clean up any references in hbase:meta by reassigning Regions assigned the 'Unknown Server' after replaying any WALs found in hdfs attributed to the dead server. As to how they arise: * A contrived illustration would be a large online cluster crashes down with a massive backlog of WAL files – zk went down for some reason say. The replay of the WALs look like it could take a very long time (lets say the cluster was badly configured and a bug and misconfig made it so each RS was carrying hundreds of WALs and there are hundreds of servers). To get the service back online, the procedure store and WALs are moved aside (for later replay with WALPlayer). The cluster comes up. meta is onlined but refers to server instances that are no longer around. Can schedule an SCP per server mentioned in the 'HBCK Report' by scraping and scripting hbck2 or, better, catalogjanitor could just do it. * HBASE-24286 HMaster won't become healthy after after cloning... describes starting a cluster over data that is hfile-content only. In this case the original servers used manufacture the hfile cluster data are long dead yet meta still refers to the old servers. They will not make the 'dead servers' list. Let this issue stew awhile. Meantime collect how 'Unknown Server' gets created and best way to fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25091) Move LogComparator from ReplicationSource to AbstractFSWALProvider#.WALsStartTimeComparator
[ https://issues.apache.org/jira/browse/HBASE-25091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25091. --- Fix Version/s: 2.4.0 3.0.0-alpha-1 Hadoop Flags: Reviewed Assignee: Michael Stack Resolution: Fixed Merged. Thanks for reviews [~zhangduo] and [~zghao] > Move LogComparator from ReplicationSource to > AbstractFSWALProvider#.WALsStartTimeComparator > --- > > Key: HBASE-25091 > URL: https://issues.apache.org/jira/browse/HBASE-25091 > Project: HBase > Issue Type: Improvement >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Minor cleanup item noticed playing over in HBASE-18070. > ReplicationSource has an inner class named LogComparator which is a pretty > generic name for the comparator only it just compares on WAL start time and > nothing else. > Also, messing in HBASE-18070 I ran into compares that included user-space > WALs and hbase:meta WALs. The LogComparator as is barfed on meta WALs. > This ticket moves the comparator to AbstractFSWALProvider, where folks will > go looking if they need WAL comparators, and it also renames it to more > clearly explain what it does (and makes it so it can compare start times even > if it a meta WAL). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24395) ServerName#getHostname() is case sensitive
[ https://issues.apache.org/jira/browse/HBASE-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204998#comment-17204998 ] Michael Stack commented on HBASE-24395: --- [~Bo Cui] what you think of the suggestion by [~pankajkumar] that "You can backport HBASE-20589 changes." <= Seems good to me. > ServerName#getHostname() is case sensitive > -- > > Key: HBASE-24395 > URL: https://issues.apache.org/jira/browse/HBASE-24395 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Affects Versions: 1.3.1 >Reporter: Bo Cui >Priority: Major > Attachments: HBase-24395.patch, image-2020-05-18-17-42-57-119.png > > > ServerName calss,the getServerName(String hostName, int port, long > startcode),equals and compareTo are case insensitive, but getHostname() is > case sensitive. > if hostName is HOSTNAME1, ServerName is hostname1,1,1589615319931, and > getHostname() returns HOSTNAME1. > and then BaseLoadBalancer#retainAssignment() uses ServerName#getHostname(), > all keys of serversByHostname are > upperCase(HOSTNAME1,HOSTNAME2,HOSTNAME3,HOSTNAME4...) from > ServerManager#createDestinationServersList, but oldServerName.getHostname() > is lowerCase(hostname1,hostname2,hostname3...) from walLog dir. > !image-2020-05-18-17-42-57-119.png! > and finally...all region of old ServerName will be assigned to random hosts -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25062) The link of "Re:(HBASE-451) Remove HTableDescriptor from HRegionInfo" invalid
[ https://issues.apache.org/jira/browse/HBASE-25062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25062. --- Fix Version/s: 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Merged. Thanks for the patch [~filtertip] (and reviews by [~zhangduo] and [~janh] ) > The link of "Re:(HBASE-451) Remove HTableDescriptor from HRegionInfo" invalid > - > > Key: HBASE-25062 > URL: https://issues.apache.org/jira/browse/HBASE-25062 > Project: HBase > Issue Type: Improvement > Components: documentation >Reporter: Zheng Wang >Assignee: Zheng Wang >Priority: Minor > Fix For: 3.0.0-alpha-1 > > > This link belong to "184.8.8. Do not edit JIRA comments" at document. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25109) Add MR Counters to WALPlayer; currently hard to tell if it is doing anything
[ https://issues.apache.org/jira/browse/HBASE-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25109. --- Fix Version/s: 2.4.0 2.3.3 3.0.0-alpha-1 Hadoop Flags: Reviewed Assignee: Michael Stack Resolution: Fixed Pushed to branch-2.3+. Thanks for review [~huaxiangsun] > Add MR Counters to WALPlayer; currently hard to tell if it is doing anything > > > Key: HBASE-25109 > URL: https://issues.apache.org/jira/browse/HBASE-25109 > Project: HBase > Issue Type: Improvement >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0 > > > For example, when WALPlayer runs, it emits this: > {code:java} > 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] > mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: > Counters: 20 > File System Counters > FILE: Number of bytes read=268891453 > FILE: Number of bytes written=1018719 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=4375 > Map output records=5369 > Input split bytes=245 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=59 > Total committed heap usage (bytes)=518979584 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 {code} > Change it so it does this: > {code:java} > 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] > mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: > Counters: 20 > File System Counters > FILE: Number of bytes read=268891453 > FILE: Number of bytes written=1018719 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=4375 > Map output records=5369 > Input split bytes=245 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=59 > Total committed heap usage (bytes)=518979584 > org.apache.hadoop.hbase.mapreduce.WALPlayer$Counter > CELLS_READ=89574 > CELLS_WRITTEN=89572 > DELETES=64 > PUTS=5305 > WALEDITS=4375 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25099) Change meta replica count by altering meta table descriptor
[ https://issues.apache.org/jira/browse/HBASE-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204804#comment-17204804 ] Michael Stack commented on HBASE-25099: --- Yeah, I think #2. You made good argument when I wanted to disable meta to allow generally altering schema (back when I wanted to add indices and blooms to the hbase:meta table). > Change meta replica count by altering meta table descriptor > --- > > Key: HBASE-25099 > URL: https://issues.apache.org/jira/browse/HBASE-25099 > Project: HBase > Issue Type: Improvement > Components: meta, read replicas >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > > As now we support altering meta table, it will be better to also deal with > changing meta replica number using altering meta, i.e, we could unify the > logic in MasterMetaBootstrap to ModifyTableProcedure, and another benefit is > that we do not need to restart master when changing the replica number for > meta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25007) Make HBCK2 work for 'root table'
[ https://issues.apache.org/jira/browse/HBASE-25007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203613#comment-17203613 ] Michael Stack commented on HBASE-25007: --- {quote}And the problem is that, we will not schedule SCP for them if something goes wrong. {quote} Correct. Maybe we could if an 'unknown server' has been in hbase:meta for more than two or ten heartbeats... The catalog janitor could schedule an SCP for any 'unknown server' found. {quote}So I think a normal SCP is enough? {quote} In most cases, yes. {quote}Or at least, we do not need to scan meta to find out the regions on a 'unknown server'? {quote} HBCKSCP only does this if the Operator runs an HBCKSCP and the SCP super-call returns that there are no matching crashed Servers. In this case, the Operator is insisting that SCP has 'missed' some references to the named server; in this latter case, HBCKSCP goes the extra running a full scan looking for any references to the passed server – even looking for references from Region Replicas. Fold this latter checking bit into SCP and then remove HBCKSCP? I've not run into case where meta was on the 'unknown server'. Meta has to be up to make any progress. On restarting Master to get the latest, in my 'cluster fixing experience', the Master has usually been started recently. > Make HBCK2 work for 'root table' > > > Key: HBASE-25007 > URL: https://issues.apache.org/jira/browse/HBASE-25007 > Project: HBase > Issue Type: Sub-task > Components: hbck2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > > We will also scan catalog table and fix them in HBCK2, we should add support > for root too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25109) Add MR Counters to WALPlayer; currently hard to tell if it is doing anything
[ https://issues.apache.org/jira/browse/HBASE-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25109: -- Description: For example, when WALPlayer runs, it emits this: {code:java} 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: Counters: 20 File System Counters FILE: Number of bytes read=268891453 FILE: Number of bytes written=1018719 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=4375 Map output records=5369 Input split bytes=245 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=59 Total committed heap usage (bytes)=518979584 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 {code} Change it so it does this: {code:java} 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: Counters: 20 File System Counters FILE: Number of bytes read=268891453 FILE: Number of bytes written=1018719 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=4375 Map output records=5369 Input split bytes=245 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=59 Total committed heap usage (bytes)=518979584 org.apache.hadoop.hbase.mapreduce.WALPlayer$Counter CELLS_READ=89574 CELLS_WRITTEN=89572 DELETES=64 PUTS=5305 WALEDITS=4375 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 {code} > Add MR Counters to WALPlayer; currently hard to tell if it is doing anything > > > Key: HBASE-25109 > URL: https://issues.apache.org/jira/browse/HBASE-25109 > Project: HBase > Issue Type: Improvement >Reporter: Michael Stack >Priority: Major > > For example, when WALPlayer runs, it emits this: > {code:java} > 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] > mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: > Counters: 20 > File System Counters > FILE: Number of bytes read=268891453 > FILE: Number of bytes written=1018719 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=4375 > Map output records=5369 > Input split bytes=245 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=59 > Total committed heap usage (bytes)=518979584 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Bytes Written=0 {code} > Change it so it does this: > {code:java} > 020-09-28 11:16:05,489 INFO [LocalJobRunner Map Task Executor #0] > mapred.Task: Final Counters for attempt_local1916643172_0001_m_00_0: > Counters: 20 > File System Counters > FILE: Number of bytes read=268891453 > FILE: Number of bytes written=1018719 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=4375 > Map output records=5369 > Input split bytes=245 > Spilled Records=0 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=59 > Total committed heap usage (bytes)=518979584 > org.apache.hadoop.hbase.mapreduce.WALPlayer$Counter > CELLS_READ=89574 > CELLS_WRITTEN=89572 > DELETES=64 > PUTS=5305 > WALEDITS=4375 > File Input Format Counters > Bytes Read=0 > File Output Format Counters > Byte
[jira] [Created] (HBASE-25109) Add MR Counters to WALPlayer; currently hard to tell if it is doing anything
Michael Stack created HBASE-25109: - Summary: Add MR Counters to WALPlayer; currently hard to tell if it is doing anything Key: HBASE-25109 URL: https://issues.apache.org/jira/browse/HBASE-25109 Project: HBase Issue Type: Improvement Reporter: Michael Stack -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25099) Change meta replica number by altering meta table descriptor
[ https://issues.apache.org/jira/browse/HBASE-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202240#comment-17202240 ] Michael Stack commented on HBASE-25099: --- Sure. There exists a mechanism currently for setting replica count on hbase:meta that comes from region replicas. In HConstants are the following defines: {color:#80}public static final {color}String {color:#660e7a}META_REPLICAS_NUM {color}= {color:#008000}"hbase.meta.replica.count"{color}; {color:#80}public static final int {color}{color:#660e7a}DEFAULT_META_REPLICA_NUM {color}= {color:#ff}1{color}; Could make it so you do replicas on meta same way you do it on user-space table. > Change meta replica number by altering meta table descriptor > > > Key: HBASE-25099 > URL: https://issues.apache.org/jira/browse/HBASE-25099 > Project: HBase > Issue Type: Improvement > Components: meta, read replicas >Reporter: Duo Zhang >Priority: Major > > As now we support altering meta table, it will be better to also deal with > changing meta replica number using altering meta, i.e, we could unify the > logic in MasterMetaBootstrap to ModifyTableProcedure, and another benefit is > that we do not need to restart master when changing the replica number for > meta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica
[ https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201212#comment-17201212 ] Michael Stack commented on HBASE-18070: --- I just did an edit on the Design doc. Added section on 'Async WAL Replication' current state (section 3.1) –- i.e. it is unused even though it has been checked in for ~5 years now -- concluding that the current effort is "... solving for the special case of catalog tables only. If _async WAL Replication_ can be made work satisfactorily for the special case, it may revive interest in [_async WAL Replication_] for user-space Tables. At that time, we’ll can come back to work on the general case." > Enable memstore replication for meta replica > > > Key: HBASE-18070 > URL: https://issues.apache.org/jira/browse/HBASE-18070 > Project: HBase > Issue Type: New Feature >Reporter: Hua Xiang >Assignee: Huaxiang Sun >Priority: Major > > Based on the current doc, memstore replication is not enabled for meta > replica. Memstore replication will be a good improvement for meta replica. > Create jira to track this effort (feasibility, design, implementation, etc). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25091) Move LogComparator from ReplicationSource to AbstractFSWALProvider#.WALsStartTimeComparator
Michael Stack created HBASE-25091: - Summary: Move LogComparator from ReplicationSource to AbstractFSWALProvider#.WALsStartTimeComparator Key: HBASE-25091 URL: https://issues.apache.org/jira/browse/HBASE-25091 Project: HBase Issue Type: Improvement Reporter: Michael Stack Minor cleanup item noticed playing over in HBASE-18070. ReplicationSource has an inner class named LogComparator which is a pretty generic name for the comparator only it just compares on WAL start time and nothing else. Also, messing in HBASE-18070 I ran into compares that included user-space WALs and hbase:meta WALs. The LogComparator as is barfed on meta WALs. This ticket moves the comparator to AbstractFSWALProvider, where folks will go looking if they need WAL comparators, and it also renames it to more clearly explain what it does (and makes it so it can compare start times even if it a meta WAL). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica
[ https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200499#comment-17200499 ] Michael Stack commented on HBASE-18070: --- Made a HBASE-18070 branch just now from master at e7797208d6ca10a10d37b77591e1f0531ed57dfc Applied HBASE-25068 to HBASE-18070. > Enable memstore replication for meta replica > > > Key: HBASE-18070 > URL: https://issues.apache.org/jira/browse/HBASE-18070 > Project: HBase > Issue Type: New Feature >Reporter: Hua Xiang >Assignee: Huaxiang Sun >Priority: Major > > Based on the current doc, memstore replication is not enabled for meta > replica. Memstore replication will be a good improvement for meta replica. > Create jira to track this effort (feasibility, design, implementation, etc). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25068) Pass WALFactory to Replication so it knows of all WALProviders, not just default/user-space
[ https://issues.apache.org/jira/browse/HBASE-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200498#comment-17200498 ] Michael Stack commented on HBASE-25068: --- Pushed master branch patch on to HBASE-18070 which I just branched from master. > Pass WALFactory to Replication so it knows of all WALProviders, not just > default/user-space > --- > > Key: HBASE-25068 > URL: https://issues.apache.org/jira/browse/HBASE-25068 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Fix For: HBASE-18070 > > Attachments: > 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.2, > 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.master > > > Small change that passes all WALProviders to ReplicationService rather than > just the default/user-space WALProvider. It does this using the WALFactory > vessel since it holds all Providers. This change is to be exploited by > adjacent sub-task HBASE-25055 in follow-on. This sub-task also exists to make > the HBASE-25055 patch smaller and more focused, easier to review. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25068) Pass WALFactory to Replication so it knows of all WALProviders, not just default/user-space
[ https://issues.apache.org/jira/browse/HBASE-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25068: -- Fix Version/s: (was: 2.4.0) (was: 3.0.0-alpha-1) HBASE-18070 > Pass WALFactory to Replication so it knows of all WALProviders, not just > default/user-space > --- > > Key: HBASE-25068 > URL: https://issues.apache.org/jira/browse/HBASE-25068 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Fix For: HBASE-18070 > > Attachments: > 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.2, > 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.master > > > Small change that passes all WALProviders to ReplicationService rather than > just the default/user-space WALProvider. It does this using the WALFactory > vessel since it holds all Providers. This change is to be exploited by > adjacent sub-task HBASE-25055 in follow-on. This sub-task also exists to make > the HBASE-25055 patch smaller and more focused, easier to review. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25068) Pass WALFactory to Replication so it knows of all WALProviders, not just default/user-space
[ https://issues.apache.org/jira/browse/HBASE-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25068: -- Attachment: 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.master > Pass WALFactory to Replication so it knows of all WALProviders, not just > default/user-space > --- > > Key: HBASE-25068 > URL: https://issues.apache.org/jira/browse/HBASE-25068 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > Attachments: > 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.2, > 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.master > > > Small change that passes all WALProviders to ReplicationService rather than > just the default/user-space WALProvider. It does this using the WALFactory > vessel since it holds all Providers. This change is to be exploited by > adjacent sub-task HBASE-25055 in follow-on. This sub-task also exists to make > the HBASE-25055 patch smaller and more focused, easier to review. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25068) Pass WALFactory to Replication so it knows of all WALProviders, not just default/user-space
[ https://issues.apache.org/jira/browse/HBASE-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-25068: -- Attachment: 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.2 > Pass WALFactory to Replication so it knows of all WALProviders, not just > default/user-space > --- > > Key: HBASE-25068 > URL: https://issues.apache.org/jira/browse/HBASE-25068 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > Attachments: > 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.2 > > > Small change that passes all WALProviders to ReplicationService rather than > just the default/user-space WALProvider. It does this using the WALFactory > vessel since it holds all Providers. This change is to be exploited by > adjacent sub-task HBASE-25055 in follow-on. This sub-task also exists to make > the HBASE-25055 patch smaller and more focused, easier to review. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25068) Pass WALFactory to Replication so it knows of all WALProviders, not just default/user-space
[ https://issues.apache.org/jira/browse/HBASE-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200494#comment-17200494 ] Michael Stack commented on HBASE-25068: --- OK. No problem. Will revert and make a HBASE-18070 branch. > Pass WALFactory to Replication so it knows of all WALProviders, not just > default/user-space > --- > > Key: HBASE-25068 > URL: https://issues.apache.org/jira/browse/HBASE-25068 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Small change that passes all WALProviders to ReplicationService rather than > just the default/user-space WALProvider. It does this using the WALFactory > vessel since it holds all Providers. This change is to be exploited by > adjacent sub-task HBASE-25055 in follow-on. This sub-task also exists to make > the HBASE-25055 patch smaller and more focused, easier to review. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25068) Pass WALFactory to Replication so it knows of all WALProviders, not just default/user-space
[ https://issues.apache.org/jira/browse/HBASE-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25068. --- Fix Version/s: 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Pushed to branch-2 and master. Thanks for review [~zhangduo] > Pass WALFactory to Replication so it knows of all WALProviders, not just > default/user-space > --- > > Key: HBASE-25068 > URL: https://issues.apache.org/jira/browse/HBASE-25068 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Small change that passes all WALProviders to ReplicationService rather than > just the default/user-space WALProvider. It does this using the WALFactory > vessel since it holds all Providers. This change is to be exploited by > adjacent sub-task HBASE-25055 in follow-on. This sub-task also exists to make > the HBASE-25055 patch smaller and more focused, easier to review. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25067) Edit of log messages around async WAL Replication; checkstyle fixes; and a bugfix
[ https://issues.apache.org/jira/browse/HBASE-25067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25067. --- Fix Version/s: 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Pushed to branch-2 and master, thanks for review [~zhangduo] > Edit of log messages around async WAL Replication; checkstyle fixes; and a > bugfix > - > > Key: HBASE-25067 > URL: https://issues.apache.org/jira/browse/HBASE-25067 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Edit of logging around region replicas: shortening and adding context. > Checkstyle fixes in edited files while I was in there. > Bug fix in AssignRegionHandler – was using M_RS_CLOSE_META to open > a Region instead of a M_RS_OPEN_META. > > Main reason for this issue is making the substantial adjacent issue > HBASE-25055 smaller in size/easier to review. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25081) Up the container nproc uplimit to 30000
[ https://issues.apache.org/jira/browse/HBASE-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-25081. --- Fix Version/s: 3.0.0-alpha-1 Hadoop Flags: Reviewed Release Note: Ups the nproc (processes) limit from 12500 to 3 in yetus (so build container can have new limit). Assignee: Istvan Toth Resolution: Fixed > Up the container nproc uplimit to 3 > --- > > Key: HBASE-25081 > URL: https://issues.apache.org/jira/browse/HBASE-25081 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Fix For: 3.0.0-alpha-1 > > > We (Apache Phoenix team) have recently switched our precommit tests to > Dockerized Yetus (mostly adopted from the solution in Hbase) > We see > java.lang.OutOfMemoryError: unable to create new native thread > errors , while Yetus shows > |Max. process+thread count|6833 (vs. ulimit of 12500)| > While I couldn't determine what was the job that we shared the Agent with at > the time, statistically it was very likely HBase, and an HBase job probably > failed with a similar error. > Some research has thrown up the official Docker docs: > [https://docs.docker.com/engine/reference/commandline/run/#set-ulimits-in-container-ulimit] > According to which it is not possible to set container level nprocs ulimit > with Docker. > All settings apply to the docker Daemon user instead, and the limit is shared > between all containers. > Based on this, I think that it makes no sense to set a container (really > docker user) nprocs ulimit any lower than the current hard limit of 3. > I have already set PROC_LIMIT=3 in the Phoenix Yetus personality, but it > is only a half solution until some Docker users set lower values, as the > later setting will apply as soon as the container is started. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25081) Up the container nproc uplimit to 30000
[ https://issues.apache.org/jira/browse/HBASE-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200193#comment-17200193 ] Michael Stack commented on HBASE-25081: --- No hurry [~stoty] Come back when you need it applied elsewhere. Resolving for now. Can open subtask when need it elsewhere. > Up the container nproc uplimit to 3 > --- > > Key: HBASE-25081 > URL: https://issues.apache.org/jira/browse/HBASE-25081 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Istvan Toth >Priority: Major > > We (Apache Phoenix team) have recently switched our precommit tests to > Dockerized Yetus (mostly adopted from the solution in Hbase) > We see > java.lang.OutOfMemoryError: unable to create new native thread > errors , while Yetus shows > |Max. process+thread count|6833 (vs. ulimit of 12500)| > While I couldn't determine what was the job that we shared the Agent with at > the time, statistically it was very likely HBase, and an HBase job probably > failed with a similar error. > Some research has thrown up the official Docker docs: > [https://docs.docker.com/engine/reference/commandline/run/#set-ulimits-in-container-ulimit] > According to which it is not possible to set container level nprocs ulimit > with Docker. > All settings apply to the docker Daemon user instead, and the limit is shared > between all containers. > Based on this, I think that it makes no sense to set a container (really > docker user) nprocs ulimit any lower than the current hard limit of 3. > I have already set PROC_LIMIT=3 in the Phoenix Yetus personality, but it > is only a half solution until some Docker users set lower values, as the > later setting will apply as soon as the container is started. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25081) Up the container nproc uplimit to 30000
[ https://issues.apache.org/jira/browse/HBASE-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200171#comment-17200171 ] Michael Stack commented on HBASE-25081: --- Merged on master. See if it helps? If it does, then we backport [~stoty] ? > Up the container nproc uplimit to 3 > --- > > Key: HBASE-25081 > URL: https://issues.apache.org/jira/browse/HBASE-25081 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Istvan Toth >Priority: Major > > We (Apache Phoenix team) have recently switched our precommit tests to > Dockerized Yetus (mostly adopted from the solution in Hbase) > We see > java.lang.OutOfMemoryError: unable to create new native thread > errors , while Yetus shows > |Max. process+thread count|6833 (vs. ulimit of 12500)| > While I couldn't determine what was the job that we shared the Agent with at > the time, statistically it was very likely HBase, and an HBase job probably > failed with a similar error. > Some research has thrown up the official Docker docs: > [https://docs.docker.com/engine/reference/commandline/run/#set-ulimits-in-container-ulimit] > According to which it is not possible to set container level nprocs ulimit > with Docker. > All settings apply to the docker Daemon user instead, and the limit is shared > between all containers. > Based on this, I think that it makes no sense to set a container (really > docker user) nprocs ulimit any lower than the current hard limit of 3. > I have already set PROC_LIMIT=3 in the Phoenix Yetus personality, but it > is only a half solution until some Docker users set lower values, as the > later setting will apply as soon as the container is started. -- This message was sent by Atlassian Jira (v8.3.4#803005)