[jira] [Resolved] (HBASE-19754) Backport HBASE-11409 to branch-1 and branch-1.4
[ https://issues.apache.org/jira/browse/HBASE-19754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-19754. Resolution: Duplicate > Backport HBASE-11409 to branch-1 and branch-1.4 > --- > > Key: HBASE-19754 > URL: https://issues.apache.org/jira/browse/HBASE-19754 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0, 1.5.0 >Reporter: churro morales >Assignee: churro morales >Priority: Minor > Attachments: HBASE-19754.branch-1.patch > > > backport HBASE-11409 to branch-1, branch-1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19754) Backport HBASE-11409 to branch-1 and branch-1.4
[ https://issues.apache.org/jira/browse/HBASE-19754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales reopened HBASE-19754: > Backport HBASE-11409 to branch-1 and branch-1.4 > --- > > Key: HBASE-19754 > URL: https://issues.apache.org/jira/browse/HBASE-19754 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0, 1.5.0 >Reporter: churro morales >Assignee: churro morales >Priority: Minor > Attachments: HBASE-19754.branch-1.patch > > > backport HBASE-11409 to branch-1, branch-1.4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-13459) A more robust Verify Replication
[ https://issues.apache.org/jira/browse/HBASE-13459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-13459. Resolution: Won't Fix we have SyncTable which is much better. > A more robust Verify Replication > - > > Key: HBASE-13459 > URL: https://issues.apache.org/jira/browse/HBASE-13459 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0, 1.0.1, 0.98.12 >Reporter: churro morales >Assignee: churro morales >Priority: Minor > Attachments: HBASE-13459-0.98.patch > > > We have done quite a bit of data center migration work in the past year. We > modified verify replication a bit to help us out. > Things like: > Ignoring timestamps when comparing Cells > More detailed counters when discrepancies are reported between rows added the > following counters: > SOURCEMISSINGROWS,TARGETMISSINGROWS,SOURCEMISSINGKEYS, TARGETMISSINGKEYS > Also added the ability to run this job on any pair of tables and clusters. > If folks are interested I can put up the patch and backport. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-13043) Backport HBASE-11436 to 94 branch
[ https://issues.apache.org/jira/browse/HBASE-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-13043. Resolution: Won't Do > Backport HBASE-11436 to 94 branch > - > > Key: HBASE-13043 > URL: https://issues.apache.org/jira/browse/HBASE-13043 > Project: HBase > Issue Type: Task >Reporter: churro morales >Assignee: churro morales >Priority: Major > Attachments: HBASE-11436-0.94.patch > > > it would be nice to be able to specify key ranges for the export job in 94 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-11409) Add more flexibility for input directory structure to LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales reopened HBASE-11409: > Add more flexibility for input directory structure to LoadIncrementalHFiles > --- > > Key: HBASE-11409 > URL: https://issues.apache.org/jira/browse/HBASE-11409 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-11409.v1.patch, HBASE-11409.v2.patch, > HBASE-11409.v3.patch, HBASE-11409.v4.patch, HBASE-11409.v5.patch, > HBASE-11409.v6.branch-1.patch > > > Use case: > We were trying to combine two very large tables into a single table. Thus we > ran jobs in one datacenter that populated certain column families and another > datacenter which populated other column families. Took a snapshot and > exported them to their respective datacenters. Wanted to simply take the > hdfs restored snapshot and use LoadIncremental to merge the data. > It would be nice to add support where we could run LoadIncremental on a > directory where the depth of store files is something other than two (current > behavior). > With snapshots it would be nice if you could pass a restored hdfs snapshot's > directory and have the tool run. > I am attaching a patch where I parameterize the bulkLoad timeout as well as > the default store file depth. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-19754) Backport HBASE-11409 to branch-1 and branch-1.4
[ https://issues.apache.org/jira/browse/HBASE-19754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-19754. Resolution: Fixed Moving this ticket back to the original HBASE-11409 for the branch-1 backport > Backport HBASE-11409 to branch-1 and branch-1.4 > --- > > Key: HBASE-19754 > URL: https://issues.apache.org/jira/browse/HBASE-19754 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0, 1.5.0 >Reporter: churro morales >Assignee: churro morales >Priority: Minor > Attachments: HBASE-19754.branch-1.patch > > > backport HBASE-11409 to branch-1, branch-1.4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-11409) Add more flexibility for input directory structure to LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-11409. Resolution: Fixed > Add more flexibility for input directory structure to LoadIncrementalHFiles > --- > > Key: HBASE-11409 > URL: https://issues.apache.org/jira/browse/HBASE-11409 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-11409.v1.patch, HBASE-11409.v2.patch, > HBASE-11409.v3.patch, HBASE-11409.v4.patch, HBASE-11409.v5.patch > > > Use case: > We were trying to combine two very large tables into a single table. Thus we > ran jobs in one datacenter that populated certain column families and another > datacenter which populated other column families. Took a snapshot and > exported them to their respective datacenters. Wanted to simply take the > hdfs restored snapshot and use LoadIncremental to merge the data. > It would be nice to add support where we could run LoadIncremental on a > directory where the depth of store files is something other than two (current > behavior). > With snapshots it would be nice if you could pass a restored hdfs snapshot's > directory and have the tool run. > I am attaching a patch where I parameterize the bulkLoad timeout as well as > the default store file depth. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19754) Backport HBASE-11409 to branch-1 and branch-1.4
churro morales created HBASE-19754: -- Summary: Backport HBASE-11409 to branch-1 and branch-1.4 Key: HBASE-19754 URL: https://issues.apache.org/jira/browse/HBASE-19754 Project: HBase Issue Type: Bug Affects Versions: 1.4.0, 1.5.0 Reporter: churro morales Assignee: churro morales Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19528) Major Compaction Tool
churro morales created HBASE-19528: -- Summary: Major Compaction Tool Key: HBASE-19528 URL: https://issues.apache.org/jira/browse/HBASE-19528 Project: HBase Issue Type: New Feature Reporter: churro morales Assignee: churro morales Fix For: 2.0.0, 3.0.0 The basic overview of how this tool works is: Parameters: Table Stores ClusterConcurrency Timestamp So you input a table, desired concurrency and the list of stores you wish to major compact. The tool first checks the filesystem to see which stores need compaction based on the timestamp you provide (default is current time). It takes that list of stores that require compaction and executes those requests concurrently with at most N distinct RegionServers compacting at a given time. Each thread waits for the compaction to complete before moving to the next queue. If a region split, merge or move happens this tool ensures those regions get major compacted as well. This helps us in two ways, we can limit how much I/O bandwidth we are using for major compaction cluster wide and we are guaranteed after the tool completes that all requested compactions complete regardless of moves, merges and splits. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19405) Fix RowTooBigException to use that in hbase-client and ensure that it extends DoNotRetryIOException
churro morales created HBASE-19405: -- Summary: Fix RowTooBigException to use that in hbase-client and ensure that it extends DoNotRetryIOException Key: HBASE-19405 URL: https://issues.apache.org/jira/browse/HBASE-19405 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 3.0.0, 1.4.0 Reporter: churro morales Assignee: churro morales Looks like between branches this is very different. In master the client extends the correct exception but it is not called from the StoreScanner. Looking quickly at 1.4 it does not look to extend the correct exception and it is not called from anywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-18253) Ability to isolate regions on regionservers through hbase shell
[ https://issues.apache.org/jira/browse/HBASE-18253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-18253. Resolution: Not A Problem > Ability to isolate regions on regionservers through hbase shell > --- > > Key: HBASE-18253 > URL: https://issues.apache.org/jira/browse/HBASE-18253 > Project: HBase > Issue Type: Task >Affects Versions: 2.0.0-alpha-1 >Reporter: churro morales >Assignee: Chinmay Kulkarni >Priority: Minor > > Now that we have the ability to put regionservers in draining mode through > the hbase shell. Another tool that would be nice is sometimes certain > regions need to be isolated from others (temporarily - like META). A shell > command with the form: > shell> isolate_regions '', '', ''. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18253) Ability to isolate regions on regionservers through hbase shell
churro morales created HBASE-18253: -- Summary: Ability to isolate regions on regionservers through hbase shell Key: HBASE-18253 URL: https://issues.apache.org/jira/browse/HBASE-18253 Project: HBase Issue Type: Task Affects Versions: 2.0.0-alpha-1 Reporter: churro morales Assignee: Chinmay Kulkarni Priority: Minor Now that we have the ability to put regionservers in draining mode through the hbase shell. Another tool that would be nice is sometimes certain regions need to be isolated from others (temporarily - like META). A shell command with the form: shell> isolate_regions '', '', ''. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-17965) Canary tool should print the regionserver name on failure
churro morales created HBASE-17965: -- Summary: Canary tool should print the regionserver name on failure Key: HBASE-17965 URL: https://issues.apache.org/jira/browse/HBASE-17965 Project: HBase Issue Type: Task Reporter: churro morales Assignee: Karan Mehta Priority: Minor It would be nice when we have a canary failure for a region to print the associated regionserver's name in the log as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17762) Add logging to HBaseAdmin for user initiated tasks
churro morales created HBASE-17762: -- Summary: Add logging to HBaseAdmin for user initiated tasks Key: HBASE-17762 URL: https://issues.apache.org/jira/browse/HBASE-17762 Project: HBase Issue Type: Task Reporter: churro morales Assignee: churro morales Fix For: 2.0.0, 1.4.0, 0.98.25 Things like auditing a forced major compaction are really useful and right now there is no logging when this is triggered. Other actions may require logging as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17698) ReplicationEndpoint choosing sinks
churro morales created HBASE-17698: -- Summary: ReplicationEndpoint choosing sinks Key: HBASE-17698 URL: https://issues.apache.org/jira/browse/HBASE-17698 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.4.0 Reporter: churro morales The only time we choose new sinks is when we have a ConnectException, but we have encountered other exceptions where there is a problem contacting a particular sink and replication gets backed up for any sources that try that sink HBASE-17675 occurred when there was a bad keytab refresh and the source was stuck. Another issue we recently had was a bad drive controller on the sink side and replication was stuck again. Is there any reason not to choose new sinks anytime we have a RemoteException? I can understand TableNotFound we don't have to choose new sinks, but for all other cases this seems like the safest approach. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17675) ReplicationEndpoint should choose new sinks if a SaslException occurs
churro morales created HBASE-17675: -- Summary: ReplicationEndpoint should choose new sinks if a SaslException occurs Key: HBASE-17675 URL: https://issues.apache.org/jira/browse/HBASE-17675 Project: HBase Issue Type: Bug Reporter: churro morales We had an issue where a regionserver on our destination side failed to refresh the keytabs. The source side's replication got stuck because the HBaseInterClusterReplicationEndpoint only chooses new sinks if there happens to be a ConnectException but the SaslException is an IOException, which does not choose new sinks. I'll put up a patch to check this exception and choose new sinks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17609) Allow for region merging in the UI
churro morales created HBASE-17609: -- Summary: Allow for region merging in the UI Key: HBASE-17609 URL: https://issues.apache.org/jira/browse/HBASE-17609 Project: HBase Issue Type: Task Affects Versions: 2.0.0, 1.4.0 Reporter: churro morales Assignee: churro morales HBASE-49 discussed having the ability to merge regions through the HBase UI, but online region merging wasn't around back then. I have created additional form fields for the table.jsp where you can pass in two encoded region names (must be adjacent regions) and a merge can be called through the UI. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-16710) Add ZStandard Codec to Compression.java
churro morales created HBASE-16710: -- Summary: Add ZStandard Codec to Compression.java Key: HBASE-16710 URL: https://issues.apache.org/jira/browse/HBASE-16710 Project: HBase Issue Type: Task Affects Versions: 2.0.0 Reporter: churro morales Assignee: churro morales Priority: Minor HADOOP-13578 is adding the ZStandardCodec to hadoop. This is a placeholder to ensure it gets added to hbase once this gets upstream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16086) TableCfWALEntryFilter and ScopeWALEntryFilter should not redundantly iterate over cells.
churro morales created HBASE-16086: -- Summary: TableCfWALEntryFilter and ScopeWALEntryFilter should not redundantly iterate over cells. Key: HBASE-16086 URL: https://issues.apache.org/jira/browse/HBASE-16086 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: churro morales TableCfWALEntryFilter and ScopeWALEntryFilter both filter by iterating over cells. Since the filters are chained we do this work twice. Instead iterate over cells once and apply the "cell filtering" logic to these cells. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15816) Provide client with ability to set priority on Operations
churro morales created HBASE-15816: -- Summary: Provide client with ability to set priority on Operations Key: HBASE-15816 URL: https://issues.apache.org/jira/browse/HBASE-15816 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: churro morales Assignee: churro morales First round will just be to expose the ability to set priorities for client operations. For more background: http://mail-archives.apache.org/mod_mbox/hbase-dev/201604.mbox/%3CCA+RK=_BG_o=q8HMptcP2WauAinmEsL+15f3YEJuz=qbpcya...@mail.gmail.com%3E Next step would be to remove AnnotationReadingPriorityFunction and have the client send priorities explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15727) Canary Tool for Zookeeper
churro morales created HBASE-15727: -- Summary: Canary Tool for Zookeeper Key: HBASE-15727 URL: https://issues.apache.org/jira/browse/HBASE-15727 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: churro morales Assignee: churro morales It would be nice to have the canary tool also monitor zookeeper. Something simple like doing a getData() call on zookeeper.znode.parent It would be nice to create clients for every instance in the quorum such that you could monitor overloaded or poor behaving instances. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12814) Zero downtime upgrade from 94 to 98
[ https://issues.apache.org/jira/browse/HBASE-12814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-12814. Resolution: Not A Problem Most likely everyone is off the 94 branch. > Zero downtime upgrade from 94 to 98 > > > Key: HBASE-12814 > URL: https://issues.apache.org/jira/browse/HBASE-12814 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.94.26, 0.98.10 >Reporter: churro morales >Assignee: churro morales > Attachments: HBASE-12814-0.94.patch, HBASE-12814-0.98.patch > > > Here at Flurry we want to upgrade our HBase cluster from 94 to 98 while not > having any downtime and maintaining master / master replication. > Summary: > Replication is done via thrift RPC between clusters. It is configurable on a > peer by peer basis and the one caveat is that a thrift server starts up on > every node which proxies the request to the ReplicationSink. > For the upgrade process: > * in hbase-site.xml two new configuration parameters are added: > ** *Required* > *** hbase.replication.sink.enable.thrift -> true > *** hbase.replication.thrift.server.port -> > ** *Optional* > *** hbase.replication.thrift.protection {default: AUTHENTICATION} > *** hbase.replication.thrift.framed {default: false} > *** hbase.replication.thrift.compact {default: true} > - All regionservers can be rolling restarted (no downtime), all clusters must > have the respective patch for this to work. > - the hbase shell add_peer command takes an additional parameter for rpc > protocol > - example: {code} add_peer '1' "hbase-101:2181:/hbase", "THRIFT" {code} > Now comes the fun part when you want to upgrade your cluster from 94 to 98 > you simply pause replication to the cluster being upgraded, do the upgrade > and un-pause replication. Once you have a pair of clusters only replicating > inbound and outbound with the 98 release. You can start replicating via the > native rpc protocol by adding the peer again without the _THRIFT_ parameter > and subsequently deleting the peer with the thrift protocol. Because > replication is idempotent I don't see any issues as long as you wait for the > backlog to drain after un-pausing replication. > Special thanks to Francis Liu at Yahoo for laying the groundwork and Mr. Dave > Latham for his invaluable knowledge and assistance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15321) Ability to open a HRegion from hdfs snapshot.
churro morales created HBASE-15321: -- Summary: Ability to open a HRegion from hdfs snapshot. Key: HBASE-15321 URL: https://issues.apache.org/jira/browse/HBASE-15321 Project: HBase Issue Type: New Feature Affects Versions: 2.0.0 Reporter: churro morales Fix For: 2.0.0 Now that hdfs snapshots are here, we started to run our mapreduce jobs over hdfs snapshots. The thing is, hdfs snapshots are read-only point-in-time copies of the file system. Thus we had to modify the section of code that initialized the region internals in HRegion. We have to skip cleanup of certain directories if the HRegion is backed by a hdfs snapshot. I have a patch for trunk with some basic tests if folks are interested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-11352) When HMaster starts up it deletes the tmp snapshot directory, if you are exporting a snapshot at that time the job will fail
[ https://issues.apache.org/jira/browse/HBASE-11352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-11352. Resolution: Not A Problem In newer versions of HBase you can just select the skipTmp option when Exporting Snapshots this will resolve this issue. > When HMaster starts up it deletes the tmp snapshot directory, if you are > exporting a snapshot at that time the job will fail > > > Key: HBASE-11352 > URL: https://issues.apache.org/jira/browse/HBASE-11352 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.19 >Reporter: churro morales > Attachments: HBASE-11352-0.94.patch, HBASE-11352-v2.0.94.patch > > > We are exporting a very large table. The export snapshot job takes 7+ days > to complete. During that time we had to bounce HMaster. When HMaster > initializes, it initializes the SnapshotManager which subsequently deletes > the .tmp directory. > If this happens while the ExportSnapshot job is running the reference files > get removed and the job fails. > Maybe we could put some sort of token such that when this job is running > HMaster wont reset the tmp directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12889) Add scanner caching and batching options for the CopyTable job.
[ https://issues.apache.org/jira/browse/HBASE-12889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-12889. Resolution: Won't Fix > Add scanner caching and batching options for the CopyTable job. > --- > > Key: HBASE-12889 > URL: https://issues.apache.org/jira/browse/HBASE-12889 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0, 0.98.10, 1.1.0 >Reporter: churro morales >Assignee: churro morales >Priority: Minor > Attachments: HBASE-12889.0.98.patch, HBASE-12889.patch > > > We use the copy table job to ship data between clusters. Sometimes we have > very wide rows and it is nice to be able to set the batching and caching. > I'll attach trivial patches for you guys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-13031. Resolution: Won't Fix > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12890) Provide a way to throttle the number of regions moved by the balancer
[ https://issues.apache.org/jira/browse/HBASE-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-12890. Resolution: Won't Fix > Provide a way to throttle the number of regions moved by the balancer > - > > Key: HBASE-12890 > URL: https://issues.apache.org/jira/browse/HBASE-12890 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.98.10 >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-12890.patch > > > We have a very large cluster and we frequently add remove quite a few > regionservers from our cluster. Whenever we do this the balancer moves > thousands of regions at once. Instead we provide a configuration parameter: > hbase.balancer.max.regions. This limits the number of regions that are > balanced per iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15286) Revert the API changes to TimeRange constructor and make IA.Private
churro morales created HBASE-15286: -- Summary: Revert the API changes to TimeRange constructor and make IA.Private Key: HBASE-15286 URL: https://issues.apache.org/jira/browse/HBASE-15286 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.2.0 Reporter: churro morales Assignee: churro morales Based on the discussion here: https://mail-archives.apache.org/mod_mbox/hbase-dev/201602.mbox/%3ccan5cbe4rs-2tv3rn1-xhaz0yt3kh3+zkg+8ewk_6kbkfkds...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15130) Backport 0.98 Scan different TimeRange for each column family
churro morales created HBASE-15130: -- Summary: Backport 0.98 Scan different TimeRange for each column family Key: HBASE-15130 URL: https://issues.apache.org/jira/browse/HBASE-15130 Project: HBase Issue Type: Bug Affects Versions: 0.98.17 Reporter: churro morales Assignee: churro morales Fix For: 0.98.18 branch 98 version backport for HBASE-14355 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15067) Rest API should support scan timeRange per column family
churro morales created HBASE-15067: -- Summary: Rest API should support scan timeRange per column family Key: HBASE-15067 URL: https://issues.apache.org/jira/browse/HBASE-15067 Project: HBase Issue Type: New Feature Reporter: churro morales see discussion in HBASE-14872 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14872) Scan different timeRange per column family doesn't percolate down to the memstore
churro morales created HBASE-14872: -- Summary: Scan different timeRange per column family doesn't percolate down to the memstore Key: HBASE-14872 URL: https://issues.apache.org/jira/browse/HBASE-14872 Project: HBase Issue Type: Bug Components: Client, regionserver, Scanners Affects Versions: 2.0.0, 1.3.0 Reporter: churro morales Assignee: churro morales Fix For: 2.0.0, 1.3.0, 0.98.17 HBASE-14355 The scan different time range for column family feature was not applied to the memstore it was only done for the store files. This breaks the contract. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14129) If any regionserver gets shutdown uncleanly during full cluster restart, locality looks to be lost
churro morales created HBASE-14129: -- Summary: If any regionserver gets shutdown uncleanly during full cluster restart, locality looks to be lost Key: HBASE-14129 URL: https://issues.apache.org/jira/browse/HBASE-14129 Project: HBase Issue Type: Bug Reporter: churro morales We were doing a cluster restart the other day. Some regionservers did not shut down cleanly. Upon restart our locality went from 99% to 5%. Upon looking at the AssignmentManager.joinCluster() code it calls AssignmentManager.processDeadServersAndRegionsInTransition(). If the failover flag gets set for any reason it seems we don't call assignAllUserRegions(). Then it looks like the balancer does the work in assigning those regions, we don't use a locality aware balancer and we lost our region locality. I don't have a solid grasp on the reasoning for these checks but there could be some potential workarounds here. 1. After shutting down your cluster, move your WALs aside (replay later). 2. Clean up your zNodes That seems to work, but requires a lot of manual labor. Another solution which I prefer would be to have a flag for ./start-hbase.sh --clean If we start master with that flag then we do a check in AssignmentManager.processDeadServersAndRegionsInTransition() thus if this flag is set we call: assignAllUserRegions() regardless of the failover state. I have a patch for the later solution, that is if I am understanding the logic correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13724) ReplicationSource dies under certain conditions reading a sequence file
churro morales created HBASE-13724: -- Summary: ReplicationSource dies under certain conditions reading a sequence file Key: HBASE-13724 URL: https://issues.apache.org/jira/browse/HBASE-13724 Project: HBase Issue Type: Bug Reporter: churro morales A little background, We run our server in -ea mode and have seen quite a few replication sources silently die over the past few months. Note: the stacktrace I posted below comes from a regionserver running 0.94 but quickly looking at this issue, I believe this will happen in 98 too. Should we harden replication source to deal with these types of assertion errors by catching throwables, should we be dealing with this at the sequence file reader level? Still looking into the root cause of this issue but when manually shutdown our regionservers the regionserver that recovered its queue replicated that log just fine. So in our case a simple retry would've worked just fine. {code} 2015-05-08 11:04:23,348 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected exception in ReplicationSource, currentPath=hdfs://hm6.xxx.flurry.com:9000/hbase/.logs/x.yy.flurry.com,60020,1426792702998/x.atl.flurry.com%2C60020%2C1426792702998.1431107922449 java.lang.AssertionError at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader$WALReaderFSDataInputStream.getPos(SequenceFileLogReader.java:121) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1489) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1479) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1474) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:178) at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:734) at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:583) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:373) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13042) MR Job to export HFiles directly from an online cluster
[ https://issues.apache.org/jira/browse/HBASE-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-13042. Resolution: Fixed MR Job to export HFiles directly from an online cluster --- Key: HBASE-13042 URL: https://issues.apache.org/jira/browse/HBASE-13042 Project: HBase Issue Type: New Feature Reporter: Dave Latham We're looking at the best way to bootstrap a new remote cluster. The source cluster has a a large table of compressed data using more than 50% of the HDFS capacity and we have a WAN link to the remote cluster. Ideally we would set up replication to a new table remotely, snapshot the source table, copy the snapshot across, then bulk load it into the new table. However the amount of time to copy the data remotely is greater than the major compaction interval so the source cluster would run out of storage. One approach is HBASE-13031 to allow the operators to snapshot and copy one key range at a time. Here's another idea: Create a MR job that tries to do a robust remote HFile copy directly: * Each split is responsible for a key range. * Map task lookups up that key range and maps it to a set of HDFS store directories (one for each region/family) * For each store: ** List HFiles in store (needs to be less than 1000 files to guarantee atomic listing) ** Attempt to copy store files (copy in increasing size order to minimize likelihood of compaction removing a file during copy) ** If some of the files disappear (compaction), retry directory list / copy * If any of the stores disappear (region split / merge) then retry map task (and remap key range to stores) Or maybe there are some HBase locking mechanisms for a region or store that would be better. Otherwise the question is how often would compactions or region splits force retries. Is this crazy? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13459) A more robust Verify Replication
churro morales created HBASE-13459: -- Summary: A more robust Verify Replication Key: HBASE-13459 URL: https://issues.apache.org/jira/browse/HBASE-13459 Project: HBase Issue Type: Improvement Affects Versions: 0.98.12, 2.0.0, 1.0.1 Reporter: churro morales Assignee: churro morales Priority: Minor We have done quite a bit of data center migration work in the past year. We modified verify replication a bit to help us out. Things like: Ignoring timestamps when comparing Cells More detailed counters when discrepancies are reported between rows added the following counters: SOURCEMISSINGROWS,TARGETMISSINGROWS,SOURCEMISSINGKEYS, TARGETMISSINGKEYS Also added the ability to run this job on any pair of tables and clusters. If folks are interested I can put up the patch and backport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13043) Backport HBASE-11436 to 94 branch
churro morales created HBASE-13043: -- Summary: Backport HBASE-11436 to 94 branch Key: HBASE-13043 URL: https://issues.apache.org/jira/browse/HBASE-13043 Project: HBase Issue Type: Task Reporter: churro morales Assignee: churro morales it would be nice to be able to specify key ranges for the export job in 94 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13031) Ability to snapshot based on a key range
churro morales created HBASE-13031: -- Summary: Ability to snapshot based on a key range Key: HBASE-13031 URL: https://issues.apache.org/jira/browse/HBASE-13031 Project: HBase Issue Type: Brainstorming Affects Versions: 0.94.26, 2.0.0, 1.1.0, 0.98.11 Reporter: churro morales Assignee: churro morales Priority: Critical Posted on the mailing list and seems like some people are interested. A little background for everyone. We have a very large table, we would like to snapshot and transfer the data to another cluster (compressed data is always better to ship). Our problem lies in the fact it could take many weeks to transfer all of the data and during that time with major compactions, the data stored in dfs has the potential to double which would cause us to run out of disk space. So we were thinking about allowing the ability to snapshot a specific key range. Ideally I feel the approach is that the user would specify a start and stop key, those would be associated with a region boundary. If between the time the user submits the request and the snapshot is taken the boundaries change (due to merging or splitting of regions) the snapshot should fail. We would know which regions to snapshot and if those changed between when the request was submitted and the regions locked, the snapshot could simply fail and the user would try again, instead of potentially giving the user more / less than what they had anticipated. I was planning on storing the start / stop key in the SnapshotDescription and from there it looks pretty straight forward where we just have to change the verifier code to accommodate the key ranges. If this design sounds good to anyone, or if I am overlooking anything please let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13033) Max allowed memstore size should be 80% not 90%
churro morales created HBASE-13033: -- Summary: Max allowed memstore size should be 80% not 90% Key: HBASE-13033 URL: https://issues.apache.org/jira/browse/HBASE-13033 Project: HBase Issue Type: Bug Affects Versions: 0.98.11 Reporter: churro morales Assignee: churro morales Priority: Minor Currently in MemstoreFlusher the check for maximum allowed memstore size is set to 90% and it should be 80% -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13033) Max allowed memstore size should be 80% not 90%
[ https://issues.apache.org/jira/browse/HBASE-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-13033. Resolution: Invalid Max allowed memstore size should be 80% not 90% Key: HBASE-13033 URL: https://issues.apache.org/jira/browse/HBASE-13033 Project: HBase Issue Type: Bug Affects Versions: 0.98.11 Reporter: churro morales Assignee: churro morales Priority: Minor Currently in MemstoreFlusher the check for maximum allowed memstore size is set to 90% and it should be 80% -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12897) Minimum memstore size is a percentage
churro morales created HBASE-12897: -- Summary: Minimum memstore size is a percentage Key: HBASE-12897 URL: https://issues.apache.org/jira/browse/HBASE-12897 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.10, 1.1.0 Reporter: churro morales Assignee: churro morales We have a cluster which is optimized for random reads. Thus we have a large block cache and a small memstore. Currently our heap is 20GB and we wanted to configure the memstore to take 4% or 800MB. Right now the minimum memstore size is 5%. What do you guys think about reducing the minimum size to 1%? Suppose we log a warning if the memstore is below 5% but allow it? What do you folks think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12890) Provide a way to throttle the number of regions moved by the balancer
churro morales created HBASE-12890: -- Summary: Provide a way to throttle the number of regions moved by the balancer Key: HBASE-12890 URL: https://issues.apache.org/jira/browse/HBASE-12890 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0, 0.98.10, 1.1.0 Reporter: churro morales Assignee: churro morales We have a very large cluster and we frequently add remove quite a few regionservers from our cluster. Whenever we do this the balancer moves thousands of regions at once. Instead we provide a configuration parameter: hbase.balancer.max.regions. This limits the number of regions that are balanced per iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12889) Add scanner caching and batching options for the CopyTable job.
churro morales created HBASE-12889: -- Summary: Add scanner caching and batching options for the CopyTable job. Key: HBASE-12889 URL: https://issues.apache.org/jira/browse/HBASE-12889 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0, 0.98.10, 1.1.0 Reporter: churro morales Assignee: churro morales Priority: Minor We use the copy table job to ship data between clusters. Sometimes we have very wide rows and it is nice to be able to set the batching and caching. I'll attach trivial patches for you guys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12891) have hbck do region consistency checks in parallel
churro morales created HBASE-12891: -- Summary: have hbck do region consistency checks in parallel Key: HBASE-12891 URL: https://issues.apache.org/jira/browse/HBASE-12891 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0, 0.98.10, 1.1.0 Reporter: churro morales Assignee: churro morales We have a lot of regions on our cluster ~500k and noticed that hbck took quite some time in checkAndFixConsistency(). [~davelatham] patched our cluster to do this check in parallel to speed things up. I'll attach the patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12814) Zero downtime upgrade from 94 to 98 with replication
churro morales created HBASE-12814: -- Summary: Zero downtime upgrade from 94 to 98 with replication Key: HBASE-12814 URL: https://issues.apache.org/jira/browse/HBASE-12814 Project: HBase Issue Type: New Feature Affects Versions: 0.94.26, 0.98.10 Reporter: churro morales Assignee: churro morales Here at Flurry we want to upgrade our HBase cluster from 94 to 98 while not having any downtime and maintaining master / master replication. Summary: Replication is done via thrift RPC between clusters. It is configurable on a peer by peer basis and the one caveat is that a thrift server starts up on every node which proxies the request to the ReplicationSink. For the upgrade process: * in hbase-site.xml two new configuration parameters are added: ** *Required* *** hbase.replication.sink.enable.thrift - true *** hbase.replication.thrift.server.port - thrit_server_port ** *Optional* *** hbase.replication.thrift.protection {default: AUTHENTICATION} *** hbase.replication.thrift.framed {default: false} *** hbase.replication.thrift.compact {default: true} - All regionservers can be rolling restarted (no downtime), all clusters must have the respective patch for this to work. - the hbase shell add_peer command takes an additional parameter for rpc protocol - example: {code} add_peer '1' hbase-101:2181:/hbase, THRIFT {code} Now comes the fun part when you want to upgrade your cluster from 94 to 98 you simply pause replication to the cluster being upgraded, do the upgrade and un-pause replication. Once you have a pair of clusters only replicating inbound and outbound with the 98 release. You can start replicating via the native rpc protocol by adding the peer again without the _THRIFT_ parameter and subsequently deleting the peer with the thrift protocol. Because replication is idempotent I don't see any issues as long as you wait for the backlog to drain after un-pausing replication. Special thanks to Francis Liu at Yahoo for laying the groundwork and Mr. Dave Latham for his invaluable knowledge and assistance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-11601) Parallelize Snapshot operations for 0.94
churro morales created HBASE-11601: -- Summary: Parallelize Snapshot operations for 0.94 Key: HBASE-11601 URL: https://issues.apache.org/jira/browse/HBASE-11601 Project: HBase Issue Type: Improvement Affects Versions: 0.94.21 Reporter: churro morales Although HBASE-11185 exists, it is geared towards the snapshot manifest code. We have used snapshots to ship our two largest tables across the country and while doing so found a few potential optimizations where doing things in parallel helped quite a bit. I can attach a patch containing changes I've made and we can discuss if these are changes worth getting pushed to 0.94. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11528) The restoreSnapshot operation should delete the rollback snapshot upon a successful restore
churro morales created HBASE-11528: -- Summary: The restoreSnapshot operation should delete the rollback snapshot upon a successful restore Key: HBASE-11528 URL: https://issues.apache.org/jira/browse/HBASE-11528 Project: HBase Issue Type: Bug Affects Versions: 0.94.20 Reporter: churro morales Assignee: churro morales Priority: Minor We take a snapshot: rollbackSnapshot prior to doing a restore such that if the restore fails we can revert the table back to its pre-restore state. If we are successful in restoring the table, we should delete the rollbackSnapshot when the restoreSnapshot operation successfully completes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11409) Add more flexibility for input directory structure to LoadIncrementalHFiles
churro morales created HBASE-11409: -- Summary: Add more flexibility for input directory structure to LoadIncrementalHFiles Key: HBASE-11409 URL: https://issues.apache.org/jira/browse/HBASE-11409 Project: HBase Issue Type: Bug Affects Versions: 0.94.20 Reporter: churro morales Use case: We were trying to combine two very large tables into a single table. Thus we ran jobs in one datacenter that populated certain column families and another datacenter which populated other column families. Took a snapshot and exported them to their respective datacenters. Wanted to simply take the hdfs restored snapshot and use LoadIncremental to merge the data. It would be nice to add support where we could run LoadIncremental on a directory where the depth of store files is something other than two (current behavior). With snapshots it would be nice if you could pass a restored hdfs snapshot's directory and have the tool run. I am attaching a patch where I parameterize the bulkLoad timeout as well as the default store file depth. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11360) SnapshotFileCache refresh logic based on modified directory time might be insufficient
churro morales created HBASE-11360: -- Summary: SnapshotFileCache refresh logic based on modified directory time might be insufficient Key: HBASE-11360 URL: https://issues.apache.org/jira/browse/HBASE-11360 Project: HBase Issue Type: Bug Affects Versions: 0.94.19 Reporter: churro morales Right now we decide whether to refresh the cache based on the lastModified timestamp of all the snapshots and those running snapshots which is located in the /hbase/.hbase-snapshot/.tmp/snapshot directory We ran a ExportSnapshot job which takes around 7 minutes between creating the directory and copying all the files. Thus the modified time for the /hbase/.hbase-snapshot/.tmp directory was 7 minutes earlier than the modified time of the /hbase/.hbase-snapshot/.tmp/snapshot directory Thus the cache refresh happens and doesn't pick up all the files but thinks its up to date as the modified time of the .tmp directory never changes. This is a bug as when the export job starts the cache never contains the files for the running snapshot and will fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11352) When HMaster starts up it deletes the tmp snapshot directory, if you are exporting a snapshot at that time the job will fail
churro morales created HBASE-11352: -- Summary: When HMaster starts up it deletes the tmp snapshot directory, if you are exporting a snapshot at that time the job will fail Key: HBASE-11352 URL: https://issues.apache.org/jira/browse/HBASE-11352 Project: HBase Issue Type: Bug Affects Versions: 0.94.19 Reporter: churro morales We are exporting a very large table. The export snapshot job takes 7+ days to complete. During that time we had to bounce HMaster. When HMaster initializes, it initializes the SnapshotManager which subsequently deletes the .tmp directory. If this happens while the ExportSnapshot job is running the reference files get removed and the job fails. Maybe we could put some sort of token such that when this job is running HMaster wont reset the tmp directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11322) SnapshotHFileCleaner makes the wrong check for lastModified time thus causing too many cache refreshes
churro morales created HBASE-11322: -- Summary: SnapshotHFileCleaner makes the wrong check for lastModified time thus causing too many cache refreshes Key: HBASE-11322 URL: https://issues.apache.org/jira/browse/HBASE-11322 Project: HBase Issue Type: Bug Affects Versions: 0.94.19 Reporter: churro morales Assignee: churro morales Priority: Critical In the SnapshotFileCache: The last modified time is done via this operation: {code} this.lastModifiedTime = Math.min(dirStatus.getModificationTime(), tempStatus.getModificationTime()); {code} and the check to see if the snapshot directories have been modified: {code} // if the snapshot directory wasn't modified since we last check, we are done if (dirStatus.getModificationTime() = lastModifiedTime tempStatus.getModificationTime() = lastModifiedTime) { return; } {code} so if the dirStatus and tmpStatus are modified at different times, we will always assume they have been modified and refresh the cache. In our cluster, this was a huge performance hit. The cleaner chain fell behind, thus almost filling up dfs and our namenode heap. Its a simple fix, instead of Math.min we use Math.max for the lastModified, I believe that will be correct. I'll apply a patch for you guys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11195) Potentially improve block locality during major compaction for old regions
churro morales created HBASE-11195: -- Summary: Potentially improve block locality during major compaction for old regions Key: HBASE-11195 URL: https://issues.apache.org/jira/browse/HBASE-11195 Project: HBase Issue Type: Improvement Affects Versions: 0.94.19 Reporter: churro morales This might be a specific use case. But we have some regions which are no longer written to (due to the key). Those regions have 1 store file and they are very old, they haven't been written to in a while. We still use these regions to read from so locality would be nice. I propose putting a configuration option: something like hbase.hstore.min.locality.to.skip.major.compact [between 0 and 1] such that you can decide whether or not to skip major compaction for an old region with a single store file. I'll attach a patch, let me know what you guys think. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10528) DefaultBalancer selects plans to move regions onto draining nodes
churro morales created HBASE-10528: -- Summary: DefaultBalancer selects plans to move regions onto draining nodes Key: HBASE-10528 URL: https://issues.apache.org/jira/browse/HBASE-10528 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: churro morales We have quite a large cluster 100k regions, and we needed to isolate a region was very hot until we could push a patch. We put this region on its own regionserver and set it in the draining state. The default balancer was selecting regions to move to this cluster for its region plans. It just so happened that there were very small regions on the draining servers, which constantly triggered balancing. Thus we were closing regions, then attempting to move to the draining server finding out its draining. There are some approaches we can take here. 1. Exclude draining servers altogether, don't even pass those into the load balancer from HMaster. 2. We could exclude draining servers from ceiling and floor calculations where we could potentially skip load balancing because those draining servers wont be represented when deciding whether to balance. 3. Along with #2 when assigning regions, we would skip plans to assign regions to those draining servers. I am in favor of #1 which is simply removes servers as candidates for balancing if they are in the draining state. But I would love to hear what everyone else thinks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10133) ReplicationSource currentNbOperations overflows
[ https://issues.apache.org/jira/browse/HBASE-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales resolved HBASE-10133. Resolution: Invalid ReplicationSource currentNbOperations overflows Key: HBASE-10133 URL: https://issues.apache.org/jira/browse/HBASE-10133 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.96.0, 0.94.14 Reporter: churro morales Priority: Minor Noticed in the logs we had lines like this: 2013-12-11 00:02:00,343 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:-1341767084 and seenEntries:0 and size: 0 Maybe this value should be reset after we ship our edits this value should get adjusted. Either that or convert from an int to a long. As this is a jmx metric I feel its important to get this correct. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10133) ReplicationSource currentNbOperations overflows
churro morales created HBASE-10133: -- Summary: ReplicationSource currentNbOperations overflows Key: HBASE-10133 URL: https://issues.apache.org/jira/browse/HBASE-10133 Project: HBase Issue Type: Bug Affects Versions: 0.94.14, 0.96.0, 0.95.0 Reporter: churro morales Priority: Minor Noticed in the logs we had lines like this: 2013-12-11 00:02:00,343 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:-1341767084 and seenEntries:0 and size: 0 Maybe this value should be reset after we ship our edits this value should get adjusted. Either that or convert from an int to a long. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HBASE-10100) Hbase replication cluster can have varying peers under certain conditions
churro morales created HBASE-10100: -- Summary: Hbase replication cluster can have varying peers under certain conditions Key: HBASE-10100 URL: https://issues.apache.org/jira/browse/HBASE-10100 Project: HBase Issue Type: Bug Affects Versions: 0.96.0, 0.95.0, 0.94.5 Reporter: churro morales We were trying to replicate hbase data over to a new datacenter recently. After we turned on replication and then did our copy tables. We noticed that verify replication had discrepancies. We ran a list_peers and it returned back both peers, the original datacenter we were replicating to and the new datacenter (this was correct). When grepping through the logs for a few regionservers we noticed that a few regionservers had the following entry in their logs: 2013-09-26 10:55:46,907 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Error while adding a new peer java.net.UnknownHostException: xxx.xxx.flurry.com (this was due to a transient dns issue) Thus a very small subet of our regionservers were not replicating to this new cluster while most were. We probably don't want to abort if this type of issue comes up, it could potentially be fatal if someone does an add_peer operation with a typo. This could potentially shut down the cluster. One solution I can think of is keeping some flag in ReplicationSourceManager which is a boolean that keeps track of whether there was an errorAddingPeer. Then in the logPositionAndCleanOldLogs we can do something like: {code} if (errorAddingPeer) { LOG.error(There was an error adding a peer, logs will not be marked for deletion); return; } {code} thus we are not deleting these logs from the queue. You will notice your replicating queue rising on certain machines and you can still replay the logs, thus avoiding a lengthy copy table. I have a patch (with unit test) for the above proposal, if everyone thinks that is an okay solution. An additional idea would be to add some retry logic inside the PeersWatcher class for the nodeChildrenChanged method. Thus if there happens to be some issue we could sort it out without having to bounce that particular regionserver. Would love to hear everyones thoughts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM
churro morales created HBASE-9865: - Summary: WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM Key: HBASE-9865 URL: https://issues.apache.org/jira/browse/HBASE-9865 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.94.5 Reporter: churro morales WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM. A little background on this issue. We noticed that our source replication regionservers would get into gc storms and sometimes even OOM. We noticed a case where it showed that there were around 25k WALEdits to replicate, each one with an ArrayList of KeyValues. The array list had a capacity of around 90k (using 350KB of heap memory) but had around 6 non null entries. When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a WALEdit it removes all kv's that are scoped other than local. But in doing so we don't account for the capacity of the ArrayList when determining heapSize for a WALEdit. The logic for shipping a batch is whether you have hit a size capacity or number of entries capacity. Therefore if have a WALEdit with 25k entries and suppose all are removed: The size of the arrayList is 0 (we don't even count the collection's heap size currently) but the capacity is ignored. This will yield a heapSize() of 0 bytes while in the best case it would be at least 10 bytes (provided you pass initialCapacity and you have 32 bit JVM) I have some ideas on how to address this problem and want to know everyone's thoughts: 1. We use a probabalistic counter such as HyperLogLog and create something like: * class CapacityEstimateArrayList implements ArrayList ** this class overrides all additive methods to update the probabalistic counts ** it includes one additional method called estimateCapacity (we would take estimateCapacity - size() and fill in sizes for all references) * Then we can do something like this in WALEdit.heapSize: {code} public long heapSize() { long ret = ClassSize.ARRAYLIST; for (KeyValue kv : kvs) { ret += kv.heapSize(); } long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size(); ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE); if (scopes != null) { ret += ClassSize.TREEMAP; ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY); // TODO this isn't quite right, need help here } return ret; } {code} 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the array originally, and we provide some percentage threshold. When that threshold is met (50% of the entries have been removed) we can call kvs.trimToSize() 3. in the heapSize() method for WALEdit we could use reflection (Please don't shoot me for this) to grab the actual capacity of the list. Doing something like this: {code} public int getArrayListCapacity() { try { Field f = ArrayList.class.getDeclaredField(elementData); f.setAccessible(true); return ((Object[]) f.get(kvs)).length; } catch (Exception e) { log.warn(Exception in trying to get capacity on ArrayList, e); return kvs.size(); } {code} I am partial to (1) using HyperLogLog and creating a CapacityEstimateArrayList, this is reusable throughout the code for other classes that implement HeapSize which contains ArrayLists. The memory footprint is very small and it is very fast. The issue is that this is an estimate, although we can configure the precision we most likely always be conservative. The estimateCapacity will always be less than the actualCapacity, but it will be close. I think that putting the logic in removeNonReplicableEdits will work, but this only solves the heapSize problem in this particular scenario. Solution 3 is slow and horrible but that gives us the exact answer. I would love to hear if anyone else has any other ideas on how to remedy this problem? I have code for trunk and 0.94 for all 3 ideas and can provide a patch if the community thinks any of these approaches is a viable one. -- This message was sent by Atlassian JIRA (v6.1#6144)