[jira] [Resolved] (HBASE-16785) We are not running all tests
[ https://issues.apache.org/jira/browse/HBASE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-16785. --- Resolution: Fixed Re-resolve after pushing addendum. > We are not running all tests > > > Key: HBASE-16785 > URL: https://issues.apache.org/jira/browse/HBASE-16785 > Project: HBase > Issue Type: Bug > Components: build, test >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: 16785.addendum.patch, HBASE-16785.master.001.patch, > HBASE-16785.master.002 (1).patch, HBASE-16785.master.002 (2).patch, > HBASE-16785.master.002 (2).patch, HBASE-16785.master.002.patch, > HBASE-16785.master.002.patch, HBASE-16785.master.003.patch > > > Noticed by [~mbertozzi] > We have some modules where we tried to 'skip' the running of the second part > of tests -- medium and larges. That might have made sense once when the > module was originally added when there may have been just a few small tests > to run but as time goes by and the module accumulates more tests in a few > cases we've added mediums and larges but we've not removed the 'skip' config. > Matteo noticed this happened in hbase-procedure. > In hbase-client, there is at least a medium test that is being skipped. > Let me try purging this trick everywhere. It doesn't seem to save us anything > going by build time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-16785) We are not running all tests
[ https://issues.apache.org/jira/browse/HBASE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-16785: --- Reopening to add an addendum to disable TestHRegionServerBulkLoadWithOldSecureEndpoint... it is just failing since this patch went in enabling it. Needs looking at... > We are not running all tests > > > Key: HBASE-16785 > URL: https://issues.apache.org/jira/browse/HBASE-16785 > Project: HBase > Issue Type: Bug > Components: build, test >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: HBASE-16785.master.001.patch, HBASE-16785.master.002 > (1).patch, HBASE-16785.master.002 (2).patch, HBASE-16785.master.002 > (2).patch, HBASE-16785.master.002.patch, HBASE-16785.master.002.patch, > HBASE-16785.master.003.patch > > > Noticed by [~mbertozzi] > We have some modules where we tried to 'skip' the running of the second part > of tests -- medium and larges. That might have made sense once when the > module was originally added when there may have been just a few small tests > to run but as time goes by and the module accumulates more tests in a few > cases we've added mediums and larges but we've not removed the 'skip' config. > Matteo noticed this happened in hbase-procedure. > In hbase-client, there is at least a medium test that is being skipped. > Let me try purging this trick everywhere. It doesn't seem to save us anything > going by build time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Incremental import from HBase to Hive
Sure, There are several applications talks to HBase and populate data, Now I want to load Incrementally data from HBase and do transformations like Data Quality (filters) and save at Hive. Incremental load means - I want to run this job weekly, and making sure should not get duplication at Hive level. Thanks. On Sat, Jan 28, 2017 at 1:00 AM, Josh Elser wrote: > (-cc dev) > > Might you be able to be more specific in the context of your question? > > What kind of requirements do you have? > > > Chetan Khatri wrote: > >> Hello Community, >> >> I am working with HBase 1.2.4 , what would be the best approach to do >> Incremental load from HBase to Hive ? >> >> Thanks. >> >>
[jira] [Created] (HBASE-17565) StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum
Ted Yu created HBASE-17565: -- Summary: StochasticLoadBalancer may incorrectly skip balancing due to skewed multiplier sum Key: HBASE-17565 URL: https://issues.apache.org/jira/browse/HBASE-17565 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu I was investigating why a 6 node cluster kept skipping balancing requests. Here were the region counts on the servers: 449, 448, 447, 449, 453, 0 {code} 2017-01-26 22:04:47,145 INFO [RpcServer.deafult.FPBQ.Fifo.handler=1,queue=0,port=16000] balancer.StochasticLoadBalancer: Skipping load balancing because balanced cluster; total cost is 127.0171157050385, sum multiplier is 111087.0 min cost which need balance is 0.05 {code} The big multiplier sum caught my eyes. Here was what additional debug logging showed: {code} 2017-01-27 23:25:31,749 DEBUG [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] balancer.StochasticLoadBalancer: class org.apache.hadoop.hbase.master.balancer. StochasticLoadBalancer$RegionReplicaHostCostFunction with multiplier 10.0 2017-01-27 23:25:31,749 DEBUG [RpcServer.deafult.FPBQ.Fifo.handler=9,queue=0,port=16000] balancer.StochasticLoadBalancer: class org.apache.hadoop.hbase.master.balancer. StochasticLoadBalancer$RegionReplicaRackCostFunction with multiplier 1.0 {code} Note however, that no table in the cluster used read replica. I can think of two ways of fixing this situation: 1. If there is no read replica in the cluster, ignore the multipliers for the above two functions. 2. When cost() returned by the CostFunction is 0 (or very very close to 0.0), ignore the multiplier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17564) Fix remaining calls to deprecated methods of Admin and HBaseAdmin
Jan Hentschel created HBASE-17564: - Summary: Fix remaining calls to deprecated methods of Admin and HBaseAdmin Key: HBASE-17564 URL: https://issues.apache.org/jira/browse/HBASE-17564 Project: HBase Issue Type: Improvement Reporter: Jan Hentschel Assignee: Jan Hentschel Priority: Trivial Fix the remaining calls to deprecated methods of the *Admin* interface and the *HBaseAdmin* class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17563) Foreach and switch in RootDocProcessor and StabilityOptions
Jan Hentschel created HBASE-17563: - Summary: Foreach and switch in RootDocProcessor and StabilityOptions Key: HBASE-17563 URL: https://issues.apache.org/jira/browse/HBASE-17563 Project: HBase Issue Type: Improvement Reporter: Jan Hentschel Assignee: Jan Hentschel Priority: Trivial To make the code in *RootDocProcessor* and *StabilityOptions* more readable change the existing for-loops and if-statements to foreach-loops and switch-statements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Re: Replication resiliency
There is an old JIRA somewhere to use Error Prone (https://github.com/google/error-prone) as framework for implementing static code analysis checks like that. FWIW > On Jan 27, 2017, at 1:03 PM, Sean Busbey wrote: > > Josh, probably worth checking if a grep or something else we can do in > precommit could catch this. > > > >> On Fri, Jan 27, 2017 at 1:26 PM, Josh Elser wrote: >> Cool. >> >> Let me open an issue to scan the codebase to see if we can find any >> instances where we are starting threads which aren't using the UEH. >> >> >> Andrew Purtell wrote: >>> >>> Agreed, let's abort with an abundance of caution. That is the _least_ that >>> should be done when a thread dies unexpectedly. Maybe we can improve >>> resiliency for specific cases later. >>> >>> >>> On Jan 26, 2017, at 5:53 PM, Enis Söztutar wrote: >>> > Do we have worker threads that we can't safely continue without indefinitely? Can we solve the general problem of "unhandled exception in threads cause a RS Abort"? We have this already in the code base. We are injecting an UncaughtExceptionhandler (which is calling Abortable.abort) to almost all of the HRegionServer service threads (see HRS.startServiceThreads). But I've also seen cases where some important thread got unmarked. I think it is good idea to revisit that and make sure that all the threads are injected with the UEH. The replication source threads are started on demand, that is why the UEH is not injected I think. But agreed that we should do the safe route here, and abort the regionserver. Enis > On Thu, Jan 26, 2017 at 2:19 PM, Josh Elser wrote: > > +1 If any "worker" thread can't safely/reasonably retry some unexpected > exception without a reasonable expectation of self-healing, tank the RS. > > Having those threads die but not the RS could go uncaught for indefinite > period of time. > > > Sean Busbey wrote: > >> I've noticed a few other places where we can lose a worker thread and >> the RegionServer happily continues. One notable example is the worker >> threads that handle syncs for the WAL. I'm generally a >> fail-fast-and-loud advocate, so I like aborting when things look >> wonky. I've also had to deal with a lot of pain around silent and thus >> hard to see replication failures, so strong signals that the >> replication system is in a bad way sound good to me atm. >> >> Do we have worker threads that we can't safely continue without >> indefinitely? Can we solve the general problem of "unhandled exception >> in threads cause a RS Abort"? >> >> As mentioned on the jira, I do worry a bit about cluster stability and >> cascading failures, given the ability to have user-provided endpoints >> in the replication process. Ultimately, I don't see that as different >> than all the other places coprocessors can put the cluster at risk. >> >>> On Thu, Jan 26, 2017 at 2:48 PM, Sean Busbey >>> wrote: >>> >>> (edited subject to ensure folks filtering for DISCUSS see this) >>> >>> >>> >>> On Thu, Jan 26, 2017 at 1:58 PM, Gary Helmling >>> wrote: >>> Over in HBASE-17381 there has been some discussion around whether an unhandled exception in a ReplicationSourceWorkerThread should trigger a regionserver abort. The current behavior in the case of an unexpected exception in ReplicationSourceWorkerThread.run() is to log a message and simply let the thread die, allowing replication for this source to back up. I've seen this happen in an OOME scenario, which seems like a clear case where we would be better off aborting the regionserver. However, in the case of any other unexpected exceptions out of the run() method, how do we want to handle this? I'm of the general opinion that where we would be better off aborting on all unexpected exceptions, as it means that: a) we have some missing error handling b) failing fast raises visibility and makes it easier to add any error handling that should be there c) silently stopping up replication creates problems that are difficult for our users to identify operationally and hard to troubleshoot. However, the current behavior has been there for quite a while, and maybe there are other situations or concerns I'm not seeing which would justify having regionserver stability over replication stability. What are folks thoughts on this? Should the regionserver abort on all unexpected exceptions in the run method or should we mor
Re: Reminder: Please use git am when committing patches
I'm a strong +1 for getting as many of our committers as possible over to using git am or some way of making sure the contributor shows up as the author. I agree with the previous statements about incentivizing participation by making sure contributors get credit in places they'll commonly look (e.g. github and other open source contribution aggregators). Additionally, as a PMC member it is much easier to try to build an idea of contributions to the repo over time if I can scan through git author and sign-off tags than if I have to try to pick out bespoke commit message details. On Thu, Jan 26, 2017 at 8:45 PM, Apekshit Sharma wrote: > Yes, many people who contribute to open source like their work to show up > in git profile. I do for one. > Past thread on this discussion: > http://search-hadoop.com/m/HBase/YGbb14CnYDA6GX1/threaded > > On Thu, Jan 26, 2017 at 6:01 PM, 张铎(Duo Zhang) > wrote: > >> I think the point here is github will not count the name in the commit >> message as a contributor for the project? >> >> Enis Söztutar 于2017年1月27日 周五09:41写道: >> >> > Yep, we have been indeed using the msg (author name) historically. In >> cases >> > where author info is there, the guideline is to use git am --signoff. >> > >> > I don't think we should require git comit --author, as long as there is >> > attribution in the commit msg. But we can do a --author as an option. >> > >> > Enis >> > >> > On Thu, Jan 26, 2017 at 4:29 PM, 张铎(Duo Zhang) >> > wrote: >> > >> > > See here, our committer guide says that the commit message format >> should >> > be >> > > HBASE-XXX XXX (the actual author). >> > > >> > > http://hbase.apache.org/book.html#_commit_message_format >> > > >> > > I think the rule was setup when we were still on svn, but we do not >> > change >> > > it. >> > > >> > > I‘m a big +1 on always using 'git am' if possible, and set >> > > author explicitly when using 'git commit'. One more thing is that, do >> not >> > > forget to add --signoff :) >> > > >> > > Mind opening a issue to modify the above section in hbase book, Appy? >> > > >> > > Thanks. >> > > >> > > 2017-01-27 4:12 GMT+08:00 Chetan Khatri : >> > > >> > > > Thanks , Appy. For understanding because i am very new to open source >> > > > contribution. >> > > > >> > > > On Fri, Jan 27, 2017 at 1:25 AM, Apekshit Sharma >> > > > wrote: >> > > > >> > > > > Hi devs, >> > > > > >> > > > > A recent question by new contributor (thread: Git Pull Request to >> > > HBase) >> > > > > and looking at history of commits, i think it is worth re-iterating >> > > that >> > > > > committer should use *git am* while committing patches so that >> > whoever >> > > > did >> > > > > the actual work gets appropriate credit in git history. >> > > > > In case the patch uploaded is unformatted patch (without author >> tag), >> > > > > please use >> > > > > *git commit --author=.* >> > > > > I don't think we should be lax about it as a community if the >> > original >> > > > > author doesn't get appropriate credit. >> > > > > >> > > > > Thanks >> > > > > -- Appy >> > > > > >> > > > >> > > >> > >> > > > > -- > > -- Appy
Re: [DISCUSS] Re: Replication resiliency
Josh, probably worth checking if a grep or something else we can do in precommit could catch this. On Fri, Jan 27, 2017 at 1:26 PM, Josh Elser wrote: > Cool. > > Let me open an issue to scan the codebase to see if we can find any > instances where we are starting threads which aren't using the UEH. > > > Andrew Purtell wrote: >> >> Agreed, let's abort with an abundance of caution. That is the _least_ that >> should be done when a thread dies unexpectedly. Maybe we can improve >> resiliency for specific cases later. >> >> >> On Jan 26, 2017, at 5:53 PM, Enis Söztutar wrote: >> Do we have worker threads that we can't safely continue without >>> >>> indefinitely? Can we solve the general problem of "unhandled exception >>> in threads cause a RS Abort"? >>> We have this already in the code base. We are injecting an >>> UncaughtExceptionhandler (which is calling Abortable.abort) to almost all >>> of the HRegionServer service threads (see HRS.startServiceThreads). But >>> I've also seen cases where some important thread got unmarked. I think it >>> is good idea to revisit that and make sure that all the threads are >>> injected with the UEH. >>> >>> The replication source threads are started on demand, that is why the UEH >>> is not injected I think. But agreed that we should do the safe route >>> here, >>> and abort the regionserver. >>> >>> Enis >>> On Thu, Jan 26, 2017 at 2:19 PM, Josh Elser wrote: +1 If any "worker" thread can't safely/reasonably retry some unexpected exception without a reasonable expectation of self-healing, tank the RS. Having those threads die but not the RS could go uncaught for indefinite period of time. Sean Busbey wrote: > I've noticed a few other places where we can lose a worker thread and > the RegionServer happily continues. One notable example is the worker > threads that handle syncs for the WAL. I'm generally a > fail-fast-and-loud advocate, so I like aborting when things look > wonky. I've also had to deal with a lot of pain around silent and thus > hard to see replication failures, so strong signals that the > replication system is in a bad way sound good to me atm. > > Do we have worker threads that we can't safely continue without > indefinitely? Can we solve the general problem of "unhandled exception > in threads cause a RS Abort"? > > As mentioned on the jira, I do worry a bit about cluster stability and > cascading failures, given the ability to have user-provided endpoints > in the replication process. Ultimately, I don't see that as different > than all the other places coprocessors can put the cluster at risk. > >> On Thu, Jan 26, 2017 at 2:48 PM, Sean Busbey >> wrote: >> >> (edited subject to ensure folks filtering for DISCUSS see this) >> >> >> >> On Thu, Jan 26, 2017 at 1:58 PM, Gary Helmling >> wrote: >> >>> Over in HBASE-17381 there has been some discussion around whether an >>> unhandled exception in a ReplicationSourceWorkerThread should trigger >>> a >>> regionserver abort. >>> >>> The current behavior in the case of an unexpected exception in >>> ReplicationSourceWorkerThread.run() is to log a message and simply >>> let >>> the >>> thread die, allowing replication for this source to back up. >>> >>> I've seen this happen in an OOME scenario, which seems like a clear >>> case >>> where we would be better off aborting the regionserver. >>> >>> However, in the case of any other unexpected exceptions out of the >>> run() >>> method, how do we want to handle this? >>> >>> I'm of the general opinion that where we would be better off aborting >>> on >>> all unexpected exceptions, as it means that: >>> a) we have some missing error handling >>> b) failing fast raises visibility and makes it easier to add any >>> error >>> handling that should be there >>> c) silently stopping up replication creates problems that are >>> difficult >>> for >>> our users to identify operationally and hard to troubleshoot. >>> >>> However, the current behavior has been there for quite a while, and >>> maybe >>> there are other situations or concerns I'm not seeing which would >>> justify >>> having regionserver stability over replication stability. >>> >>> What are folks thoughts on this? Should the regionserver abort on >>> all >>> unexpected exceptions in the run method or should we more narrowly >>> focus >>> this on OOME's? >>> >
[jira] [Resolved] (HBASE-14477) Compaction improvements: Date tiered compaction policy
[ https://issues.apache.org/jira/browse/HBASE-14477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-14477. --- Resolution: Duplicate Duplicate of HBASE-15181 > Compaction improvements: Date tiered compaction policy > -- > > Key: HBASE-14477 > URL: https://issues.apache.org/jira/browse/HBASE-14477 > Project: HBase > Issue Type: New Feature >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > For immutable and mostly immutable data the current SizeTiered-based > compaction policy is not efficient. > # There is no need to compact all files into one, because, data is (mostly) > immutable and we do not need to collect garbage. (performance reason will be > discussed later) > # Size-tiered compaction is not suitable for applications where most recent > data is most important and prevents efficient caching of this data. > The idea is pretty similar to DateTieredCompaction in Cassandra: > http://www.datastax.com/dev/blog/datetieredcompactionstrategy > http://www.datastax.com/dev/blog/dtcs-notes-from-the-field > From Cassandra own blog: > {quote} > Since DTCS can be used with any table, it is important to know when it is a > good idea, and when it is not. I’ll try to explain the spectrum and > trade-offs here: > 1. Perfect Fit: Time Series Fact Data, Deletes by Default TTL: When you > ingest fact data that is ordered in time, with no deletes or overwrites. This > is the standard “time series” use case. > 2. OK Fit: Time-Ordered, with limited updates across whole data set, or only > updates to recent data: When you ingest data that is (mostly) ordered in > time, but revise or delete a very small proportion of the overall data across > the whole timeline. > 3. Not a Good Fit: many partial row updates or deletions over time: When you > need to partially revise or delete fields for rows that you read together. > Also, when you revise or delete rows within clustered reads. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Re: Replication resiliency
Thanks for the feedback, all. Makes sense to me. I'll follow up in the issue to use the same UEH to abort or at least same abort handling as other cases (the current UEH used for replication source worker threads only logs). On Fri, Jan 27, 2017 at 11:27 AM Josh Elser wrote: Cool. Let me open an issue to scan the codebase to see if we can find any instances where we are starting threads which aren't using the UEH. Andrew Purtell wrote: > Agreed, let's abort with an abundance of caution. That is the _least_ that should be done when a thread dies unexpectedly. Maybe we can improve resiliency for specific cases later. > > > On Jan 26, 2017, at 5:53 PM, Enis Söztutar wrote: > >>> Do we have worker threads that we can't safely continue without >> indefinitely? Can we solve the general problem of "unhandled exception >> in threads cause a RS Abort"? >> We have this already in the code base. We are injecting an >> UncaughtExceptionhandler (which is calling Abortable.abort) to almost all >> of the HRegionServer service threads (see HRS.startServiceThreads). But >> I've also seen cases where some important thread got unmarked. I think it >> is good idea to revisit that and make sure that all the threads are >> injected with the UEH. >> >> The replication source threads are started on demand, that is why the UEH >> is not injected I think. But agreed that we should do the safe route here, >> and abort the regionserver. >> >> Enis >> >>> On Thu, Jan 26, 2017 at 2:19 PM, Josh Elser wrote: >>> >>> +1 If any "worker" thread can't safely/reasonably retry some unexpected >>> exception without a reasonable expectation of self-healing, tank the RS. >>> >>> Having those threads die but not the RS could go uncaught for indefinite >>> period of time. >>> >>> >>> Sean Busbey wrote: >>> I've noticed a few other places where we can lose a worker thread and the RegionServer happily continues. One notable example is the worker threads that handle syncs for the WAL. I'm generally a fail-fast-and-loud advocate, so I like aborting when things look wonky. I've also had to deal with a lot of pain around silent and thus hard to see replication failures, so strong signals that the replication system is in a bad way sound good to me atm. Do we have worker threads that we can't safely continue without indefinitely? Can we solve the general problem of "unhandled exception in threads cause a RS Abort"? As mentioned on the jira, I do worry a bit about cluster stability and cascading failures, given the ability to have user-provided endpoints in the replication process. Ultimately, I don't see that as different than all the other places coprocessors can put the cluster at risk. > On Thu, Jan 26, 2017 at 2:48 PM, Sean Busbey wrote: > > (edited subject to ensure folks filtering for DISCUSS see this) > > > > On Thu, Jan 26, 2017 at 1:58 PM, Gary Helmling > wrote: > >> Over in HBASE-17381 there has been some discussion around whether an >> unhandled exception in a ReplicationSourceWorkerThread should trigger a >> regionserver abort. >> >> The current behavior in the case of an unexpected exception in >> ReplicationSourceWorkerThread.run() is to log a message and simply let >> the >> thread die, allowing replication for this source to back up. >> >> I've seen this happen in an OOME scenario, which seems like a clear case >> where we would be better off aborting the regionserver. >> >> However, in the case of any other unexpected exceptions out of the run() >> method, how do we want to handle this? >> >> I'm of the general opinion that where we would be better off aborting on >> all unexpected exceptions, as it means that: >> a) we have some missing error handling >> b) failing fast raises visibility and makes it easier to add any error >> handling that should be there >> c) silently stopping up replication creates problems that are difficult >> for >> our users to identify operationally and hard to troubleshoot. >> >> However, the current behavior has been there for quite a while, and >> maybe >> there are other situations or concerns I'm not seeing which would >> justify >> having regionserver stability over replication stability. >> >> What are folks thoughts on this? Should the regionserver abort on all >> unexpected exceptions in the run method or should we more narrowly focus >> this on OOME's? >>
[jira] [Created] (HBASE-17562) Remove documentation for coprocessor execution times after HBASE-14205
Enis Soztutar created HBASE-17562: - Summary: Remove documentation for coprocessor execution times after HBASE-14205 Key: HBASE-17562 URL: https://issues.apache.org/jira/browse/HBASE-17562 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Thanks, [~manniche] for reporting. Opened up a subtask. Feel free to pick it up if you want to patch it yourself. Otherwise, I can do a quick patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17561) table status page should escape table name
Sean Busbey created HBASE-17561: --- Summary: table status page should escape table name Key: HBASE-17561 URL: https://issues.apache.org/jira/browse/HBASE-17561 Project: HBase Issue Type: Bug Components: master, UI Reporter: Sean Busbey Assignee: Sean Busbey We write out table names to an html document without escaping html entities e.g. in this case it even comes directly from the request {code} <% if ( !readOnly && action != null ) { %> HBase Master: <%= master.getServerName() %> <% } else { %> Table: <%= fqtn %> <% } %> {code} in https://github.com/apache/hbase/blob/master/hbase-server/src/main/resources/hbase-webapps/master/table.jsp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17560) HMaster redirect should sanity check user input
Sean Busbey created HBASE-17560: --- Summary: HMaster redirect should sanity check user input Key: HBASE-17560 URL: https://issues.apache.org/jira/browse/HBASE-17560 Project: HBase Issue Type: Bug Components: master, security, UI Reporter: Sean Busbey We should do some sanity checking on the user provided data before we blindly pass it to a redirect. i.e. {code} public static class RedirectServlet extends HttpServlet { private static final long serialVersionUID = 2894774810058302472L; private static int regionServerInfoPort; @Override public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { String redirectUrl = request.getScheme() + "://" + request.getServerName() + ":" + regionServerInfoPort + request.getRequestURI(); response.sendRedirect(redirectUrl); } } {code} e.g. * Are we reidrecting to a server that is ours? * Did we validate the path/query string? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17559) Verify service threads are using the UncaughtExceptionHandler
Josh Elser created HBASE-17559: -- Summary: Verify service threads are using the UncaughtExceptionHandler Key: HBASE-17559 URL: https://issues.apache.org/jira/browse/HBASE-17559 Project: HBase Issue Type: Task Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.4.0 (Context: https://lists.apache.org/thread.html/47c9a0f7193eaf0546ce241cfe093885366f5177ed867e18b45d77b9@%3Cdev.hbase.apache.org%3E) We should take a once-over on the threads we start in the RegionServer/Master to make sure that they're using the UncaughtExceptionHandler to prevent the case where the process keeps running but one of our threads has died. Such a situation may result in operators not realizing that their HBase instance is not actually running as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Re: Replication resiliency
Cool. Let me open an issue to scan the codebase to see if we can find any instances where we are starting threads which aren't using the UEH. Andrew Purtell wrote: Agreed, let's abort with an abundance of caution. That is the _least_ that should be done when a thread dies unexpectedly. Maybe we can improve resiliency for specific cases later. On Jan 26, 2017, at 5:53 PM, Enis Söztutar wrote: Do we have worker threads that we can't safely continue without indefinitely? Can we solve the general problem of "unhandled exception in threads cause a RS Abort"? We have this already in the code base. We are injecting an UncaughtExceptionhandler (which is calling Abortable.abort) to almost all of the HRegionServer service threads (see HRS.startServiceThreads). But I've also seen cases where some important thread got unmarked. I think it is good idea to revisit that and make sure that all the threads are injected with the UEH. The replication source threads are started on demand, that is why the UEH is not injected I think. But agreed that we should do the safe route here, and abort the regionserver. Enis On Thu, Jan 26, 2017 at 2:19 PM, Josh Elser wrote: +1 If any "worker" thread can't safely/reasonably retry some unexpected exception without a reasonable expectation of self-healing, tank the RS. Having those threads die but not the RS could go uncaught for indefinite period of time. Sean Busbey wrote: I've noticed a few other places where we can lose a worker thread and the RegionServer happily continues. One notable example is the worker threads that handle syncs for the WAL. I'm generally a fail-fast-and-loud advocate, so I like aborting when things look wonky. I've also had to deal with a lot of pain around silent and thus hard to see replication failures, so strong signals that the replication system is in a bad way sound good to me atm. Do we have worker threads that we can't safely continue without indefinitely? Can we solve the general problem of "unhandled exception in threads cause a RS Abort"? As mentioned on the jira, I do worry a bit about cluster stability and cascading failures, given the ability to have user-provided endpoints in the replication process. Ultimately, I don't see that as different than all the other places coprocessors can put the cluster at risk. On Thu, Jan 26, 2017 at 2:48 PM, Sean Busbey wrote: (edited subject to ensure folks filtering for DISCUSS see this) On Thu, Jan 26, 2017 at 1:58 PM, Gary Helmling wrote: Over in HBASE-17381 there has been some discussion around whether an unhandled exception in a ReplicationSourceWorkerThread should trigger a regionserver abort. The current behavior in the case of an unexpected exception in ReplicationSourceWorkerThread.run() is to log a message and simply let the thread die, allowing replication for this source to back up. I've seen this happen in an OOME scenario, which seems like a clear case where we would be better off aborting the regionserver. However, in the case of any other unexpected exceptions out of the run() method, how do we want to handle this? I'm of the general opinion that where we would be better off aborting on all unexpected exceptions, as it means that: a) we have some missing error handling b) failing fast raises visibility and makes it easier to add any error handling that should be there c) silently stopping up replication creates problems that are difficult for our users to identify operationally and hard to troubleshoot. However, the current behavior has been there for quite a while, and maybe there are other situations or concerns I'm not seeing which would justify having regionserver stability over replication stability. What are folks thoughts on this? Should the regionserver abort on all unexpected exceptions in the run method or should we more narrowly focus this on OOME's?
[jira] [Created] (HBASE-17558) ZK dumping jsp should escape html
Sean Busbey created HBASE-17558: --- Summary: ZK dumping jsp should escape html Key: HBASE-17558 URL: https://issues.apache.org/jira/browse/HBASE-17558 Project: HBase Issue Type: Bug Components: security, UI Reporter: Sean Busbey Priority: Minor Right now the ZK status page in the master dumps data from ZK using ZKUtil without doing any processing to e.g. escape HTML entities. ie.: {codE} ZooKeeper Dump <%= ZKUtil.dump(watcher).trim() %> {code} current url: https://github.com/apache/hbase/blob/master/hbase-server/src/main/resources/hbase-webapps/master/zk.jsp#L83 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Incremental import from HBase to Hive
Hello Community, I am working with HBase 1.2.4 , what would be the best approach to do Incremental load from HBase to Hive ? Thanks.
[jira] [Created] (HBASE-17557) HRegionServer#reportRegionSizesForQuotas() should respond to UnsupportedOperationException
Ted Yu created HBASE-17557: -- Summary: HRegionServer#reportRegionSizesForQuotas() should respond to UnsupportedOperationException Key: HBASE-17557 URL: https://issues.apache.org/jira/browse/HBASE-17557 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Assignee: Ted Yu When master doesn't support quota, you would see the following repeatedly in region server log: {code} 2017-01-27 17:24:27,389 DEBUG [cn011.x.com,16020,1485468203653_ChoreService_1] regionserver.HRegionServer: Failed to report region sizes to Master. This will be retried. org.apache.hadoop.hbase.DoNotRetryIOException: /23:16000 is unable to read call parameter from client 21; java.lang.UnsupportedOperationException: ReportRegionSpaceUse at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:334) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportRegionSizesForQuotas(HRegionServer.java:1211) at org.apache.hadoop.hbase.quotas.FileSystemUtilizationChore.reportRegionSizesToMaster(FileSystemUtilizationChore.java:170) at org.apache.hadoop.hbase.quotas.FileSystemUtilizationChore.chore(FileSystemUtilizationChore.java:129) at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): /23:16000 is unable to read call parameter from client 21; java.lang. UnsupportedOperationException: ReportRegionSpaceUse at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1225) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRegionSpaceUse(RegionServerStatusProtos.java:10919) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportRegionSizesForQuotas(HRegionServer.java:1209) {code} HRegionServer.reportRegionSizesForQuotas() should respond to UnsupportedOperationException and stop retrying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Successful: HBase Generate Website
Build status: Successful If successful, the website and docs have been generated. To update the live site, follow the instructions below. If failed, skip to the bottom of this email. Use the following commands to download the patch and apply it to a clean branch based on origin/asf-site. If you prefer to keep the hbase-site repo around permanently, you can skip the clone step. git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git cd hbase-site wget -O- https://builds.apache.org/job/hbase_generate_website/472/artifact/website.patch.zip | funzip > 92fc4c0cc8efddae74662ac26c9b821402dd6394.patch git fetch git checkout -b asf-site-92fc4c0cc8efddae74662ac26c9b821402dd6394 origin/asf-site git am --whitespace=fix 92fc4c0cc8efddae74662ac26c9b821402dd6394.patch At this point, you can preview the changes by opening index.html or any of the other HTML pages in your local asf-site-92fc4c0cc8efddae74662ac26c9b821402dd6394 branch. There are lots of spurious changes, such as timestamps and CSS styles in tables, so a generic git diff is not very useful. To see a list of files that have been added, deleted, renamed, changed type, or are otherwise interesting, use the following command: git diff --name-status --diff-filter=ADCRTXUB origin/asf-site To see only files that had 100 or more lines changed: git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}' When you are satisfied, publish your changes to origin/asf-site using these commands: git commit --allow-empty -m "Empty commit" # to work around a current ASF INFRA bug git push origin asf-site-92fc4c0cc8efddae74662ac26c9b821402dd6394:asf-site git checkout asf-site git branch -D asf-site-92fc4c0cc8efddae74662ac26c9b821402dd6394 Changes take a couple of minutes to be propagated. You can verify whether they have been propagated by looking at the Last Published date at the bottom of http://hbase.apache.org/. It should match the date in the index.html on the asf-site branch in Git. As a courtesy- reply-all to this email to let other committers know you pushed the site. If failed, see https://builds.apache.org/job/hbase_generate_website/472/console
[jira] [Reopened] (HBASE-17526) Procedure v2 - cleanup isSuspended from MasterProcedureScheduler#Queue
[ https://issues.apache.org/jira/browse/HBASE-17526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey reopened HBASE-17526: - Reopening. This commit introduced a findbugs warning that is now pinging every patch submission that touches hbase-server. Please include an addendum that cleans it up. One example: https://builds.apache.org/job/PreCommit-HBASE-Build/5449/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html Please check on precommit results before pushing. > Procedure v2 - cleanup isSuspended from MasterProcedureScheduler#Queue > -- > > Key: HBASE-17526 > URL: https://issues.apache.org/jira/browse/HBASE-17526 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17526.master.001.patch > > > Queue#setSuspended() is not used anywhere, probably because when queue > wait/wakes on an event, it gets removed or added back to the fairq. > Removing state, functions, and uses of isSuspended() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17556) The client will not invalidate stale region caches
Marcin Januszkiewicz created HBASE-17556: Summary: The client will not invalidate stale region caches Key: HBASE-17556 URL: https://issues.apache.org/jira/browse/HBASE-17556 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.24, 1.0.0, 2.0.0 Reporter: Marcin Januszkiewicz Priority: Critical We noticed in our application, that sometimes when we interact with a table an operation will fail with an exception, an all operations that happen on the same region will also fail until the application is restarted. It seems that when a merge or split happens on a region that is already in the clients cache, and the client is configured to retry operations, then there is no way for the client to detect this. In RpcRetryingCaller#callWithRetries if a call fails with RegionNotServingException then the cache will be cleared only if the retry parameter is equal to 1. This means the call will fail but the following calls will succeed. RpcRetryingCaller#callWithoutRetries contains the comment "It would be nice to clear the location cache here". Additionally, the stale cache will cause this call to fail, even though the data is available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17555) Change calls to deprecated getHBaseAdmin to getAdmin
Jan Hentschel created HBASE-17555: - Summary: Change calls to deprecated getHBaseAdmin to getAdmin Key: HBASE-17555 URL: https://issues.apache.org/jira/browse/HBASE-17555 Project: HBase Issue Type: Improvement Reporter: Jan Hentschel Assignee: Jan Hentschel Priority: Minor *HBaseTestingUtil.getHBaseAdmin* is deprecated and was replaced with *getAdmin*. Change the calls to *getHBaseAdmin* to *getAdmin* where possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)