Re: [ANNOUNCE] PhoenixCon 2016 on Wed, May 25th 9am-1pm
Hi James, I spoke to my manager and he is fine with the idea of giving the talk. Now, he is gonna ask higher management for final approval. I am assuming there is still a slot for my talk in use case srction. I should go ahead with my approval process. Correct? Thanks, Anil Gupta Sent from my iPhone > On Apr 26, 2016, at 5:56 PM, James Taylorwrote: > > We invite you to attend the inaugural PhoenixCon on Wed, May 25th 9am-1pm > (the day after HBaseCon) hosted by Salesforce.com in San Francisco. There > will be two tracks: one for use cases and one for internals. Drop me a note > if you're interested in giving a talk. To RSVP and for more details, see > here[1]. > > Thanks, > James > > [1] http://www.meetup.com/SF-Bay-Area-Apache-Phoenix-Meetup/events/230545182
[ANNOUNCE] PhoenixCon 2016 on Wed, May 25th 9am-1pm
We invite you to attend the inaugural PhoenixCon on Wed, May 25th 9am-1pm (the day after HBaseCon) hosted by Salesforce.com in San Francisco. There will be two tracks: one for use cases and one for internals. Drop me a note if you're interested in giving a talk. To RSVP and for more details, see here[1]. Thanks, James [1] http://www.meetup.com/SF-Bay-Area-Apache-Phoenix-Meetup/events/230545182
Re: Slow sync cost
That is interesting. Would it be possible for you to share what GC settings you ended up on that gave you the most predictable performance? Thanks. Saad On Tue, Apr 26, 2016 at 11:56 AM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote: > We were seeing this for a while with our CDH5 HBase clusters too. We > eventually correlated it very closely to GC pauses. Through heavily tuning > our GC we were able to drastically reduce the logs, by keeping most GC's > under 100ms. > > On Tue, Apr 26, 2016 at 6:25 AM Saad Muftiwrote: > > > From what I can see in the source code, the default is actually even > lower > > at 100 ms (can be overridden with hbase.regionserver.hlog.slowsync.ms). > > > > > > Saad > > > > > > On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowling > > > wrote: > > > > > I see similar log spam while system has reasonable performance. Was > the > > > 250ms default chosen with SSDs and 10ge in mind or something? I guess > > I'm > > > surprised a sync write several times through JVMs to 2 remote datanodes > > > would be expected to consistently happen that fast. > > > > > > Regards, > > > > > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti > > wrote: > > > > > > > Hi, > > > > > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're constantly > > > seeing > > > > the following messages in the region server logs: > > > > > > > > 2016-04-25 14:02:55,178 INFO > > > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258 > > ms, > > > > current pipeline: > > > > [DatanodeInfoWithStorage[10.99.182.165:50010 > > > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK], > > > > DatanodeInfoWithStorage[10.99.182.236:50010 > > > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK], > > > > DatanodeInfoWithStorage[10.99.182.195:50010 > > > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]] > > > > > > > > > > > > These happen regularly while HBase appear to be operating normally > with > > > > decent read and write performance. We do have occasional performance > > > > problems when regions are auto-splitting, and at first I thought this > > was > > > > related but now I se it happens all the time. > > > > > > > > > > > > Can someone explain what this means really and should we be > concerned? > > I > > > > tracked down the source code that outputs it in > > > > > > > > > > > > > > > > > > hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > > > > > > > > but after going through the code I think I'd need to know much more > > about > > > > the code to glean anything from it or the associated JIRA ticket > > > > https://issues.apache.org/jira/browse/HBASE-11240. > > > > > > > > Also, what is this "pipeline" the ticket and code talks about? > > > > > > > > Thanks in advance for any information and/or clarification anyone can > > > > provide. > > > > > > > > > > > > > > > > Saad > > > > > > > > > >
Re: Retiring empty regions
I'm looking forward to your talk Vlad. In the mean time, I filed HBASE-15712. We'll get our implementation posted up there. We have these deployed on one of the masters, running daily with cron. @Mikhail, to get this feature into the normalizer, how about this: let's add a min number of regions property to user tables. This can be set when someone creates a table with split points, or maintained manually. The normalizer can use that as a constraint to guide its convergence. On Wed, Apr 20, 2016 at 5:18 PM, Vladimir Rodionovwrote: > >I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to > >write up a post for the blog? Meanwhile, I'm sure of a couple of us on > here > >on the list would appreciate your Cliff's Notes version. I can take this > >into account for my v2 schema design. > > Nick, there will be a presentation on time-series HBase (hbasecon.com) > Come > join us :) > > > On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk wrote: > > > > Crazy idea, but you might be able to take stripped down version of > region > > > normalizer code and make a Tool to run? Requesting split or merge is > done > > > through the client API, and the only weighing information you need is > > > whether region empty or not, that you could find out too? > > > > Yeah, that's the direction I'm headed. > > > > > A bit off topic, but I think unfortunately region normalizer now > ignores > > > empty regions to avoid undoing pre-split on the table. > > > > Unfortunate indeed. Maybe we should be keeping around the initial splits > > list as a metadata attribute on the table? > > > > > With a right row-key design you will never have empty regions due to > TTL. > > > > I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to > > write up a post for the blog? Meanwhile, I'm sure of a couple of us on > here > > on the list would appreciate your Cliff's Notes version. I can take this > > into account for my v2 schema design. > > > > > So Nick, merge on 1.1 is not recommended??? Was working very well on > > > previous versions. Is ProcV2 really impact it that bad?? > > > > How to answer here carefully... I have no reason to believe merge is not > > working on 1.1. I've been on the wrong end of enough "regions stuck in > > transition" support tickets that I'm not keen to put undue stress on my > > master. ProcV2 insures against many scenarios that cause master trauma, > > hence my interest in the implementation details and my preference for > > cluster administration tasks that use it as their source of authority. > > > > Thanks for the thoughts folks. > > -n > > > > On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari < > > jean-m...@spaggiari.org> wrote: > > > > > ;) That was not the question ;) > > > > > > So Nick, merge on 1.1 is not recommended??? Was working very well on > > > previous versions. Is ProcV2 really impact it that bad?? > > > > > > JMS > > > > > > 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov : > > > > > > > >> This is something > > > > >> which makes it far less useful for time-series databases with > short > > > TTL > > > > on > > > > >> the tables. > > > > > > > > With a right row-key design you will never have empty regions due to > > TTL. > > > > > > > > -Vlad > > > > > > > > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov < > > olorinb...@gmail.com> > > > > wrote: > > > > > > > > > Crazy idea, but you might be able to take stripped down version of > > > region > > > > > normalizer code and make a Tool to run? Requesting split or merge > is > > > done > > > > > through the client API, and the only weighing information you need > is > > > > > whether region empty or not, that you could find out too? > > > > > > > > > > > > > > > "Short of upgrading to 1.2 for the region normalizer," > > > > > > > > > > A bit off topic, but I think unfortunately region normalizer now > > > ignores > > > > > empty regions to avoid undoing pre-split on the table. This is > > > something > > > > > which makes it far less useful for time-series databases with short > > TTL > > > > on > > > > > the tables. We'll need to address that. > > > > > > > > > > -Mikhail > > > > > > > > > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk > > > > wrote: > > > > > > > > > > > Hi folks, > > > > > > > > > > > > I have a table with TTL enabled. It's been receiving data for a > > while > > > > > > beyond the TTL and I now have a number of empty regions. I'd like > > to > > > > drop > > > > > > those empty regions to free up heap space on the region servers > and > > > > > reduce > > > > > > master load. I'm running a 1.1 derivative. > > > > > > > > > > > > The only threads I found on this topic are from circa 0.92 > > timeframe. > > > > > > > > > > > > Short of upgrading to 1.2 for the region normalizer, what's the > > > > > recommended > > > > > > method of cleaning up this cruft? Should I be merging empty > regions > > > > into
Re: Slow sync cost
We were seeing this for a while with our CDH5 HBase clusters too. We eventually correlated it very closely to GC pauses. Through heavily tuning our GC we were able to drastically reduce the logs, by keeping most GC's under 100ms. On Tue, Apr 26, 2016 at 6:25 AM Saad Muftiwrote: > From what I can see in the source code, the default is actually even lower > at 100 ms (can be overridden with hbase.regionserver.hlog.slowsync.ms). > > > Saad > > > On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowling > wrote: > > > I see similar log spam while system has reasonable performance. Was the > > 250ms default chosen with SSDs and 10ge in mind or something? I guess > I'm > > surprised a sync write several times through JVMs to 2 remote datanodes > > would be expected to consistently happen that fast. > > > > Regards, > > > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti > wrote: > > > > > Hi, > > > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're constantly > > seeing > > > the following messages in the region server logs: > > > > > > 2016-04-25 14:02:55,178 INFO > > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258 > ms, > > > current pipeline: > > > [DatanodeInfoWithStorage[10.99.182.165:50010 > > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK], > > > DatanodeInfoWithStorage[10.99.182.236:50010 > > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK], > > > DatanodeInfoWithStorage[10.99.182.195:50010 > > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]] > > > > > > > > > These happen regularly while HBase appear to be operating normally with > > > decent read and write performance. We do have occasional performance > > > problems when regions are auto-splitting, and at first I thought this > was > > > related but now I se it happens all the time. > > > > > > > > > Can someone explain what this means really and should we be concerned? > I > > > tracked down the source code that outputs it in > > > > > > > > > > > > hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > > > > > > but after going through the code I think I'd need to know much more > about > > > the code to glean anything from it or the associated JIRA ticket > > > https://issues.apache.org/jira/browse/HBASE-11240. > > > > > > Also, what is this "pipeline" the ticket and code talks about? > > > > > > Thanks in advance for any information and/or clarification anyone can > > > provide. > > > > > > > > > > > > Saad > > > > > >
Re: Re: Re: question on "drain region servers"
Please see HBASE-4298 where this feature was introduced. On Tue, Apr 26, 2016 at 5:12 AM, WangYQwrote: > yes, there is a tool graceful_stop.sh to graceful stop regionserver, and > can move the regions back to the rs after rs come back. > but i can not find the relation with drain region servers... > > > i think drain region servers function is good, but can not think up with a > pracital use case > > > > > > > > > At 2016-04-26 16:01:55, "Dejan Menges" wrote: > >One of use cases we use it is graceful stop of regionserver - you unload > >regions from the server before you restart it. Of course, after restart > you > >expect HBase to move regions back. > > > >Now I'm not really remembering correctly, but I kinda remember that one of > >the features was at least that it will move back regions which were > already > >there, hence not destroy too much block locality. > > > >On Tue, Apr 26, 2016 at 8:15 AM WangYQ wrote: > > > >> thanks > >> in hbase 0.99.0, I find the rb file: draining_servers.rb > >> > >> > >> i have some suggestions on this tool: > >> 1. if I add rs hs1 to draining_servers, when hs1 restart, the zk node > >> still exists in zk, but hmaster will not treat hs1 as draining_servers > >> i think when we add a hs to draining_servers, we do not need to > store > >> the start code in zk, just store the hostName and port > >> 2. we add hs1 to draining_servers, but if hs1 always restart, we will > >> need to add hs1 several times > >> when we need to delete the draining_servers info of hs1, we will > need > >> to delete hs1 several times > >> > >> > >> > >> finally, what is the original motivation of this tool, some scenario > >> descriptions are good. > >> > >> > >> > >> > >> > >> > >> At 2016-04-26 11:33:10, "Ted Yu" wrote: > >> >Please take a look at: > >> >bin/draining_servers.rb > >> > > >> >On Mon, Apr 25, 2016 at 8:12 PM, WangYQ > >> wrote: > >> > > >> >> in hbase, I find there is a "drain regionServer" feature > >> >> > >> >> > >> >> if a rs is added to drain regionServer in ZK, then regions will not > be > >> >> move to on these regionServers > >> >> > >> >> > >> >> but, how can a rs be add to drain regionServer, we add it handly > or > >> rs > >> >> will add itself automaticly > >> >
Re:Re: Re: question on "drain region servers"
yes, there is a tool graceful_stop.sh to graceful stop regionserver, and can move the regions back to the rs after rs come back. but i can not find the relation with drain region servers... i think drain region servers function is good, but can not think up with a pracital use case At 2016-04-26 16:01:55, "Dejan Menges"wrote: >One of use cases we use it is graceful stop of regionserver - you unload >regions from the server before you restart it. Of course, after restart you >expect HBase to move regions back. > >Now I'm not really remembering correctly, but I kinda remember that one of >the features was at least that it will move back regions which were already >there, hence not destroy too much block locality. > >On Tue, Apr 26, 2016 at 8:15 AM WangYQ wrote: > >> thanks >> in hbase 0.99.0, I find the rb file: draining_servers.rb >> >> >> i have some suggestions on this tool: >> 1. if I add rs hs1 to draining_servers, when hs1 restart, the zk node >> still exists in zk, but hmaster will not treat hs1 as draining_servers >> i think when we add a hs to draining_servers, we do not need to store >> the start code in zk, just store the hostName and port >> 2. we add hs1 to draining_servers, but if hs1 always restart, we will >> need to add hs1 several times >> when we need to delete the draining_servers info of hs1, we will need >> to delete hs1 several times >> >> >> >> finally, what is the original motivation of this tool, some scenario >> descriptions are good. >> >> >> >> >> >> >> At 2016-04-26 11:33:10, "Ted Yu" wrote: >> >Please take a look at: >> >bin/draining_servers.rb >> > >> >On Mon, Apr 25, 2016 at 8:12 PM, WangYQ >> wrote: >> > >> >> in hbase, I find there is a "drain regionServer" feature >> >> >> >> >> >> if a rs is added to drain regionServer in ZK, then regions will not be >> >> move to on these regionServers >> >> >> >> >> >> but, how can a rs be add to drain regionServer, we add it handly or >> rs >> >> will add itself automaticly >>
Re: Slow sync cost
>From what I can see in the source code, the default is actually even lower at 100 ms (can be overridden with hbase.regionserver.hlog.slowsync.ms). Saad On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowlingwrote: > I see similar log spam while system has reasonable performance. Was the > 250ms default chosen with SSDs and 10ge in mind or something? I guess I'm > surprised a sync write several times through JVMs to 2 remote datanodes > would be expected to consistently happen that fast. > > Regards, > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti wrote: > > > Hi, > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're constantly > seeing > > the following messages in the region server logs: > > > > 2016-04-25 14:02:55,178 INFO > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258 ms, > > current pipeline: > > [DatanodeInfoWithStorage[10.99.182.165:50010 > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK], > > DatanodeInfoWithStorage[10.99.182.236:50010 > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK], > > DatanodeInfoWithStorage[10.99.182.195:50010 > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]] > > > > > > These happen regularly while HBase appear to be operating normally with > > decent read and write performance. We do have occasional performance > > problems when regions are auto-splitting, and at first I thought this was > > related but now I se it happens all the time. > > > > > > Can someone explain what this means really and should we be concerned? I > > tracked down the source code that outputs it in > > > > > > > hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > > > > but after going through the code I think I'd need to know much more about > > the code to glean anything from it or the associated JIRA ticket > > https://issues.apache.org/jira/browse/HBASE-11240. > > > > Also, what is this "pipeline" the ticket and code talks about? > > > > Thanks in advance for any information and/or clarification anyone can > > provide. > > > > > > > > Saad > > >
Re: Re: question on "drain region servers"
One of use cases we use it is graceful stop of regionserver - you unload regions from the server before you restart it. Of course, after restart you expect HBase to move regions back. Now I'm not really remembering correctly, but I kinda remember that one of the features was at least that it will move back regions which were already there, hence not destroy too much block locality. On Tue, Apr 26, 2016 at 8:15 AM WangYQwrote: > thanks > in hbase 0.99.0, I find the rb file: draining_servers.rb > > > i have some suggestions on this tool: > 1. if I add rs hs1 to draining_servers, when hs1 restart, the zk node > still exists in zk, but hmaster will not treat hs1 as draining_servers > i think when we add a hs to draining_servers, we do not need to store > the start code in zk, just store the hostName and port > 2. we add hs1 to draining_servers, but if hs1 always restart, we will > need to add hs1 several times > when we need to delete the draining_servers info of hs1, we will need > to delete hs1 several times > > > > finally, what is the original motivation of this tool, some scenario > descriptions are good. > > > > > > > At 2016-04-26 11:33:10, "Ted Yu" wrote: > >Please take a look at: > >bin/draining_servers.rb > > > >On Mon, Apr 25, 2016 at 8:12 PM, WangYQ > wrote: > > > >> in hbase, I find there is a "drain regionServer" feature > >> > >> > >> if a rs is added to drain regionServer in ZK, then regions will not be > >> move to on these regionServers > >> > >> > >> but, how can a rs be add to drain regionServer, we add it handly or > rs > >> will add itself automaticly >
Re: Slow sync cost
I see similar log spam while system has reasonable performance. Was the 250ms default chosen with SSDs and 10ge in mind or something? I guess I'm surprised a sync write several times through JVMs to 2 remote datanodes would be expected to consistently happen that fast. Regards, On Mon, Apr 25, 2016 at 12:18 PM, Saad Muftiwrote: > Hi, > > In our large HBase cluster based on CDH 5.5 in AWS, we're constantly seeing > the following messages in the region server logs: > > 2016-04-25 14:02:55,178 INFO > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258 ms, > current pipeline: > [DatanodeInfoWithStorage[10.99.182.165:50010 > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK], > DatanodeInfoWithStorage[10.99.182.236:50010 > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK], > DatanodeInfoWithStorage[10.99.182.195:50010 > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]] > > > These happen regularly while HBase appear to be operating normally with > decent read and write performance. We do have occasional performance > problems when regions are auto-splitting, and at first I thought this was > related but now I se it happens all the time. > > > Can someone explain what this means really and should we be concerned? I > tracked down the source code that outputs it in > > > hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > > but after going through the code I think I'd need to know much more about > the code to glean anything from it or the associated JIRA ticket > https://issues.apache.org/jira/browse/HBASE-11240. > > Also, what is this "pipeline" the ticket and code talks about? > > Thanks in advance for any information and/or clarification anyone can > provide. > > > > Saad >
Re:Re: question on "drain region servers"
thanks in hbase 0.99.0, I find the rb file: draining_servers.rb i have some suggestions on this tool: 1. if I add rs hs1 to draining_servers, when hs1 restart, the zk node still exists in zk, but hmaster will not treat hs1 as draining_servers i think when we add a hs to draining_servers, we do not need to store the start code in zk, just store the hostName and port 2. we add hs1 to draining_servers, but if hs1 always restart, we will need to add hs1 several times when we need to delete the draining_servers info of hs1, we will need to delete hs1 several times finally, what is the original motivation of this tool, some scenario descriptions are good. At 2016-04-26 11:33:10, "Ted Yu"wrote: >Please take a look at: >bin/draining_servers.rb > >On Mon, Apr 25, 2016 at 8:12 PM, WangYQ wrote: > >> in hbase, I find there is a "drain regionServer" feature >> >> >> if a rs is added to drain regionServer in ZK, then regions will not be >> move to on these regionServers >> >> >> but, how can a rs be add to drain regionServer, we add it handly or rs >> will add itself automaticly