[jira] [Commented] (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030215#comment-13030215 ] Hudson commented on HBASE-3431: --- Integrated in HBase-TRUNK #1909 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1909/]) > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.92.0 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991694#comment-12991694 ] stack commented on HBASE-3431: -- I can't use EnvironmentEdge to change addresses since the InetSocketAddress that is at root of our HServerAddress, etc., is taken from the socket down in RPC -- I can't interject EnvironmentEdge inside Socket.getLocalSocketAddress, etc. I can't change how HSA or HSI serialize since this is a point release. All this is going to go away, or at least change radically, 0.92 because we intend dropping heartbeat. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991661#comment-12991661 ] stack commented on HBASE-3431: -- bq. Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. J-D points out that I'm reading this code lazily (i.e. wrong), that on registration, the NN returns a DataRegistration instance that the DN will use going forward. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991662#comment-12991662 ] stack commented on HBASE-3431: -- bq. Looking in hdfs, datanode generates a registration name – e.g. DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. J-D points out that I'm reading this code lazily (i.e. wrong), that on registration, the NN returns a DataRegistration instance that the DN will use going forward. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991612#comment-12991612 ] ryan rawson commented on HBASE-3431: one thing to consider is a lot of the network code attempts to figure out what is the 'primary ip' then bind to just that IP. would it make sense to bind to * instead? (ie: 0.0.0.0) Why not accept RPCs on all interfaces? If security is a concern, I think SASL and host level firewall controls are a better way to address that, rather than bake it in HBase. That way it won't really "matter" what our IP is, whatever IP the master 'sees' us as could be used as what to stuff in the META. Then we could use the registration name to identify dead hosts, etc, etc. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991610#comment-12991610 ] stack commented on HBASE-3431: -- Chatted w/ Jon and J-D on this. Jon suggests EnvironmentEdgeManager utility as means of intercepting lookups so we can do up tests returning different answers. Let me try it out. J-D rehearsed issues w/ have had in here over time and that this 'mess' was 'working' in 0.20.x and even unto 0.89.x (He remembers also that a RS can volunteer its address as 127.0.0.1 but actually bind to real, non-localhost address somehow). He's wary about stripping it all out as the patch does. Let me try and put up unit tests that can mock the various scenarios. Looking at code w/ J-D, we turned up one problematic bit of code -- HSA will create a new InetSocketAddress on deserialization which can result in a lookup. Looking in hdfs, datanode generates a registration name -- e.g. DS-198919343-10.20.20.187-10010-1291133524722 -- and this is how it identifies itself to NN regardless. No messing w/ NN telling it what name to use. TT does something similar. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991217#comment-12991217 ] ryan rawson commented on HBASE-3431: I'll have a look monday > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991099#comment-12991099 ] stack commented on HBASE-3431: -- Tested w/ name resolution broke on both ends. If I broke lookup good, server wouldn't start complaining couldn't resolve name (thats not new to my patch). If no resolve when it got to server side then again same thing w/ a complaint that couldn't resolve regionserver name... again not new to my patch... more a commentary on how hbase will complain loudly already if resolve is mangled. Messages are pretty plain about whats wrong. I broke master resolve so the incoming RS did not resolve to a proper address -- in the past we'd send back an IP and use that ever after and then you'd have double-vision after next heartbeat -- and then on RS I broke it so passed back a FQDN when Master was dealing in host names only. That worked too. Review please. Unit tests are hard to do. Would have to somehow mock java dns lookup. Changing the dns doesn't seem to be possible (I can see providing alternate dns provider to jndi if you provide flags on JVM startup). > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991094#comment-12991094 ] stack commented on HBASE-3431: -- If RS passes 127.0.0.1, then thats what its bound too and no (remote) client will be able to connect. Its broke. The fixup in master would let this (broke) server successfully register. The master would call remoteIP on the connected socket to get the RSs' address and it would then know the RS as this. This would happen only on startup, in reportForDuty, not subsequently during heartbeating; we only do the lookup of remoteip on reportForDuty. Heartbeating, the RS was supposed to be volunteering the HServerInfo that the Master had passed it back as response to the reportForDuty. Since 0.90.0, servers can register at heartbeat time. This is because masters can join an already running cluster. The RSs do not rerun the reportForDuty step. They just start heartbeating the new Master. We could I suppose add lookup on the sockets remoteip to heartbeating too with reverse lookup. I'm thinking its better to just strip all this crap out. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991081#comment-12991081 ] Jean-Daniel Cryans commented on HBASE-3431: --- bq. Instead Master just uses the ServerName the RS volunteered. So what happens if region server passes 127.0.0.1? > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990852#comment-12990852 ] stack commented on HBASE-3431: -- If master can't find regionserver address, then master does this: {code} Caused by: java.lang.IllegalArgumentException: Could not resolve the DNS name of sv2borg185:60020 at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) at org.apache.hadoop.hbase.HServerAddress.readFields(HServerAddress.java:168) at org.apache.hadoop.hbase.HServerInfo.readFields(HServerInfo.java:230) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521) ... 8 more {code} ... which is kinda dumb but means no progress unless server can get an address. If DNS is wrong, e.g. on master, when it does a lookup on passed name, we come up w/ a different address, then we'll tell the regionserver go forward with the IP. At moment you'll see two entries for this badly configured server. The regionserver will show by its name and by its bad IP. Symptom is you can't shutdown because master is waiting on the ghost server to finish its close up (this is what was happening for mr oracle.com). I manufactured Ted's prob. by changing hosts on master to have different subnet for a server. Then I got this in RS log: {code} 2011-02-05 00:33:49,409 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address to use. Was=sv2borg185:60020, Now=10.20.20.185:60020 {code} Let me dig in. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987334#action_12987334 ] stack commented on HBASE-3431: -- Up on IRC we just had case where RS was reporting hostname only but reverse lookup was return FQDN. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987320#action_12987320 ] stack commented on HBASE-3431: -- Workaround is to make reverse DNS on master produce same hostname as that which the RegionServer reports (RS hostname lookup). > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979087#action_12979087 ] stack commented on HBASE-3431: -- Seems like this is a regression since 0.89. Ted says 0.89 works on his cluster. The master is seeing RS as an IP then subsequently the RS is giving the IP back as its 'name'. Ted is also starting things a little odd... manually starting each daemon... with the RS saying that its NotReadyYet exception in 0.90. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979056#action_12979056 ] stack commented on HBASE-3431: -- Why is RS not taking what the Master tells it use when gong to the Master? > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.