[jira] [Commented] (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-05-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030215#comment-13030215
 ] 

Hudson commented on HBASE-3431:
---

Integrated in HBase-TRUNK #1909 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1909/])


> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 
> 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991694#comment-12991694
 ] 

stack commented on HBASE-3431:
--

I can't use EnvironmentEdge to change addresses since the InetSocketAddress 
that is at root of our HServerAddress, etc., is taken from the socket down in 
RPC -- I can't interject EnvironmentEdge inside Socket.getLocalSocketAddress, 
etc.

I can't change how HSA or HSI serialize since this is a point release.

All this is going to go away, or at least change radically, 0.92 because we 
intend dropping heartbeat.

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 
> 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991661#comment-12991661
 ] 

stack commented on HBASE-3431:
--

bq. Looking in hdfs, datanode generates a registration name – e.g. 
DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies 
itself to NN regardless. No messing w/ NN telling it what name to use. 

J-D points out that I'm reading this code lazily (i.e. wrong), that on 
registration, the NN returns a DataRegistration instance that the DN will use 
going forward.

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 
> 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991662#comment-12991662
 ] 

stack commented on HBASE-3431:
--

bq. Looking in hdfs, datanode generates a registration name – e.g. 
DS-198919343-10.20.20.187-10010-1291133524722 – and this is how it identifies 
itself to NN regardless. No messing w/ NN telling it what name to use. 

J-D points out that I'm reading this code lazily (i.e. wrong), that on 
registration, the NN returns a DataRegistration instance that the DN will use 
going forward.

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 
> 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-07 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991612#comment-12991612
 ] 

ryan rawson commented on HBASE-3431:


one thing to consider is a lot of the network code attempts to figure
out what is the 'primary ip' then bind to just that IP.

would it make sense to bind to * instead? (ie: 0.0.0.0) Why not accept
RPCs on all interfaces? If security is a concern, I think SASL and
host level firewall controls are a better way to address that, rather
than bake it in HBase.  That way it won't really "matter" what our IP
is, whatever IP the master 'sees' us as could be used as what to stuff
in the META.  Then we could use the registration name to identify dead
hosts, etc, etc.


> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 
> 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991610#comment-12991610
 ] 

stack commented on HBASE-3431:
--

Chatted w/ Jon and J-D on this.  Jon suggests EnvironmentEdgeManager utility as 
means of intercepting lookups so we can do up tests returning different 
answers.  Let me try it out.  J-D rehearsed issues w/ have had in here over 
time and that this 'mess' was 'working' in 0.20.x and even unto 0.89.x (He 
remembers also that a RS can volunteer its address as 127.0.0.1 but actually 
bind to real, non-localhost address somehow).  He's wary about stripping it all 
out as the patch does.  Let me try and put up unit tests that can mock the 
various scenarios.

Looking at code w/ J-D, we turned up one problematic bit of code -- HSA will 
create a new InetSocketAddress on deserialization which can result in a lookup.

Looking in hdfs, datanode generates a registration name -- e.g. 
DS-198919343-10.20.20.187-10010-1291133524722 -- and this is how it identifies 
itself to NN regardless.  No messing w/ NN telling it what name to use.   TT 
does something similar.

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 
> 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-06 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991217#comment-12991217
 ] 

ryan rawson commented on HBASE-3431:


I'll have a look monday


> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 
> 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991099#comment-12991099
 ] 

stack commented on HBASE-3431:
--

Tested w/ name resolution broke on both ends.  If I broke lookup good, server 
wouldn't start complaining couldn't resolve name (thats not new to my patch).  
If no resolve when it got to server side then again same thing w/ a complaint 
that couldn't resolve regionserver name... again not new to my patch... more a 
commentary on how hbase will complain loudly already if resolve is mangled.  
Messages are pretty plain about whats wrong.

I broke master resolve so the incoming RS did not resolve to a proper address 
-- in the past we'd send back an IP and use that ever after and then you'd have 
double-vision after next heartbeat -- and then on RS I broke it so passed back 
a FQDN when Master was dealing in host names only.  That worked too.

Review please.  Unit tests are hard to do.  Would have to somehow mock java dns 
lookup.  Changing the dns doesn't seem to be possible (I can see providing 
alternate dns provider to jndi if you provide flags on JVM startup).


> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, 
> 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991094#comment-12991094
 ] 

stack commented on HBASE-3431:
--

If RS passes 127.0.0.1, then thats what its bound too and no (remote) client 
will be able to connect.  Its broke.

The fixup in master would let this (broke) server successfully register.  The 
master would call remoteIP on the connected socket to get the RSs' address and 
it would then know the RS as this.  This would happen only on startup, in 
reportForDuty, not subsequently during heartbeating; we only do the lookup of 
remoteip on reportForDuty.

Heartbeating, the RS was supposed to be volunteering the HServerInfo that the 
Master had passed it back as response to the reportForDuty.

Since 0.90.0, servers can register at heartbeat time.  This is because masters 
can join an already running cluster.  The RSs do not rerun the reportForDuty 
step.  They just start heartbeating the new Master.

We could I suppose add lookup on the sockets remoteip to heartbeating too with 
reverse lookup.

I'm thinking its better to just strip all this crap out.

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-05 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991081#comment-12991081
 ] 

Jean-Daniel Cryans commented on HBASE-3431:
---

bq. Instead Master just uses the ServerName the RS volunteered.

So what happens if region server passes 127.0.0.1?

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-02-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990852#comment-12990852
 ] 

stack commented on HBASE-3431:
--

If master can't find regionserver address, then master does this:

{code}
Caused by: java.lang.IllegalArgumentException: Could not resolve the DNS name 
of sv2borg185:60020
at 
org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
at 
org.apache.hadoop.hbase.HServerAddress.readFields(HServerAddress.java:168)
at org.apache.hadoop.hbase.HServerInfo.readFields(HServerInfo.java:230)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:521)
... 8 more
{code}

... which is kinda dumb but means no progress unless server can get an address.

If DNS is wrong, e.g. on master, when it does a lookup on passed name, we come 
up w/ a different address, then we'll tell the regionserver go forward with the 
IP.

At moment you'll see two entries for this badly configured server.  The 
regionserver will show by its name and by its bad IP.

Symptom is you can't shutdown because master is waiting on the ghost server to 
finish its close up (this is what was happening for mr oracle.com).

I manufactured Ted's prob. by changing hosts on master to have different subnet 
for a server.  Then I got this in RS log:

{code}
2011-02-05 00:33:49,409 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address to 
use. Was=sv2borg185:60020, Now=10.20.20.185:60020
{code}

Let me dig in.



> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-01-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987334#action_12987334
 ] 

stack commented on HBASE-3431:
--

Up on IRC we just had case where RS was reporting hostname only but reverse 
lookup was return FQDN.

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-01-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987320#action_12987320
 ] 

stack commented on HBASE-3431:
--

Workaround is to make reverse DNS on master produce same hostname as that which 
the RegionServer reports (RS hostname lookup).

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Priority: Blocker
> Fix For: 0.90.1
>
> Attachments: 3431.txt
>
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-01-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979087#action_12979087
 ] 

stack commented on HBASE-3431:
--

Seems like this is a regression since 0.89.  Ted says 0.89 works on his 
cluster.  The master is seeing RS as an IP then subsequently the RS is giving 
the IP back as its 'name'.  Ted is also starting things a little odd... 
manually starting each daemon... with the RS saying that its NotReadyYet 
exception in 0.90.

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers

2011-01-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979056#action_12979056
 ] 

stack commented on HBASE-3431:
--

Why is RS not taking what the Master tells it use when gong to the Master?

> Regionserver is not using the name given it by the master; double entry in 
> master listing of servers
> 
>
> Key: HBASE-3431
> URL: https://issues.apache.org/jira/browse/HBASE-3431
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>
> Our man Ted Dunning found the following where RS checks in with one name, the 
> master tells it use another name but we seem to go ahead and continue with 
> our original name.
> In RS logs I see:
> {code}
> 2011-01-07 15:45:50,757 INFO  
> org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: 
> Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020
> {code}
> On master I see
> {code}
> 2011-01-07 15:45:38,613 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 0 on 6]: Registering 
> server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false
> {code}
> 
> then later
> {code}
> 2011-01-07 15:45:44,247 INFO  org.apache.hadoop.hbase.master.ServerManager 
> [IPC Server handler 2 on 6]: Registering 
> server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true
> {code}
> This might be since we started letting servers register in other than with 
> the reportStartup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.