[ https://issues.apache.org/jira/browse/HBASE-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049930#comment-15049930 ]
Yong Zheng commented on HBASE-14958: ------------------------------------ I did some simple test on n03docker2(172.17.1.3) with tcp_server; on n04docker2(172.17.2.3) with tcp_client. in tcp_server: ... sin_size=sizeof(struct sockaddr_in); if((new_fd=accept(sockfd,(struct sockaddr *)(&client_addr),&sin_size)) == -1) { fprintf(stderr,"Accept error:%s\n\a",strerror(errno)); exit(1); } fprintf(stderr,"Server get connection from %x\n", client_addr.sin_addr.s_addr); ret = getpeername(sockfd, (struct sockaddr *)(&client_peer_addr), &sin_size); ... on tcp_client, it just connects to the server and send one message. bash-4.1# hostname n04docker2 bash-4.1# ./tcp_client 172.17.1.3 8030 on tcp_server, bash-4.1# hostname n03docker2.gpfs.net bash-4.1# ./tcp_server 8030 will accepting... Server get connection ... Server get connection from 7203a8c0 <== this IP address is 192.168.3.114 after transforming to host address. So, in Source NAT-involved virtualization, it looks to me that the current hbase master/region server mechanism doesn't work. maybe, we could ask the region server/master to exchange the hostname,not depends on socket API to get the client IP address. > regionserver.HRegionServer: Master passed us a different hostname to use; > was=n04docker2, but now=192.168.3.114 > --------------------------------------------------------------------------------------------------------------- > > Key: HBASE-14958 > URL: https://issues.apache.org/jira/browse/HBASE-14958 > Project: HBase > Issue Type: Bug > Affects Versions: 1.1.2 > Environment: physical machines: redhat7.1 > docker version: 1.9.1 > Reporter: Yong Zheng > > I have two physical machines: c3m3n03docker and c3m3n04docker. > I started two docker instances per physical node. the topology is like: > n03docker1(172.17.1.2) -\ > | br0(172.17.1.1) + c3m3n03 > n03docker2(172.17.1.3) -/ > n04docker1(172.17.2.2) -\ > | br0(172.17.2.1) + c3m3n04 > n04docker2(172.17.2.3) -/ > for physical machines, c3m3n03 is bundled with physical adapter enp11s0f0 > with IP (192.168.3.113/16); c3m3n04 is bundled with physical adapter > enp11s0f0 with IP(192.168.3.114/16). these two physical adapters are > connecting to the same switch. > Note: br0 is not bundled to physical adapter enp11s0f0 on both nodes. so, > all requests in 172.17.2.x will be source NAT as 192.168.3.114(c3m3n04) and > forwarded to c3m3n03. > n03docker1: hbase(1.1.2) master > n03docker2: region server > n04docker1: region server > n04docker2: region server > I first start the n03docker1 and n03docker2, it works; after that, I start > n04docker2 and it will reported: > 2015-12-09 08:01:58,259 ERROR > [regionserver/n04docker2.gpfs.net/172.17.2.3:16020] > regionserver.HRegionServer: Master passed us a different hostname to use; > was=n04docker2.gpfs.net, but now=192.168.3.114 > on the master logs: > 2015-12-09 08:11:12,234 INFO > [PriorityRpcServer.handler=0,queue=0,port=16000] master.ServerManager: > Registering server=192.168.3.114,16020,1449666670721 > So, you see, when hbase master receives the requests from n04docker2, all > these requests are source NATed with 192.168.3.114(not 172.17.2.3). and > hbase master passes 192.168.3.114 back to 172.17.2.3(n04docker2). Thus, > n04docker1(172.17.2.3) reported exceptions in logs. > hbase doesn't support running in virtualization cluster? because SNAT is > widely used in virtualization. if hbase master get remote hostname/ip(thus > get 192.168.3.114) and pass it back to region server, it will hit this issues. > HBASE-8667 doesn't fix this issue because the fix has been hbase 0.98(I'm > taking hbase 1.1.2). -- This message was sent by Atlassian JIRA (v6.3.4#6332)