[ https://issues.apache.org/jira/browse/KUDU-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liu jing updated KUDU-3358: --------------------------- Description: h3. Description When I use the k8s's virtual network to manage the kudu, there is a problem that if the k8s restart or any other way to change the kudu pod's ip, then kudu's tables will fail to insert or scan. h3. Make a reproduction There is a way to trigger the problem, using the impala to make a test. 1. First, the original k8s pod service network like this figure1: {panel:title=figure1} service-kudu-test01-entry ClusterIP 10.98.78.224 <none> 8051/TCP,8050/TCP,7051/TCP,7050/TCP 2d22h service-kudu-test01-master-0 ClusterIP 10.109.78.49 <none> 7051/TCP,8051/TCP,20051/TCP 2d22h service-kudu-test01-master-1 ClusterIP 10.98.28.69 <none> 7051/TCP,8051/TCP,20051/TCP 2d22h service-kudu-test01-master-2 ClusterIP 10.105.180.113 <none> 7051/TCP,8051/TCP,20051/TCP 2d22h {color:#ff0000}service-kudu-test01-{color}{color:#ff0000}tserver-0 ClusterIP 10.106.224.20 <none> 7050/TCP,8050/TCP,20050/TCP 2d22h{color} {color:#ff0000}service-kudu-test01-tserver-1 ClusterIP 10.110.69.131 <none> 7050/TCP,8050/TCP,20050/TCP 2d22h{color} service-kudu-test01-tserver-2 ClusterIP 10.108.30.59 <none> 7050/TCP,8050/TCP,20050/TCP 2d22h {panel} 2. Second, using impala to create a table named *testTable.* 3. Then, restart the pod service, using the command: {code:java} kubectl delete --force -f ${dirname}/xx.yaml kubectl apply --force -f ${dirname}/xx.yaml{code} This will lead the kudu pod service to another new network, like this: {panel:title=figure2} service-kudu-test01-entry ClusterIP 10.108.85.55 <none> 8051/TCP,8050/TCP,7051/TCP,7050/TCP 2m22s service-kudu-test01-master-0 ClusterIP 10.96.245.192 <none> 7051/TCP,8051/TCP,20051/TCP 2m22s service-kudu-test01-master-1 ClusterIP 10.105.96.68 <none> 7051/TCP,8051/TCP,20051/TCP 2m22s service-kudu-test01-master-2 ClusterIP 10.103.221.65 <none> 7051/TCP,8051/TCP,20051/TCP 2m22s {color:#ff0000}service-kudu-test01-tserver-0 ClusterIP 10.101.128.27 <none> 7050/TCP,8050/TCP,20050/TCP 2m22s{color} {color:#ff0000}service-kudu-test01-tserver-1 ClusterIP 10.111.9.225 <none> 7050/TCP,8050/TCP,20050/TCP 2m22s{color} service-kudu-test01-tserver-2 ClusterIP 10.104.26.31 <none> 7050/TCP,8050/TCP,20050/TCP 2m22s {panel} 4. Then, using the impala to scan the table {*}testTable{*}, like this: {code:java} select * from testTable {code} then, the impala client return a error, like this: {code:java} [service-impala-test01-server-0:21000] default> select * from testTable; Query: select * from testTable Query submitted at: 2022-03-07 15:13:04 (Coordinator: http://service-impala-test01-server-0:25000) Query progress can be monitored at: http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000 ERROR: Unable to open scanner for node with id '0' for Kudu table 'impala::default.testTable': Timed out: exceeded configured scan timeout of 180.000s: after 3 scan attempts: Client connection negotiation failed: client connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network error: Client connection negotiation failed: client connection to 10.106.224.20:7050: connect: Connection refused (error 111) {code} >From this error log, we can find that kudu master return an old tserver ip to >impala client(we can use *figure1* to check the ip) . But, this ip is not >available, so impala fail to make a scan. 5. Depending on the new network, using the impala to create a new table {*}testTable2{*}. It will succeed. But, if we use impala to make a select for the {*}testTable2{*}, it will return the same error log, like this: {code:java} ERROR: Unable to open scanner for node with id '0' for Kudu table 'impala::default.testTable2': Timed out: exceeded configured scan timeout of 180.000s: after 3 scan attempts: Client connection negotiation failed: client connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network error: Client connection negotiation failed: client connection to 10.106.224.20:7050: connect: Connection refused (error 111) {code} This indicates that the kudu master still uses the old network to manage the new table. h3. To avoid the problem If I use the local machine's host network for kudu, the problem will not happen was: h3. Description When I use the k8s's virtual network to manage the kudu, there is a problem that if the k8s restart or any other way to change the kudu pod's ip, then kudu's tables will fail to insert or scan. h3. Make a reproduction There is a way to trigger the problem, using the impala to make a test. 1. First, the original k8s pod service network like this figure1: {panel:title=figure1} service-kudu-test01-entry ClusterIP 10.98.78.224 <none> 8051/TCP,8050/TCP,7051/TCP,7050/TCP 2d22h service-kudu-test01-master-0 ClusterIP 10.109.78.49 <none> 7051/TCP,8051/TCP,20051/TCP 2d22h service-kudu-test01-master-1 ClusterIP 10.98.28.69 <none> 7051/TCP,8051/TCP,20051/TCP 2d22h service-kudu-test01-master-2 ClusterIP 10.105.180.113 <none> 7051/TCP,8051/TCP,20051/TCP 2d22h {color:#ff0000}service-kudu-test01-{color}{color:#ff0000}tserver-0 ClusterIP 10.106.224.20 <none> 7050/TCP,8050/TCP,20050/TCP 2d22h{color} {color:#ff0000}service-kudu-test01-tserver-1 ClusterIP 10.110.69.131 <none> 7050/TCP,8050/TCP,20050/TCP 2d22h{color} service-kudu-test01-tserver-2 ClusterIP 10.108.30.59 <none> 7050/TCP,8050/TCP,20050/TCP 2d22h {panel} 2. Second, using impala to create a table named *testTable.* 3. Then, restart the pod service, using the command: {code:java} kubectl delete --force -f ${dirname}/xx.yaml kubectl apply --force -f ${dirname}/xx.yaml{code} This will lead the kudu pod service to another new network, like this: {panel:title=figure2} service-kudu-test01-entry ClusterIP 10.108.85.55 <none> 8051/TCP,8050/TCP,7051/TCP,7050/TCP 2m22s service-kudu-test01-master-0 ClusterIP 10.96.245.192 <none> 7051/TCP,8051/TCP,20051/TCP 2m22s service-kudu-test01-master-1 ClusterIP 10.105.96.68 <none> 7051/TCP,8051/TCP,20051/TCP 2m22s service-kudu-test01-master-2 ClusterIP 10.103.221.65 <none> 7051/TCP,8051/TCP,20051/TCP 2m22s {color:#ff0000}service-kudu-test01-tserver-0 ClusterIP 10.101.128.27 <none> 7050/TCP,8050/TCP,20050/TCP 2m22s{color} {color:#ff0000}service-kudu-test01-tserver-1 ClusterIP 10.111.9.225 <none> 7050/TCP,8050/TCP,20050/TCP 2m22s{color} service-kudu-test01-tserver-2 ClusterIP 10.104.26.31 <none> 7050/TCP,8050/TCP,20050/TCP 2m22s {panel} 4. Then, using the impala to scan the table {*}testTable{*}, like this: {code:java} select * from testTable {code} then, the impala client return a error, like this: {code:java} [service-impala-test01-server-0:21000] default> select * from testTable; Query: select * from testTable Query submitted at: 2022-03-07 15:13:04 (Coordinator: http://service-impala-test01-server-0:25000) Query progress can be monitored at: http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000 ERROR: Unable to open scanner for node with id '0' for Kudu table 'impala::default.testTable': Timed out: exceeded configured scan timeout of 180.000s: after 3 scan attempts: Client connection negotiation failed: client connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network error: Client connection negotiation failed: client connection to 10.106.224.20:7050: connect: Connection refused (error 111) {code} >From this error log, we can find that kudu master return an old tserver ip to >impala client(we can use *figure1* to check the ip) . But, this ip is not >available, so impala fail to make a scan. 5. Depending on the new network, using the impala to create a new table {*}testTable2{*}. It will succeed. But, if we use impala to make a insert or select for the {*}testTable2{*}, it will return the same error log, like this: {code:java} ERROR: Unable to open scanner for node with id '0' for Kudu table 'impala::default.testTable2': Timed out: exceeded configured scan timeout of 180.000s: after 3 scan attempts: Client connection negotiation failed: client connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network error: Client connection negotiation failed: client connection to 10.106.224.20:7050: connect: Connection refused (error 111) {code} This indicates that the kudu master still uses the old network to manage the new table. h3. To avoid the problem If I use the local machine's host network for kudu, the problem will not happen > kudu tables fail to insert and scan when k8s network changes > ------------------------------------------------------------ > > Key: KUDU-3358 > URL: https://issues.apache.org/jira/browse/KUDU-3358 > Project: Kudu > Issue Type: Bug > Affects Versions: 1.10.0 > Reporter: liu jing > Priority: Major > > h3. Description > When I use the k8s's virtual network to manage the kudu, there is a problem > that if the k8s restart or any other way to change the kudu pod's ip, then > kudu's tables will fail to insert or scan. > h3. Make a reproduction > There is a way to trigger the problem, using the impala to make a test. > 1. First, the original k8s pod service network like this figure1: > {panel:title=figure1} > service-kudu-test01-entry ClusterIP 10.98.78.224 > <none> 8051/TCP,8050/TCP,7051/TCP,7050/TCP > 2d22h > service-kudu-test01-master-0 ClusterIP 10.109.78.49 > <none> 7051/TCP,8051/TCP,20051/TCP > 2d22h > service-kudu-test01-master-1 ClusterIP 10.98.28.69 > <none> 7051/TCP,8051/TCP,20051/TCP > 2d22h > service-kudu-test01-master-2 ClusterIP 10.105.180.113 > <none> 7051/TCP,8051/TCP,20051/TCP > 2d22h > {color:#ff0000}service-kudu-test01-{color}{color:#ff0000}tserver-0 > ClusterIP 10.106.224.20 <none> 7050/TCP,8050/TCP,20050/TCP > 2d22h{color} > {color:#ff0000}service-kudu-test01-tserver-1 ClusterIP > 10.110.69.131 <none> 7050/TCP,8050/TCP,20050/TCP > 2d22h{color} > service-kudu-test01-tserver-2 ClusterIP 10.108.30.59 > <none> 7050/TCP,8050/TCP,20050/TCP > 2d22h > {panel} > 2. Second, using impala to create a table named *testTable.* > 3. Then, restart the pod service, using the command: > {code:java} > kubectl delete --force -f ${dirname}/xx.yaml > kubectl apply --force -f ${dirname}/xx.yaml{code} > This will lead the kudu pod service to another new network, like this: > {panel:title=figure2} > service-kudu-test01-entry ClusterIP 10.108.85.55 > <none> 8051/TCP,8050/TCP,7051/TCP,7050/TCP > 2m22s > service-kudu-test01-master-0 ClusterIP 10.96.245.192 > <none> 7051/TCP,8051/TCP,20051/TCP > 2m22s > service-kudu-test01-master-1 ClusterIP 10.105.96.68 > <none> 7051/TCP,8051/TCP,20051/TCP > 2m22s > service-kudu-test01-master-2 ClusterIP 10.103.221.65 > <none> 7051/TCP,8051/TCP,20051/TCP > 2m22s > {color:#ff0000}service-kudu-test01-tserver-0 ClusterIP > 10.101.128.27 <none> 7050/TCP,8050/TCP,20050/TCP > 2m22s{color} > {color:#ff0000}service-kudu-test01-tserver-1 ClusterIP > 10.111.9.225 <none> 7050/TCP,8050/TCP,20050/TCP > 2m22s{color} > service-kudu-test01-tserver-2 ClusterIP 10.104.26.31 > <none> 7050/TCP,8050/TCP,20050/TCP > 2m22s > {panel} > 4. Then, using the impala to scan the table {*}testTable{*}, like this: > {code:java} > select * from testTable > {code} > then, the impala client return a error, like this: > {code:java} > [service-impala-test01-server-0:21000] default> select * from testTable; > Query: select * from testTable > Query submitted at: 2022-03-07 15:13:04 (Coordinator: > http://service-impala-test01-server-0:25000) > Query progress can be monitored at: > http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000 > ERROR: Unable to open scanner for node with id '0' for Kudu table > 'impala::default.testTable': Timed out: exceeded configured scan timeout of > 180.000s: after 3 scan attempts: Client connection negotiation failed: client > connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: > Network error: Client connection negotiation failed: client connection to > 10.106.224.20:7050: connect: Connection refused (error 111) {code} > From this error log, we can find that kudu master return an old tserver ip to > impala client(we can use *figure1* to check the ip) . But, this ip is not > available, so impala fail to make a scan. > 5. Depending on the new network, using the impala to create a new table > {*}testTable2{*}. It will succeed. But, if we use impala to make a select for > the {*}testTable2{*}, it will return the same error log, like this: > {code:java} > ERROR: Unable to open scanner for node with id '0' for Kudu table > 'impala::default.testTable2': Timed out: exceeded configured scan timeout of > 180.000s: after 3 scan attempts: Client connection negotiation failed: client > connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: > Network error: Client connection negotiation failed: client connection to > 10.106.224.20:7050: connect: Connection refused (error 111) {code} > This indicates that the kudu master still uses the old network to manage the > new table. > h3. To avoid the problem > If I use the local machine's host network for kudu, the problem will not > happen -- This message was sent by Atlassian Jira (v8.20.1#820001)