[jira] [Updated] (KUDU-3358) kudu tables fail to insert and scan when k8s network changes

liu jing (Jira) Mon, 07 Mar 2022 00:42:08 -0800


     [ 
https://issues.apache.org/jira/browse/KUDU-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


liu jing updated KUDU-3358:
---------------------------
    Description: 
h3. Description

When I use the k8s's virtual network to manage the kudu, there is a problem 
that if the k8s restart or any other way to change the kudu pod's ip, then 
kudu's tables will fail to insert or scan.
h3. Make a reproduction

There is a way to trigger the problem, using the impala to make a test.

1. First, the original k8s pod service network like this figure1:
{panel:title=figure1}
service-kudu-test01-entry                  ClusterIP   10.98.78.224     <none>  
      8051/TCP,8050/TCP,7051/TCP,7050/TCP                                    
2d22h

service-kudu-test01-master-0               ClusterIP   10.109.78.49     <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2d22h

service-kudu-test01-master-1               ClusterIP   10.98.28.69      <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2d22h

service-kudu-test01-master-2               ClusterIP   10.105.180.113   <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2d22h

{color:#ff0000}service-kudu-test01-{color}{color:#ff0000}tserver-0              
ClusterIP   10.106.224.20    <none>        7050/TCP,8050/TCP,20050/TCP          
                                  2d22h{color}

{color:#ff0000}service-kudu-test01-tserver-1              ClusterIP   
10.110.69.131    <none>        7050/TCP,8050/TCP,20050/TCP                      
                      2d22h{color}

service-kudu-test01-tserver-2              ClusterIP   10.108.30.59     <none>  
      7050/TCP,8050/TCP,20050/TCP                                            
2d22h 
{panel}
2. Second, using impala to create a table named *testTable.*

3. Then, restart the pod service, using the command:
{code:java}
kubectl delete --force -f ${dirname}/xx.yaml

kubectl apply --force -f ${dirname}/xx.yaml{code}
This will lead the kudu pod service to another new network, like this:
{panel:title=figure2}
service-kudu-test01-entry                  ClusterIP   10.108.85.55     <none>  
      8051/TCP,8050/TCP,7051/TCP,7050/TCP                                    
2m22s
service-kudu-test01-master-0               ClusterIP   10.96.245.192    <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2m22s
service-kudu-test01-master-1               ClusterIP   10.105.96.68     <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2m22s
service-kudu-test01-master-2               ClusterIP   10.103.221.65    <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2m22s
{color:#ff0000}service-kudu-test01-tserver-0              ClusterIP   
10.101.128.27    <none>        7050/TCP,8050/TCP,20050/TCP                      
                      2m22s{color}
{color:#ff0000}service-kudu-test01-tserver-1              ClusterIP   
10.111.9.225     <none>        7050/TCP,8050/TCP,20050/TCP                      
                      2m22s{color}
service-kudu-test01-tserver-2              ClusterIP   10.104.26.31     <none>  
      7050/TCP,8050/TCP,20050/TCP                                            
2m22s
{panel}
4. Then, using the impala to scan the table {*}testTable{*}, like this:
{code:java}
select * from testTable
{code}
then, the impala client return a error, like this:
{code:java}
[service-impala-test01-server-0:21000] default> select * from testTable;
Query: select * from testTable
Query submitted at: 2022-03-07 15:13:04 (Coordinator: 
http://service-impala-test01-server-0:25000)
Query progress can be monitored at: 
http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000
ERROR: Unable to open scanner for node with id '0' for Kudu table 
'impala::default.testTable': Timed out: exceeded configured scan timeout of 
180.000s: after 3 scan attempts: Client connection negotiation failed: client 
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network 
error: Client connection negotiation failed: client connection to 
10.106.224.20:7050: connect: Connection refused (error 111) {code}
>From this error log, we can find that kudu master return an old tserver ip to 
>impala client(we can use *figure1* to check the ip) . But, this ip is not 
>available, so impala fail to make a scan.

5. Depending on the new network,  using the impala to create a new table 
{*}testTable2{*}. It will succeed. But, if we use impala to make a select for 
the {*}testTable2{*}, it will return the same error log, like this:
{code:java}
ERROR: Unable to open scanner for node with id '0' for Kudu table 
'impala::default.testTable2': Timed out: exceeded configured scan timeout of 
180.000s: after 3 scan attempts: Client connection negotiation failed: client 
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network 
error: Client connection negotiation failed: client connection to 
10.106.224.20:7050: connect: Connection refused (error 111)  {code}
This indicates that the kudu master still uses the old network to manage the 
new table.
h3. To avoid the problem

If I use the local machine's host network for kudu, the problem will not happen

  was:
h3. Description

When I use the k8s's virtual network to manage the kudu, there is a problem 
that if the k8s restart or any other way to change the kudu pod's ip, then 
kudu's tables will fail to insert or scan.
h3. Make a reproduction

There is a way to trigger the problem, using the impala to make a test.

1. First, the original k8s pod service network like this figure1:
{panel:title=figure1}
service-kudu-test01-entry                  ClusterIP   10.98.78.224     <none>  
      8051/TCP,8050/TCP,7051/TCP,7050/TCP                                    
2d22h

service-kudu-test01-master-0               ClusterIP   10.109.78.49     <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2d22h

service-kudu-test01-master-1               ClusterIP   10.98.28.69      <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2d22h

service-kudu-test01-master-2               ClusterIP   10.105.180.113   <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2d22h

{color:#ff0000}service-kudu-test01-{color}{color:#ff0000}tserver-0              
ClusterIP   10.106.224.20    <none>        7050/TCP,8050/TCP,20050/TCP          
                                  2d22h{color}

{color:#ff0000}service-kudu-test01-tserver-1              ClusterIP   
10.110.69.131    <none>        7050/TCP,8050/TCP,20050/TCP                      
                      2d22h{color}

service-kudu-test01-tserver-2              ClusterIP   10.108.30.59     <none>  
      7050/TCP,8050/TCP,20050/TCP                                            
2d22h 
{panel}
2. Second, using impala to create a table named *testTable.*

3. Then, restart the pod service, using the command:
{code:java}
kubectl delete --force -f ${dirname}/xx.yaml

kubectl apply --force -f ${dirname}/xx.yaml{code}
This will lead the kudu pod service to another new network, like this:
{panel:title=figure2}
service-kudu-test01-entry                  ClusterIP   10.108.85.55     <none>  
      8051/TCP,8050/TCP,7051/TCP,7050/TCP                                    
2m22s
service-kudu-test01-master-0               ClusterIP   10.96.245.192    <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2m22s
service-kudu-test01-master-1               ClusterIP   10.105.96.68     <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2m22s
service-kudu-test01-master-2               ClusterIP   10.103.221.65    <none>  
      7051/TCP,8051/TCP,20051/TCP                                            
2m22s
{color:#ff0000}service-kudu-test01-tserver-0              ClusterIP   
10.101.128.27    <none>        7050/TCP,8050/TCP,20050/TCP                      
                      2m22s{color}
{color:#ff0000}service-kudu-test01-tserver-1              ClusterIP   
10.111.9.225     <none>        7050/TCP,8050/TCP,20050/TCP                      
                      2m22s{color}
service-kudu-test01-tserver-2              ClusterIP   10.104.26.31     <none>  
      7050/TCP,8050/TCP,20050/TCP                                            
2m22s
{panel}
4. Then, using the impala to scan the table {*}testTable{*}, like this:
{code:java}
select * from testTable
{code}
then, the impala client return a error, like this:
{code:java}
[service-impala-test01-server-0:21000] default> select * from testTable;
Query: select * from testTable
Query submitted at: 2022-03-07 15:13:04 (Coordinator: 
http://service-impala-test01-server-0:25000)
Query progress can be monitored at: 
http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000
ERROR: Unable to open scanner for node with id '0' for Kudu table 
'impala::default.testTable': Timed out: exceeded configured scan timeout of 
180.000s: after 3 scan attempts: Client connection negotiation failed: client 
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network 
error: Client connection negotiation failed: client connection to 
10.106.224.20:7050: connect: Connection refused (error 111) {code}
>From this error log, we can find that kudu master return an old tserver ip to 
>impala client(we can use *figure1* to check the ip) . But, this ip is not 
>available, so impala fail to make a scan.

5. Depending on the new network,  using the impala to create a new table 
{*}testTable2{*}. It will succeed. But, if we use impala to make a insert or 
select for the {*}testTable2{*}, it will return the same error log, like this:
{code:java}
ERROR: Unable to open scanner for node with id '0' for Kudu table 
'impala::default.testTable2': Timed out: exceeded configured scan timeout of 
180.000s: after 3 scan attempts: Client connection negotiation failed: client 
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network 
error: Client connection negotiation failed: client connection to 
10.106.224.20:7050: connect: Connection refused (error 111)  {code}
This indicates that the kudu master still uses the old network to manage the 
new table.
h3. To avoid the problem

If I use the local machine's host network for kudu, the problem will not happen


> kudu tables fail to insert and scan when k8s network changes
> ------------------------------------------------------------
>
>                 Key: KUDU-3358
>                 URL: https://issues.apache.org/jira/browse/KUDU-3358
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: liu jing
>            Priority: Major
>
> h3. Description
> When I use the k8s's virtual network to manage the kudu, there is a problem 
> that if the k8s restart or any other way to change the kudu pod's ip, then 
> kudu's tables will fail to insert or scan.
> h3. Make a reproduction
> There is a way to trigger the problem, using the impala to make a test.
> 1. First, the original k8s pod service network like this figure1:
> {panel:title=figure1}
> service-kudu-test01-entry                  ClusterIP   10.98.78.224     
> <none>        8051/TCP,8050/TCP,7051/TCP,7050/TCP                             
>        2d22h
> service-kudu-test01-master-0               ClusterIP   10.109.78.49     
> <none>        7051/TCP,8051/TCP,20051/TCP                                     
>        2d22h
> service-kudu-test01-master-1               ClusterIP   10.98.28.69      
> <none>        7051/TCP,8051/TCP,20051/TCP                                     
>        2d22h
> service-kudu-test01-master-2               ClusterIP   10.105.180.113   
> <none>        7051/TCP,8051/TCP,20051/TCP                                     
>        2d22h
> {color:#ff0000}service-kudu-test01-{color}{color:#ff0000}tserver-0            
>   ClusterIP   10.106.224.20    <none>        7050/TCP,8050/TCP,20050/TCP      
>                                       2d22h{color}
> {color:#ff0000}service-kudu-test01-tserver-1              ClusterIP   
> 10.110.69.131    <none>        7050/TCP,8050/TCP,20050/TCP                    
>                         2d22h{color}
> service-kudu-test01-tserver-2              ClusterIP   10.108.30.59     
> <none>        7050/TCP,8050/TCP,20050/TCP                                     
>        2d22h 
> {panel}
> 2. Second, using impala to create a table named *testTable.*
> 3. Then, restart the pod service, using the command:
> {code:java}
> kubectl delete --force -f ${dirname}/xx.yaml
> kubectl apply --force -f ${dirname}/xx.yaml{code}
> This will lead the kudu pod service to another new network, like this:
> {panel:title=figure2}
> service-kudu-test01-entry                  ClusterIP   10.108.85.55     
> <none>        8051/TCP,8050/TCP,7051/TCP,7050/TCP                             
>        2m22s
> service-kudu-test01-master-0               ClusterIP   10.96.245.192    
> <none>        7051/TCP,8051/TCP,20051/TCP                                     
>        2m22s
> service-kudu-test01-master-1               ClusterIP   10.105.96.68     
> <none>        7051/TCP,8051/TCP,20051/TCP                                     
>        2m22s
> service-kudu-test01-master-2               ClusterIP   10.103.221.65    
> <none>        7051/TCP,8051/TCP,20051/TCP                                     
>        2m22s
> {color:#ff0000}service-kudu-test01-tserver-0              ClusterIP   
> 10.101.128.27    <none>        7050/TCP,8050/TCP,20050/TCP                    
>                         2m22s{color}
> {color:#ff0000}service-kudu-test01-tserver-1              ClusterIP   
> 10.111.9.225     <none>        7050/TCP,8050/TCP,20050/TCP                    
>                         2m22s{color}
> service-kudu-test01-tserver-2              ClusterIP   10.104.26.31     
> <none>        7050/TCP,8050/TCP,20050/TCP                                     
>        2m22s
> {panel}
> 4. Then, using the impala to scan the table {*}testTable{*}, like this:
> {code:java}
> select * from testTable
> {code}
> then, the impala client return a error, like this:
> {code:java}
> [service-impala-test01-server-0:21000] default> select * from testTable;
> Query: select * from testTable
> Query submitted at: 2022-03-07 15:13:04 (Coordinator: 
> http://service-impala-test01-server-0:25000)
> Query progress can be monitored at: 
> http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000
> ERROR: Unable to open scanner for node with id '0' for Kudu table 
> 'impala::default.testTable': Timed out: exceeded configured scan timeout of 
> 180.000s: after 3 scan attempts: Client connection negotiation failed: client 
> connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: 
> Network error: Client connection negotiation failed: client connection to 
> 10.106.224.20:7050: connect: Connection refused (error 111) {code}
> From this error log, we can find that kudu master return an old tserver ip to 
> impala client(we can use *figure1* to check the ip) . But, this ip is not 
> available, so impala fail to make a scan.
> 5. Depending on the new network,  using the impala to create a new table 
> {*}testTable2{*}. It will succeed. But, if we use impala to make a select for 
> the {*}testTable2{*}, it will return the same error log, like this:
> {code:java}
> ERROR: Unable to open scanner for node with id '0' for Kudu table 
> 'impala::default.testTable2': Timed out: exceeded configured scan timeout of 
> 180.000s: after 3 scan attempts: Client connection negotiation failed: client 
> connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: 
> Network error: Client connection negotiation failed: client connection to 
> 10.106.224.20:7050: connect: Connection refused (error 111)  {code}
> This indicates that the kudu master still uses the old network to manage the 
> new table.
> h3. To avoid the problem
> If I use the local machine's host network for kudu, the problem will not 
> happen



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (KUDU-3358) kudu tables fail to insert and scan when k8s network changes

Reply via email to