On 01/22/2016 03:42 PM, Jason Wang wrote: > > > On 01/22/2016 02:47 PM, Wen Congyang wrote: >> On 01/22/2016 02:21 PM, Jason Wang wrote: >>> >>> On 01/22/2016 01:56 PM, Wen Congyang wrote: >>>> On 01/22/2016 01:41 PM, Jason Wang wrote: >>>>>> >>>>>> On 01/22/2016 11:28 AM, Wen Congyang wrote: >>>>>>>> On 01/22/2016 11:15 AM, Jason Wang wrote: >>>>>>>>>> On 01/20/2016 06:30 PM, Wen Congyang wrote: >>>>>>>>>>>> On 01/20/2016 06:19 PM, Jason Wang wrote: >>>>>>>>>>>>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote: >>>>>>>>>>>>>>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote: >>>>>>>>>>>>>>>>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Two main comments/suggestions: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - TCP analysis is missed in current version, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe you point a git tree >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or another version of RFC) to me for a better >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> understanding of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design. (Just a skeleton for TCP should be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sufficient to discuss). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - I prefer to make the code as reusable as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible. So it's better to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> split/decouple the reusable parts from the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> codes. So a vague idea is: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Decouple the packet comparing from the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> netfilter. You've achieved >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this 99% since the work has been done in a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread. Just let the thread >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poll sockets directly, then the comparing have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the possibility to be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reused by other kinds of dataplane. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then, in primary node, you need just a traffic >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror, which did: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - mirror ingress traffic to secondary node >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - mirror outgress traffic to packet comparing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thread >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And in secondadry node, you need two filters: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> number. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A traffic redirector which redirect packet >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from a socket as ingress >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traffic, and redirect outgress traffic to the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> socket which could be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> polled by remote packet comparing thread. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zhangchen >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Jason. >>>>>>>>>>>>>>>>>>>>>>>>>>>> We consider your suggestion to split/decouple >>>>>>>>>>>>>>>>>>>>>>>>>>>> the reusable parts from the codes. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Due to filter plugin are traversed one by one in >>>>>>>>>>>>>>>>>>>>>>>>>>>> order >>>>>>>>>>>>>>>>>>>>>>>>>>>> we will split colo-proxy to three filters in each >>>>>>>>>>>>>>>>>>>>>>>>>>>> side. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> But in this plan,primary and secondary both have >>>>>>>>>>>>>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>>>>>>>>>>> server,startup is a problem. >>>>>>>>>>>>>>>>>>>>>>>> I believe this issue could be solved by reusing socket >>>>>>>>>>>>>>>>>>>>>>>> chardev. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Primary qemu >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Secondary qemu >>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | guest >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | guest >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------^--------------+--------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +---------------------+--------+-----------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | ^ | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | netfilter | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> netfilter | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | filter excute >>>>>>>>>>>>>>>>>>>>>>>>>>>> order | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | filter excute >>>>>>>>>>>>>>>>>>>>>>>>>>>> order | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-------------------> | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-------------------> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | TCP >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | +---------+-+ +------v-----+ +----+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----+ | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------+ +---+----+---v+rewriter+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> +--------+ | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | mirror | | redirect +----> compare >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +--------> mirror +---> adjust | adjust >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-->redirect| | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | client | | server | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | server | | ack | seq | >>>>>>>>>>>>>>>>>>>>>>>>>>>> |client | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | +----^------+ +----^-------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----+------+ | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------+ +--------+-------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> +----+---+ | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | tx | rx | >>>>>>>>>>>>>>>>>>>>>>>>>>>> rx | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | tx all | >>>>>>>>>>>>>>>>>>>>>>>>>>>> rx | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-------------------------------------------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> |guest receive >>>>>>>>>>>>>>>>>>>>>>>>>>>> |guest send >>>>>>>>>>>>>>>>>>>>>>>>>>>> | | >>>>>>>>>>>>>>>>>>>>>>>>>>>> +--------+------------------------------------v------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> | tap >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> | NOTE: filter >>>>>>>>>>>>>>>>>>>>>>>>>>>> direction is rx/tx/all >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> | rx:receive packets >>>>>>>>>>>>>>>>>>>>>>>>>>>> sent to the netdev >>>>>>>>>>>>>>>>>>>>>>>>>>>> | >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> | tx:receive packets >>>>>>>>>>>>>>>>>>>>>>>>>>>> sent by the netdev >>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+ >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I still like to decouple comparer from netfilter. It >>>>>>>>>>>>>>>>>>>>>>>> have two obvious >>>>>>>>>>>>>>>>>>>>>>>> advantages: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> - make it can be reused by other dataplane (e.g vhost) >>>>>>>>>>>>>>>>>>>>>>>> - secondary redirector could redirect rx to comparer >>>>>>>>>>>>>>>>>>>>>>>> on primary node >>>>>>>>>>>>>>>>>>>>>>>> directly which simplify the design. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> guest recv packet route >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> primary >>>>>>>>>>>>>>>>>>>>>>>>>>>> tap --> mirror client filter >>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror client will send packet to guest,at the >>>>>>>>>>>>>>>>>>>>>>>>>>>> same time, copy and forward packet to secondary >>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror server. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> secondary >>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror server filter --> TCP rewriter >>>>>>>>>>>>>>>>>>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack >>>>>>>>>>>>>>>>>>>>>>>>>>>> and update TCP checksum, then send to secondary >>>>>>>>>>>>>>>>>>>>>>>>>>>> guest. else directly send to guest. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> guest send packet route >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> primary >>>>>>>>>>>>>>>>>>>>>>>>>>>> guest --> redirect server filter >>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect server filter recv primary guest packet >>>>>>>>>>>>>>>>>>>>>>>>>>>> but do nothing, just pass to next filter. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect server filter --> compare filter >>>>>>>>>>>>>>>>>>>>>>>>>>>> compare filter recv primary guest packet then >>>>>>>>>>>>>>>>>>>>>>>>>>>> waiting scondary redirect packet to compare it. >>>>>>>>>>>>>>>>>>>>>>>>>>>> if packet same,send primary packet and clear >>>>>>>>>>>>>>>>>>>>>>>>>>>> secondary >>>>>>>>>>>>>>>>>>>>>>>>>>>> packet, else send primary packet and do >>>>>>>>>>>>>>>>>>>>>>>>>>>> checkpoint. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> secondary >>>>>>>>>>>>>>>>>>>>>>>>>>>> guest --> TCP rewriter filter >>>>>>>>>>>>>>>>>>>>>>>>>>>> if the packet is TCP packet,we will adjust seq >>>>>>>>>>>>>>>>>>>>>>>>>>>> and update TCP checksum. then send it to >>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter. else directly send to >>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter --> redirect server filter >>>>>>>>>>>>>>>>>>>>>>>>>>>> forward packet to primary >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> In failover scene(primary is down), the TCP >>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriter will keep >>>>>>>>>>>>>>>>>>>>>>>>>>>> servicing >>>>>>>>>>>>>>>>>>>>>>>>>>>> for the TCP connection which is established after >>>>>>>>>>>>>>>>>>>>>>>>>>>> the last checkpoint。 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> How about this plan? >>>>>>>>>>>>>>>>>>>>>>>> Sounds good. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> And there's indeed no need to differ client/server by >>>>>>>>>>>>>>>>>>>>>>>> reusing the socket >>>>>>>>>>>>>>>>>>>>>>>> chardev. E.g: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> In primary node: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>>>>>>> -chardev >>>>>>>>>>>>>>>>>>>>>>>> socket,id=comparer0,host=ip_primary,port=X,server,nowait >>>>>>>>>>>>>>>>>>>>>>>> -chardev >>>>>>>>>>>>>>>>>>>>>>>> socket,id=comparer1,host=ip_primary,port=Y,server,nowait >>>>>>>>>>>>>>>>>>>>>>>> -chardev >>>>>>>>>>>>>>>>>>>>>>>> socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait >>>>>>>>>>>>>>>>>>>>>>>> -netdev tap,id=hn0 >>>>>>>>>>>>>>>>>>>>>>>> -traffic-mirrorer >>>>>>>>>>>>>>>>>>>>>>>> netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0 >>>>>>>>>>>>>>>>>>>>>>>> -colo-comparer >>>>>>>>>>>>>>>>>>>>>>>> primary_traffic=comparer0,secondary_traffic=comparer1 >>>>>>>>>>>>>>>>>>>> Why mirrorer has indev? >>>>>>>>>>>>>>>> As I said in the previous mails. I would like to decouple >>>>>>>>>>>>>>>> packet >>>>>>>>>>>>>>>> comparing from netfilter. You've already done most of this >>>>>>>>>>>>>>>> since the >>>>>>>>>>>>>>>> comparing is done in an independent thread. So the indev here >>>>>>>>>>>>>>>> is to >>>>>>>>>>>>>>>> mirror the packet sent by guest to the packet comparing thread. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I think we can use traffic-redirector to do it. >>>>>>>>>>>>>>>>>>>> The command line is: >>>>>>>>>>>>>>>>>>>> -netdev tap,id=hn0 >>>>>>>>>>>>>>>>>>>> -object >>>>>>>>>>>>>>>>>>>> traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0 >>>>>>>>>>>>>>>>>>>> -object >>>>>>>>>>>>>>>>>>>> traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0 >>>>>>>>>>>>>>>>>>>> -colo-comparer >>>>>>>>>>>>>>>>>>>> primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0 >>>>>>>>>>>>>>>>>>>> In the comparer thread, we can use >>>>>>>>>>>>>>>>>>>> qemu_net_queue_send_iov() to send >>>>>>>>>>>>>>>>>>>> out the packet. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0. >>>>>>>>>>>>>>>> It depends on whether or not packet comparing was done in a >>>>>>>>>>>>>>>> net filter >>>>>>>>>>>>>>>> (which I prefer not). >>>>>>>>>>>> I mean that: packet comapring is done in a thread, not a net >>>>>>>>>>>> filter. >>>>>>>>>>>> The flow of the packet sent from guest: >>>>>>>>>>>> 1. traffice-redirecotr, we will redirector the packet to >>>>>>>>>>>> comparer0, the next >>>>>>>>>>>> filter will never see it. >>>>>>>>>>>> 2. comparing thread: read it from socket chardev comparer0 >>>>>>>>>>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev. >>>>>>>>>> Ok, looks like I miss something. >>>>>>>>>> >>>>>>>>>> My suggestion tries best to let the packet comparing not tie to >>>>>>>>>> filter >>>>>>>>>> or netdev. But your suggestion still need it to be coupled with a >>>>>>>>>> netdev. Any advantages of doing this (or is there a reason that >>>>>>>>>> packet >>>>>>>>>> must be sent to netdev after doing comparing?). If not, why not just >>>>>>>> Yes, the packet must be sent to netdev after doing comparing. If both >>>>>>>> the primary packet and secondary packet are the same(contains the same >>>>>>>> application level data), we will drop the secondary packet, and send >>>>>>>> the >>>>>>>> primary packet to the netdev. Otherwise, we will sync the state. >>>>>> And drop primary packet also here? >>>> No, the primary packet must be sent back to the netdev, so the client can >>>> receive >>>> the response. >>>> >>>> For example: >>>> 1. guest has a ftp server >>>> 2. we connect to the ftp server via the network >>>> 3. both primary guest and secondary guest receive this request >>>> 4. both primary guest and secondary guest ack it >>>> 5. we compare these two ack packets in the comparing thread >>>> 6. it is the same(the seqno is different, but it is not important, we can >>>> modify it in >>>> colo-rewriter). So we drop the secondary packets, and sent back the >>>> primary packet >>>> to netdev >>>> 7. The primary ack packet is sent to the ftp client via netdev. >>>> >>>> The ftp client only cares of the received packet. So if the packets from >>>> primay >>>> and secondary guest contain the same data, we can say they are in the >>>> "same" state. >>>> >>>> Thanks >>>> Wen Congyang >>>> >>> Thanks for the example. But still don't get why it must be done before >>> comparing consider it will always be sent regardless the result of >>> comparing? >> Our goal is that: the connection is OK after failover, and the user doesn't >> know one of >> the hosts crashed. >> >> If it sent out regardless the result of comparing, and primary host crashes. >> The connection >> may be corrupted after failover. For example: the packet from primary and >> secondary host >> contains different host, and we send the primary packet before comparing. >> The primary host >> crashes before comparing these two packets. After failover, the connection >> may be reset or >> the client doesn't receive the correct data, or some unexpected problems >> occurs. >> >> Another example(tcp): >> 1. primary guest acks 100, and secondary guest only ack 95(some packet is >> lost in the guest) >> 2. client doesn't resend the lost packet >> 3. the connection will be recovered after the next checkpoint >> If we do failover before the next checkpoint, there is no way to recover >> this connection. >> >> If we send out the packet after comparing, we can assume that the client >> always receives the >> same data. > > Thanks. I think I get the point. So if there's a difference, primary > packet will only be sent after checkpoint and we could not assume the > checkpoint itself is reliable.
Yes. > > Back to the filters design. We'd better still decouple packet comparing > out of netdev. Maybe a little bit more tweak on what you've suggested: > > -netdev tap,id=hn0 > -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0 > -object > traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0,indev=comparer2 > -colo-comparer > primary_traffic=comparer0,secondary_traffic=comparer1,outdev=comparer2 > > Just add one more socket for comparer for sending primary packet, and > let f1 redirector its output to netdev? OK, I understand it now. Thanks for your suggestion. Wen Congyang > >> >> Thanks >> Wen Congyang >> >>> >>> >>> . >>> >> >> > > > > . >