My immediate reaction is "don't do it", but on the other hand I've never
known people for whom 'money is not a problem' to shy away from
something because of boring concerns like security. So...


Software:

Basically, to do this "correctly" you need to parse all the packets
running in both directions between the two endpoints, tracking the acks
and correctly emulating the behaviour of the TCP stacks on both sides to
determine what is valid data to convert to UDP.

Things to think about:

- IP fragment reassembly
- duplicate packets
- out of order packets
- lost packets
- TCP resends
- TCP checksums
- IP checksums
- TCP sequence number validation
- etc, etc.

Look at pf_normalise_state_tcp() in pf_norm.c and pf_test_state_tcp() in
pf.c for a small taste of the scope of what you're considering if you
want to write this in the kernel.  Further examples for TCP reassembly
could be found in the source code for ports/net/snort or
ports/net/tcpflow.

Of course you can take some shortcuts if you assume that the data you're
getting is clean, and even more if you don't have to parse the TCP
stream but can handle each individual TCP packet as an individual
payload. Perhaps your current problematic implementation already does
this? If so, it's also probably trivial to inject bogus data into the
stream and have it accepted. Maybe that's a feature.

Remember: Lots of attacks can be performed against this hacked up
monstrosity unless everything is exactly perfect. Good luck with the
frankenstein code, it's not supported.


Hardware:

- NIC: something that allows you to adjust the interrupt rate, e.g. em,
  bnx. On the other hand if the packet rate is not too high a cheaper
  network card without any bells and whistles might give you better
  performance (less overhead in the interrupt handler). I'd say you'd be
  best off buying a bunch and testing them.

- CPU: maximum SINGLE CORE "turbo" speed. Disable the other cores,
  they're not helping you at all; in theory you want the biggest,
  fastest cache possible, but perhaps not necessary depending on how much
  software you're running.

- Fast RAM might help, but you don't need much. probably the minimum you
  can get in a board with the above CPU.

Also, remember to use the shortest patch cables possible, to reduce
signal propagation latency.



On Thu, Nov 08, 2012 at 08:08:05PM +0200, Dan Shechter wrote:
> For unrelated reasons, I can't directly receive the TCP stream.
> 
> I must copy the TCP data from a running stream to another server. I
> can use tap or just port-mirroring on the switch. So I can't use any
> network stack or leverage any offloading.
> 
> I also need to modify the received data, and add few application
> headers before sending it as a multicast udp stream.
> 
> Winsock is userland. What I want to do is in the kernel, even before
> ip_input. I guess it should be faster.
> 
> 
> On Thu, Nov 8, 2012 at 7:36 PM, Johan Beisser <j...@caustic.org> wrote:
> > On Thu, Nov 8, 2012 at 4:12 AM, Dan Shechter <dans...@gmail.com> wrote:
> >> Hi All,
> >>
> >> <current situation>
> >> A windows 2008 server is receiving TCP traffic from a stock exchange
> >> and sends it, almost as is, using UDP multicast to automated high
> >> frequancy traders.
> >>
> >> StockExchange --TCP---> windows2008 ---MCAST-UDP---->
> >>
> >> On average, the time it take to do the TCP to UDP translation, using
> >> winsock, is 240 micro seconds. It can even be as high as 60,000 micro
> >> seconds.
> >> </current situation>
> >>
> >> <my idea>
> >> 1. Use port mirroring to get the TCP data sent to a dedicated OpenBSD
> >> box with two NICs. One for the TCP, the other for the multicast UDP.
> >
> > You'll incur an extra penalty offloading to the kernel. Winsock is
> > already doing that, though.
> >
> >> 2. Put the TCP port in a promiscuous mode.
> >
> > Why? You can just set up the right bits to listen to on the network,
> > and pull raw frames to be processed. Or, just let the network stack
> > behave as it should.
> >
> >> 3. Write my TCP->UDP logic directly into ether_input.c
> >
> > Any reason to not use pf for this translation?
> >
> >> </my idea>
> >>
> >> Now for the questions:
> >> 1. Am I on the right track? or in other words how crazy is my idea?
> >
> > Pretty crazy. You may want to see if there's hardware accelerated or
> > on NIC TCP off-load options instead.
> >
> >> 2. What would be the latency? Can I achieve 50 microseconds between
> >> getting the interrupt and until sending the new packet through the
> >> NIC?
> >
> > See above. You'll end up having to do some tuning.
> >
> >> 3. Which NIC/CPU/Memory should I use? Money is not a problem.
> >
> > Custom order a few NICs, hire a developer to write a driver to offload
> > TCP/UDP on the NIC, and enable as little kernel interference as
> > possible.
> >
> > Money's not a problem, right?

Reply via email to