Title: Message
John,
> I am
assuming that frames are unique, and a retransmission will be different to the
original packet.
I'm not so sure this assumption is
correct. Try tracing a web request to a server that doesn't talk port 80. You'll
probably see three outbound SYN frames (and three RSTs) . These
frames will be identical (including IP and TCP checksums). There are
also numerious services running on my local network (esp MAC layer protocols
such as ARP) which contiually squirt the same or similar frames onto the LAN all
the time. This may or may not be important to you but goes some way to explains
why this is a non-tirivial task.
> It might be a simple thing to do
to modify the code to keep a circular buffer of src, dst, CRC
> and then
scan it for a duplicates before allowing a packet to be merged into the output
file.
> The length of the buffer will be the correlation window for the
duplicate packet check.
> Read from input
files
> Pick most recent
packet
> Check against buffer
>
If in buffer, then ignore
> If not in
buffer
> put src, dst, CRC into
buffer
> write to output
file
> Loop until all input files read
Even if you are
confindent the duplicates listed above are not a problem for you. The algorithm
will need a little more work. What is going to prompt you to shift frames
out of your buffer (size, time, # of frames?) . Also, the TCP layer checksum is only two
bytes long (=65536 possible checksums). This gets you into the shared birthday
problem
http://www.cut-the-knot.com/do_you_know/coincidence.shtml . There's a 50% chance of a duplicate checksum every 301 or
so packets. So you'd need to compare a little more than the
TCP checksum. Depending on how large your
trace is I'd probably opt for storing the whole frame and comparing the whole
packet.
Nastiest of all IMHO you need to make
a decision on the frame's timestamp.(Again maybe this isn't important to
you...) Which all leads back to my last question:
>> It may
be easier to advise you if we understood what sort of packets were being
duplicated? Eg is it MAC
>> level
broadcasts, RIP updates etc, HSRP hellos ?
You say you are running "capture
processes on one system". Does this mean
1) you're tracing two NICs on the
same machine,
2) the same NIC with different filters,
3) or your're
running the same filter on the same NIC with different trace start
times.
If the answer is #3 (and depending on
the filter #2 too) you will capture exactly the same frames in exactly the same
order with the same relative time stamps, so your algorithm will be fine, in
fact you needn't worry about a buffer at all. For anything else you'll need to
put your thinking hat on. :-)
That having been said I'm quite
tempted to see if I can write something that would do this sort thing.At least
until I convince myself it really is too tricky a problem, :-)
Cheers,
Alistair
Registered Office:
Marks & Spencer p.l.c
Michael House, Baker Street,
London, W1U 8EP
Registered No. 214436 in England and Wales.
Telephone (020) 7935 4422
Facsimile (020) 7487 2670
www.marksandspencer.com
Please note that electronic mail may be monitored.
This e-mail is confidential. If you received it by mistake, please let us know and then delete it from your system; you should not copy, disclose, or distribute its contents to anyone nor act in reliance on this e-mail, as this is prohibited and may be unlawful.
The registered office of Marks and Spencer Financial Services PLC, Marks and Spencer Unit Trust Management Limited, Marks and Spencer Life Assurance Limited and Marks and Spencer Savings and Investments Limited is Kings Meadow, Chester, CH99 9FB.