Re: [tcpdump-workers] Legacy Linux kernel support

2019-10-08 Thread David Laight
From: Mario Rugiero
> Sent: 07 October 2019 19:07
...
> > If we were to drop TPACKET_V1 and TPACKET_V2 support, that'd mean we'd be 
> > dropping 2.x kernels and older 3.x kernels as
> > targets, and would require that the headers with which libpcap is being 
> > built be new enough to support TPACKET_V3.
> >
> Yes. That's what I'm proposing.

I don't think it makes sense to drop support for kernels which distributions 
still have under LTS.
I think that means 2.6.32 still needs to be supported.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Libpcap timeout settings in tcpdump - too long when printing to a terminal?

2015-02-11 Thread David Laight
From: Guy Harris
 On Jan 9, 2015, at 8:30 AM, Michael Richardson m...@sandelman.ca wrote:
 
  Guy Harris g...@alum.mit.edu wrote:
  The longer timeout can reduce capturing overhead, and if you're
  capturing a high volume of traffic to a file, it's probably the right
  timeout to have.

If you are capturing a high volume of traffic even a short (10ms) timeout
won't expire.

  If, however, you're printing packets to the console,
  you're probably doomed if it's a high volume of traffic, and may want
  less of a delay if it's a low volume of traffic.
 
  Should we reduce the timeout if -w isn't specified - or do so if -w
  isn't specified *and* if we're outputting to a terminal (isatty(1)
  returns a non-zero value)?  Should we use immediate mode if libpcap
 
  Yes, I think that -w not specified, and isatty()==1.

What about piping through 'tee', 'grep' or into a pager?
In all those cases you want immediate output (as if directly writing the tty).
This also means you need an fflush(stdout) before waiting for more data.

Even with -w you can have problems - it is silly to have to wait a significant
time between running a test that generates a small number of packets and typing
^C to stop tcpdump.

 
 OK, I've implemented that for immediate mode, i.e. immediate mode if -w isn't 
 specified and isatty(1)
 is true, and added a --immediate-mode flag so the nerds in the audience have 
 a knob to tweak. :-)
 
 If pcap_set_immediate_mode() isn't available, should it set the timeout to a 
 lower value instead, in
 those cases?
 
 Should we reduce the default timeout?  Should we have a command-line flag to 
 set the timeout?

I don't see any point in delaying more than 100ms.
Returning to user every 50ms shouldn't be a problem either.

David

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Libpcap performance problem

2015-01-28 Thread David Laight
From: Rick Jones
 On 01/28/2015 06:57 AM, Giray Simsek wrote:
  Hi,
  We are currently working on testing Linux network performance. We
  have two Linux machines in our test setup. Machine1 is the attacker
  machine from which we are sending SYN packets to Machine2 at a rate
  of 3million pps. We are able to receive these packets on Machine2's
  external interface and forward them through the internal interface
  without dropping any packets. So far no problems. However, when we
  start another app that captures traffic on Machine2's external
  interface using libpcap, the amount of traffic that is forwarded
  drops significantly. Obviously, this second libpcap app becomes a
  bottleneck. It can capture only about 800Kpps of traffic and only
  about 800Kpps can be forwarded in this case. This drop in the amount
  of forwarded traffic is not acceptable for us.
  Is there any way we can overcome this problem? Are there any settings
  on Os, ixgbe driver or libpcap that will allow us to forward all the
  traffic?
  Both machines are running Linux kernel 3.15.
 
 TCP SYN segments would be something like 66 bytes per (I'm assuming some
 options being set in the SYN).  At 3 million packets per second, that
 would be 198 million bytes per second.  Perhaps overly paranoid of me
 but can the storage on Machine2 keep-up with that without say the bulk
 of the RAM being taken-over by buffer cache and perhaps inhibiting skb
 alloctions?

More likely is that running pcap requires that every receive packet
be copied (so it can be delivered to pcap and IP).
The cost of doing this could easily be significant.

Even setting a pcap filter to return no packets will invoke the
same overhead.
As does running the dhcp client!

David

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] RFC: DLT for application TCP stream capture

2015-01-15 Thread David Laight
From: Denis Ovsienko
 Eventually, we'll be using this format to debug multi-path TCP, in which case
 the IP addresses (and maybe even the IP4/IP6-ness of it) might change.
 
 Also there exists SCTP, which implements the concept of variable (0..65535)
 number of streams for each direction of an association between a pair of
 sockets (in TCP these two things are the same), so a stream_id field in the
 encoding (0 for TCP and UDP) could be handy for SCTP payload representation.

SCTP 'streams' aren't entirely separate data flows.

There is only one transmit window and one set of acks.
Each data 'chunk' does have a stream-id (and in-stream sequence number) so
that data for a specific stream can be delivered to the application even
when data for a different stream has been lost (and retransmit requested).

SCTP also has a field that identifies the data encoding for each chunk.

If you are trying to add this into some kind of packet header then
you'll need some kind of free-format (but don't use XML).

David

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] RFC: DLT for application TCP stream capture

2015-01-15 Thread David Laight
 I'd prefer to also have a flag to say if this segment was received or
 transmitted - I've never liked inferring that information from the
 identity of the source/dest. addresses. It then makes it impossible to
 sensibly analyse the file if you don't know the underlying networking
 configuration, as may well be the case for .pcap(ng) files copied from
 one machine to another.

This is even more important when people use pcap file formats for
things like SS7 (telephone signalling) where you might be monitoring
sixteen (or even more) bidirectional 64k signalling links and need to
know precisely which of the 32+ data flows being monitored each packet
came from (ie the SS7 pointcodes and SLC of each link).

David

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Libpcap timeout settings in tcpdump - too long when printing to a terminal?

2015-01-12 Thread David Laight
From: Guy Harris
 On Jan 9, 2015, at 2:09 AM, Michal Sekletar msekl...@redhat.com wrote:
 
  Can't we use new default timeout value (lower) if we detect TPACKET_V3,
 
 The first sentence of my original mail was With TPACKET_V3 support, Linux 
 users are discovering what
 those of us using BSD-flavored OSes have known for quite a while:
 
 This is not a TPACKET_V3 issue, it's a buffering issue.  I notice it when 
 testing tcpdump on Macs,
 which don't have PF_PACKET sockets of any sort, they have BPF; if, for 
 example, I test on the
 generally-low-traffic loopback interface by pinging 127.0.0.1, the packets 
 don't show up continuously,
 they show up in batches, with a 1-second delay.

Is there any real reason for a delay as long as 1 second?
If the traffic is light (which might mean 100/sec) then processing
every packet separately isn't going to be a problem.
Cleary at very high rates you do want to defer the wakuep.

So reducing the delay from 1 sec to 100ms (or even 50ms) will have
little effect on the ability to process the received data.

David

___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] [tcpdump] IEEE float decoded incorrectly(#333)

2013-10-01 Thread David Laight
 from
 https://github.com/the-tcpdump-group/tcpdump/issues/333
 
 details an issue where differences in arch and compiler result in different
 extractions of floating point objects in LMP packets. Guy discovers it has
 something to do with assumptions about x86 SSE.
 
 and I think we might find more expertise on the list.

You need to read the C standard as well.
The C standard certainly allows 'double' arithmetic to be done
using the 80bit x87 80bit registers without requiring that
intermediate values (even those assigned to (implicit) register
variables) being truncated to 64bits.
I suspect that is a general statement about floating point precision.
Either that or there is something that allows 'float' arithmetic be
performed as 'float' or by converting to 'double'.
This means that expression below can be calculated with 32bit, 64bit
or 80bit intermediaries depending on the whim of the implementation.

Passing an appropriate -march=xxx will allow 32bit code use SSE
instructions - but portable code shouldn't enforce it.

David

 
 Date: Mon, 30 Sep 2013 16:31:47 -0700
 Subject: Re: [tcpdump] IEEE float decoded incorrectly (#333)
 
 So if I compile
 
 float
 xxx(float f)
 {
 return f*8/100;
 }
 
 with gcc -O2 -m32 -S, I get
 
 xxx:
 .LFB0:
 .cfi_startproc
 flds.LC0
 fmuls   4(%esp)
 fdivs   .LC1
 ret
 .cfi_endproc
 
 and if I compile it with gcc -O2 -S (64-bit), I get
 
 xxx:
 .LFB0:
 .cfi_startproc
 mulss   .LC0(%rip), %xmm0
 divss   .LC1(%rip), %xmm0
 ret
 .cfi_endproc
 
 So the 32-bit code is using the x87 instructions, using the 80-bit registers 
 on
 the stack, and doing 80-bit extended-precision arithmetic, and the 64-bit code
 is using SSE instructions, and presumably doing 32-bit single-precision
 arithmetic; the Intel manuals don't explicitly say that the single-precision
 scalar floating point SSE instructions perform single-precision arithmetic on
 their single-precision operands, but that's my guess - somebody more
 knowledgable than I am about IEEE floating point could probably answer this,
 and, even if it's double-precision, that's still different from
 extended-precision.
 
 I.e., it's a consequence of i686 not being guaranteed to have SSE (it was
 introduced in the Pentium III, not the Pentium Pro), so that compiling for 
 i686
 means using the x86 instructions for floating point, and the x86 instructions
 doing everything in 80-bit extended precision, not of the use of a union.
 
 Sadly, the obvious choice of configuring with CFLAGS=-m32 -mfpmath=sse
 doesn't help - the compiler decides, for some reason, that I didn't really 
 mean
 -mfpmath=sse and says hey, 32-bit x86 isn't guaranteed to have SSE, so I'm
 just going to build with x86 instructions (or, as it puts it, warning: SSE
 instruction set disabled, using 387 arithmetics [enabled by default]), and
 some other things I tried, such as -mtune=pentium3, didn't help. I'll leave it
 to somebody more familiar with GCC-wrangling to figure out how to beat GCC 
 into
 producing 32-bit code that uses SSE rather than x87 floating-point
 instructions, but I suspect that, if you build a 32-bit tcpdump with whatever
 options those are, make check will succeed, at least on those tests.
 
 (32-bit tcpdump is failing with some other problems as well; I'll check 
 whether
 they're also problems with 64-bit tcpdump, and see what's going on there.)
 
 
 ___
 tcpdump-workers mailing list
 tcpdump-workers@lists.tcpdump.org
 https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] capturing packets with identical MAC for sourceand destination

2013-09-03 Thread David Laight
 currently we are expierencing bad network performance. And in the log of a 
 linux-server i have a lot
 of these messages:
 
 Sep  2 10:16:08 pc60181 kernel: [4286760.823563] br0: received packet on eth0 
 with own address as
 source address

Since you know your own MAC address, you can just filter for it directly.

However I think you'll find that it is a broadcast packet and that you
have a loop in the network somewhere.

A simple trace of 'everything' will probably show bursts of repeated packets.

David



___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] -W options to gcc

2013-03-28 Thread David Laight
  Gisle == Gisle Vanem gva...@broadpark.no writes:
 Gisle I compile using MingW (gcc 4.7.2) and normally I use
 Gisle -Wall -W.
 
 sure, I'd like to get to -Wall -Werror at some point, but for the
 moment, I want to know how to include -Wgcc-things when we are using
 gcc, and omit when we aren't.

You probably want to include (some of) them when using some
other compilers (eg clang) that are likely to support the
same options.

There are problems with -Werror for release software in that
different versions of gcc will detect different errors.
Particularly if some aggressive function inlining is done.

David



___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] mmap consumes more CPU

2012-11-18 Thread David Laight
 hi,
   I just checked the two mechanism :
 (1) Using mmap to fetch packets from kernel to userspace
 (2) Using recvfrom() call to fetch packets
 
 I see top reports
 (1) 34% memory 20% cpu usage
 (2) 21% memory 7% cpu usage !

It is worth remembering that the cpu usage reported by top isn't
worth the paper it is printed on for many workloads.
IIRC it is based on the cpu state when the timer interrupt fires.
processes that are scheduled very often, and run for short periods
tend to get mis-counted.

Since the Linux scheduler doesn't get a high-res timestamp everytime
it does a process switch, about the only way to measure idle time
is to put a very low priority process into a counting loop.
Unfortunately the scheduler might make it difficult to make the
processes priority low enough.

David




___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Modular arithmetic

2012-09-10 Thread David Laight
 On Fri, Sep 07, 2012 at 07:49:10AM +, George Bakos wrote:
  Gents,
  Any fundamental reason why the following (, etc.) shouldn't be
  included in net/core/filter.c?
 
  case BPF_S_ALU_MOD_X:
  if (X == 0)
  return 0;
  A %= X;
  continue;
 
 Copying netdev.
 
 In principle no reason against it, but you may need to update
 the various BPF JITs too that Linux now has too.

What about the other OS - eg all the BSDs?
I had a vague idea that BPF was supposed to be reasonable portable.

David(d...@netbsd.org)




___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Modular arithmetic

2012-09-10 Thread David Laight
  What about the other OS - eg all the BSDs?
  I had a vague idea that BPF was supposed to be reasonable portable.
 
 Yes, does it mean BPF is frozen ?
 
 Or is BSD so hard to update these days ?

Not really - but it some other places that need updating in order
to make this useful for cross-platform tools (like tcpdump).

The 'real fun (tm)' happens when NetBSD tries to run Linux binaries
that include the Linux libpcap.

David
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Multifile patch

2012-09-05 Thread David Laight
  On windows you can't pass 'FILE *' into shared libraries,
  they are likely to have their own copies of the stdio
  libraries - with different FILE structures.
  (eg if one part is compiled with debug enabled).
 
 In this patch, the library into which VFile is being passed is called
 the C library, i.e., with the patch, we're not passing it to
 libpcap/WinPcap, we're passing it to fgets(); if you couldn't pass a
 FILE * to, say, fgets(), the stdio libraries would be completely
 useless.

Did I miss that this is a tcpdump change, not a pcap one :-(

David



___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Multifile patch

2012-09-04 Thread David Laight
 On Sep 3, 2012, at 7:13 PM, Michael Richardson wrote:
 
  Wesley, is fopen(/dev/stdin) really the most portal
 
 (Presumably portable.)
 
  way to get a reference to stein?
 
 Definitely not - it will probably work on most modern UN*Xes (Linux,
 *BSD/OS X, and Solaris; I don't know about HP-UX or AIX), but not on
 Windows, so it won't work in WinDump.
 
   I'd have thought that doing:
 VFile=stdin;
 
  was the best way?
 
 Yes.

I seem to be missing half these mails 

On windows you can't pass 'FILE *' into shared libraries,
they are likely to have their own copies of the stdio
libraries - with different FILE structures.
(eg if one part is compiled with debug enabled).

Probably the most portable way is using fdopen(0, ...)
that will work in windows - fileno(stdin) is still 0.

David




___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] Building tcpdump with static libraries

2012-05-29 Thread David Laight
 
 Can I know how to build tcpdump with static libraries rather than with
 shared libraries ?

Why are you trying to do this.
As Guy says in his recent posts you can't (and shouldn't) try to link
statically to all the system libraries.

If you are trying to link with a specific version of libpcap,
the look into the RPATH, RUNPATH, SONAME and NEEDED elf sections
and the use of $ORIGIN.

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] Multiple interface capture and thread safety

2012-05-10 Thread David Laight
 
 As I have to deal with asymm. paths and perform flow analysis, I must 
 ensure that the packets of a flow are analyzed in temporal order, no 
 matter from which interface they came through.

You'll probably only manage that if the underlying low level
device driver (or preferably the hardware itself - because
of interrupt mitigation) adds a rx timestamp to the frame
AND that value is made available through the pcap library.

That might mean a very recent linux kernel (there are
current discussions on netdev about timestamps).

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


[tcpdump-workers] 64bit support on netbsd

2012-02-09 Thread David Laight
There is a report on one of the netbsd lists (might have
been a developer-only list) that tcpdump (etc) aren't
working on 64bit netbsd platforms.

IIRC it had something to do with 'struct timeval' and friends.

I'm not sure of the full details but it might be related to:
1) NetBSD recently changing from 32bit to 64bit time_t.
2) 'struct timeval' being 48bits for i386 but 64bits for amd64,  
  and someone trying to run the 32bit pcap code on a 64bit system.

However I'm not certain where non fixed size time fields get used.

David  (also d...@netbsd.org)


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] Fwd: New datasource implementation

2012-01-04 Thread David Laight
 
 Do you have any references for this, so I can see exactly 
 what it means?
 
 If it just means that if you build an executable image (or 
 shared library), linking it with library A, and library A is 
 a shared that is linked with library B, and if the executable 
 image is *not* linked with library B when you build it, if 
 the image refers to routines in library B those references 
 will *not* be treated as resolved by virtue of library B 
 being dragged in by library B, that doesn't appear to break 
 the scenario I describe.

This comes from reading the NetBSD 'pkgsrc' mailing list,
where some packages are failing to build with newer versions
of 'gld'. The 'solution' (which isn't really one) is a
linker option to copy DT_NEEDED entries from refernced
libraries into the program image.

  This breaks many things!
 
 Does it, in particular, break the scenario I describe?

If I misunderstood your senario, maybe not!

But consider something like:

I have an old product 'A' that is releases liba.so.
I now write a new product 'B' that shares quite a
lot of code with product 'A', so i generate a libab.so
containing the common parts, and build liba.so with
a DT_NEEDED entry for libab.so (and build libb.so).
I would like existing program binaries and makefiles
to still work unchanged.

David



-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] Fwd: New datasource implementation

2012-01-03 Thread David Laight
 
 On all modern UN*Xes, as far as I know, a dynamic library can 
 be linked with another dynamic library, and if a program is 
 explicitly linked with the first of those libraries, but 
 *not* explicitly linked with the second of those libraries, 
 the program will still work - the run-time linker will see 
 that the first library requires the second library and will 
 load and bind it in at run-time.

The gnu/linux folks have recently changed the behaviour
of gld (probably contrary to the elf specification, but
they tend not to care about standards) so that linker
will not assume that libraries referenced by DT_NEEDED
entries in other libraries have their symbols made available
to teh main program. This breaks many things!
(It also stops you implementing a shared library in
separate pieces.)

What happens at run time depends on the dynamic linker.

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] tcpdump license and Nokia

2011-12-21 Thread David Laight
Chris Maynard wrote:

 Tyson Key tyson.key at gmail.com writes:
 
  As far as I'm aware, TCPDump is released under the terms of the BSD
Licence
  - meaning that Nokia haven't got any obligations regarding releasing
their
  modifications; and whilst it's not the most reliable information
source on
  the planet, Wikipedia seems to corroborate that thought.
 
 Thanks Tyson.  Yes, I see that now in the sources.  Still,  might it
be useful to
 mention the license somewhere on the web site?

The sources seem to contain a mix of zero, 2, 3 and 4
clause BSD licences from Berkley.
I thought Berkley had made a general statement that
some of those clauses (esp. the advertising one) could
be removed??

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] capturing on both interfaces simultaneously

2011-12-13 Thread David Laight
 
 
 On Dec 12, 2011, at 3:59 AM, David Laight wrote:
 
  The linux libpcap has a poll() in the 'memory mapped'
  kernel interface (in order to check for errors).
  If the application is using poll() this is an unnecessary
  system call.
 
 The only way libpcap can infer that the application is using 
 poll() is if it has put the pcap_t in non-blocking mode.  
 libpcap used to avoid the poll() in that case, but that was 
 causing applications to loop infinitely chewing up CPU; see 
 the thread that starts at
 
   http://thread.gmane.org/gmane.network.tcpdump.devel/3937
 
 That poll() is unnecessary in non-blocking mode only if the 
 application isn't expecting libpcap to return errors, and is 
 itself checking for those errors after the poll() call.  That 
 would be the case only if the application knew it had to do 
 that special Linux-specific stuff. 

Perhaps it could be done every 256'th time through.
That would pick up an actual error quite quickly, but
reduce the overhead a lot.

  I also think that interface could defer freeing the last
  rx buffer until the request to read another packet.
  That would avoid the necessity of a buffer copy
  for applications that don't want to use callbacks.
 
 That strategy was attempted by
 
   commit 54ef309e921c11a4e80cd7a26d9e25d30c833e14
...

 If you have a change that will eliminate the need for the 
 copy *without* breaking Mike Kershaw's code, please contribute it.-

Is there a sane way I can look a the diff of that commit?

I think it should be enough to set the deferred
  tp_status = TP_STATUS_KERNEL
at the top of pcap_read_linux_mmap() and at the top of the
loop that processes packets.

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] capturing on both interfaces simultaneously

2011-12-13 Thread David Laight
 On Dec 12, 2011, at 1:41 PM, Guy Harris wrote:
 
  On Dec 12, 2011, at 3:59 AM, David Laight wrote:
  
  I also think that interface could defer freeing the last
  rx buffer until the request to read another packet.
  That would avoid the necessity of a buffer copy
  for applications that don't want to use callbacks.
 
 Actually, that *might* be doable, with some additional 
 complexity, although it does leave one less buffer slot 
 available to the kernel...

Only until the application tries to read the next packet.
Which will be almost immediately it has finished processing it.

The V1/V2 header stuff looks like a badly fixed fubar!
If might be worth sorting out the correct offset for a
32bit access to sp_status! - needed for big-endian 64bit
systems.

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] capturing on both interfaces simultaneously

2011-12-13 Thread David Laight
 
  That poll() is unnecessary in non-blocking mode only if the 
  application isn't expecting libpcap to return errors, and is 
  itself checking for those errors after the poll() call.  That 
  would be the case only if the application knew it had to do 
  that special Linux-specific stuff. 
 
 Perhaps it could be done every 256'th time through.
 That would pick up an actual error quite quickly, but
 reduce the overhead a lot.

Actually skip the poll if a packet has been found since
the last time the check was done.

Then code that does:
select(...)
while (read_packet())
...
done;
won't call poll() inside the library every time it
finishes processing the available data.
But if there is a failure, it will be detected immediately.

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] capturing on both interfaces simultaneously

2011-12-12 Thread David Laight
 
 Is poll() better than select ?

poll() and select() use the same basic kernel code.
poll() is generally better since it doesn't have problems
with high numbered fds, and doesn't require a sparse
fd map to be scanned.

The linux libpcap has a poll() in the 'memory mapped'
kernel interface (in order to check for errors).
If the application is using poll() this is an unnecessary
system call.

I also think that interface could defer freeing the last
rx buffer until the request to read another packet.
That would avoid the necessity of a buffer copy
for applications that don't want to use callbacks.

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] [PATCH] tcpdump -s 0 improvement

2011-12-01 Thread David Laight
 
 On Nov 30, 2011, at 9:55 AM, David Laight wrote:
 
  That doesn't preclude the use of variable sized buffers.
  There are several schemes that could have been used that
  have much the same logic, but allow variable sized buffers.
 
 At least with the Linux design, there's a fixed ring buffer 
 of descriptors for packets...

Yes - modelled on a simple ethernet mac rx ring.
So could be modified to allow a frame to use multiple buffers
(I've used 512 byte buffers for ethernet rx in the past).

Modelling as a tx ring is more flexible, eg:
- If the writes (kernel in this case) keeps track of the
  offset into the buffer space, it write the actual address
  into the descriptor just before changing the owner.
  (Means checking for the end of buffer as well as ring).
- Interleave the descriptors and buffers.
  In this case the descriptor is written with the offset
  of the next descriptor as well as the frame length.

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Re: [tcpdump-workers] [PATCH] tcpdump -s 0 improvement

2011-11-30 Thread David Laight
 
 I didn't see any of the discussions about it, but my guess is 
 that the intent was to have a fixed set of slots in the 
 buffer, each one associated with a fixed header, so that most 
 of the packet-receive loop can just look at the headers and 
 process all owned by userland headers and only make a 
 system call when it has to block waiting for new packets to arrive.

That doesn't preclude the use of variable sized buffers.
There are several schemes that could have been used that
have much the same logic, but allow variable sized buffers.

Perhaps the most obvious way to look at it would be to
consider it as a 'transmit' ring from the kernel to user
instead of a receive ring.
Provided the kernel code knows the length of a frame
before it starts copying it to userspace, it could
easily allocate a data offset just beyond the previous
frame.

David


-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.