See my in-line notes to the trimmed msg... But be aware that a lot of my response at first is to correct your misunderstanding of how ntop works. You've linked things together that are asynchronous and assumed it's cause & effect. What you're forgotten is the multi-threaded nature of ntop and what you don't realize is the huge amount of processing that has to occur to get ntop ready to process packets and be a web server.
IMHO, the real issue is, what is this 'bpf' state, and why does trussing clear it... -----Burton > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf > Of Stanley Hopcroft > Sent: Friday, January 30, 2004 1:21 AM <snip /> > In relation to Mr Strauss debugging suggestions in pre MyDoom mail > flow times :- > This may be simply a CHKVER related temporary hang while ntop attempts > to log that it is running to the ntop dev team. CHKVER is asynchronous with the web server startup. It's in initNtop() in globals-core.c, #ifdef CFG_MULTITHREADED { pthread_t myThreadId; createThread(&myThreadId, checkVersion, NULL); } #else checkVersion(NULL); #endif > No strange as it seems, ntop hangs in a state top reports as bpf and > when the ntop process is trussed, data starts moving through the > connections (or the kernel hands a connected socket to accept()). This is probably the problem, but it's FreeBSD internals - what is 'bpf' state? And how do you know that's what ntop is 'in'. We'll pick back up with this later in this msg, after we dispose of the cause and effect issues. > However, trying to disable version checking with --no-check-version > leads to ... what me be a bug in the 25 Jan 2004 CVS. Bug is in the usage() listing, not the code. --no-check-version in the usage() should be --skip-version-check. man ntop is right. <snip /> > 1 the web server hang is repeatable on _some_ instances of ntop on same > hw, os (25 Jan CVS, FreeBSD 4.9-RELEASE-p1, tiny p5 class hosts). There > exist ntops that do not seem to do this on same hw and os. > > 2 web server hangs at start; ntop process stuck in bpf state > > 3 truss of the ntop process unwedges the web server ... > > Looks like its time to start taking drugs again. Yeah, really... 1) Are you saying that you can start ntop on host a and it always hangs, while seemingly identical host b always works? Or that sometimes host a hangs and sometimes it works? 3) When it 'unwedges', has ntop been successfully recording packet data? I.e. this is limited to the web server thread? Or is the whole shooting match hung, meaning FreeBSD is doing something to the ntop thread group??? > Your comments or hilarity are welcome. <snip /> > It may be that I can only connect after the CHKVER error in the log <snip /> > Yep Nope ... I think you're confusing an accident of timing w/ causality. On your box the two processes just happen to take about the same time. > tsade# telnet tsade 3000 > Trying 192.168.105.230... > Connected to tsade.aipo.gov.au. > Escape character is '^]'. > GET / > > > ..wait .. wait The web server is started thus: traceEvent(CONST_TRACE_INFO, "WEB: Starting web server"); createThread(&myGlobals.handleWebConnectionsThreadId, handleWebConnections, NULL); traceEvent(CONST_TRACE_INFO, "THREADMGMT: Started thread (%ld) for web server", myGlobals.handleWebConnectionsThreadId); called at the end of initWeb() which is called from main(). A lot has happened before this... the web server isn't what you thing of as 'active' until it gets to handleWebConnections() in webInterface.c. The last message before this is usually ... traceEvent(CONST_TRACE_INFO, "Note: SIGPIPE handler set (ignore)"); Now, techically, once listen() is called, the requests are accepted and queued. That's this: Jan 29 13:45:21 tigger ntop[7443]: Initialized socket, port 3000, address (any) [MSGID0349927] Once that's happened, you'll see your 'hang'... Since you are getting connected, the ntop host's tcp/ip stack has accepted the connection (meaning there's somebody bound - the bind() call - and listening - the listen() call - to the port), but the select() call which actually waits for the connection and the recv() call which actually takes in the data have yet to happen. With all that is happening as part of ntop's startup, there can be a lag, esp. as it reads large oui, asn, p2c files. I agree that adding a message, "ntop's web server is now active" or some such would be a good idea. I can also change the log tags to INITWEB: until the web server is actually up... but all these changes do is clarify in the log what's happening. > Jan 30 17:45:20 tsade ntop[2381]: **ERROR** CHKVER: Unable to connect > socket: Operation timed out(60) Irrelevant, per my top comment, it's asynchronous. UNLESS, this is a FreeBSD artifact, which limits the # of calls or ports or some such... but I find that hard to believe, as everything else that implements a web server would have problems... The real issue is what's 'bpf' state and why are things hanging... So, let's go to the web... http://unix.derkeiler.com/Mailing-Lists/FreeBSD/hackers/2003-09/0120.html Now that's interesting ... ntop doesn't use the 'bpf' device, but libpcap might... Checking man 4 bpf... BIOCIMMEDIATE (u_int) Enable or disable "immediate mode", based on the truth value of the argument. When immediate mode is enabled, reads return immediately upon packet reception. Otherwise, a read will block until either the kernel buffer becomes full or a timeout occurs. This is useful for programs like rarpd(8) which must respond to messages in real time. The default for a new file is off. So, we're to interesting questions. What is 'bpf', how do you know ntop's hung in that state, why does truss free it ... and what's the underlying ntop problem??? -----Burton _______________________________________________ Ntop-dev mailing list [EMAIL PROTECTED] http://listgateway.unipi.it/mailman/listinfo/ntop-dev
