Hello,
I think I've found a deadlock condition.
It should be infrequently triggered but due to the special nature of my
configuration it has kept pestering me this morning :/ It
seems to be caused by a bug in nfexpire.
Some assumptions I am making:
- rrd files for newly created (so, empty) sources are invalid. The symptom is
this: ("darknet" is a newly created source which hasn't received any data yet)
name live
group (nogroup)
tcreate Thu Dec 4 13:55:00 2014
tstart Thu Dec 4 13:55:00 2014
tend Thu Dec 4 15:00:00 2014
updated Thu Dec 4 15:00:00 2014
expire 120 days 0 hours
size 103.0 KB
maxsize 10.0 GB
type live
locked 0
status OK
version 130
channel fprobe sign: + colour: #00ff00 order: 1 sourcelist: fprobe
Files: 13 Size: 105472
channel darknet sign: + colour: #0000ff order: 2 sourcelist: darknet
ERR Error reading channel stat information. Missing key 'first'
Files: 0 Size: 0
That missing key 'first' sounds related to another bug, the failures when a
profile doesn't have either a valid expiration period or maximum size, by the
way.
- nfexpire freezes when it finds one of these invalid rrd files, keeping the
profile locked. In my case it was the live profile.
Nfprofile should have a clean exit unlocking the profile properly when it finds
a troublesome rrd. Otherwise it can lead to a deadlock. For
example, in my case, the first symptom was the inability to stop nfsen
orderly, with the nfcapd processes deadlocked with nfexpire. Killing nfexpire
manually
would let them go on.
This issue can be especially problematic if you create a source that receives
very little data. My "darknet" source, for example, is a softflowd process
which sends
flows in large periods, something between 10 and 15 minutes (I know, it's
wrong, but there's still a bug in nfsen). So, in this scenario, nfexpire will
hang in each 5 minute period,
unable to progress until you kill it manually. Indeed, when I have realized
what was happening, killing nfexpire manually a couple of times has given it
time to add
data to the empty channel and "solved" the problem.
Does it ring a bell?
Borja.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Nfsen-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfsen-discuss