Hello,

I think I've found a deadlock condition.

It should be infrequently triggered but due to the special nature of my 
configuration it has kept pestering me this morning :/ It
seems to be caused by a bug in nfexpire.

Some assumptions I am making: 

- rrd files for newly created (so, empty) sources are  invalid. The symptom is 
this: ("darknet" is a newly created source which hasn't received any data yet)

name    live
group   (nogroup)
tcreate Thu Dec  4 13:55:00 2014
tstart  Thu Dec  4 13:55:00 2014
tend    Thu Dec  4 15:00:00 2014
updated Thu Dec  4 15:00:00 2014
expire  120 days 0 hours
size    103.0 KB
maxsize 10.0 GB
type    live
locked  0
status  OK
version 130
channel fprobe  sign: + colour: #00ff00 order: 1        sourcelist: fprobe      
Files: 13       Size: 105472    
channel darknet sign: + colour: #0000ff order: 2        sourcelist: darknet     
ERR Error reading channel stat information. Missing key 'first'
        Files: 0        Size: 0 



That missing key 'first' sounds related to another bug, the failures when a 
profile doesn't have either a valid expiration period or maximum size, by the 
way.


- nfexpire freezes when it finds one of these invalid rrd files, keeping the 
profile locked. In my case it was the live profile.


Nfprofile should have a clean exit unlocking the profile properly when it finds 
a troublesome rrd. Otherwise it can lead to a deadlock. For
example, in my case, the first symptom was the  inability to stop nfsen 
orderly, with the nfcapd processes deadlocked with nfexpire. Killing nfexpire 
manually
would let them go on.


This issue can be especially problematic if you create a source that receives 
very little data. My "darknet" source, for example, is a softflowd process 
which sends
flows in large periods, something between 10 and 15 minutes (I know, it's 
wrong, but there's still a bug in nfsen). So, in this scenario, nfexpire will 
hang in each 5 minute period,
unable to progress until you kill it manually.  Indeed, when I have realized 
what was happening, killing nfexpire manually a couple of times has given it 
time to add
data to the empty channel and "solved" the problem.


Does it ring a bell?









Borja.




------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Nfsen-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfsen-discuss

Reply via email to