Brian
I agree with your that there's something wrong with sessions. However sessions.c:343 contains something different from what you reported. Can you please send me the source code round line 343 so I can see what you mean?

Thanks Luca

On 11/05/2011 08:09 PM, Brian Behrens wrote:
No problem,

I did some more work on this and found that line 343 in sessions.c is the 
culprit.   Basically here is a breakdown of whats happening.

That line attempts to free a memory at a pointer at the address specified by 
sessionToPurge->session_info.  When you dump what is in the address pointed to 
session_info, it contains 0xffffffff.   Since this is not a NULL value, it 
attempts to free the memory at that address which is out of bounds and causes a 
segfault.

So, in perspective, its most likely trying to free memory that has already been 
freed.   The question becomes why is the code thinking there is still a valid 
memory address at that pointer?   I think I have an idea on why that might be,  
I started watching the session counters and even though I have specified an 
upper limit of 65536 sessions, I can see the count does actually get this high. 
 When the count gets that high, it clears and starts over.   Now, I have not 
investigated on what actually transpires when this reset occurs, but my guess 
is that it still thinks that there are sessions that need to be purged that 
have already been purged by the clearing.

I have also noticed that once that bound is reached, the count seems to stay 
around 14k sessions.  The ESX server I am running this on has 98Gb of memory, 
so memory constraints are not really a concern, this might be just tuning the 
max sessions to tolerate enough sessions so that the purge cycle that is 
supposed to purge these idle sessions can do its job effectively.

I would think that this might be occurring on the lower load networks as the 
DEFAULT_NTOP_MAX_NUM_SESSIONS is set lower, and thus the limit might also being 
reached and causing the clear routine, and the segfault as the use of 
0xffffffff is implemented in various places and could easily be stored in many 
memory locations.

So, I might try to work around this by elevating the 
DEFAULT_NTOP_MAX_NUM_SESSIONS to see if that helps out.   Also, taking a deeper 
look at what happens when this bound is reached might be productive for me to 
understand to help eliminate this.

I hope this helps out some as I have seen similar postings to this in the 
threads.

--Brian
________________________________________
From: [email protected] 
[[email protected]] on behalf of Luca Deri [[email protected]]
Sent: Saturday, November 05, 2011 6:22 AM
To: [email protected]
Subject: Re: [Ntop-misc] Easily Reproducable Segfaults

Brian
thanks for your report. I do not have the ability to reproduce the crash you 
reported using the code in SVN (this is the only version I can support). Can 
you please crash ntop, generate a core and analyze it a bit so that I can 
understand where the problem could be? Before doing that, please resync with 
SVN.

Thanks for your support Luca

On Nov 4, 2011, at 5:09 PM, Brian Behrens wrote:

Hello,

I have been working for days trying to resolve a segfault issue like the 
following:

Nov  4 10:46:54 NTOP-SC kernel: ntop[25479]: segfault at 645 ip 
00007f95f3cf3395 sp 00007f95e9b75ae8 error 6 in 
libntop-4.1.1.so[7f95f3cb9000+56000]

The environment is an ESX 5 VM.

Guest OS I have tried:

1. CentOS 6
2. Fedora 15
3. Network Security Toolkit (uses 4865 of the current dev tree)

Versions I have tried:

1. Current dev tree.
2. Current stable version (4.1.0)

The times variate on where these faults occur, but it is relevant to network 
load factors.

My test networks:

1. Simple home network with all packets going to NTOP.
2. High load work network that can see 25 Gig in 15 mins.

The most stable I have seen is a clean CentOS install, build ntop from trunk 
tree, install and run.

The quickest segfault I can obtain is when I implement PF_RING, use a e1000 
card in the vm, and use the pf_ring aware e1000 driver.   Can get a segfault 
usually within 30 mins on the busy network.

The common theme is the segfaulting.  I did attempt a gdb on the device one 
time and saw a malloc issue, but all these VMs have 4GB memory and I have tried 
tuning different hash sizes to see how this impacts the issue, but it really 
never does.  Use smaller hash values, and I get more messages of low memory, 
etc.

I am really not sure what else to do, if there is anything I can do to present 
more information, please let me know as I would like to stop this incessant 
segfaulting.






_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc
---
We can't solve problems by using the same kind of thinking we used when we 
created them - Albert Einstein

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc
_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Reply via email to