Hi,

 btw. this is the bug that REMOVE_BOGUS_SPIKES is/was supposed to fix:

https://bugzilla.redhat.com/show_bug.cgi?id=515274

Cheers
Martin

----- Original Message ----

> From: Martin Knoblauch <kn...@knobisoft.de>
> To: 左扬 <weichon...@gmail.com>; ganglia-developers@lists.sourceforge.net
> Sent: Wed, April 28, 2010 6:32:32 PM
> Subject: Re: [Ganglia-developers] bogus spikes of network_report, is that a 
> bug on the kernel?
> 
> Hi,

can you tell us which NIC you are using (/sbin/lspci) and which 
> version of the driver? When I wrote that REMOVE_BOGUS_SPIKES hack, it was 
> because of a HW/FW problem in certain Broadcom devices. It was supposed to be 
> fixed after kernel 2.6.9.

The debug output from gmond suggests the 
> overflow coming from the bytes_out counter (BO).

And you are right, just 
> lowering the thresholds is not useful in 
> general.

Cheers
Martin

>
>From: 左扬 <
> ymailto="mailto:weichon...@gmail.com"; 
> href="mailto:weichon...@gmail.com";>weichon...@gmail.com>
>To: 
> ymailto="mailto:ganglia-developers@lists.sourceforge.net"; 
> href="mailto:ganglia-developers@lists.sourceforge.net";>ganglia-developers@lists.sourceforge.net
>Sent: 
> Wed, April 28, 2010 1:48:58 PM
>Subject: [Ganglia-developers] bogus spikes 
> of network_report, is that a bug on the kernel?
>
>hello dear 
> all~
>
>we use the ganglia to generate the network traffic report, 
> 
>
>bu because of the bogus spikes up to 400p, I can see 
> nothing...(as the graph in the attachment, i modified the 
> graph.d/network_report.php, change the unit from bytes/s to bits/s 
> )
>
>and I read the code and then made some tests for 
> days
>
>
>in the libmetrics/linux/metrics.c:line 287, there is 
> a switch, so i re-make ganglia with CFLAGS=DREMOVE_BOGUS_SPIKES, and restart 
> the 
> gmond,
>
>after days, i found there were still spkes (about 
> 4T)
>
>so I have to change the Line 292 from 
>
>if 
> ((l_bin > 1.0e13) || (l_bout > 1.0e13) 
> ||
>
>to
>
>if ((l_bin > 2.5e8) || (l_bout > 
> 2.5e8) ||  /* 2Gbps , there are 2 gigabit NIC on our 
> server)
>>          
>to avoid the 
> spikes.
>
>I think that is not a good idea, the others may use the 
> faster NIC, and then I added some code in the update_ifdata() to log the 
> contents of '/proc/net/dev '(value of 
> proc_net_dev.buffer)
>
>
>logs from 
> /var/log/message:
>Apr 27 23:19:13 hostname 
> /opt/ganglia/sbin/gmond[18465]:
>update_ifdata(BO) - Overflow in rbo: 
> 304634803029227 -> 630666266
>[1272381553] 
>>Apr 27 23:20:13 
> hostname /opt/ganglia/sbin/gmond[18465]:
>update_ifdata(BO) - Overflow in 
> rbi: 10458900526801464705 ->
>38016437180368 [1272381613] 
> 
>>Apr 27 23:20:13 hostname 
> /opt/ganglia/sbin/gmond[18465]:
>update_ifdata(BO) - Overflow in rpo: 
> 219388676028 -> 219365592250
>[1272381613] 
>
>
>logs 
> for the /proc/net/dev
>
>>------------------ 1272381433.117603 
> -----------------
>>Inter-|   Receive        
>                       
>                   |  
> Transmit
>>face |bytes    packets errs drop fifo frame 
> compressed multicast|bytes    packets errs drop fifo colls carrier 
> compressed
>>lo:3143390051 39831988    0    0  
>   0     0          0      
>    0 3143390051 39831988    0    0    0  
>    0       0          
> 0
>>tunl0:       0       0    
> 0    0    0     0        
>   0         0        0    
>    0    0    0    0     0  
>      0          
> 0
>>eth0:38015520377153 135587033135    0 8587116  
>   0     0          0      
>    6 304631801519418 219359254753    0    0  
>   0     0       0        
>   0
>>eth1:       0       0  
>   0    0    0     0      
>     0         0        0  
>      0    0    0    0    
> 0       0          
> 0
>
>>------------------ 1272381493.118502 
> -----------------
>>Inter-|   Receive        
>                       
>                   |  
> Transmit
>>face |bytes    packets errs drop fifo frame 
> compressed multicast|bytes    packets errs drop fifo colls carrier 
> compressed
>>lo:3143407797 39832216    0    0  
>   0     0          0      
>    0 3143407797 39832216    0    0    0  
>    0       0          
> 0
>>tunl0:       0       0    
> 0    0    0     0        
>   0         0        0    
>    0    0    0    0     0  
>      0          
> 0
>>eth0:38015973907827 135588437010    0 8587116  
>   0     0          0      
>    6 304634803029227 219361451245    0    0  
>   0     0       0        
>   0
>>eth1:       0       0  
>   0    0    0     0      
>     0         0        0  
>      0    0    0    0    
> 0       0          
> 0
>
>>------------------ 1272381553.121013 
> -----------------
>>Inter-|   Receive        
>                       
>                   |  
> Transmit
>>face |bytes    packets errs drop fifo frame 
> compressed multicast|bytes    packets errs drop fifo colls carrier 
> compressed
>>lo:3143407797 39832216    0    0  
>   0     0          0      
>    0 3143407797 39832216    0    0    0  
>    0       0          
> 0
>>tunl0:       0       0    
> 0    0    0     0        
>   0         0        0    
>    0    0    0    0     0  
>      0          
> 0
>>eth0:10458900526801464705 135564674293    0 8587116  
>   0     0          0 219363599555 
> 630666266 219388676028 7723    0    0     0  
>   7723          0
>>eth1:    
>    0       0    0    0    
> 0     0          0        
> 0        0       0    0  
>   0    0     0       0    
>       0
>
>>------------------ 1272381613.123535 
> -----------------
>>Inter-|   Receive        
>                       
>                   |  
> Transmit
>>face |bytes    packets errs drop fifo frame 
> compressed multicast|bytes    packets errs drop fifo colls carrier 
> compressed
>>lo:3143444605 39832676    0    0  
>   0     0          0      
>    0 3143444605 39832676    0    0    0  
>    0       0          
> 0
>>tunl0:       0       0    
> 0    0    0     0        
>   0         0        0    
>    0    0    0    0     0  
>      0          
> 0
>>eth0:38016437180368 135590918375    0 8587116  
>   0     0          0      
>    6 304640653909921 219365592250    0    0  
>   0     0       0        
>   0
>>eth1:       0       0  
>   0    0    0     0      
>     0         0        0  
>      0    0    0    0    
> 0       0          0
>
>the 
> value at 1272381493 is ok, but the value at 1272381553 is abnormal, and then 
> the 
> value at 1272381613 recovered .
>
>I don't think this is caused by a 
> HW error, it seems a bug on the kernel. (we're using 2.6.20-pm and 
> 2.6.9-34.ELsmp, both are x86_64)
>
>but i don't know much about the 
> kernel... so is there anyone to confirm 
> ?
>
>thanks.
>
>-- 
> 
>墙角数枝梅,凌寒独自开。
>遥知不是雪,为有暗香来。
>

------------------------------------------------------------------------------
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to