Ah, that's possible, but I thought it would be more random in that case.  Can 
collectl use the extended counters instead?  They seem to be 64bit

[root@c10-13 ~]# perfquery -x
# Port extended counters: Lid 134 port 1 (CapMask: 0x1400)
PortSelect:......................1
CounterSelect:...................0x0000
PortXmitData:....................2275893875846
PortRcvData:.....................2270798551080
PortXmitPkts:....................5294870538
PortRcvPkts:.....................5310452204
PortUnicastXmitPkts:.............0
PortUnicastRcvPkts:..............0
PortMulticastXmitPkts:...........0
PortMulticastRcvPkts:............0

r.

________________________________
From: Mark Seger [[email protected]]
Sent: Tuesday, November 26, 2013 3:04 PM
To: Dragseth Roy Einar
Cc: [email protected]
Subject: Re: [Collectl-interest] collectl disagrees with itself regarding 
infiniband bandwidth.

I just checked in with some colleagues and was reminded what I forgot - the way 
IB deals with numbers is really poor!  they use narrow, non-wrapping counters 
so if you're running at high rates and don't collect them frequently enough, 
you'll lose data.  The faster the IB, the faster to have to read them.  At FDR 
speeds, you'll need to read the counters every 2-3 seconds or you'll lose data! 
 What rate are you collecting at?  The default of 10 seconds?  Try running 
interactively with -i10 and you'll probably lose there too.  ;(

The 'smoking gun' in your data seems to be the packet rates which are reported 
correctly since they're smaller numbers.

-mark


On Tue, Nov 26, 2013 at 8:34 AM, Mark Seger 
<[email protected]<mailto:[email protected]>> wrote:
re playback: can you try it out?  even if only for a little bit?  I'm guessing 
you'll see the problem there as well but I'd really like to understand what is 
happening.  All you need to do is add "--rawtoo -f/tmp" to the DaemonCommands 
in your /etc/collectl.conf, restart collectl and it will write a raw file to 
/tmp.  Then, if you can run your tests and save the interactive output with 
timestamps - include -oT, you should be able to play back the data with 
'collectl -p file -sx -oT' from the raw file and see almost identical numbers 
OR 1/2 the values.  Just remember to reset /etc/collectl.conf when you're done.

It would provide a useful data point.  You can even play back the data with 
--export lexpr.

Meanwhile we can try to reproduce what you're seeing.  Actually have you seen 
this with earlier versions of collectl?  I haven't touch the IB code in years, 
at least I don't remember doing so, but I have touched lexpr, that's why it's 
important to try and understand where the actual problem lies.

-mark


On Tue, Nov 26, 2013 at 8:09 AM, Dragseth Roy Einar 
<[email protected]<mailto:[email protected]>> wrote:
Hi Mark.  Yes, its been a while...

I must admit I have never used playback mode so I do not know.  We do not have 
any .raw files produced by collectl.

r.

________________________________
From: Mark Seger [[email protected]<mailto:[email protected]>]
Sent: Tuesday, November 26, 2013 1:52 PM
To: Dragseth Roy Einar
Cc: 
[email protected]<mailto:[email protected]>
Subject: Re: [Collectl-interest] collectl disagrees with itself regarding 
infiniband bandwidth.

hi roy - long time no chat...

This is indeed an interesting one I haven't seen.  Just to be clear, because 
you said it reports half as a daemon when using lexpr.  Does it also record 1/2 
as a daemon and playback as 1/2 w/o lexpr?

-mark


On Tue, Nov 26, 2013 at 4:01 AM, Roy Dragseth 
<[email protected]<mailto:[email protected]>> wrote:
Collectl seems to disagree with itself when reporting infiniband bandwidth
usage.

I'm running a bandwidth benchmark that reports appr. 7 GB/s bidirectional
bandwidth on our QDR infiniband network:

Benchmark exchange(MPI_Sendrecv)
================================
        lenght     iterations   elapsed time  transfer rate        latency
       (bytes)        (count)      (seconds)     (Mbytes/s)         (usec)
--------------------------------------------------------------------------
      12582912           8578         30.626         7048.6         1785.2


Running collectl interactively shows approximately the same

[root@c10-13 etc]# collectl -s x
Couldn't find 'ofed_info'.  Won't be able to determine OFED version
waiting for 1 second sample...
#<-----------InfiniBand----------->
#   KBIn  PktIn   KBOut PktOut Errs
 3472553  1717K 3472483  1717K    0
 3472962  1717K 3472977  1717K    0
 3472570  1717K 3472629  1717K    0
 3470588  1716K 3470598  1716K    0
 3472094  1717K 3472105  1717K    0
 3471221  1716K 3471156  1716K    0
 3472378  1717K 3472409  1717K    0

But if I run it as a daemon, with this addition to DaemonCommands in
collectl.conf, -P --export lexpr,f=/tmp/L, (*) it only reports half the
bandwidth usage

[root@c10-13 etc]# grep iconnect /tmp/L
iconnect.kbin 1677721
iconnect.pktin 1722455
iconnect.kbout 1677721
iconnect.pktout 1722455


Is this a bug?  Any workarounds?
The test was done with collectl 3.6.9.


* I use this to report infiniband traffic in ganglia,
https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia




--

  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
              phone:+47 77 64 41 07<tel:%2B47%2077%2064%2041%2007>, fax:+47 77 
64 41 00<tel:%2B47%2077%2064%2041%2000>
        Roy Dragseth, Team Leader, High Performance Computing
         Direct call: +47 77 64 62 56<tel:%2B47%2077%2064%2062%2056>. email: 
[email protected]<mailto:[email protected]>


------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Collectl-interest mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/collectl-interest


------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Collectl-interest mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/collectl-interest



------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest

Reply via email to