Ah, that's possible, but I thought it would be more random in that case. Can
collectl use the extended counters instead? They seem to be 64bit
[root@c10-13 ~]# perfquery -x
# Port extended counters: Lid 134 port 1 (CapMask: 0x1400)
PortSelect:......................1
CounterSelect:...................0x0000
PortXmitData:....................2275893875846
PortRcvData:.....................2270798551080
PortXmitPkts:....................5294870538
PortRcvPkts:.....................5310452204
PortUnicastXmitPkts:.............0
PortUnicastRcvPkts:..............0
PortMulticastXmitPkts:...........0
PortMulticastRcvPkts:............0
r.
________________________________
From: Mark Seger [[email protected]]
Sent: Tuesday, November 26, 2013 3:04 PM
To: Dragseth Roy Einar
Cc: [email protected]
Subject: Re: [Collectl-interest] collectl disagrees with itself regarding
infiniband bandwidth.
I just checked in with some colleagues and was reminded what I forgot - the way
IB deals with numbers is really poor! they use narrow, non-wrapping counters
so if you're running at high rates and don't collect them frequently enough,
you'll lose data. The faster the IB, the faster to have to read them. At FDR
speeds, you'll need to read the counters every 2-3 seconds or you'll lose data!
What rate are you collecting at? The default of 10 seconds? Try running
interactively with -i10 and you'll probably lose there too. ;(
The 'smoking gun' in your data seems to be the packet rates which are reported
correctly since they're smaller numbers.
-mark
On Tue, Nov 26, 2013 at 8:34 AM, Mark Seger
<[email protected]<mailto:[email protected]>> wrote:
re playback: can you try it out? even if only for a little bit? I'm guessing
you'll see the problem there as well but I'd really like to understand what is
happening. All you need to do is add "--rawtoo -f/tmp" to the DaemonCommands
in your /etc/collectl.conf, restart collectl and it will write a raw file to
/tmp. Then, if you can run your tests and save the interactive output with
timestamps - include -oT, you should be able to play back the data with
'collectl -p file -sx -oT' from the raw file and see almost identical numbers
OR 1/2 the values. Just remember to reset /etc/collectl.conf when you're done.
It would provide a useful data point. You can even play back the data with
--export lexpr.
Meanwhile we can try to reproduce what you're seeing. Actually have you seen
this with earlier versions of collectl? I haven't touch the IB code in years,
at least I don't remember doing so, but I have touched lexpr, that's why it's
important to try and understand where the actual problem lies.
-mark
On Tue, Nov 26, 2013 at 8:09 AM, Dragseth Roy Einar
<[email protected]<mailto:[email protected]>> wrote:
Hi Mark. Yes, its been a while...
I must admit I have never used playback mode so I do not know. We do not have
any .raw files produced by collectl.
r.
________________________________
From: Mark Seger [[email protected]<mailto:[email protected]>]
Sent: Tuesday, November 26, 2013 1:52 PM
To: Dragseth Roy Einar
Cc:
[email protected]<mailto:[email protected]>
Subject: Re: [Collectl-interest] collectl disagrees with itself regarding
infiniband bandwidth.
hi roy - long time no chat...
This is indeed an interesting one I haven't seen. Just to be clear, because
you said it reports half as a daemon when using lexpr. Does it also record 1/2
as a daemon and playback as 1/2 w/o lexpr?
-mark
On Tue, Nov 26, 2013 at 4:01 AM, Roy Dragseth
<[email protected]<mailto:[email protected]>> wrote:
Collectl seems to disagree with itself when reporting infiniband bandwidth
usage.
I'm running a bandwidth benchmark that reports appr. 7 GB/s bidirectional
bandwidth on our QDR infiniband network:
Benchmark exchange(MPI_Sendrecv)
================================
lenght iterations elapsed time transfer rate latency
(bytes) (count) (seconds) (Mbytes/s) (usec)
--------------------------------------------------------------------------
12582912 8578 30.626 7048.6 1785.2
Running collectl interactively shows approximately the same
[root@c10-13 etc]# collectl -s x
Couldn't find 'ofed_info'. Won't be able to determine OFED version
waiting for 1 second sample...
#<-----------InfiniBand----------->
# KBIn PktIn KBOut PktOut Errs
3472553 1717K 3472483 1717K 0
3472962 1717K 3472977 1717K 0
3472570 1717K 3472629 1717K 0
3470588 1716K 3470598 1716K 0
3472094 1717K 3472105 1717K 0
3471221 1716K 3471156 1716K 0
3472378 1717K 3472409 1717K 0
But if I run it as a daemon, with this addition to DaemonCommands in
collectl.conf, -P --export lexpr,f=/tmp/L, (*) it only reports half the
bandwidth usage
[root@c10-13 etc]# grep iconnect /tmp/L
iconnect.kbin 1677721
iconnect.pktin 1722455
iconnect.kbout 1677721
iconnect.pktout 1722455
Is this a bug? Any workarounds?
The test was done with collectl 3.6.9.
* I use this to report infiniband traffic in ganglia,
https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia
--
The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
phone:+47 77 64 41 07<tel:%2B47%2077%2064%2041%2007>, fax:+47 77
64 41 00<tel:%2B47%2077%2064%2041%2000>
Roy Dragseth, Team Leader, High Performance Computing
Direct call: +47 77 64 62 56<tel:%2B47%2077%2064%2062%2056>. email:
[email protected]<mailto:[email protected]>
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Collectl-interest mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/collectl-interest
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Collectl-interest mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/collectl-interest
------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing
conversations that shape the rapidly evolving mobile landscape. Sign up now.
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest