so while I haven't been able to repeat what you've reported I did find a
few bugs with playing back raw files in plot mode, so this has been a good
thing.  the biggest challenge is there are a lot of switch combinations in
native collectl and tossing colmux into the mix makes it even more
complicated, especially when you fear breaking something that already
works, but I think I've figure it out.  The other complication is the lack
of testing as I often feel like I'm the only one who uses some of the more
obscure, but useful, features.  Good to see you doing so too and if you
haven't yet tried playing back files across multiple machines I think
you'll discover a whole new power.  ;)
-mark

On Thu, Jun 16, 2016 at 8:30 AM, Mark Seger <[email protected]> wrote:

> Wow, that's a tricky one.  quite honestly colmux has been so solid for me
> I haven't looked at the code in ages, but that doesn't mean anything
> either.  It's also amusing to note I had totally forgotten it supported the
> hostname address syntax you're using.  ;)  That allowed me to essentially
> use the same command you are, with one note.  I also added -test and see
> columns 10 and 20 are different than you're saying.  maybe you have a
> different kernel?  I'm on 4.4.7-1-amd64-hpelinux which is the linux we use
> for our Helion Cloud and is essentially debian as well.
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P" -cols 10,20
>
>          [CPU:0]Idle%                  [CPU:1]Soft%
> #Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
> 12:08:27     -1     -1     -1 |      -1     -1     -1
> 12:08:28     -1     -1     -1 |      -1     -1     -1
> 12:08:29     95     -1    100 |       0     -1      0
> 12:08:30     95     97     98 |       0      0      0
> 12:08:31     97    100    100 |       0      0      0
> 12:08:32     87    100     89 |       0      0      0
> 12:08:33    100    100    100 |       0      0      0
> 12:08:34    100    100     99 |       0      0      0
> 12:08:35    100     97     97 |       0      0      0
> 12:08:36     99     98    100 |       0      0      0
>
> What you didn't say is does this fail all the time or intermittently.  If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
> Have you tried playing back a file with colmux yet?  If not, you can
> simply rerun the command but include -p and point it to the raw files.  The
> one thing I did discover is I think I introduced a bug some time in the
> past and you need to have the hostname portion of the string start with a
> wild card rather than anywhere in the middle.  And then to make matters
> worse I found a second bug and am using the wrong column during playback.
>  more digging into that required too.  ;(
>
> BUT if I add 1 to each column I think this looks right if you ignore what
> the headers say:
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P -p
> '/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
>
>          [CPU:0]Totl%                  [CPU:1]Steal%
> #Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
>      99     99    100 |       0      0      0
>      98     99     97 |       0      0      0
>      94     98     94 |       0      0      0
>      94     93     92 |       0      0      0
>      99     94     98 |       0      0      0
>      99    100     99 |       0      0      0
>      99    100    100 |       0      0      0
>
> and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one].  then, maybe I can track down why this is happening.
>
> -mark
>
>
>
>
>
>
> On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <
> [email protected]> wrote:
>
>> Hello,
>>
>> We are trying to gather detailed CPU usage from a number of machines in
>> our cluster. In particular, we want to see usage of every individual CPU in
>> a group of machines.
>>
>> With collectl, on a single machine, the command we can run is:
>>
>>    collectl -sC -oT -P
>>
>> Which gives us 282 columns (the machines have 28 CPU's).
>>
>> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
>> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
>> "[CPU:1]Idle%"). The command we use is:
>>
>>    colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>>
>> This generates the error:
>>
>>    Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>>
>> The error occurs when parsing the field "lasttime" of a data structure
>> $hostVars, which has the following content at the time of the error:
>>
>> {
>>           'lasttime' => [
>>                           '',
>>                           '20160615'
>>                         ],
>>           'maxinst' => [
>>                          -1,
>>                          0
>>                        ],
>>           'lastinst' => [
>>                           -1,
>>                           0
>>                         ],
>>           'bufptr' => 1
>> };
>>
>> I am currently running version "collectl V3.6.9-1
>> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
>> here?
>>
>>
>> Thanks in advance,
>>
>> Hernan
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.
>> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
>> _______________________________________________
>> Collectl-interest mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>
>>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest

Reply via email to