Hello Mark,

I downloaded the latest version from Sourceforge and it seems to fix these
issues, even with RAW files generated by the (older?) version available on
Debian. I will use this version going forward, we can declare the problem
resolved.

Thanks for your help,

Hernan


On Thu, Jun 16, 2016 at 5:30 AM, Mark Seger <[email protected]> wrote:

> Wow, that's a tricky one.  quite honestly colmux has been so solid for me
> I haven't looked at the code in ages, but that doesn't mean anything
> either.  It's also amusing to note I had totally forgotten it supported the
> hostname address syntax you're using.  ;)  That allowed me to essentially
> use the same command you are, with one note.  I also added -test and see
> columns 10 and 20 are different than you're saying.  maybe you have a
> different kernel?  I'm on 4.4.7-1-amd64-hpelinux which is the linux we use
> for our Helion Cloud and is essentially debian as well.
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P" -cols 10,20
>
>          [CPU:0]Idle%                  [CPU:1]Soft%
> #Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
> 12:08:27     -1     -1     -1 |      -1     -1     -1
> 12:08:28     -1     -1     -1 |      -1     -1     -1
> 12:08:29     95     -1    100 |       0     -1      0
> 12:08:30     95     97     98 |       0      0      0
> 12:08:31     97    100    100 |       0      0      0
> 12:08:32     87    100     89 |       0      0      0
> 12:08:33    100    100    100 |       0      0      0
> 12:08:34    100    100     99 |       0      0      0
> 12:08:35    100     97     97 |       0      0      0
> 12:08:36     99     98    100 |       0      0      0
>
> What you didn't say is does this fail all the time or intermittently.  If
> intermittent it will indeed be hard to track down, but there is hope too ;)
>
> Have you tried playing back a file with colmux yet?  If not, you can
> simply rerun the command but include -p and point it to the raw files.  The
> one thing I did discover is I think I introduced a bug some time in the
> past and you need to have the hostname portion of the string start with a
> wild card rather than anywhere in the middle.  And then to make matters
> worse I found a second bug and am using the wrong column during playback.
>  more digging into that required too.  ;(
>
> BUT if I add 1 to each column I think this looks right if you ignore what
> the headers say:
>
> stack@cd-cp1-c1-m1-mgmt:~$ ~/colmux.pl -addr cd-cp1-swobj000[1-3]-mgmt
> -command "-sC -oT -P -p
> '/var/cache/collectl/*-mgmt-20160616-110000.raw.gz'" -cols 11,21|more
>
>          [CPU:0]Totl%                  [CPU:1]Steal%
> #Time    1-mgmt 2-mgmt 3-mgmt |  1-mgmt 2-mgmt 3-mgmt
>      99     99    100 |       0      0      0
>      98     99     97 |       0      0      0
>      94     98     94 |       0      0      0
>      94     93     92 |       0      0      0
>      99     94     98 |       0      0      0
>      99    100     99 |       0      0      0
>      99    100    100 |       0      0      0
>
> and since this is a playback command, you can use time ranges as well to
> limit what is being displayed so I may help zero in on where in the data
> the problem is and then maybe even send me a subset of the problem raw file
> [use collectl --extract to create a new raw from from the time slice of an
> old one].  then, maybe I can track down why this is happening.
>
> -mark
>
>
>
>
>
>
> On Wed, Jun 15, 2016 at 8:35 PM, Hernan Laffitte <
> [email protected]> wrote:
>
>> Hello,
>>
>> We are trying to gather detailed CPU usage from a number of machines in
>> our cluster. In particular, we want to see usage of every individual CPU in
>> a group of machines.
>>
>> With collectl, on a single machine, the command we can run is:
>>
>>    collectl -sC -oT -P
>>
>> Which gives us 282 columns (the machines have 28 CPU's).
>>
>> Now we want to run a colmux command to see the idle time of CPU's 0 and 1
>> on 3 machines. This is columns 10 and 20 ("[CPU:0]Idle%" and
>> "[CPU:1]Idle%"). The command we use is:
>>
>>    colmux -addr 'machine-[1-3]' -command "-sC -oT -P" -cols 10,20
>>
>> This generates the error:
>>
>>    Minute '60' out of range 0..59 at /usr/bin/colmux line 1699.
>>
>> The error occurs when parsing the field "lasttime" of a data structure
>> $hostVars, which has the following content at the time of the error:
>>
>> {
>>           'lasttime' => [
>>                           '',
>>                           '20160615'
>>                         ],
>>           'maxinst' => [
>>                          -1,
>>                          0
>>                        ],
>>           'lastinst' => [
>>                           -1,
>>                           0
>>                         ],
>>           'bufptr' => 1
>> };
>>
>> I am currently running version "collectl V3.6.9-1
>> (zlib:2.06,HiRes:1.9725)" on Debian. Any idea of what may be the problem
>> here?
>>
>>
>> Thanks in advance,
>>
>> Hernan
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports.
>> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
>> _______________________________________________
>> Collectl-interest mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/collectl-interest
>>
>>
>
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Collectl-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/collectl-interest

Reply via email to