Re: ravirin down

2016-10-31 Thread James Clarke
The tool works by each server sending an HTTP PUT every 5 minutes, so it’s
possible it was dying and a login shell was too much to ask for; odd though.
Anyway, back up now.

James

> On 31 Oct 2016, at 14:20, rod  wrote:
> 
> I tried to get a prompt, which I usually do when I see seg fault errors,
> and it wouldn't return one.  Maybe it's a ghost.  Today's the day for them.
> 
> Rod
> 
> On 10/31/2016 9:18 AM, James Clarke wrote:
>> Hi Rod,
>> Don’t worry; those are all down to me (a kernel module to dump vmap_lazy_nr
>> and friends, and the DEBUG ones are because I added a printk to see if a
>> particular condition is reached). Then of course the tst_qtgraphical segfault
>> is just a normal user-space SIGSEGV.
>> 
>> I’m curious as to whether raverin actually needed rebooting; my monitoring
>> tool says it was still up 7 minutes ago, but not 2 minutes later at the
>> next refresh (presumably because you’ve rebooted). Being idle is expected;
>> there are no packages queued for building.
>> 
>> Thanks,
>> James
>> 
>>> On 31 Oct 2016, at 14:12, rod  wrote:
>>> 
>>> Good morning,
>>> 
>>> I saw ravirin was not building again so in the process of rebooting him
>>> I found this snipet.
>>> 
>>> [ 1208.832559] vln_init: lazy_max_pages() is 4096
>>> [ 1208.918323] vln_init: vmap_lazy_nr is caeb1c
>>> [ 1208.974405] vln_init: *vmap_lazy_nr is 145
>>> [ 1209.030340] vln_init: lazy_max_pages() is 4096
>>> [ 1209.294257] vln_init: vmap_lazy_nr is caeb1c
>>> [ 1209.350412] vln_init: *vmap_lazy_nr is 174
>>> [ 1209.406334] vln_init: lazy_max_pages() is 4096
>>> [ 1209.834246] vln_init: vmap_lazy_nr is caeb1c
>>> [ 1209.890347] vln_init: *vmap_lazy_nr is 203
>>> [ 1209.946274] vln_init: lazy_max_pages() is 4096
>>> [ 1210.553105] vln_init: vmap_lazy_nr is caeb1c
>>> [ 1210.609279] vln_init: *vmap_lazy_nr is 232
>>> [ 1210.665157] vln_init: lazy_max_pages() is 4096
>>> [ 1211.349159] vln_init: vmap_lazy_nr is caeb1c
>>> [ 1211.405310] vln_init: *vmap_lazy_nr is 261
>>> [ 1211.461203] vln_init: lazy_max_pages() is 4096
>>> [24537.258928] TSB[kworker/0:5:5958]: DEBUG flush_tsb_kernel_range
>>> start=1000c000 end=f000 PAGE_SIZE=2000
>>> [24537.404378] TSB[kworker/0:5:5958]: DEBUG flush_tsb_kernel_range
>>> start=0001 end=0001020d6000 PAGE_SIZE=2000
>>> [217632.451540] TSB[kworker/0:1:17055]: DEBUG flush_tsb_kernel_range
>>> start=00014000 end=000102126000 PAGE_SIZE=2000
>>> [230284.512421] tst_qtgraphical[476]: segfault at 0 ip fff1013e8644
>>> (rpc fff100841884) sp 07feffc275e1 error 30001 in
>>> libc-2.24.so[fff10135c000+15e000]
>>> 
>>> Not sure what it means nor what I can do to help fix it.  Thoughts.
>>> 
>>> Rod
>>> 
>>> 
>>> On 10/27/2016 7:07 AM, James Clarke wrote:
 Hi Rod,
 It seems ravirin has been down for at least the past few days. Could you 
 please
 give it some love? If it’s crashed/hung with CPU lockups, I’ve built a 4.9
 kernel with a patch to fix this which I’d like to test.
 
 Thanks,
 James
 
>> 



Re: ravirin down

2016-10-31 Thread rod
Good morning,

I saw ravirin was not building again so in the process of rebooting him
I found this snipet.

[ 1208.832559] vln_init: lazy_max_pages() is 4096
[ 1208.918323] vln_init: vmap_lazy_nr is caeb1c
[ 1208.974405] vln_init: *vmap_lazy_nr is 145
[ 1209.030340] vln_init: lazy_max_pages() is 4096
[ 1209.294257] vln_init: vmap_lazy_nr is caeb1c
[ 1209.350412] vln_init: *vmap_lazy_nr is 174
[ 1209.406334] vln_init: lazy_max_pages() is 4096
[ 1209.834246] vln_init: vmap_lazy_nr is caeb1c
[ 1209.890347] vln_init: *vmap_lazy_nr is 203
[ 1209.946274] vln_init: lazy_max_pages() is 4096
[ 1210.553105] vln_init: vmap_lazy_nr is caeb1c
[ 1210.609279] vln_init: *vmap_lazy_nr is 232
[ 1210.665157] vln_init: lazy_max_pages() is 4096
[ 1211.349159] vln_init: vmap_lazy_nr is caeb1c
[ 1211.405310] vln_init: *vmap_lazy_nr is 261
[ 1211.461203] vln_init: lazy_max_pages() is 4096
[24537.258928] TSB[kworker/0:5:5958]: DEBUG flush_tsb_kernel_range
start=1000c000 end=f000 PAGE_SIZE=2000
[24537.404378] TSB[kworker/0:5:5958]: DEBUG flush_tsb_kernel_range
start=0001 end=0001020d6000 PAGE_SIZE=2000
[217632.451540] TSB[kworker/0:1:17055]: DEBUG flush_tsb_kernel_range
start=00014000 end=000102126000 PAGE_SIZE=2000
[230284.512421] tst_qtgraphical[476]: segfault at 0 ip fff1013e8644
(rpc fff100841884) sp 07feffc275e1 error 30001 in
libc-2.24.so[fff10135c000+15e000]

Not sure what it means nor what I can do to help fix it.  Thoughts.

Rod


On 10/27/2016 7:07 AM, James Clarke wrote:
> Hi Rod,
> It seems ravirin has been down for at least the past few days. Could you 
> please
> give it some love? If it’s crashed/hung with CPU lockups, I’ve built a 4.9
> kernel with a patch to fix this which I’d like to test.
> 
> Thanks,
> James
> 



Re: ravirin down

2016-10-31 Thread James Clarke
Hi Rod,
Don’t worry; those are all down to me (a kernel module to dump vmap_lazy_nr
and friends, and the DEBUG ones are because I added a printk to see if a
particular condition is reached). Then of course the tst_qtgraphical segfault
is just a normal user-space SIGSEGV.

I’m curious as to whether raverin actually needed rebooting; my monitoring
tool says it was still up 7 minutes ago, but not 2 minutes later at the
next refresh (presumably because you’ve rebooted). Being idle is expected;
there are no packages queued for building.

Thanks,
James

> On 31 Oct 2016, at 14:12, rod  wrote:
> 
> Good morning,
> 
> I saw ravirin was not building again so in the process of rebooting him
> I found this snipet.
> 
> [ 1208.832559] vln_init: lazy_max_pages() is 4096
> [ 1208.918323] vln_init: vmap_lazy_nr is caeb1c
> [ 1208.974405] vln_init: *vmap_lazy_nr is 145
> [ 1209.030340] vln_init: lazy_max_pages() is 4096
> [ 1209.294257] vln_init: vmap_lazy_nr is caeb1c
> [ 1209.350412] vln_init: *vmap_lazy_nr is 174
> [ 1209.406334] vln_init: lazy_max_pages() is 4096
> [ 1209.834246] vln_init: vmap_lazy_nr is caeb1c
> [ 1209.890347] vln_init: *vmap_lazy_nr is 203
> [ 1209.946274] vln_init: lazy_max_pages() is 4096
> [ 1210.553105] vln_init: vmap_lazy_nr is caeb1c
> [ 1210.609279] vln_init: *vmap_lazy_nr is 232
> [ 1210.665157] vln_init: lazy_max_pages() is 4096
> [ 1211.349159] vln_init: vmap_lazy_nr is caeb1c
> [ 1211.405310] vln_init: *vmap_lazy_nr is 261
> [ 1211.461203] vln_init: lazy_max_pages() is 4096
> [24537.258928] TSB[kworker/0:5:5958]: DEBUG flush_tsb_kernel_range
> start=1000c000 end=f000 PAGE_SIZE=2000
> [24537.404378] TSB[kworker/0:5:5958]: DEBUG flush_tsb_kernel_range
> start=0001 end=0001020d6000 PAGE_SIZE=2000
> [217632.451540] TSB[kworker/0:1:17055]: DEBUG flush_tsb_kernel_range
> start=00014000 end=000102126000 PAGE_SIZE=2000
> [230284.512421] tst_qtgraphical[476]: segfault at 0 ip fff1013e8644
> (rpc fff100841884) sp 07feffc275e1 error 30001 in
> libc-2.24.so[fff10135c000+15e000]
> 
> Not sure what it means nor what I can do to help fix it.  Thoughts.
> 
> Rod
> 
> 
> On 10/27/2016 7:07 AM, James Clarke wrote:
>> Hi Rod,
>> It seems ravirin has been down for at least the past few days. Could you 
>> please
>> give it some love? If it’s crashed/hung with CPU lockups, I’ve built a 4.9
>> kernel with a patch to fix this which I’d like to test.
>> 
>> Thanks,
>> James
>> 



Re: ravirin down

2016-10-27 Thread James Clarke
Thanks! I'll put the new kernel on there.

James

> On 27 Oct 2016, at 15:11, rod  wrote:
> 
> James,
> 
> He's back up.  Let me know if you need anything else.
> 
> Rod
> 
>> On 10/27/2016 7:07 AM, James Clarke wrote:
>> Hi Rod,
>> It seems ravirin has been down for at least the past few days. Could you 
>> please
>> give it some love? If it’s crashed/hung with CPU lockups, I’ve built a 4.9
>> kernel with a patch to fix this which I’d like to test.
>> 
>> Thanks,
>> James
>>