Re: [arch-general] Process 13696 (systemctl) of user 0 dumped core ??

2015-08-24 Thread David C. Rankin

On 08/24/2015 06:17 PM, Damjan Georgievski wrote:

   I haven't seen or noticed this happening before, but obviously the first
core dump was back in April related to cups. The question is What should I
check? and Does any of this look related to BIOS settings and the new disk
controller? (that looks more doubtful after looking over all the
information)

   Anybody have experience with this type thing?


are you running everything Arch up-to-date vanilla or do you have some
custom stuff?
if you're vanilla, run memtest on the machine.


All vanilla, I'll double-check with memtest. After putting the pci-sata 
controller in, I have noticed an IRQ 13 error on boot related to the failed 
onboard disk controller. I suspect disabling the onboard controller completely 
will eliminate that error.


(next snip)
 are you running everything Arch up-to-date vanilla or do you have some
 custom stuff?
 if you're vanilla, run memtest on the machine.
 also, make sure to:
 update the bios
 and
 do you have the inte-ucode installed and configured (this is very
 important for certain cpus)?
 https://wiki.archlinux.org/index.php/Microcode

Thankfully, this is a situation where I have an older AMD Phenom-9850 Black in 
the box, so linux-firmware should catch it. Thanks for your reply.


--
David C. Rankin, J.D.,P.E.


Re: [arch-general] Process 13696 (systemctl) of user 0 dumped core ??

2015-08-24 Thread Damjan Georgievski
 Mon 2015-08-24 15:32:05 CDT   13580 0 0   7 * /usr/bin/systemctl
 Mon 2015-08-24 15:53:37 CDT   13696 0 0   7 * /usr/bin/systemctl


   I haven't seen or noticed this happening before, but obviously the first
 core dump was back in April related to cups. The question is What should I
 check? and Does any of this look related to BIOS settings and the new disk
 controller? (that looks more doubtful after looking over all the
 information)

   Anybody have experience with this type thing?


are you running everything Arch up-to-date vanilla or do you have some
custom stuff?
if you're vanilla, run memtest on the machine.


-- 
damjan


[arch-general] Process 13696 (systemctl) of user 0 dumped core ??

2015-08-24 Thread David C. Rankin

All,

  After bandaiding my server back together by putting a 4-port pci-sata 
controller in it to work around the failed onboard disk controller, the system 
is up and running fine. In the BIOS, currently the onboard sata controller is 
'Enabled', but each of the sata ports is 'Disabled'. When I check the status of 
something with systemclt, I get an odd error at the end of each command, eg:


[15:47 phoinix:~/.ssh] # sc status smbd
● smbd.service - Samba SMB/CIFS server
   Loaded: loaded (/usr/lib/systemd/system/smbd.service; enabled; vendor 
preset: disabled)

   Active: active (running) since Sat 2015-08-22 22:57:26 CDT; 1 day 16h ago
 Main PID: 542 (smbd)
   CGroup: /system.slice/smbd.service
   ├─542 /usr/bin/smbd -D
   └─559 /usr/bin/smbd -D
Bus error (core dumped)

  Looking at the journal and looking at the core dumps, the only other process 
that is implicated is:


  Cannot add dependency job for unit cups.socket, ignoring: Unit cups.socket 
failed to load: No such file or directory.


  Nothing else is generating a core dump. But each time I check the status of a 
process, it ends with:


Bus error (core dumped)

  The only other thing I see in the journal that may or may not be related is:

Aug 24 14:21:58 phoinix systemd[13187]: pam_unix(systemd-user:session): session 
opened for user root by (uid=0)
Aug 24 14:21:58 phoinix systemd[13187]: Unit type .busname is not supported on 
this system.


  I don't know if that's related, but it was the only thing else tangentially 
related to 'bus'.


  Looking at the core dump list with 'coredumpctl list' show a handful of files:

[17:46 phoinix:~/.ssh] # coredumpctl list
TIMEPID   UID   GID SIG PRESENT EXE
Mon 2015-04-06 19:00:15 CDT 342 0 0  11   /usr/bin/cupsd
Tue 2015-05-26 13:15:01 CDT   23265 0 0  11   /usr/bin/crond
Tue 2015-05-26 14:01:01 CDT   23563 0 0  11   /usr/bin/crond
Tue 2015-05-26 14:05:01 CDT   23593 0 0  11   /usr/bin/crond
Sun 2015-08-23 05:51:43 CDT3151 0 0   7 * /usr/bin/systemctl
Sun 2015-08-23 05:52:16 CDT3179 0 0   7 * /usr/bin/systemctl
Sun 2015-08-23 07:11:33 CDT3639 0 0   7 * /usr/bin/systemctl
Sun 2015-08-23 07:12:31 CDT3652 0 0   7 * /usr/bin/systemctl
Mon 2015-08-24 15:30:11 CDT   13565 0 0   7 * /usr/bin/systemctl
Mon 2015-08-24 15:32:05 CDT   13580 0 0   7 * /usr/bin/systemctl
Mon 2015-08-24 15:53:37 CDT   13696 0 0   7 * /usr/bin/systemctl

  Looking at the dumps in gdb shows:

[17:47 phoinix:~/.ssh] # coredumpctl gdb 13696
   PID: 13696 (systemctl)
   UID: 0 (root)
   GID: 0 (root)
Signal: 7 (BUS)
 Timestamp: Mon 2015-08-24 15:53:37 CDT (1h 54min ago)
  Command Line: systemctl status smbd
Executable: /usr/bin/systemctl
 Control Group: /user.slice/user-1000.slice/session-c2.scope
  Unit: session-c2.scope
 Slice: user-1000.slice
   Session: c2
 Owner UID: 1000 (david)
   Boot ID: aeecdf7479ea4b43aae7f1b9b83b2502
Machine ID: 8d32bcc3152b4a1f87c4d71f948f93fb
  Hostname: phoinix
  Coredump: 
/var/lib/systemd/coredump/core.systemctl.0.aeecdf7479ea4b43aae7f1b9b83b2502.13696.144044961700.lz4

   Message: Process 13696 (systemctl) of user 0 dumped core.
snip
(gdb) bt
#0  0x7f353981becf in ?? ()
#1  0x7f3539801c09 in ?? ()
#2  0x7f3539801d38 in ?? ()
#3  0x7f3539801b64 in ?? ()
#4  0x7f3539801d38 in ?? ()
#5  0x7f3539801b64 in ?? ()
#6  0x7f353980310e in ?? ()
#7  0x7f35397f4080 in ?? ()
#8  0x7f353983340b in ?? ()
#9  0x7f35397ed1d1 in ?? ()
#10 0x7f35397e2414 in ?? ()
#11 0x7f35386f5790 in __libc_start_main () from /usr/lib/libc.so.6
#12 0x7f35397e3049 in ?? ()
(gdb) frame 0
#0  0x7f353981becf in ?? ()
(gdb) info frame
Stack level 0, frame at 0x7ffed3907080:
 rip = 0x7f353981becf; saved rip = 0x7f3539801c09
 called by frame at 0x7ffed3907160
 Arglist at 0x7ffed3906fd8, args:
 Locals at 0x7ffed3906fd8, Previous frame's sp is 0x7ffed3907080
 Saved registers:
  rbx at 0x7ffed3907048, rbp at 0x7ffed3907050, r12 at 0x7ffed3907058, r13 at 
0x7ffed3907060, r14 at 0x7ffed3907068,

  r15 at 0x7ffed3907070, rip at 0x7ffed3907078
(gdb) quit

  I haven't seen or noticed this happening before, but obviously the first core 
dump was back in April related to cups. The question is What should I check? 
and Does any of this look related to BIOS settings and the new disk 
controller? (that looks more doubtful after looking over all the information)


  Anybody have experience with this type thing?

--
David C. Rankin, J.D.,P.E.


Re: [arch-general] Process 13696 (systemctl) of user 0 dumped core ??

2015-08-24 Thread Damjan Georgievski
On 25 August 2015 at 01:17, Damjan Georgievski gdam...@gmail.com wrote:
 Mon 2015-08-24 15:32:05 CDT   13580 0 0   7 * /usr/bin/systemctl
 Mon 2015-08-24 15:53:37 CDT   13696 0 0   7 * /usr/bin/systemctl


   I haven't seen or noticed this happening before, but obviously the first
 core dump was back in April related to cups. The question is What should I
 check? and Does any of this look related to BIOS settings and the new disk
 controller? (that looks more doubtful after looking over all the
 information)

   Anybody have experience with this type thing?


 are you running everything Arch up-to-date vanilla or do you have some
 custom stuff?
 if you're vanilla, run memtest on the machine.

also, make sure to:
update the bios
and
do you have the inte-ucode installed and configured (this is very
important for certain cpus)?
https://wiki.archlinux.org/index.php/Microcode

-- 
damjan