from:"Dmitry Sivachenko"

Re: fsck_ufs dumps core

2016-08-15 Thread Dmitry Sivachenko


> On 12 Aug 2016, at 08:51, Konstantin Belousov  wrote:
> 
> On Wed, Aug 10, 2016 at 06:11:39PM +0300, Dmitry Sivachenko wrote:
>> 
>>> On 10 Aug 2016, at 17:55, Konstantin Belousov  wrote:
>>> 
>>> On Wed, Aug 10, 2016 at 05:29:31PM +0300, Dmitry Sivachenko wrote:
>>>> Hello,
>>>> 
>>>> I am running FreeBSD 10.3-STABLE #0 r299261M
>>>> 
>>>> After unclean reboot I am unable to fsck my UFS filesystem:
>>>> 
>>>> # fsck  /dev/mfid0p1
>>>> ** /dev/mfid0p1
>>>> ** Last Mounted on /opt
>>>> ** Phase 1 - Check Blocks and Sizes
>>>> fsck: /dev/mfid0p1: Segmentation fault
>>>> 
>>>> pid 482 (fsck_ufs), uid 0: exited on signal 11 (core dumped)
>>>> 
>>>> # gdb -c fsck_ufs.482 /sbin/fsck_ufs 
>>>> GNU gdb 6.1.1 [FreeBSD]
>>>> Copyright 2004 Free Software Foundation, Inc.
>>>> GDB is free software, covered by the GNU General Public License, and you 
>>>> are
>>>> welcome to change it and/or distribute copies of it under certain 
>>>> conditions.
>>>> Type "show copying" to see the conditions.
>>>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>>>> This GDB was configured as "amd64-marcel-freebsd"...
>>>> Core was generated by `fsck_ufs'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> Reading symbols from /lib/libufs.so.6...done.
>>>> Loaded symbols for /lib/libufs.so.6
>>>> Reading symbols from /lib/libc.so.7...done.
>>>> Loaded symbols for /lib/libc.so.7
>>>> Reading symbols from /libexec/ld-elf.so.1...done.
>>>> Loaded symbols for /libexec/ld-elf.so.1
>>>> #0  0x00409a8b in pass1 () at 
>>>> /place/WRK/src/sbin/fsck_ffs/pass1.c:83
>>>> 83  setbmap(i);
>>>> (gdb) bt
>>>> #0  0x00409a8b in pass1 () at 
>>>> /place/WRK/src/sbin/fsck_ffs/pass1.c:83
>>>> #1  0x00409050 in main (argc=, 
>>>>   argv=) at /place/WRK/src/sbin/fsck_ffs/main.c:447
>>>> Current language:  auto; currently minimal
>>>> (gdb) 
>>>> 
>>> 
>>> Try to use alternative superblock (-b switch).  You can get the list of
>>> the possible values for -b by 'newfs -N' invocation, but you have to know
>>> the parameters which were used for formatting.
>> 
>> 
>> Yes, I tried several different backup superblocks, with the same result.  (I 
>> created this FS few years ago so I can't be 100% sure about the parameters, 
>> but I usually only use larger -i NN for big filesystems, and I can guess the 
>> exact value examining df -ik).
>> 
>> 
>> BTW I just noticed that when I use larger values for backup superblock, it 
>> reports an error which looks like overflow:
>> 
>> # fsck_ufs -b 7437746112 /dev/mfid0p1
>> Alternate super block location: -1152188480
>> ** /dev/mfid0p1
>> 
>> CANNOT SEEK BLK: -1152188480
>> CONTINUE? [yn] 
> 
> Well, it seems that your beginning of the volume got obliterated.
> Fsck_ffs cannot convert random sequence of bytes into the valid FFS
> volume.
> 
> The only other way to try is to restore content of the cylinder groups
> which are farther away from the start.  Create a scratch volume of the
> same size, newfs it with the same parameters.  Then dd from the broken
> volume to the new one, with some offset.  Offset should be large enough
> to not include initial superblock, and if the zero cg is damaged, skip
> it as well.  You should use seek=n skip=n (i.e. the same initial offsets
> both for input and output).


Okay, then it was simpler for me to backup vital data from this volume and do 
newfs on it (rather that dd 145TB of data).

But fsck_ufs -b still does not work (after fresh newfs):

# fsck_ufs -b 343748128704 /dev/mfid0p1 
Alternate super block location: 150745024
** /dev/mfid0p1
150745024 is not a file system superblock


343748128704 was taken from freshly made newfs.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: fsck_ufs dumps core

2016-08-10 Thread Dmitry Sivachenko


> On 10 Aug 2016, at 17:55, Konstantin Belousov  wrote:
> 
> On Wed, Aug 10, 2016 at 05:29:31PM +0300, Dmitry Sivachenko wrote:
>> Hello,
>> 
>> I am running FreeBSD 10.3-STABLE #0 r299261M
>> 
>> After unclean reboot I am unable to fsck my UFS filesystem:
>> 
>> # fsck  /dev/mfid0p1
>> ** /dev/mfid0p1
>> ** Last Mounted on /opt
>> ** Phase 1 - Check Blocks and Sizes
>> fsck: /dev/mfid0p1: Segmentation fault
>> 
>> pid 482 (fsck_ufs), uid 0: exited on signal 11 (core dumped)
>> 
>> # gdb -c fsck_ufs.482 /sbin/fsck_ufs 
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>> Core was generated by `fsck_ufs'.
>> Program terminated with signal 11, Segmentation fault.
>> Reading symbols from /lib/libufs.so.6...done.
>> Loaded symbols for /lib/libufs.so.6
>> Reading symbols from /lib/libc.so.7...done.
>> Loaded symbols for /lib/libc.so.7
>> Reading symbols from /libexec/ld-elf.so.1...done.
>> Loaded symbols for /libexec/ld-elf.so.1
>> #0  0x00409a8b in pass1 () at /place/WRK/src/sbin/fsck_ffs/pass1.c:83
>> 83  setbmap(i);
>> (gdb) bt
>> #0  0x00409a8b in pass1 () at /place/WRK/src/sbin/fsck_ffs/pass1.c:83
>> #1  0x00409050 in main (argc=, 
>>argv=) at /place/WRK/src/sbin/fsck_ffs/main.c:447
>> Current language:  auto; currently minimal
>> (gdb) 
>> 
> 
> Try to use alternative superblock (-b switch).  You can get the list of
> the possible values for -b by 'newfs -N' invocation, but you have to know
> the parameters which were used for formatting.


Yes, I tried several different backup superblocks, with the same result.  (I 
created this FS few years ago so I can't be 100% sure about the parameters, but 
I usually only use larger -i NN for big filesystems, and I can guess the exact 
value examining df -ik).


BTW I just noticed that when I use larger values for backup superblock, it 
reports an error which looks like overflow:

# fsck_ufs -b 7437746112 /dev/mfid0p1
Alternate super block location: -1152188480
** /dev/mfid0p1

CANNOT SEEK BLK: -1152188480
CONTINUE? [yn] 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

fsck_ufs dumps core

2016-08-10 Thread Dmitry Sivachenko

Hello,

I am running FreeBSD 10.3-STABLE #0 r299261M

After unclean reboot I am unable to fsck my UFS filesystem:

# fsck  /dev/mfid0p1
** /dev/mfid0p1
** Last Mounted on /opt
** Phase 1 - Check Blocks and Sizes
fsck: /dev/mfid0p1: Segmentation fault

pid 482 (fsck_ufs), uid 0: exited on signal 11 (core dumped)

# gdb -c fsck_ufs.482 /sbin/fsck_ufs 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `fsck_ufs'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libufs.so.6...done.
Loaded symbols for /lib/libufs.so.6
Reading symbols from /lib/libc.so.7...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x00409a8b in pass1 () at /place/WRK/src/sbin/fsck_ffs/pass1.c:83
83  setbmap(i);
(gdb) bt
#0  0x00409a8b in pass1 () at /place/WRK/src/sbin/fsck_ffs/pass1.c:83
#1  0x00409050 in main (argc=, 
argv=) at /place/WRK/src/sbin/fsck_ffs/main.c:447
Current language:  auto; currently minimal
(gdb) 



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Failed to write core file (error 14)

2016-05-19 Thread Dmitry Sivachenko


> On 19 May 2016, at 19:42, Erik  wrote:
> 
> 
> On 05/19/2016 06:29 PM, Dmitry Sivachenko wrote:
>> 
>> 
>> gdb does not show stack:
>> 
>> (gdb) bt
>> #0  0x000800bffb9b in ?? ()
>> Cannot access memory at address 0x7fffd588
>> 
>> It started several months ago after OS update to fresh 10/stable (but I do 
>> not remember details: which version were before and from which version that 
>> started).
>> 
>> Does anyone observe something similar?
> 
> 
> This sounds like:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204426
> and
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764
> 
> 
> This problem exists since 10.2.
> 10.1 is fine.



Oh, yes, thanks, somehow google missed that for me.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Failed to write core file (error 14)

2016-05-19 Thread Dmitry Sivachenko

Hello,

On our 10-stable boxes sometimes processes crash with the following errors:

Failed to write core file for process check_ssh (error 14)
pid 81441 (check_ssh), uid 181: exited on signal 11
Failed to write core file for process nagios (error 14)
pid 30255 (nagios), uid 181: exited on signal 11
Failed to write core file for process sh (error 14)
pid 59267 (sh), uid 181: exited on signal 11
Failed to write core file for process nagios (error 14)
pid 99102 (nagios), uid 181: exited on signal 11


gdb does not show stack:

(gdb) bt
#0  0x000800bffb9b in ?? ()
Cannot access memory at address 0x7fffd588

It started several months ago after OS update to fresh 10/stable (but I do not 
remember details: which version were before and from which version that 
started).

Does anyone observe something similar?

Thanks.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs_getpages: error 4

2016-03-09 Thread Dmitry Sivachenko


> On 05 Mar 2016, at 23:10, Dmitry Sivachenko  wrote:
> 
> 
>> On 05 Mar 2016, at 21:35, Konstantin Belousov  wrote:
>> 
>> But I suspect that you do have enough free or reclamaible pages for OOM
>> to not trigger, e.g. because you demonstrated commands output from the
>> live system after the situation occured.  It more likely was a temporal
>> free page shortage, after which the system recovered.
>> 
>> I more believe in a bug in the handling of killed process in vm_fault().
>> Could you get the p_flag value for the hung process ?  Like
>>  ps -o flags 
> 
> 
> Unfortunately I already rebooted this machine because our developers needed 
> it and processes did not stop after kill -9.
> 
> When this repeats, I will try to keep this server up for longer time and 
> provide any necessary information.


So far I got the same error:

Mar  8 07:13:08 skazka4 kernel: nfs_getpages: error 4
Mar  8 07:13:08 skazka4 kernel: vm_fault: pager read error, pid 58483 (decodcmd)

But the process in question finished successfully.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs_getpages: error 4

2016-03-05 Thread Dmitry Sivachenko

> On 05 Mar 2016, at 21:35, Konstantin Belousov  wrote:
> 
> But I suspect that you do have enough free or reclamaible pages for OOM
> to not trigger, e.g. because you demonstrated commands output from the
> live system after the situation occured.  It more likely was a temporal
> free page shortage, after which the system recovered.
> 
> I more believe in a bug in the handling of killed process in vm_fault().
> Could you get the p_flag value for the hung process ?  Like
>   ps -o flags 

Unfortunately I already rebooted this machine because our developers needed it 
and processes did not stop after kill -9.

When this repeats, I will try to keep this server up for longer time and 
provide any necessary information.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs_getpages: error 4

2016-03-05 Thread Dmitry Sivachenko


> On 05 Mar 2016, at 19:27, Konstantin Belousov  wrote:
> 
> On Sat, Mar 05, 2016 at 05:24:26PM +0300, Dmitry Sivachenko wrote:
>>> 
>>> Again, error 4 is EINTR so you could disable both "soft" and "intr" options 
>>> for test.
>> 
>> 
>> "soft" is meaningless in such setup, because "file system calls will fail 
>> after retrycnt round trip timeout intervals" but "The default is a retry 
>> count of zero, which means to keep retrying forever".
>> 
>> If I understand "intr" correctly, it matters only when server becomes 
>> unresponsive, that is "server is not responding" message should be in my 
>> logs.  But I have no such a message.
>> 
>> 
> 
> The intr NFS mount option allows signals to interrupt NFS waits for the
> RPC responses.  This is almost certainly the reason for the EINTR error
> you get from the pager.
> 
> You should at last get the
> vm_fault: pager read error, pid ...
> messages as well.  Is this true ?


That is true, see my initial post.


>  The end result would be SIGSEGV
> delivered to the process.
> 
> OTOH, I do not quite understand why did your threads requesting page-in
> fall into the wait for a free page.  I assume that there is enough free
> pages in the system ?
> 


I have no swap configured, but it is possible that running processes eat all 
RAM (I expect them to be killed with OOM rather than stuck?)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs_getpages: error 4

2016-03-05 Thread Dmitry Sivachenko

> On 05 Mar 2016, at 17:01, Eugene Grosbein  wrote:
> 
> 05.03.2016 20:42, Dmitry Sivachenko пишет:
> 
>>> and to discover what version is broken. And show full mount command/option 
>>> set.
>> I already included mount flags from fstab in my original e-mail:
>> 
>> rw,bg,intr,soft
> 
> If that's only options you use, there is another workaround: add options
> rsize=1024,wsize=1024 to avoid possible packet reassemply/defragmentation
> related bugs.

I wonder how rsize=wsize=1024 will affect performance?  I have 10GBit network 
and I expect to achieve comparable throughput.

> 
> Again, error 4 is EINTR so you could disable both "soft" and "intr" options 
> for test.

"soft" is meaningless in such setup, because "file system calls will fail after 
retrycnt round trip timeout intervals" but "The default is a retry count of 
zero, which means to keep retrying forever".

If I understand "intr" correctly, it matters only when server becomes 
unresponsive, that is "server is not responding" message should be in my logs.  
But I have no such a message.

> Anyway, re-read mount_nfs(8) manual page, section BUGS before switch to NFSv4.

That is why I chose to use NFSv3, I thought it is more mature and stable 
implementation.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs_getpages: error 4

2016-03-05 Thread Dmitry Sivachenko


> On 05 Mar 2016, at 16:33, Eugene Grosbein  wrote:
> 
> 05.03.2016 19:32, Dmitry Sivachenko пишет:
> 
>>>> I am running a number of machines with /home mounted via nfs (FreeBSD 
>>>> 10.3-PRERELEASE #0 r294799, rw,bg,intr,soft).
>>>> 
>>>> Sometimes I get the following messages in syslog:
>>>> 
>>>> nfs_getpages: error 4
>>>> vm_fault: pager read error, pid NNN (myprog)
>>>> 
>>>> After that I see I lot of processes stuck in "pfault" state (these are 
>>>> computational processes which use some files from NFS mount), they use 0% 
>>>> of CPU after that.
>>>> 
>>>> On NFS server machine I see nothing strange in logs.  procstat -kk for 
>>>> such stuck processes shows:
>>>>  PIDTID COMM TDNAME   KSTACK
>>>> 85274 102056 myprog   -mi_switch+0xbe 
>>>> sleepq_wait+0x3a _sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0 
>>>> vm_fault+0x77 trap_pfault+0x180 trap+0x52c calltrap+0x8
>>>> 
>>>> 
>>>> What can be the reason of this?
>>> 
>>> For example, if some processes running on NFS server box modify some files 
>>> "in-place"
>>> and these files are opened by processes running on NFS client, that could 
>>> be the reason.
>>> If so, change this so processes updating such files create new temporary 
>>> versions of them first
>>> and then rename them atomically.
>>> 
>> 
>> This should not be the case: users are working only on NFS clients.
>> Moreover, the nature of computations is so that each process uses it's own 
>> set of files.
>> 
>> (Forgot to mention in my previous e-mail that these processes can't be 
>> stopped even with kill -9)
> 
> Make sure you use TCP mounts and TSO is disabled.


I do use TCP mount (this is the default).  I will try to disable TSO.


> Try switching between NFSv3/NFSv4 to avoid this bug

As far as I understand, the default is NFSv3 (which should be more stable?).

I can try to switch to NFSv4.


> and to discover what version is broken. And show full mount command/option 
> set.


I already included mount flags from fstab in my original e-mail:

rw,bg,intr,soft

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs_getpages: error 4

2016-03-05 Thread Dmitry Sivachenko


> On 05 Mar 2016, at 15:13, Eugene Grosbein  wrote:
> 
> 05.03.2016 18:21, Dmitry Sivachenko пишет:
>> Hello,
>> 
>> I am running a number of machines with /home mounted via nfs (FreeBSD 
>> 10.3-PRERELEASE #0 r294799, rw,bg,intr,soft).
>> 
>> Sometimes I get the following messages in syslog:
>> 
>> nfs_getpages: error 4
>> vm_fault: pager read error, pid NNN (myprog)
>> 
>> After that I see I lot of processes stuck in "pfault" state (these are 
>> computational processes which use some files from NFS mount), they use 0% of 
>> CPU after that.
>> 
>> On NFS server machine I see nothing strange in logs.  procstat -kk for such 
>> stuck processes shows:
>>  PIDTID COMM TDNAME   KSTACK
>> 85274 102056 myprog   -mi_switch+0xbe 
>> sleepq_wait+0x3a _sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0 
>> vm_fault+0x77 trap_pfault+0x180 trap+0x52c calltrap+0x8
>> 
>> 
>> What can be the reason of this?
> 
> For example, if some processes running on NFS server box modify some files 
> "in-place"
> and these files are opened by processes running on NFS client, that could be 
> the reason.
> If so, change this so processes updating such files create new temporary 
> versions of them first
> and then rename them atomically.
> 

This should not be the case: users are working only on NFS clients.
Moreover, the nature of computations is so that each process uses it's own set 
of files.

(Forgot to mention in my previous e-mail that these processes can't be stopped 
even with kill -9)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

nfs_getpages: error 4

2016-03-05 Thread Dmitry Sivachenko

Hello,

I am running a number of machines with /home mounted via nfs (FreeBSD 
10.3-PRERELEASE #0 r294799, rw,bg,intr,soft).

Sometimes I get the following messages in syslog:

nfs_getpages: error 4
vm_fault: pager read error, pid NNN (myprog)

After that I see I lot of processes stuck in "pfault" state (these are 
computational processes which use some files from NFS mount), they use 0% of 
CPU after that.

On NFS server machine I see nothing strange in logs.  procstat -kk for such 
stuck processes shows:
 PIDTID COMM TDNAME   KSTACK   
85274 102056 myprog   -mi_switch+0xbe sleepq_wait+0x3a 
_sleep+0x287 vm_waitpfault+0x8a vm_fault_hold+0xdd0 vm_fault+0x77 
trap_pfault+0x180 trap+0x52c calltrap+0x8


What can be the reason of this?

Thanks.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Regression on 10/STABLE (Was: SOL_TCP def?)

2016-01-02 Thread Dmitry Sivachenko


> On 29 Dec 2015, at 23:48, Jonathan Chen  wrote:
> 
> On 30 December 2015 at 07:50, Dmitry Sivachenko  wrote:
>>> Patch gsoap to use IPPROTO_IP instead of SOL_TCP.
>> I meant IPPROTO_TCP, sorry.
> 
> Thanks for the quick-fix. However, IMHO this should be classed as a
> regression on 10/STABLE.
> 

I think it is non-portable code in gsoap, which was revealed by introduction of 
TCP_FASTOPEN to 10/stable.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SOL_TCP def?

2015-12-29 Thread Dmitry Sivachenko


> On 29 Dec 2015, at 21:49, Dmitry Sivachenko  wrote:
> 
> 
>> On 29 Dec 2015, at 21:38, Jonathan Chen  wrote:
>> 
>> On 30 December 2015 at 07:28, Jonathan Chen  wrote:
>> [...]
>>> devel/gsoap will build on older version. My installed devel/gsoap was
>>> last installed on 6-Dec-2015.
>> 
>> Rephrasing: devel/gsoap will build on an older snapshot of 10/STABLE.
>> My currently installed devel/gsoap was last installed on 6-Dec-2015.
>> -- 
> 
> 
> Patch gsoap to use IPPROTO_IP instead of SOL_TCP.
> 


I meant IPPROTO_TCP, sorry.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SOL_TCP def?

2015-12-29 Thread Dmitry Sivachenko


> On 29 Dec 2015, at 21:38, Jonathan Chen  wrote:
> 
> On 30 December 2015 at 07:28, Jonathan Chen  wrote:
> [...]
>> devel/gsoap will build on older version. My installed devel/gsoap was
>> last installed on 6-Dec-2015.
> 
> Rephrasing: devel/gsoap will build on an older snapshot of 10/STABLE.
> My currently installed devel/gsoap was last installed on 6-Dec-2015.
> -- 


Patch gsoap to use IPPROTO_IP instead of SOL_TCP.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: process scheduling and cpuset

2015-09-14 Thread Dmitry Sivachenko


> On 13 сент. 2015 г., at 19:40, Slawa Olhovchenkov  wrote:
> 
> On Sun, Sep 13, 2015 at 04:44:40PM +0300, Dmitry Sivachenko wrote:
> 
>> 
>>> On 13 сент. 2015 г., at 16:09, Slawa Olhovchenkov  wrote:
>>> 
>>> On Sun, Sep 13, 2015 at 02:52:08PM +0300, Dmitry Sivachenko wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I have 32 processor machine (2x CPU E5-2650) running several CPU-bound 
>>>> processes (ULE scheduler).
>>>> 3 processes are 32-threaded, and 8 are single threaded.
>>>> 
>>>> I bind all 3 32-threaded processes to CPUs 0-24 (cpuset -C -l 0-24 -p XXX).
>>>> 
>>>> I expect that the remaining 8 single-threaded processes will (mostly) run 
>>>> on the remaining 25-31 CPU cores and use (almost) 100% cpu each.
>>>> 
>>>> But this is not the case (according to top(1)):  they spend a lot of time 
>>>> on 0-24 CPUs and CPU Idle time is about 10%.
>>>> 
>>>> These are all purely computational programs, in idle system 
>>>> single-threaded programs steadily consume 100% of a core, and 32-threaded 
>>>> programs consume all 32 cores and idle time is zero.
>>>> 
>>>> Is it an ULE scheduler feature or am I doing something wrong?
>>>> 
>>>> The goal is to give a single-threaded program a chance to run when 
>>>> somebody started several 32-threaded processes.
>>> 
>>> You don't have 32 processor machine, you have only 16 processor
>>> machine.
>>> SMT/hyperthreading don't give real processor, SMT "CPU" have
>>> unpredicable power and his load depend on load parent CPU.
>>> 
>>> For example, for my case I see such condition (simpliy) on CPU 0 and 1
>>> (SMT of one real core) with rise load:
>>> 
>>> load 0.1  0.1
>>> load 0.2  0.2
>>> load 0.3  0.3
>>> load 0.4  0.4
>>> load 0.45 0.45
>>> load 0.48 0.48
>>> load 1.00 1.00\
>> 
>> 
>> Yes I know about HT.  But how does this explain why I have 10% of CPU idle?
>> 
>> If I explicitly bind my single-threaded processes to the remaining CPU cores 
>> (25-32), they start to receive expected 100% of CPU and overall Idle 
>> decreases.
>> 
>> I just expect scheduler to do the same for me.
>> 
> 
> Idle is not goal, goal is lessing task executing time.


Thanks for the explanation.

In my example SMT pairs are numbered with sequential numbers, so 0+1 is one SMT 
group, 2+3 is second SMT group, and so on.

So in 25-32 range there are several real CPU cores which remain idle while 
processes are fighting for overloaded 0-24.

When I explicitly pin my single-threaded processes to 25-32 range, they start 
to receive 100% of CPU (and finish faster to be clear).

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: process scheduling and cpuset

2015-09-13 Thread Dmitry Sivachenko


> On 13 сент. 2015 г., at 16:09, Slawa Olhovchenkov  wrote:
> 
> On Sun, Sep 13, 2015 at 02:52:08PM +0300, Dmitry Sivachenko wrote:
> 
>> Hello,
>> 
>> I have 32 processor machine (2x CPU E5-2650) running several CPU-bound 
>> processes (ULE scheduler).
>> 3 processes are 32-threaded, and 8 are single threaded.
>> 
>> I bind all 3 32-threaded processes to CPUs 0-24 (cpuset -C -l 0-24 -p XXX).
>> 
>> I expect that the remaining 8 single-threaded processes will (mostly) run on 
>> the remaining 25-31 CPU cores and use (almost) 100% cpu each.
>> 
>> But this is not the case (according to top(1)):  they spend a lot of time on 
>> 0-24 CPUs and CPU Idle time is about 10%.
>> 
>> These are all purely computational programs, in idle system single-threaded 
>> programs steadily consume 100% of a core, and 32-threaded programs consume 
>> all 32 cores and idle time is zero.
>> 
>> Is it an ULE scheduler feature or am I doing something wrong?
>> 
>> The goal is to give a single-threaded program a chance to run when somebody 
>> started several 32-threaded processes.
> 
> You don't have 32 processor machine, you have only 16 processor
> machine.
> SMT/hyperthreading don't give real processor, SMT "CPU" have
> unpredicable power and his load depend on load parent CPU.
> 
> For example, for my case I see such condition (simpliy) on CPU 0 and 1
> (SMT of one real core) with rise load:
> 
> load 0.1  0.1
> load 0.2  0.2
> load 0.3  0.3
> load 0.4  0.4
> load 0.45 0.45
> load 0.48 0.48
> load 1.00 1.00\


Yes I know about HT.  But how does this explain why I have 10% of CPU idle?

If I explicitly bind my single-threaded processes to the remaining CPU cores 
(25-32), they start to receive expected 100% of CPU and overall Idle decreases.

I just expect scheduler to do the same for me.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

process scheduling and cpuset

2015-09-13 Thread Dmitry Sivachenko

Hello,

I have 32 processor machine (2x CPU E5-2650) running several CPU-bound 
processes (ULE scheduler).
3 processes are 32-threaded, and 8 are single threaded.

I bind all 3 32-threaded processes to CPUs 0-24 (cpuset -C -l 0-24 -p XXX).

I expect that the remaining 8 single-threaded processes will (mostly) run on 
the remaining 25-31 CPU cores and use (almost) 100% cpu each.

But this is not the case (according to top(1)):  they spend a lot of time on 
0-24 CPUs and CPU Idle time is about 10%.

These are all purely computational programs, in idle system single-threaded 
programs steadily consume 100% of a core, and 32-threaded programs consume all 
32 cores and idle time is zero.

Is it an ULE scheduler feature or am I doing something wrong?

The goal is to give a single-threaded program a chance to run when somebody 
started several 32-threaded processes.

Thanks!
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: panic: wm_page_unwire

2015-06-20 Thread Dmitry Sivachenko


> On 20 июня 2015 г., at 13:01, Konstantin Belousov  wrote:
> 
> 
> I was able to reproduce something related, this may be very well your
> problem.  Take the attached program.  Select a scratch file on UFS mount
> point, say x.  Run the following commands:
> mlock_modify x&
> dd if=/dev/zero of=x bs=1 count=1
> fg
> ^C <- system might panic at this point, if buffers are in short supply
> dd if=/dev/zero of=x bs=1 count=1 <- at this point, the system must panic


Yes, that is exactly two cases when I was able to reproduce a panic, so it is 
apparently my issue.

I tried your patch and I can confirm that it does fix the problem.

Thanks!


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: panic: wm_page_unwire

2015-06-20 Thread Dmitry Sivachenko

> On 19 июня 2015 г., at 22:57, Dmitry Sivachenko  wrote:
> 
> Hello,
> 
> got this panic today on my 10.1-STABLE #0 r279956  box:
> 
> 

Well, I tracked this down a bit.  Rather easy way to panic -stable box (mine is 
r279956), but I can't reliably reproduce this.

It happens when there is a process running which mmap()+mlock() some file, and 
while it is running this file is modified on disk
(not rm+mv, but open the same file, truncate and write some other data into it).

After process exits, system will panic with high probability.

So far I got 2 cases:

1) run process which mlock()'s a file;  modify that file;  stop process and 
system panics
2) run process which mlock()'s a file;  modify that file;  stop process [no 
panic so far];  modify that file again and system panics.

Panic message is the same: panic: vm_page_unwire: page 's wire count is 
zero
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

panic: wm_page_unwire

2015-06-19 Thread Dmitry Sivachenko

Hello,

got this panic today on my 10.1-STABLE #0 r279956  box:

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: pkg 1.5.0 is out

2015-04-19 Thread Dmitry Sivachenko

> On 14 апр. 2015 г., at 23:05, Baptiste Daroussin  wrote:
> 
> Final pkg 1.5.0 has been released.
> 

Thank a lot for working on pkg!

> 
> For pkg 1.6.0 among other things and depending on the time, here is what we do
> plan to work on:
> - 
> 

What I really miss a lot is support for package "profiles": an ability to build 
the same port with different OPTIONs combination.
For example:
minimal nginx version;
nginx version with passenger module (for puppet server)
nginx version with some other rare options turned on for custom application.

Right now I achieve this with manually renaming /var/db/ports/*/options files 
and some manipulations in /usr/ports/packages/All.
But a framework to automatically handle this would be very useful.

Thanks.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: dev.cpu.0.freq disapeared

2015-03-23 Thread Dmitry Sivachenko


> On 23 марта 2015 г., at 9:03, Ian Smith  wrote:
> 
> 
> Do you have Enhanced Speedstep (EST), disabled in your BIOS settings?  
> If so, just turn it on.  Then you should also be able to set running 
> frequency to 'MAX performance' or similar there.
> 
> If not disabled, ie you have EST enabled in BIOS, that points to a real 
> issue of EST detection.  And it still seems strange that enabling p4tcc 
> is enough to have cpufreq(4) include OIDs for freq and freq_levels?
> 


Thanks to all who replied.  This is called Intel SpeedStep Tech in that BIOS 
and it was indeed disabled.

I enabled it and now I have in dmesg
est0:  on cpu0
even with hint.p4tcc.0.disabled="1" 

for each CPU and dev.cpu.0.freq appeared back.

Thanks for your help.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: dev.cpu.0.freq disapeared

2015-03-22 Thread Dmitry Sivachenko


> On 22 марта 2015 г., at 17:11, Ian Smith  wrote:
> 
> Dmitry Sivachenko wrote:
>>> On 22 марта 2015 г., at 8:53, Peter Jeremy  wrote:
>>> On 2015-Mar-22 00:58:55 +0300, Dmitry Sivachenko  
>>> wrote:
>>>> I have a machine with the following processor:
>>>> CPU: Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz (2400.14-MHz
>>>> K8-class CPU) Origin="GenuineIntel"  Id=0x206c2  Family=0x6 Model=0x2c  
>>>> Stepping=2
>>> ...
>>>> After I upgraded to 10.1-STABLE #0 r279956, this sysctl disapeared. % 
>>>> sysctl dev.cpu.0.freq sysctl: unknown oid 'dev.cpu.0.freq': No such file 
>>>> or directory %
> 
>>> What OIDs do you have?  Does dev.cpu.0 exist?  How about dev.cpu?
>> dev.cpu.0 does exist.
> 
> It could be helpful to show all of:
> 
> % sysctl dev.cpu
> % sysctl dev.est  # if you have that?
> % sysctl -a | grep freq | grep -v time
> 
> both before and after re-enabling p4tcc.

Hello,

With #hint.p4tcc.0.disabled="1"  commented out:

% sysctl dev.cpu
dev.cpu.%parent: 
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.P001
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.coretemp.delta: 67
dev.cpu.0.coretemp.resolution: 1
dev.cpu.0.coretemp.tjmax: 95.0C
dev.cpu.0.coretemp.throttle_log: 0
dev.cpu.0.temperature: 28.0C
dev.cpu.0.freq: 2400
dev.cpu.0.freq_levels: 2400/-1 2100/-1 1800/-1 1500/-1 1200/-1 900/-1 600/-1 
300/-1
dev.cpu.0.cx_supported: C1/1/32 C2/3/96 C3/3/128
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 261us
dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=\_PR_.P002
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0
dev.cpu.1.coretemp.delta: 67
dev.cpu.1.coretemp.resolution: 1
dev.cpu.1.coretemp.tjmax: 95.0C
dev.cpu.1.coretemp.throttle_log: 0
dev.cpu.1.temperature: 28.0C
dev.cpu.1.cx_supported: C1/1/32 C2/3/96 C3/3/128
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00% 0.00% 0.00% last 71201us
dev.cpu.2.%desc: ACPI CPU
dev.cpu.2.%driver: cpu
dev.cpu.2.%location: handle=\_PR_.P003
dev.cpu.2.%pnpinfo: _HID=none _UID=0
dev.cpu.2.%parent: acpi0
dev.cpu.2.coretemp.delta: 62
dev.cpu.2.coretemp.resolution: 1
dev.cpu.2.coretemp.tjmax: 95.0C
dev.cpu.2.coretemp.throttle_log: 0
dev.cpu.2.temperature: 33.0C
dev.cpu.2.cx_supported: C1/1/32 C2/3/96 C3/3/128
dev.cpu.2.cx_lowest: C1
dev.cpu.2.cx_usage: 100.00% 0.00% 0.00% last 124614us
dev.cpu.3.%desc: ACPI CPU
dev.cpu.3.%driver: cpu
dev.cpu.3.%location: handle=\_PR_.P004
dev.cpu.3.%pnpinfo: _HID=none _UID=0
dev.cpu.3.%parent: acpi0
dev.cpu.3.coretemp.delta: 62
dev.cpu.3.coretemp.resolution: 1
dev.cpu.3.coretemp.tjmax: 95.0C
dev.cpu.3.coretemp.throttle_log: 0
dev.cpu.3.temperature: 33.0C
dev.cpu.3.cx_supported: C1/1/32 C2/3/96 C3/3/128
dev.cpu.3.cx_lowest: C1
dev.cpu.3.cx_usage: 100.00% 0.00% 0.00% last 101864us
dev.cpu.4.%desc: ACPI CPU
dev.cpu.4.%driver: cpu
dev.cpu.4.%location: handle=\_PR_.P005
dev.cpu.4.%pnpinfo: _HID=none _UID=0
dev.cpu.4.%parent: acpi0
dev.cpu.4.coretemp.delta: 62
dev.cpu.4.coretemp.resolution: 1
dev.cpu.4.coretemp.tjmax: 95.0C
dev.cpu.4.coretemp.throttle_log: 0
dev.cpu.4.temperature: 33.0C
dev.cpu.4.cx_supported: C1/1/32 C2/3/96 C3/3/128
dev.cpu.4.cx_lowest: C1
dev.cpu.4.cx_usage: 100.00% 0.00% 0.00% last 127376us
dev.cpu.5.%desc: ACPI CPU
dev.cpu.5.%driver: cpu
dev.cpu.5.%location: handle=\_PR_.P006
dev.cpu.5.%pnpinfo: _HID=none _UID=0
dev.cpu.5.%parent: acpi0
dev.cpu.5.coretemp.delta: 62
dev.cpu.5.coretemp.resolution: 1
dev.cpu.5.coretemp.tjmax: 95.0C
dev.cpu.5.coretemp.throttle_log: 0
dev.cpu.5.temperature: 33.0C
dev.cpu.5.cx_supported: C1/1/32 C2/3/96 C3/3/128
dev.cpu.5.cx_lowest: C1
dev.cpu.5.cx_usage: 100.00% 0.00% 0.00% last 107493us
dev.cpu.6.%desc: ACPI CPU
dev.cpu.6.%driver: cpu
dev.cpu.6.%location: handle=\_PR_.P007
dev.cpu.6.%pnpinfo: _HID=none _UID=0
dev.cpu.6.%parent: acpi0
dev.cpu.6.coretemp.delta: 63
dev.cpu.6.coretemp.resolution: 1
dev.cpu.6.coretemp.tjmax: 95.0C
dev.cpu.6.coretemp.throttle_log: 0
dev.cpu.6.temperature: 32.0C
dev.cpu.6.cx_supported: C1/1/32 C2/3/96 C3/3/128
dev.cpu.6.cx_lowest: C1
dev.cpu.6.cx_usage: 100.00% 0.00% 0.00% last 155573us
dev.cpu.7.%desc: ACPI CPU
dev.cpu.7.%driver: cpu
dev.cpu.7.%location: handle=\_PR_.P008
dev.cpu.7.%pnpinfo: _HID=none _UID=0
dev.cpu.7.%parent: acpi0
dev.cpu.7.coretemp.delta: 63
dev.cpu.7.coretemp.resolution: 1
dev.cpu.7.coretemp.tjmax: 95.0C
dev.cpu.7.coretemp.throttle_log: 0
dev.cpu.7.temperature: 32.0C
dev.cpu.7.cx_supported: C1/1/32 C2/3/96 C3/3/128
dev.cpu.7.cx_lowest: C1
dev.cpu.7.cx_usage: 100.00% 0.00% 0.00% last 32278us
dev.cpu.8.%desc: ACPI CPU
dev.cpu.8.%driver: cpu
dev.cpu.8.%location: handle=\_PR_.P009
dev.cpu.8.%pnpinfo: _HID=none _UID=0
dev.cpu.8.%parent: acpi0
dev.cpu.8.coretemp.delta: 72
dev.cpu.8.coretemp.resolution: 1
dev.cpu.8.co

Re: dev.cpu.0.freq disapeared

2015-03-22 Thread Dmitry Sivachenko

> On 22 марта 2015 г., at 8:53, Peter Jeremy  wrote:
> 
> On 2015-Mar-22 00:58:55 +0300, Dmitry Sivachenko  wrote:
>> I have a machine with the following processor:
>> 
>> CPU: Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz (2400.14-MHz K8-class 
>> CPU)
>> Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
> ...
>> After I upgraded to 10.1-STABLE #0 r279956, this sysctl disapeared.
>> % sysctl dev.cpu.0.freq
>> sysctl: unknown oid 'dev.cpu.0.freq': No such file or directory
>> %
> 
> What OIDs do you have?  Does dev.cpu.0 exist?  How about dev.cpu?

dev.cpu.0 does exist.  

I found the problematic change:

Author: nwhitehorn
Date: Sun Jan 11 17:10:07 2015
New Revision: 276986
URL: https://svnweb.freebsd.org/changeset/base/276986

Log:
 MFC r265329:
 Disable ACPI and P4TCC throttling by default, following discussion on
 freebsd-current. These CPU speed control techniques are usually unhelpful
 at best. For now, continue building the relevant code into GENERIC so that
 it can trivially be re-enabled at runtime if anyone wants it.

Modified: stable/10/sys/amd64/conf/GENERIC.hints
==
--- stable/10/sys/amd64/conf/GENERIC.hints  Sun Jan 11 17:00:24 2015
(r276985)
+++ stable/10/sys/amd64/conf/GENERIC.hints  Sun Jan 11 17:10:07 2015
(r276986)
@@ -31,3 +31,5 @@ hint.attimer.0.at="isa"
hint.attimer.0.port="0x40"
hint.attimer.0.irq="0"
hint.wbwd.0.at="isa"
+hint.acpi_throttle.0.disabled="1"
+hint.p4tcc.0.disabled="1"

If I remove that hint.p4tcc.0.disabled="1" from device.hints, dev.cpu.0.freq 
appears back again.

I am using dev.cpu.0.freq to ensure that processor is running at expected 
frequency (with some buggy BIOSes or buggy BIOS options combinations it is 
possible to end up with machine running at half frequency).

Does it really hurt to have this sysctl available?  Why it was disabled by 
default?

(I am not discussing  hint.acpi_throttle.0.disabled here, just 
hint.p4tcc.0.disabled).

Thanks.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: dev.cpu.0.freq disapeared

2015-03-22 Thread Dmitry Sivachenko

> On 22 марта 2015 г., at 3:27, Kevin Oberman  wrote:
> 
> 
> # uname -a FreeBSD rogue 10.1-STABLE FreeBSD 10.1-STABLE #0 r280293: Fri Mar 
> 20 11:28:08 PDT 2015 root@rogue:/usr/obj/usr/src/sys/GENERIC  amd64
> # sysctl dev.cpu.0.freq
> dev.cpu.0.freq: 2501
> # 
> No idea why it is not working for you. I'm guessing that something is not 
> starting up properly, but I have no idea what.

This problem seems to be processor-specific: I have a lot of E5-2660 machines 
which do not suffer this issue.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

dev.cpu.0.freq disapeared

2015-03-21 Thread Dmitry Sivachenko

Hello!

I have a machine with the following processor:

CPU: Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz (2400.14-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2

When running 10.1-STABLE #5 r276908  I have:
% sysctl dev.cpu.0.freq
dev.cpu.0.freq: 2400
%

After I upgraded to 10.1-STABLE #0 r279956, this sysctl disapeared.
% sysctl dev.cpu.0.freq
sysctl: unknown oid 'dev.cpu.0.freq': No such file or directory
%

I did not change kernel config file.

What can be the cause of this problem?

Thanks.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 9-STABLE panic on intensive fork

2013-09-01 Thread Dmitry Sivachenko

On 29.08.2013, at 22:45, Konstantin Belousov  wrote:

> On Wed, Aug 28, 2013 at 06:20:29PM +0400, Dmitry Sivachenko wrote:
>> Hello!
>> 
>> I am using very recent FreeBSD-9-STABLE snapshot:
>> 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0 r254986: Wed Aug 28 17:18:57 MSK 
>> 2013
>> 
>> I run uwsgi program (ports/www/uwsgi) on that machine.
>> 
>> When uwsgi starts, it forks pre-configured number of worker processes.
>> If I raise workers parameter high enough (128), I get kernel panic (100% 
>> reproducible):
>> 
>> Fatal trap 12: page fault while in kernel mode
>> 
>> If I compile kernel with KDB enabled, I get the following stack:
>> 
>> pmap_demote_pde_locked()
>> pmap_copy()
>> vmspace_fork()
>> fork1()
>> sys_fork()
>> 
>> I have only remote console for that machine, so I made 2 screenshots:
>> 
>> 1) http://people.freebsd.org/~demon/screen1.jpg
>> Panic screen when kernel has no KDB support compiled in
>> 
>> 2) http://people.freebsd.org/~demon/screen2.jpg
>> Panic screen (2nd part) with the above stack shown.
> Look up the source line for the pmap_demote_pde_locked()+0x471 for your
> kernel.  Dump the core from the panic.

Kernel dump is not generated (despite it is configured at boot), there is no 
"Dumping" message on console.
These screenshots shows everything I see on console.

I performed some more investigations on this:
I have several (14) totally identical configured machines running exactly the 
same software.
Hardware is a bit different though.  I tried to analyze motherboard differences 
but failed to find common things for the affected machines.

Under conditions described in my initial e-mail, some of them crash (exactly 
the same way), some of them do not.
I am confident there is no hardware problems, these machines run for months 
without reboot, as for now I discovered the only way to crash them.

I updated one of the affected servers to 10-current and I can state it does not 
crash anymore with the same usage scenario.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

9-STABLE panic on intensive fork

2013-08-28 Thread Dmitry Sivachenko

Hello!

I am using very recent FreeBSD-9-STABLE snapshot:
9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0 r254986: Wed Aug 28 17:18:57 MSK 2013

I run uwsgi program (ports/www/uwsgi) on that machine.

When uwsgi starts, it forks pre-configured number of worker processes.
If I raise workers parameter high enough (128), I get kernel panic (100% 
reproducible):

Fatal trap 12: page fault while in kernel mode

If I compile kernel with KDB enabled, I get the following stack:

pmap_demote_pde_locked()
pmap_copy()
vmspace_fork()
fork1()
sys_fork()

I have only remote console for that machine, so I made 2 screenshots:

1) http://people.freebsd.org/~demon/screen1.jpg
Panic screen when kernel has no KDB support compiled in

2) http://people.freebsd.org/~demon/screen2.jpg
Panic screen (2nd part) with the above stack shown.

I can provide any additional information if needed.  

Thanks!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: RELENG_7 if_nve panic

2010-01-26 Thread Dmitry Sivachenko

On Tue, Jan 26, 2010 at 09:49:45AM -0500, John Baldwin wrote:
> On Tuesday 26 January 2010 4:29:05 am Dmitry Sivachenko wrote:
> > Hello!
> > 
> > I recompiled recent RELENG_7 and I get the following panic after
> > trying to kldload if_nve (interesting stack frames are 12, 13, 14 I guess).
> > Previous version of RELENG_7 (compiled in the middle of December)
> > worked fine.  Last few days I was trying to re-cvsup and always get the
> > same panic.  I get FreeBSD sources via cvsup (cvsup5.freebsd.org).
> > 
> > Any suggestions?
> > 
> > Thanks in advance!
> 
> The bug is perhaps in e1000phy in that it expects all callers to have called
> if_initname() before the miibus is probed.  Try this patch:


That patch solves the problem, thanks!


> 
> Index: if_nve.c
> ===
> --- if_nve.c  (revision 202705)
> +++ if_nve.c  (working copy)
> @@ -526,14 +526,6 @@
>   goto fail;
>   }
>  
> - /* Probe device for MII interface to PHY */
> - DEBUGOUT(NVE_DEBUG_INIT, "nve: do mii_phy_probe\n");
> - if (mii_phy_probe(dev, &sc->miibus, nve_ifmedia_upd, nve_ifmedia_sts)) {
> - device_printf(dev, "MII without any phy!\n");
> - error = ENXIO;
> - goto fail;
> - }
> -
>   /* Setup interface parameters */
>   ifp->if_softc = sc;
>   if_initname(ifp, device_get_name(dev), device_get_unit(dev));
> @@ -549,6 +541,14 @@
>   ifp->if_capabilities |= IFCAP_VLAN_MTU;
>   ifp->if_capenable |= IFCAP_VLAN_MTU;
>  
> + /* Probe device for MII interface to PHY */
> + DEBUGOUT(NVE_DEBUG_INIT, "nve: do mii_phy_probe\n");
> + if (mii_phy_probe(dev, &sc->miibus, nve_ifmedia_upd, nve_ifmedia_sts)) {
> + device_printf(dev, "MII without any phy!\n");
> + error = ENXIO;
> + goto fail;
> + }
> +
>   /* Attach to OS's managers. */
>   ether_ifattach(ifp, eaddr);
>  
> 
> -- 
> John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: RELENG_7 if_nve panic

2010-01-26 Thread Dmitry Sivachenko

On Tue, Jan 26, 2010 at 01:00:29PM +0100, Bartosz Stec wrote:
> W dniu 2010-01-26 10:29, Dmitry Sivachenko pisze:
> > Hello!
> >
> > I recompiled recent RELENG_7 and I get the following panic after
> > trying to kldload if_nve (interesting stack frames are 12, 13, 14 I guess).
> > Previous version of RELENG_7 (compiled in the middle of December)
> > worked fine.  Last few days I was trying to re-cvsup and always get the
> > same panic.  I get FreeBSD sources via cvsup (cvsup5.freebsd.org).
> >
> > Any suggestions?
> >
> >
> As well as I know nve driver is based on nvidia binaries (and it's 
> buggy), and that's way it was replaced by nfe driver as default for 
> nvidia based NICs as soon as it was ported from OpenBSD.
> So my suggestion - if you just need NIC working, use nfe not nve.
> 

Thanks for reminding me about nfe.

I just tried it and it does work.

I tried nfe sometime in the summer and it did not work on my hardware.
That is why I was sticking to nve.

Now it seems I can switch to nfe.

(but nve is still broken if someone cares).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RELENG_7 if_nve panic

2010-01-26 Thread Dmitry Sivachenko

Hello!

I recompiled recent RELENG_7 and I get the following panic after
trying to kldload if_nve (interesting stack frames are 12, 13, 14 I guess).
Previous version of RELENG_7 (compiled in the middle of December)
worked fine.  Last few days I was trying to re-cvsup and always get the
same panic.  I get FreeBSD sources via cvsup (cvsup5.freebsd.org).

Any suggestions?

Thanks in advance!


nve0:  port 0xc800-0xc807 mem 
0xfe02b000-0xfe02bfff irq 22 at device 20.0 on pci0
nve0: Ethernet address 00:18:f3:f4:73:1c
miibus0:  on nve0
e1000phy0:  PHY 1 on miibus0


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code  = supervisor read data, page not present
instruction pointer = 0x8:0x803259cd
stack pointer   = 0x10:0xff80210ed3e0
frame pointer   = 0x10:0xff80210ed3f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 845 (kldload)
panic: from debugger
cpuid = 0
KDB: stack backtrace:
Uptime: 33s


(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0x8028b1d8 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:418
#2  0x8028b63c in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0x80183567 in db_panic (addr=Variable "addr" is not available.
) at /usr/src/sys/ddb/db_command.c:446
#4  0x80183bcf in db_command (last_cmdp=0x806414e8, 
cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:413
#5  0x80183de0 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:466
#6  0x801859c9 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:228
#7  0x802bb235 in kdb_trap (type=12, code=0, tf=0xff80210ed330)
at /usr/src/sys/kern/subr_kdb.c:524
#8  0x8044a3f0 in trap_fatal (frame=0xff80210ed330, eva=Variable 
"eva" is not available.
)
at /usr/src/sys/amd64/amd64/trap.c:772
9  0x8044a7c4 in trap_pfault (frame=0xff80210ed330, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:693
#10 0x8044b0da in trap (frame=0xff80210ed330)
at /usr/src/sys/amd64/amd64/trap.c:464
#11 0x804335fe in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:218
#12 0x803259cd in strcmp (s1=0x0, s2=0x80496f30 "msk")
at /usr/src/sys/libkern/strcmp.c:45
#13 0x801caa3d in e1000phy_attach (dev=0xff0001532900)
at /usr/src/sys/dev/mii/e1000phy.c:153
#14 0x802b54e9 in device_attach (dev=0xff0001532900)
at device_if.h:178
#15 0x802b6bca in bus_generic_attach (dev=Variable "dev" is not 
available.
)
at /usr/src/sys/kern/subr_bus.c:2923
#16 0x801ce1ee in miibus_attach (dev=0xff00016a6900)
at /usr/src/sys/dev/mii/mii.c:186
#17 0x802b54e9 in device_attach (dev=0xff00016a6900)
at device_if.h:178
#18 0x802b6bca in bus_generic_attach (dev=Variable "dev" is not 
available.
)
at /usr/src/sys/kern/subr_bus.c:2923

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: RELENG_7_1: bce driver change generating too much interrupts ?

2008-12-03 Thread Dmitry Sivachenko

On Tue, Dec 02, 2008 at 04:44:46PM -0800, Xin LI wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Hi guys,
> 
> I think I got a real fix.
> 


I tried that patch with very recent 7-STABLE.
I does fix the problem for me.


Thanks a lot!



> Cheers,
> - --
> Xin LI <[EMAIL PROTECTED]>http://www.delphij.net/
> FreeBSD - The Power to Serve!
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.9 (FreeBSD)
> 
> iEYEARECAAYFAkk11n0ACgkQi+vbBBjt66Dy6wCfSl3eLRhj5TVs24Q+8ao5Mcz0
> FNQAoK8KvziiXFoanhSlWv636o+HfYIj
> =AixC
> -END PGP SIGNATURE-

> Index: if_bce.c
> ===
> --- if_bce.c  (revision 185565)
> +++ if_bce.c  (working copy)
> @@ -7030,13 +7030,14 @@
>  
>   /* Was it a link change interrupt? */
>   if ((status_attn_bits & STATUS_ATTN_BITS_LINK_STATE) !=
> - (sc->status_block->status_attn_bits_ack & 
> STATUS_ATTN_BITS_LINK_STATE))
> + (sc->status_block->status_attn_bits_ack & 
> STATUS_ATTN_BITS_LINK_STATE)) {
>   bce_phy_intr(sc);
>  
> - /* Clear any transient status updates during link state change. 
> */
> - REG_WR(sc, BCE_HC_COMMAND,
> - sc->hc_command | BCE_HC_COMMAND_COAL_NOW_WO_INT);
> - REG_RD(sc, BCE_HC_COMMAND);
> + /* Clear any transient status updates during link state 
> change. */
> + REG_WR(sc, BCE_HC_COMMAND,
> + sc->hc_command | 
> BCE_HC_COMMAND_COAL_NOW_WO_INT);
> + REG_RD(sc, BCE_HC_COMMAND);
> + }
>  
>   /* If any other attention is asserted then the chip is toast. */
>   if (((status_attn_bits & ~STATUS_ATTN_BITS_LINK_STATE) !=

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

SVR4 problem

2001-03-28 Thread Dmitry Sivachenko


Hello!

On my recent 4-STABLE:

# kldload svr4
kldload: can't load svr4: Exec format error

from dmesg:

link_elf: symbol svr4_stream_get undefined


Is it a known problem?  Or what am I doing wrong?

Thank you in advance,
Dima.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message

Re: fsck_ufs dumps core

Re: fsck_ufs dumps core

fsck_ufs dumps core

Re: Failed to write core file (error 14)

Failed to write core file (error 14)

Re: nfs_getpages: error 4

Re: nfs_getpages: error 4

Re: nfs_getpages: error 4

Re: nfs_getpages: error 4

Re: nfs_getpages: error 4

Re: nfs_getpages: error 4

nfs_getpages: error 4

Re: Regression on 10/STABLE (Was: SOL_TCP def?)

Re: SOL_TCP def?

Re: SOL_TCP def?

Re: process scheduling and cpuset

Re: process scheduling and cpuset

process scheduling and cpuset

Re: panic: wm_page_unwire

Re: panic: wm_page_unwire

panic: wm_page_unwire

Re: pkg 1.5.0 is out

Re: dev.cpu.0.freq disapeared

Re: dev.cpu.0.freq disapeared

Re: dev.cpu.0.freq disapeared

Re: dev.cpu.0.freq disapeared

dev.cpu.0.freq disapeared

Re: 9-STABLE panic on intensive fork

9-STABLE panic on intensive fork

Re: RELENG_7 if_nve panic

Re: RELENG_7 if_nve panic

RELENG_7 if_nve panic

Re: RELENG_7_1: bce driver change generating too much interrupts ?

SVR4 problem

34 matches

Site Navigation

Mail list logo

Footer information