Re: input/output error @boot

2017-03-09 Thread Roberto Rodriguez Jr
Hello all,

I am booting from CURRENTusb img.
For the life of me I cannot comprehend why I cannot see my /boot directory
after
Choose Shell from bsdinstall welcome
mkdir /tmp/mnt
zpool import
zpool import -fR /tmp/mnt mypool
zfs mount -a
ls
mypool var usr tmp

Where is boot, etc, bin, root?

zfs list
mypool/ROOT none

How do I mount my drive?
ZFS on root stripe0
314495 12 CUREENT
amd64 HP 15 Laptop A6-5200

I would love to learn so I can be a better tester.

I could the compile the new loader.efi after svn update. Thanks to all for
this fix. You are awesome! I love this community.

-R
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: input/output error @boot

2017-03-09 Thread Pete Wright



On 3/9/17 8:10 AM, Pete Wright wrote:



On 3/9/17 4:42 AM, Dexuan Cui wrote:

From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-
curr...@freebsd.org] On Behalf Of Pete Wright
Sent: Thursday, March 9, 2017 14:04
To: freebsd-current@freebsd.org
Subject: Re: input/output error @boot
On 3/8/17 10:00 PM, Dexuan Cui wrote:

For now, I suggest we should only apply the idea "reduce the size of
the
staging area if necessary" to VM running on Hyper-V, we should
restore the
old behavior on physical machines since that has been working for
people
for a long period of time, though it's  potentially unsafe.


+1

i'd like to see the old behaviour for physical machines to be restored
as well since this has rendered my drm-next test rig broken :(

-pete


Eventually I committed 314956 for the issue:
https://svnweb.freebsd.org/base?view=revision=314956
The old behaviour for physical machines are restored.

PS, I understand usually I should put the patch on phabricator for
review,
before it's committed, but since the issue here is critical, I
committed it
directly to unblock people first. Sorry.
Please comment on the patch if you think it needs rework  -- I hope
not. :-)



Thank you Dexuan - I will do a build today and reboot when I am home
from work tonight.  FWIW I verified that if I boot my system with in
"classic" BIOS mode I am able to load the kernel and go multi-user, so
this is probably the fix for me.



Happy to report that 314956 addresses the issue I was having with UEFI 
not booting.  Thanks for the fix!


-pete

--
Pete Wright
p...@nomadlogic.org
@nomadlogicLA
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: LOR on RPi3 r314894

2017-03-09 Thread Benjamin Kaduk
On Thu, Mar 09, 2017 at 07:01:07AM +0100, Gergely Czuczy wrote:
> 
> 
> On 2017. 03. 09. 6:59, Benjamin Kaduk wrote:
> > On Wed, Mar 08, 2017 at 02:06:33PM +0100, Gergely Czuczy wrote:
> >> On 2017. 03. 08. 13:06, Hans Petter Selasky wrote:
> >>> You might check the links on this page to see if your LOR is already
> >>> listed:
> >>>
> >>> https://wiki.freebsd.org/LOR
> >> Thank you, I wasn't aware of this page. It turns out it's already listed:
> >> http://sources.zabbadoz.net/freebsd/lor/238.html
> >>
> >> However, the last reported stamp shouldn't anymore be 2008-09-30 :)
> > The page is no longer actively maintained, unfortunately.
> I couldn't find it among the PRs, should I open one? Or since it was on 
> this page, is it already known? I have no idea whether it's a false 
> positive, but I'm surely not getting a panic.

I would say hold off on filing a PR unless there is a clear
panic/deadlock/etc.

-Ben
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r314708: panic: tdsendsignal: ksi on queue

2017-03-09 Thread Bryan Drewery
On 3/9/2017 4:59 PM, Bryan Drewery wrote:
> On 3/9/2017 3:57 PM, Bryan Drewery wrote:
>> On 3/9/2017 3:47 PM, Bryan Drewery wrote:
>>> On 3/9/2017 3:11 PM, Jilles Tjoelker wrote:
 On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:
> Yes, there is a race, apparently, with the child zombie still not 
> finishing
> sending the SIGCHLD to the parent and parent exiting.  The following 
> should
> fix the issue, but I do not think that reproducing the problem is easy.

> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
> index c524fe5df37..ba5ff84e9de 100644
> --- a/sys/kern/kern_exit.c
> +++ b/sys/kern/kern_exit.c
> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>  {
>   struct proc *p, *nq, *q, *t;
>   struct thread *tdt;
> + ksiginfo_t ksi;
>  
>   mtx_assert(, MA_NOTOWNED);
>   KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>   proc_reparent(q, q->p_reaper);
>   if (q->p_state == PRS_ZOMBIE) {
>   PROC_LOCK(q->p_reaper);
> - pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
> + if (q->p_ksi != NULL) {
> + ksiginfo_init();
> + ksiginfo_copy(q->p_ksi, );
> + }
> + pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
> + NULL ?  : NULL);
>   PROC_UNLOCK(q->p_reaper);
>   }
>   } else {
>>>
>>> I just got something weird with this patch that wasn't happening before:
>>>
>>> /usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j
>>> exp-10amd64 -p commit -z test devel/ccache
>>> [poudriere runs and completes with exit status 0]
 time: command terminated abnormally
  
28.08 real 9.92 user10.38 sys   
  
  23464  maximum resident set size  
  
   4996  average shared memory size 
  
 88  average unshared data size 
  
127  average unshared stack size
  
 282705  page reclaims  
  
   5623  page faults
  
  0  swaps  
  
   2673  block input operations 
  
   4836  block output operations
  
 33  messages sent  
  
  0  messages received  
  
 37  signals received   
  
  11226  voluntary context switches 
  
780  involuntary context switches   
  
 zsh: alarm  /usr/bin/time -l src/bin/poudriere -e /usr/local/etc 
 testport -j exp-10amd64
>>> exit status: 142 (SIGALRM).
>>>
>>> I don't see time(1) using SIGALRM or proc reaper at all.
>>>
>>> Rerunning it, and trying other simpler test cases, does not produce the
>>> same result.  It may be some race unrelated to this patch, dunno.
>>>
>>
>> I'm consistently getting foreground processes getting the wrong signals
>> now. I'm removing this patch for now.
> 
> It wasn't this patch doing this. Something else is very wrong with
> signal handling right now.
> 

False alarm.  This spurious SIGALRM issue is purely my fault. Ignore
those reports.

>>
>>>

 This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
 should always be the zombie's p_ksi; otherwise, the siginfo may be lost
 if there are too many signals pending for the target process or in the
 system. If the siginfo is lost and the reaper normally passes si_pid to
 waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
 will remain until the reaper terminates.

 Conceptually the siginfo is sent to one process at a time only, so the
 bug is an artifact of the implementation. Perhaps the piece of code
 added in r309886 can be moved or the ksiginfo can be removed from the

Re: r314708: panic: tdsendsignal: ksi on queue

2017-03-09 Thread Bryan Drewery
On 3/9/2017 3:57 PM, Bryan Drewery wrote:
> On 3/9/2017 3:47 PM, Bryan Drewery wrote:
>> On 3/9/2017 3:11 PM, Jilles Tjoelker wrote:
>>> On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:
 Yes, there is a race, apparently, with the child zombie still not finishing
 sending the SIGCHLD to the parent and parent exiting.  The following should
 fix the issue, but I do not think that reproducing the problem is easy.
>>>
 diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
 index c524fe5df37..ba5ff84e9de 100644
 --- a/sys/kern/kern_exit.c
 +++ b/sys/kern/kern_exit.c
 @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
  {
struct proc *p, *nq, *q, *t;
struct thread *tdt;
 +  ksiginfo_t ksi;
  
mtx_assert(, MA_NOTOWNED);
KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
 @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
proc_reparent(q, q->p_reaper);
if (q->p_state == PRS_ZOMBIE) {
PROC_LOCK(q->p_reaper);
 -  pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
 +  if (q->p_ksi != NULL) {
 +  ksiginfo_init();
 +  ksiginfo_copy(q->p_ksi, );
 +  }
 +  pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
 +  NULL ?  : NULL);
PROC_UNLOCK(q->p_reaper);
}
} else {
>>
>> I just got something weird with this patch that wasn't happening before:
>>
>> /usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j
>> exp-10amd64 -p commit -z test devel/ccache
>> [poudriere runs and completes with exit status 0]
>>> time: command terminated abnormally 
>>> 
>>>28.08 real 9.92 user10.38 sys
>>> 
>>>  23464  maximum resident set size   
>>> 
>>>   4996  average shared memory size  
>>> 
>>> 88  average unshared data size  
>>> 
>>>127  average unshared stack size 
>>> 
>>> 282705  page reclaims   
>>> 
>>>   5623  page faults 
>>> 
>>>  0  swaps   
>>> 
>>>   2673  block input operations  
>>> 
>>>   4836  block output operations 
>>> 
>>> 33  messages sent   
>>> 
>>>  0  messages received   
>>> 
>>> 37  signals received
>>> 
>>>  11226  voluntary context switches  
>>> 
>>>780  involuntary context switches
>>> 
>>> zsh: alarm  /usr/bin/time -l src/bin/poudriere -e /usr/local/etc 
>>> testport -j exp-10amd64
>> exit status: 142 (SIGALRM).
>>
>> I don't see time(1) using SIGALRM or proc reaper at all.
>>
>> Rerunning it, and trying other simpler test cases, does not produce the
>> same result.  It may be some race unrelated to this patch, dunno.
>>
> 
> I'm consistently getting foreground processes getting the wrong signals
> now. I'm removing this patch for now.

It wasn't this patch doing this. Something else is very wrong with
signal handling right now.

> 
>>
>>>
>>> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
>>> should always be the zombie's p_ksi; otherwise, the siginfo may be lost
>>> if there are too many signals pending for the target process or in the
>>> system. If the siginfo is lost and the reaper normally passes si_pid to
>>> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
>>> will remain until the reaper terminates.
>>>
>>> Conceptually the siginfo is sent to one process at a time only, so the
>>> bug is an artifact of the implementation. Perhaps the piece of code
>>> added in r309886 can be moved or the ksiginfo can be removed from the
>>> parent's queue.
>>>
>>> If such a fix is not possible, it may be better to send a bare SIGCHLD
>>> (si_code is SI_KERNEL or 0, depending on how many signals are pending)
>>> in this situation and document that reapers must use 

Re: r314708: panic: tdsendsignal: ksi on queue

2017-03-09 Thread Bryan Drewery
On 3/9/2017 3:47 PM, Bryan Drewery wrote:
> On 3/9/2017 3:11 PM, Jilles Tjoelker wrote:
>> On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:
>>> Yes, there is a race, apparently, with the child zombie still not finishing
>>> sending the SIGCHLD to the parent and parent exiting.  The following should
>>> fix the issue, but I do not think that reproducing the problem is easy.
>>
>>> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
>>> index c524fe5df37..ba5ff84e9de 100644
>>> --- a/sys/kern/kern_exit.c
>>> +++ b/sys/kern/kern_exit.c
>>> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>>>  {
>>> struct proc *p, *nq, *q, *t;
>>> struct thread *tdt;
>>> +   ksiginfo_t ksi;
>>>  
>>> mtx_assert(, MA_NOTOWNED);
>>> KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
>>> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>>> proc_reparent(q, q->p_reaper);
>>> if (q->p_state == PRS_ZOMBIE) {
>>> PROC_LOCK(q->p_reaper);
>>> -   pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
>>> +   if (q->p_ksi != NULL) {
>>> +   ksiginfo_init();
>>> +   ksiginfo_copy(q->p_ksi, );
>>> +   }
>>> +   pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
>>> +   NULL ?  : NULL);
>>> PROC_UNLOCK(q->p_reaper);
>>> }
>>> } else {
> 
> I just got something weird with this patch that wasn't happening before:
> 
> /usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j
> exp-10amd64 -p commit -z test devel/ccache
> [poudriere runs and completes with exit status 0]
>> time: command terminated abnormally  
>>
>>28.08 real 9.92 user10.38 sys 
>>
>>  23464  maximum resident set size
>>
>>   4996  average shared memory size   
>>
>> 88  average unshared data size   
>>
>>127  average unshared stack size  
>>
>> 282705  page reclaims
>>
>>   5623  page faults  
>>
>>  0  swaps
>>
>>   2673  block input operations   
>>
>>   4836  block output operations  
>>
>> 33  messages sent
>>
>>  0  messages received
>>
>> 37  signals received 
>>
>>  11226  voluntary context switches   
>>
>>780  involuntary context switches 
>>
>> zsh: alarm  /usr/bin/time -l src/bin/poudriere -e /usr/local/etc 
>> testport -j exp-10amd64
> exit status: 142 (SIGALRM).
> 
> I don't see time(1) using SIGALRM or proc reaper at all.
> 
> Rerunning it, and trying other simpler test cases, does not produce the
> same result.  It may be some race unrelated to this patch, dunno.
> 

I'm consistently getting foreground processes getting the wrong signals
now. I'm removing this patch for now.

> 
>>
>> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
>> should always be the zombie's p_ksi; otherwise, the siginfo may be lost
>> if there are too many signals pending for the target process or in the
>> system. If the siginfo is lost and the reaper normally passes si_pid to
>> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
>> will remain until the reaper terminates.
>>
>> Conceptually the siginfo is sent to one process at a time only, so the
>> bug is an artifact of the implementation. Perhaps the piece of code
>> added in r309886 can be moved or the ksiginfo can be removed from the
>> parent's queue.
>>
>> If such a fix is not possible, it may be better to send a bare SIGCHLD
>> (si_code is SI_KERNEL or 0, depending on how many signals are pending)
>> in this situation and document that reapers must use WAIT_ANY or P_ALL.
>> (However, compared to the pre-r309886 situation they can still use
>> SIGCHLD to get notified when to call waitpid() or similar.)
>>
> 
> 


-- 
Regards,
Bryan Drewery



signature.asc
Description: 

Re: r314708: panic: tdsendsignal: ksi on queue

2017-03-09 Thread Bryan Drewery
On 3/9/2017 3:11 PM, Jilles Tjoelker wrote:
> On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:
>> Yes, there is a race, apparently, with the child zombie still not finishing
>> sending the SIGCHLD to the parent and parent exiting.  The following should
>> fix the issue, but I do not think that reproducing the problem is easy.
> 
>> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
>> index c524fe5df37..ba5ff84e9de 100644
>> --- a/sys/kern/kern_exit.c
>> +++ b/sys/kern/kern_exit.c
>> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>>  {
>>  struct proc *p, *nq, *q, *t;
>>  struct thread *tdt;
>> +ksiginfo_t ksi;
>>  
>>  mtx_assert(, MA_NOTOWNED);
>>  KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
>> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>>  proc_reparent(q, q->p_reaper);
>>  if (q->p_state == PRS_ZOMBIE) {
>>  PROC_LOCK(q->p_reaper);
>> -pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
>> +if (q->p_ksi != NULL) {
>> +ksiginfo_init();
>> +ksiginfo_copy(q->p_ksi, );
>> +}
>> +pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
>> +NULL ?  : NULL);
>>  PROC_UNLOCK(q->p_reaper);
>>  }
>>  } else {

I just got something weird with this patch that wasn't happening before:

/usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j
exp-10amd64 -p commit -z test devel/ccache
[poudriere runs and completes with exit status 0]
> time: command terminated abnormally   
>   
>28.08 real 9.92 user10.38 sys  
>   
>  23464  maximum resident set size 
>   
>   4996  average shared memory size
>   
> 88  average unshared data size
>   
>127  average unshared stack size   
>   
> 282705  page reclaims 
>   
>   5623  page faults   
>   
>  0  swaps 
>   
>   2673  block input operations
>   
>   4836  block output operations   
>   
> 33  messages sent 
>   
>  0  messages received 
>   
> 37  signals received  
>   
>  11226  voluntary context switches
>   
>780  involuntary context switches  
>   
> zsh: alarm  /usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport 
> -j exp-10amd64
exit status: 142 (SIGALRM).

I don't see time(1) using SIGALRM or proc reaper at all.

Rerunning it, and trying other simpler test cases, does not produce the
same result.  It may be some race unrelated to this patch, dunno.


> 
> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
> should always be the zombie's p_ksi; otherwise, the siginfo may be lost
> if there are too many signals pending for the target process or in the
> system. If the siginfo is lost and the reaper normally passes si_pid to
> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
> will remain until the reaper terminates.
> 
> Conceptually the siginfo is sent to one process at a time only, so the
> bug is an artifact of the implementation. Perhaps the piece of code
> added in r309886 can be moved or the ksiginfo can be removed from the
> parent's queue.
> 
> If such a fix is not possible, it may be better to send a bare SIGCHLD
> (si_code is SI_KERNEL or 0, depending on how many signals are pending)
> in this situation and document that reapers must use WAIT_ANY or P_ALL.
> (However, compared to the pre-r309886 situation they can still use
> SIGCHLD to get notified when to call waitpid() or similar.)
> 


-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: r314708: panic: tdsendsignal: ksi on queue

2017-03-09 Thread Jilles Tjoelker
On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:
> Yes, there is a race, apparently, with the child zombie still not finishing
> sending the SIGCHLD to the parent and parent exiting.  The following should
> fix the issue, but I do not think that reproducing the problem is easy.

> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
> index c524fe5df37..ba5ff84e9de 100644
> --- a/sys/kern/kern_exit.c
> +++ b/sys/kern/kern_exit.c
> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>  {
>   struct proc *p, *nq, *q, *t;
>   struct thread *tdt;
> + ksiginfo_t ksi;
>  
>   mtx_assert(, MA_NOTOWNED);
>   KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>   proc_reparent(q, q->p_reaper);
>   if (q->p_state == PRS_ZOMBIE) {
>   PROC_LOCK(q->p_reaper);
> - pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
> + if (q->p_ksi != NULL) {
> + ksiginfo_init();
> + ksiginfo_copy(q->p_ksi, );
> + }
> + pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
> + NULL ?  : NULL);
>   PROC_UNLOCK(q->p_reaper);
>   }
>   } else {

This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
should always be the zombie's p_ksi; otherwise, the siginfo may be lost
if there are too many signals pending for the target process or in the
system. If the siginfo is lost and the reaper normally passes si_pid to
waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
will remain until the reaper terminates.

Conceptually the siginfo is sent to one process at a time only, so the
bug is an artifact of the implementation. Perhaps the piece of code
added in r309886 can be moved or the ksiginfo can be removed from the
parent's queue.

If such a fix is not possible, it may be better to send a bare SIGCHLD
(si_code is SI_KERNEL or 0, depending on how many signals are pending)
in this situation and document that reapers must use WAIT_ANY or P_ALL.
(However, compared to the pre-r309886 situation they can still use
SIGCHLD to get notified when to call waitpid() or similar.)

-- 
Jilles Tjoelker
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r314708: panic: tdsendsignal: ksi on queue

2017-03-09 Thread Bryan Drewery
On 3/9/2017 6:46 AM, Konstantin Belousov wrote:
> On Wed, Mar 08, 2017 at 09:00:17PM -0800, Bryan Drewery wrote:
>> I'm on r314708.  I hit ^C while running 'kyua test' in /usr/tests/bin/pwait.
>>
>>> panic: tdsendsignal: ksi on queue
>>> cpuid = 10
>>
>>> #10 kdb_enter (why=0x814488f5 "panic", msg=) at 
>>> /usr/src/sys/kern/subr_kdb.c:444
>>> #11 0x80a577f3 in vpanic (fmt=, 
>>> ap=0xfe35601a3620) at /usr/src/sys/kern/kern_shutdown.c:772
>>> #12 0x80a5764f in _kassert_panic (fatal=1, fmt=0x81448fd7 
>>> "%s: ksi on queue") at /usr/src/sys/kern/kern_shutdown.c:669
>>> #13 0x80a5c843 in tdsendsignal (p=0xf80c39389a80, td=0x0, 
>>> sig=20, ksi=0xf803888a2bd0) at /usr/src/sys/kern/kern_sig.c:2095
>>> #14 0x80a13828 in exit1 (td=, rval=, 
>>> signo=) at /usr/src/sys/kern/kern_exit.c:459
>>> #15 0x80a5b28c in sigexit (td=0xf802f0bee000, sig=9) at 
>>> /usr/src/sys/kern/kern_sig.c:3081
>>> #16 0x80a5b88e in postsig (sig=9) at 
>>> /usr/src/sys/kern/kern_sig.c:2992
>>> #17 0x80a5b56b in kern_sigsuspend (td=0xf802f0bee000, mask=...) 
>>> at /usr/src/sys/kern/kern_sig.c:1515
>>> #18 0x80a5b441 in sys_sigsuspend (td=0xf802f0bee000, 
>>> uap=) at /usr/src/sys/kern/kern_sig.c:1479
>>> #19 0x80ee04da in syscallenter (td=0xf802f0bee000, 
>>> sa=) at 
>>> /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
>>> #20 amd64_syscall (td=0xf802f0bee000, traced=0) at 
>>> /usr/src/sys/amd64/amd64/trap.c:902
>>
>>
>>> (kgdb) frame 18
>>> #18 0x80a5b441 in sys_sigsuspend (td=0xf802f0bee000, 
>>> uap=) at /usr/src/sys/kern/kern_sig.c:1479
>>> 1479return (kern_sigsuspend(td, mask));
>>
>>> (kgdb) p td->td_proc->p_comm
>>> $3 = "timeout", '\000' 
>>
>>> (kgdb) frame 13
>>> #13 0x80a5c843 in tdsendsignal (p=0xf80c39389a80, td=0x0, 
>>> sig=20, ksi=0xf803888a2bd0) at /usr/src/sys/kern/kern_sig.c:2095
>>> 2095KASSERT(ksi == NULL || !KSI_ONQ(ksi), ("%s: ksi on queue", 
>>> __func__));
>>> (kgdb) p *ksi
>>> $4 = {ksi_link = {tqe_next = 0x0, tqe_prev = 0xf80c39389c58}, ksi_info 
>>> = {si_signo = 20, si_errno = 0, si_code = 2, si_pid = 90903, si_uid = 0, 
>>> si_status = 9, si_addr = 0x0, si_value = {
>>>   sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, 
>>> _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, 
>>> _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {
>>> __spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0, ksi_flags = 
>>> 6, ksi_sigq = 0xf80c39389c28}
>>
>>> (kgdb) p *ksi->ksi_sigq
>>> $6 = {sq_signals = {__bits = {524288, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 
>>> 0, 0}}, sq_ptrace = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 
>>> 0xf803888a2bd0, tqh_last = 0xf803888a2bd0},
> 
> Yes, there is a race, apparently, with the child zombie still not finishing
> sending the SIGCHLD to the parent and parent exiting.  The following should
> fix the issue, but I do not think that reproducing the problem is easy.
> 

Thanks, I've applied it locally.

> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
> index c524fe5df37..ba5ff84e9de 100644
> --- a/sys/kern/kern_exit.c
> +++ b/sys/kern/kern_exit.c
> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>  {
>   struct proc *p, *nq, *q, *t;
>   struct thread *tdt;
> + ksiginfo_t ksi;
>  
>   mtx_assert(, MA_NOTOWNED);
>   KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>   proc_reparent(q, q->p_reaper);
>   if (q->p_state == PRS_ZOMBIE) {
>   PROC_LOCK(q->p_reaper);
> - pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
> + if (q->p_ksi != NULL) {
> + ksiginfo_init();
> + ksiginfo_copy(q->p_ksi, );
> + }
> + pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
> + NULL ?  : NULL);
>   PROC_UNLOCK(q->p_reaper);
>   }
>   } else {
> 


-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


r314708: panic: Assertion err == 0 failed at /usr/src/sys/net/iflib.c:2241

2017-03-09 Thread Bryan Drewery
This came up at shutdown in r314708. I don't yet know if I will have a
core to diagnose.

> panic: Assertion err == 0 failed at /usr/src/sys/net/iflib.c:2241
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe349a7f9940
> vpanic() at vpanic+0x186/frame 0xfe349a7f99c0
> _kassert_panic() at _kassert_panic+0x12f/frame 0xfe349a7f9a40
> _task_fn_rx() at _task_fn_rx+0x19d/frame 0xfe349a7f9b20
> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x139/frame 
> 0xfe349a7f9b80
> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 
> 0xfe349a7f9bb0
> fork_exit() at fork_exit+0x84/frame 0xfe349a7f9bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe349a7f9bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 0 tid 100038 ]
> Stopped at  kdb_enter+0x3b: movq$0,kdb_why

-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: hang in dlclose() on rtld lock (powerpc64)

2017-03-09 Thread Justin Hibbits
On Thursday, March 9, 2017, Konstantin Belousov  wrote:

> On Thu, Mar 09, 2017 at 09:59:00AM -0600, Justin Hibbits wrote:
> > When building ports in poudriere, I see gdk-pixbuf-query-modules and
> > gio-querymodules hanging on r314676, but working in r305820.  I took a
> > backtrace on both in gdb, and see the following (identical between both):
> >
> > Program received signal SIGINT, Interrupt.
> > 0x506831d8 in .__sys.umtx_op () from /lib/libc.so.7
> > (gdb) bt
> > #0  0x506831d8 in .__sys.umtx_op () from /lib/libc.so.7
> > #1  0x50588010 in _umtx_op_err (obj=0x4, op=13, val=0, uaddr=0x0,
> > uaddr2=0x0)
> > at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.c:37
> > #2  0x505881b8 in __thr_rwlock_wrlock (rwlock=,
> > tsp=)
> > at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.c:325
> > #3  0x505965f0 in _thr_rwlock_wrlock (tsp=,
> > rwlock=)
> > at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.h:239
> > #4  _thr_rtld_wlock_acquire (lock=0x505bdd00)
> > at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_rtld.c:141
> > #5  0x50026bf4 in wlock_acquire (lock=0x5004cf20 ,
> > lockstate=0xcab0)
> > at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld_lock.c:222
> > #6  0x50022b1c in dlclose (handle=0x51d62000)
> > at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:3021
> > #7  0x50022c90 in free_needed_filtees (n=0x509f7420)
> > at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:2113
> > #8  0x50022d18 in unload_filtees (obj=0x509fa800)
> > at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:2129
> > #9  0x50022e54 in unload_object (root=)
> > at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:4464
> > #10 0x50022c20 in dlclose (handle=0x50054000)
> > at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:3044
> > ---Type  to continue, or q  to quit---q
> >
> >
> > This happens on powerpc64.  I haven't tested on powerpc or any other
> arch.
>
> Please test the following patch.  It avoids recursing on the bind lock.
>
> diff --git a/libexec/rtld-elf/rtld.c b/libexec/rtld-elf/rtld.c
> index a7c61b2d13f..880cf100c45 100644
> --- a/libexec/rtld-elf/rtld.c
> +++ b/libexec/rtld-elf/rtld.c
> @@ -77,6 +77,7 @@ static void digest_dynamic2(Obj_Entry *, const Elf_Dyn
> *, const Elf_Dyn *,
>  static void digest_dynamic(Obj_Entry *, int);
>  static Obj_Entry *digest_phdr(const Elf_Phdr *, int, caddr_t, const char
> *);
>  static Obj_Entry *dlcheck(void *);
> +static int dlclose_locked(void *, RtldLockState *);
>  static Obj_Entry *dlopen_object(const char *name, int fd, Obj_Entry
> *refobj,
>  int lo_flags, int mode, RtldLockState *lockstate);
>  static Obj_Entry *do_load_object(int, const char *, char *, struct stat
> *, int);
> @@ -98,7 +99,7 @@ static void initlist_add_objects(Obj_Entry *, Obj_Entry
> *, Objlist *);
>  static void linkmap_add(Obj_Entry *);
>  static void linkmap_delete(Obj_Entry *);
>  static void load_filtees(Obj_Entry *, int flags, RtldLockState *);
> -static void unload_filtees(Obj_Entry *);
> +static void unload_filtees(Obj_Entry *, RtldLockState *);
>  static int load_needed_objects(Obj_Entry *, int);
>  static int load_preload_objects(void);
>  static Obj_Entry *load_object(const char *, int fd, const Obj_Entry *,
> int);
> @@ -142,7 +143,7 @@ static int symlook_obj1_sysv(SymLook *, const
> Obj_Entry *);
>  static int symlook_obj1_gnu(SymLook *, const Obj_Entry *);
>  static void trace_loaded_objects(Obj_Entry *);
>  static void unlink_object(Obj_Entry *);
> -static void unload_object(Obj_Entry *);
> +static void unload_object(Obj_Entry *, RtldLockState *lockstate);
>  static void unref_dag(Obj_Entry *);
>  static void ref_dag(Obj_Entry *);
>  static char *origin_subst_one(Obj_Entry *, char *, const char *,
> @@ -2104,13 +2105,13 @@ initlist_add_objects(Obj_Entry *obj, Obj_Entry
> *tail, Objlist *list)
>  #endif
>
>  static void
> -free_needed_filtees(Needed_Entry *n)
> +free_needed_filtees(Needed_Entry *n, RtldLockState *lockstate)
>  {
>  Needed_Entry *needed, *needed1;
>
>  for (needed = n; needed != NULL; needed = needed->next) {
> if (needed->obj != NULL) {
> -   dlclose(needed->obj);
> +   dlclose_locked(needed->obj, lockstate);
> needed->obj = NULL;
> }
>  }
> @@ -2121,14 +2122,14 @@ free_needed_filtees(Needed_Entry *n)
>  }
>
>  static void
> -unload_filtees(Obj_Entry *obj)
> +unload_filtees(Obj_Entry *obj, RtldLockState *lockstate)
>  {
>
> -free_needed_filtees(obj->needed_filtees);
> -obj->needed_filtees = NULL;
> -free_needed_filtees(obj->needed_aux_filtees);
> -obj->needed_aux_filtees = NULL;
> -obj->filtees_loaded = false;
> +   free_needed_filtees(obj->needed_filtees, lockstate);
> +   obj->needed_filtees = NULL;
> +   

Re: process killed: text file modification

2017-03-09 Thread Gergely Czuczy



On 2017. 03. 09. 20:47, Gergely Czuczy wrote:



On 2017. 03. 09. 19:44, John Baldwin wrote:

On Thursday, March 09, 2017 03:31:56 PM Gergely Czuczy wrote:

[+freebsd-fs]


On 2017. 03. 09. 14:20, Gergely Czuczy wrote:

On 2017. 03. 09. 11:27, Gergely Czuczy wrote:

Hello,

I'm trying to build a few things from ports on an rpi3, the ports
collection is mounted over NFS from another machine. When it's trying
to build pkg i'm getting the error message in syslog:

rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file modification

The report to pkg@:
https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/002048.html 



In ports-mgmt/pkg's config.log It fails at the following entry:
configure:3726: checking whether we are cross compiling
configure:3734: cc -o conftest -O2 -pipe  -Wno-error
-fno-strict-aliasing   conftest.c  >&5
configure:3738: $? = 0
configure:3745: ./conftest
configure:3749: $? = 137
configure:3756: error: in 
`/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0':

configure:3760: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details

# uname -a
FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949: Thu Mar 9
08:58:46 CET 2017
ae...@marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.aarch64/tank/rpi3/src/sys/AEGIR 


arm64

So far, a few additions:
Time is synced between the NFS server and the client.
it's an open() call which is getting the kill, and it's not the file
what's being opened, but the process executing it.
Here's a simple code that reproduces it:
#include 

int main() {

   FILE *f = fopen ("/bar", "w");

   fclose(f);
   return 0;
}

Conditions to reproduce it:
  - The resulting binary must be executed from the nfs mount
  - The binary must be built after mounting the NFS share.

I haven't tried building it on a different host, I don't have access
to multiple RPis. Also, if I build the binary, umount/remount the NFS
mount point, which has the binary, execute it, then it works.

I've also tried this with the raspbsd.org's image, I could reproduce
it as well.

Another interesting thing is, when I first booted the RPi up, the NFS
server was a 10.2-STABLE, and later got updated to 11-STABLE. While it
was 10.2 I've tried to build some port, and I don't remember having
this issue.

So, could someone please help me figure this out and fix it? This
stuff should work pretty much.


So, this error message comes from here:
https://svnweb.freebsd.org/base/head/sys/fs/nfsclient/nfs_clbio.c?revision=314436=markup#l1674 



It's the NFS_TIMESPEC_COMPARE(>n_mtime, >n_vattr.na_mtime)
comparision that fails, np should be the NFS node structure, from the
vnode's v_data, and n_vattr is the attribute cache. As I've seen these
two are being updated together, so I don't really see by the code why
they might differ. Could someone please take a look at it, with more
experience in the NFS code? -czg

Can you print out the two mtimes?  I wonder if what's happening is that
your server uses different granularity (for example just seconds) than
your client, so on the client we generate a timestamp with a non-zero
nanoseconds but when the server receives that timestamp it "truncates"
it.  During open() we forcefully re-fetch the timestamp (for CTO
consistency) and then notice it doesn't match.  For now I would start
with comparing the timestamps and maybe the vfs.timestamp_precision
sysctls on client and server (if server is a FreeBSD box).

Here are the time values:
Mar  9 19:46:01 rpi3 kernel: np->n_mtime: -3298114786344 + 
-3298114786336  >n_vattr.na_mtime: -3298114786616 + -3298114786608
Mar  9 19:46:01 rpi3 kernel: pid 912 (csh), uid 0, was killed: text 
file modification
Mar  9 19:46:01 rpi3 kernel: np->n_mtime: -3298114786344 + 
-3298114786336  >n_vattr.na_mtime: -3298114786616 + -3298114786608
Mar  9 19:46:01 rpi3 kernel: pid 912 (csh), uid 0, was killed: text 
file modification


Printed this way:
 printf("np->n_mtime: %ji + %ji 
>n_vattr.na_mtime: %ji + %ji",

(intmax_t)(>n_mtime.tv_sec),
(intmax_t)(>n_mtime.tv_nsec),
(intmax_t)(>n_vattr.na_mtime.tv_sec),
(intmax_t)(>n_vattr.na_mtime.tv_nsec));

Sorry, I made a typo there. Here's it now:
Mar  9 20:05:35 rpi3 kernel: np->n_mtime: 1489089935 + 219323000 
>n_vattr.na_mtime: 1489089935 + 221438000
Mar  9 20:05:35 rpi3 kernel: pid 847 (csh), uid 0, was killed: text file 
modification
Mar  9 20:05:35 rpi3 kernel: np->n_mtime: 1489089935 + 219323000 
>n_vattr.na_mtime: 1489089935 + 221438000
Mar  9 20:05:35 rpi3 kernel: pid 847 (csh), uid 0, was killed: text file 
modification


That's a difference of 2115 micro seconds.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-09 Thread Gergely Czuczy



On 2017. 03. 09. 19:44, John Baldwin wrote:

On Thursday, March 09, 2017 03:31:56 PM Gergely Czuczy wrote:

[+freebsd-fs]


On 2017. 03. 09. 14:20, Gergely Czuczy wrote:

On 2017. 03. 09. 11:27, Gergely Czuczy wrote:

Hello,

I'm trying to build a few things from ports on an rpi3, the ports
collection is mounted over NFS from another machine. When it's trying
to build pkg i'm getting the error message in syslog:

rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file modification

The report to pkg@:
https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/002048.html

In ports-mgmt/pkg's config.log It fails at the following entry:
configure:3726: checking whether we are cross compiling
configure:3734: cc -o conftest -O2 -pipe  -Wno-error
-fno-strict-aliasing   conftest.c  >&5
configure:3738: $? = 0
configure:3745: ./conftest
configure:3749: $? = 137
configure:3756: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0':
configure:3760: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details

# uname -a
FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949: Thu Mar 9
08:58:46 CET 2017
ae...@marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.aarch64/tank/rpi3/src/sys/AEGIR
arm64

So far, a few additions:
Time is synced between the NFS server and the client.
it's an open() call which is getting the kill, and it's not the file
what's being opened, but the process executing it.
Here's a simple code that reproduces it:
#include 

int main() {

   FILE *f = fopen ("/bar", "w");

   fclose(f);
   return 0;
}

Conditions to reproduce it:
  - The resulting binary must be executed from the nfs mount
  - The binary must be built after mounting the NFS share.

I haven't tried building it on a different host, I don't have access
to multiple RPis. Also, if I build the binary, umount/remount the NFS
mount point, which has the binary, execute it, then it works.

I've also tried this with the raspbsd.org's image, I could reproduce
it as well.

Another interesting thing is, when I first booted the RPi up, the NFS
server was a 10.2-STABLE, and later got updated to 11-STABLE. While it
was 10.2 I've tried to build some port, and I don't remember having
this issue.

So, could someone please help me figure this out and fix it? This
stuff should work pretty much.


So, this error message comes from here:
https://svnweb.freebsd.org/base/head/sys/fs/nfsclient/nfs_clbio.c?revision=314436=markup#l1674

It's the NFS_TIMESPEC_COMPARE(>n_mtime, >n_vattr.na_mtime)
comparision that fails, np should be the NFS node structure, from the
vnode's v_data, and n_vattr is the attribute cache. As I've seen these
two are being updated together, so I don't really see by the code why
they might differ. Could someone please take a look at it, with more
experience in the NFS code? -czg

Can you print out the two mtimes?  I wonder if what's happening is that
your server uses different granularity (for example just seconds) than
your client, so on the client we generate a timestamp with a non-zero
nanoseconds but when the server receives that timestamp it "truncates"
it.  During open() we forcefully re-fetch the timestamp (for CTO
consistency) and then notice it doesn't match.  For now I would start
with comparing the timestamps and maybe the vfs.timestamp_precision
sysctls on client and server (if server is a FreeBSD box).

Here are the time values:
Mar  9 19:46:01 rpi3 kernel: np->n_mtime: -3298114786344 + 
-3298114786336  >n_vattr.na_mtime: -3298114786616 + -3298114786608
Mar  9 19:46:01 rpi3 kernel: pid 912 (csh), uid 0, was killed: text file 
modification
Mar  9 19:46:01 rpi3 kernel: np->n_mtime: -3298114786344 + 
-3298114786336  >n_vattr.na_mtime: -3298114786616 + -3298114786608
Mar  9 19:46:01 rpi3 kernel: pid 912 (csh), uid 0, was killed: text file 
modification


Printed this way:
 printf("np->n_mtime: %ji + %ji 
>n_vattr.na_mtime: %ji + %ji",

(intmax_t)(>n_mtime.tv_sec),
(intmax_t)(>n_mtime.tv_nsec),
(intmax_t)(>n_vattr.na_mtime.tv_sec),
(intmax_t)(>n_vattr.na_mtime.tv_nsec));


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to SVN regenerate [ man awk ]

2017-03-09 Thread Ryan Stone
On Thu, Mar 9, 2017 at 10:00 AM, Jeffrey Bouquet 
wrote:

> For $giggles$ I svn up /usr/src/usr.bin/awk  or wherever, then
> man awk displays not the newer import per a recent SVN but
> the older 2015 [ it says ] one.  Stale file, or not all parts of
> the man page updated to include latest revision dat, or some
> other command to [g]unzip or whatever, besides 320.whatis
> in periodic--weekly, update the compressed latest installed
> files from /usr/obj to what one expects when one has just
> recompiled the  man page?
>

Any chance that there is an obsolete copy of the manpage in
/usr/share/man/cat1?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-09 Thread John Baldwin
On Thursday, March 09, 2017 03:31:56 PM Gergely Czuczy wrote:
> [+freebsd-fs]
> 
> 
> On 2017. 03. 09. 14:20, Gergely Czuczy wrote:
> > On 2017. 03. 09. 11:27, Gergely Czuczy wrote:
> >> Hello,
> >>
> >> I'm trying to build a few things from ports on an rpi3, the ports 
> >> collection is mounted over NFS from another machine. When it's trying 
> >> to build pkg i'm getting the error message in syslog:
> >>
> >> rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file modification
> >>
> >> The report to pkg@:
> >> https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/002048.html
> >>
> >> In ports-mgmt/pkg's config.log It fails at the following entry:
> >> configure:3726: checking whether we are cross compiling
> >> configure:3734: cc -o conftest -O2 -pipe  -Wno-error 
> >> -fno-strict-aliasing   conftest.c  >&5
> >> configure:3738: $? = 0
> >> configure:3745: ./conftest
> >> configure:3749: $? = 137
> >> configure:3756: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0':
> >> configure:3760: error: cannot run C compiled programs.
> >> If you meant to cross compile, use `--host'.
> >> See `config.log' for more details
> >>
> >> # uname -a
> >> FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949: Thu Mar 9 
> >> 08:58:46 CET 2017 
> >> ae...@marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.aarch64/tank/rpi3/src/sys/AEGIR
> >>  
> >> arm64
> > So far, a few additions:
> > Time is synced between the NFS server and the client.
> > it's an open() call which is getting the kill, and it's not the file 
> > what's being opened, but the process executing it.
> > Here's a simple code that reproduces it:
> > #include 
> >
> > int main() {
> >
> >   FILE *f = fopen ("/bar", "w");
> >
> >   fclose(f);
> >   return 0;
> > }
> >
> > Conditions to reproduce it:
> >  - The resulting binary must be executed from the nfs mount
> >  - The binary must be built after mounting the NFS share.
> >
> > I haven't tried building it on a different host, I don't have access 
> > to multiple RPis. Also, if I build the binary, umount/remount the NFS 
> > mount point, which has the binary, execute it, then it works.
> >
> > I've also tried this with the raspbsd.org's image, I could reproduce 
> > it as well.
> >
> > Another interesting thing is, when I first booted the RPi up, the NFS 
> > server was a 10.2-STABLE, and later got updated to 11-STABLE. While it 
> > was 10.2 I've tried to build some port, and I don't remember having 
> > this issue.
> >
> > So, could someone please help me figure this out and fix it? This 
> > stuff should work pretty much.
> >
> So, this error message comes from here:
> https://svnweb.freebsd.org/base/head/sys/fs/nfsclient/nfs_clbio.c?revision=314436=markup#l1674
> 
> It's the NFS_TIMESPEC_COMPARE(>n_mtime, >n_vattr.na_mtime) 
> comparision that fails, np should be the NFS node structure, from the 
> vnode's v_data, and n_vattr is the attribute cache. As I've seen these 
> two are being updated together, so I don't really see by the code why 
> they might differ. Could someone please take a look at it, with more 
> experience in the NFS code? -czg

Can you print out the two mtimes?  I wonder if what's happening is that
your server uses different granularity (for example just seconds) than
your client, so on the client we generate a timestamp with a non-zero
nanoseconds but when the server receives that timestamp it "truncates"
it.  During open() we forcefully re-fetch the timestamp (for CTO
consistency) and then notice it doesn't match.  For now I would start
with comparing the timestamps and maybe the vfs.timestamp_precision
sysctls on client and server (if server is a FreeBSD box).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to SVN regenerate [ man awk ]

2017-03-09 Thread Jeffrey Bouquet


On Thu, 9 Mar 2017 08:43:29 -0800, David Wolfskill  wrote:

> On Thu, Mar 09, 2017 at 07:00:13AM -0800, Jeffrey Bouquet wrote:
> > For $giggles$ I svn up /usr/src/usr.bin/awk  or wherever, then
> > man awk displays not the newer import per a recent SVN but
> > the older 2015 [ it says ] one.  Stale file, or not all parts of
> > the man page updated to include latest revision dat, or some
> > other command to [g]unzip or whatever, besides 320.whatis
> > in periodic--weekly, update the compressed latest installed
> > files from /usr/obj to what one expects when one has just
> > recompiled the  man page?
> 
> If you intend to use "svn up", you should probably review, and
> follow the instructions in, /usr/src/UPDATING.

but just for one binary?  and one man page update? 
As in, it is only two files, how to update singly if does not require a 
buildworld...

> 
> > This crops up quite a lot on this machine, so I am unschooled in
manpath
(Warning: MANPATH environment variable set)

/usr/share/man:/usr/local/man:/usr/share/openssl/man:/usr/local/lib/node_modules/npm/man:/usr/local/lib/perl5/site_perl/man:/usr/local/lib/perl5/5.24/perl/man:/usr/local/share/xpdf/man



> > some principle of updating this operating system.  
> > 
> > If it matters, I receive a 
> > 
> > WARNING manpath environment variable set
> > 
> > when starting an additional
> > xterm & ... 
> 
> You may have code in your login shell's initialization file that sets
> MANPATH
> 

MANPATH="`manpath`"

> Peace,
> david
> -- 
> David H. Wolfskillda...@catwhisker.org
> How could one possibly "respect" a misogynist, racist, bullying con-man??!?
> 
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: localedef broken on current amd64

2017-03-09 Thread Pedro Giffuni

Yes, I am looking at it,

I suspect there is an underlying bug but I will be reverting the latest 
change.


Thank you,

Pedro.


On 3/9/2017 12:25 PM, Manfred Antar wrote:

I rebuilt localedef on current this morning.
doing a make buildworld:

===> colldef (all)
localedef -D -U -i /usr/src/share/colldef/af_ZA.UTF-8.src  -f 
/usr/src/tools/tools/locale/etc/final-maps/map.UTF-8 
/usr/obj/usr/src/share/colldef/af_ZA.UTF-8
/usr/src/share/colldef/af_ZA.UTF-8.src: 2421: error: Bad file descriptor
*** Error code 4

worked fine yesterday


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


localedef broken on current amd64

2017-03-09 Thread Manfred Antar
I rebuilt localedef on current this morning.
doing a make buildworld:

===> colldef (all)
localedef -D -U -i /usr/src/share/colldef/af_ZA.UTF-8.src  -f 
/usr/src/tools/tools/locale/etc/final-maps/map.UTF-8 
/usr/obj/usr/src/share/colldef/af_ZA.UTF-8
/usr/src/share/colldef/af_ZA.UTF-8.src: 2421: error: Bad file descriptor
*** Error code 4

worked fine yesterday
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to SVN regenerate [ man awk ]

2017-03-09 Thread David Wolfskill
On Thu, Mar 09, 2017 at 07:00:13AM -0800, Jeffrey Bouquet wrote:
> For $giggles$ I svn up /usr/src/usr.bin/awk  or wherever, then
> man awk displays not the newer import per a recent SVN but
> the older 2015 [ it says ] one.  Stale file, or not all parts of
> the man page updated to include latest revision dat, or some
> other command to [g]unzip or whatever, besides 320.whatis
> in periodic--weekly, update the compressed latest installed
> files from /usr/obj to what one expects when one has just
> recompiled the  man page?

If you intend to use "svn up", you should probably review, and
follow the instructions in, /usr/src/UPDATING.

> This crops up quite a lot on this machine, so I am unschooled in
> some principle of updating this operating system.  
> 
> If it matters, I receive a 
> 
> WARNING manpath environment variable set
> 
> when starting an additional
> xterm & ... 

You may have code in your login shell's initialization file that sets
MANPATH

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
How could one possibly "respect" a misogynist, racist, bullying con-man??!?

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: hang in dlclose() on rtld lock (powerpc64)

2017-03-09 Thread Konstantin Belousov
On Thu, Mar 09, 2017 at 09:59:00AM -0600, Justin Hibbits wrote:
> When building ports in poudriere, I see gdk-pixbuf-query-modules and
> gio-querymodules hanging on r314676, but working in r305820.  I took a
> backtrace on both in gdb, and see the following (identical between both):
> 
> Program received signal SIGINT, Interrupt.
> 0x506831d8 in .__sys.umtx_op () from /lib/libc.so.7
> (gdb) bt
> #0  0x506831d8 in .__sys.umtx_op () from /lib/libc.so.7
> #1  0x50588010 in _umtx_op_err (obj=0x4, op=13, val=0, uaddr=0x0,
> uaddr2=0x0)
> at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.c:37
> #2  0x505881b8 in __thr_rwlock_wrlock (rwlock=,
> tsp=)
> at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.c:325
> #3  0x505965f0 in _thr_rwlock_wrlock (tsp=,
> rwlock=)
> at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.h:239
> #4  _thr_rtld_wlock_acquire (lock=0x505bdd00)
> at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_rtld.c:141
> #5  0x50026bf4 in wlock_acquire (lock=0x5004cf20 ,
> lockstate=0xcab0)
> at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld_lock.c:222
> #6  0x50022b1c in dlclose (handle=0x51d62000)
> at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:3021
> #7  0x50022c90 in free_needed_filtees (n=0x509f7420)
> at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:2113
> #8  0x50022d18 in unload_filtees (obj=0x509fa800)
> at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:2129
> #9  0x50022e54 in unload_object (root=)
> at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:4464
> #10 0x50022c20 in dlclose (handle=0x50054000)
> at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:3044
> ---Type  to continue, or q  to quit---q
> 
> 
> This happens on powerpc64.  I haven't tested on powerpc or any other arch.

Please test the following patch.  It avoids recursing on the bind lock.

diff --git a/libexec/rtld-elf/rtld.c b/libexec/rtld-elf/rtld.c
index a7c61b2d13f..880cf100c45 100644
--- a/libexec/rtld-elf/rtld.c
+++ b/libexec/rtld-elf/rtld.c
@@ -77,6 +77,7 @@ static void digest_dynamic2(Obj_Entry *, const Elf_Dyn *, 
const Elf_Dyn *,
 static void digest_dynamic(Obj_Entry *, int);
 static Obj_Entry *digest_phdr(const Elf_Phdr *, int, caddr_t, const char *);
 static Obj_Entry *dlcheck(void *);
+static int dlclose_locked(void *, RtldLockState *);
 static Obj_Entry *dlopen_object(const char *name, int fd, Obj_Entry *refobj,
 int lo_flags, int mode, RtldLockState *lockstate);
 static Obj_Entry *do_load_object(int, const char *, char *, struct stat *, 
int);
@@ -98,7 +99,7 @@ static void initlist_add_objects(Obj_Entry *, Obj_Entry *, 
Objlist *);
 static void linkmap_add(Obj_Entry *);
 static void linkmap_delete(Obj_Entry *);
 static void load_filtees(Obj_Entry *, int flags, RtldLockState *);
-static void unload_filtees(Obj_Entry *);
+static void unload_filtees(Obj_Entry *, RtldLockState *);
 static int load_needed_objects(Obj_Entry *, int);
 static int load_preload_objects(void);
 static Obj_Entry *load_object(const char *, int fd, const Obj_Entry *, int);
@@ -142,7 +143,7 @@ static int symlook_obj1_sysv(SymLook *, const Obj_Entry *);
 static int symlook_obj1_gnu(SymLook *, const Obj_Entry *);
 static void trace_loaded_objects(Obj_Entry *);
 static void unlink_object(Obj_Entry *);
-static void unload_object(Obj_Entry *);
+static void unload_object(Obj_Entry *, RtldLockState *lockstate);
 static void unref_dag(Obj_Entry *);
 static void ref_dag(Obj_Entry *);
 static char *origin_subst_one(Obj_Entry *, char *, const char *,
@@ -2104,13 +2105,13 @@ initlist_add_objects(Obj_Entry *obj, Obj_Entry *tail, 
Objlist *list)
 #endif
 
 static void
-free_needed_filtees(Needed_Entry *n)
+free_needed_filtees(Needed_Entry *n, RtldLockState *lockstate)
 {
 Needed_Entry *needed, *needed1;
 
 for (needed = n; needed != NULL; needed = needed->next) {
if (needed->obj != NULL) {
-   dlclose(needed->obj);
+   dlclose_locked(needed->obj, lockstate);
needed->obj = NULL;
}
 }
@@ -2121,14 +2122,14 @@ free_needed_filtees(Needed_Entry *n)
 }
 
 static void
-unload_filtees(Obj_Entry *obj)
+unload_filtees(Obj_Entry *obj, RtldLockState *lockstate)
 {
 
-free_needed_filtees(obj->needed_filtees);
-obj->needed_filtees = NULL;
-free_needed_filtees(obj->needed_aux_filtees);
-obj->needed_aux_filtees = NULL;
-obj->filtees_loaded = false;
+   free_needed_filtees(obj->needed_filtees, lockstate);
+   obj->needed_filtees = NULL;
+   free_needed_filtees(obj->needed_aux_filtees, lockstate);
+   obj->needed_aux_filtees = NULL;
+   obj->filtees_loaded = false;
 }
 
 static void
@@ -3015,15 +3016,23 @@ search_library_pathfds(const char *name, const char 
*path, int *fdp)
 int
 dlclose(void *handle)
 {
+   RtldLockState lockstate;
+   int 

Re: input/output error @boot

2017-03-09 Thread Pete Wright



On 3/9/17 4:42 AM, Dexuan Cui wrote:

From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-
curr...@freebsd.org] On Behalf Of Pete Wright
Sent: Thursday, March 9, 2017 14:04
To: freebsd-current@freebsd.org
Subject: Re: input/output error @boot
On 3/8/17 10:00 PM, Dexuan Cui wrote:

For now, I suggest we should only apply the idea "reduce the size of the
staging area if necessary" to VM running on Hyper-V, we should restore the
old behavior on physical machines since that has been working for people
for a long period of time, though it's  potentially unsafe.


+1

i'd like to see the old behaviour for physical machines to be restored
as well since this has rendered my drm-next test rig broken :(

-pete


Eventually I committed 314956 for the issue:
https://svnweb.freebsd.org/base?view=revision=314956
The old behaviour for physical machines are restored.

PS, I understand usually I should put the patch on phabricator for review,
before it's committed, but since the issue here is critical, I committed it
directly to unblock people first. Sorry.
Please comment on the patch if you think it needs rework  -- I hope not. :-)



Thank you Dexuan - I will do a build today and reboot when I am home 
from work tonight.  FWIW I verified that if I boot my system with in 
"classic" BIOS mode I am able to load the kernel and go multi-user, so 
this is probably the fix for me.


Cheers!
-pete

--
Pete Wright
p...@nomadlogic.org
@nomadlogicLA
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


how to SVN regenerate [ man awk ]

2017-03-09 Thread Jeffrey Bouquet
For $giggles$ I svn up /usr/src/usr.bin/awk  or wherever, then
man awk displays not the newer import per a recent SVN but
the older 2015 [ it says ] one.  Stale file, or not all parts of
the man page updated to include latest revision dat, or some
other command to [g]unzip or whatever, besides 320.whatis
in periodic--weekly, update the compressed latest installed
files from /usr/obj to what one expects when one has just
recompiled the  man page?

This crops up quite a lot on this machine, so I am unschooled in
some principle of updating this operating system.  

If it matters, I receive a 

WARNING manpath environment variable set

when starting an additional
xterm & ... 

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


hang in dlclose() on rtld lock (powerpc64)

2017-03-09 Thread Justin Hibbits
When building ports in poudriere, I see gdk-pixbuf-query-modules and
gio-querymodules hanging on r314676, but working in r305820.  I took a
backtrace on both in gdb, and see the following (identical between both):

Program received signal SIGINT, Interrupt.
0x506831d8 in .__sys.umtx_op () from /lib/libc.so.7
(gdb) bt
#0  0x506831d8 in .__sys.umtx_op () from /lib/libc.so.7
#1  0x50588010 in _umtx_op_err (obj=0x4, op=13, val=0, uaddr=0x0,
uaddr2=0x0)
at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.c:37
#2  0x505881b8 in __thr_rwlock_wrlock (rwlock=,
tsp=)
at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.c:325
#3  0x505965f0 in _thr_rwlock_wrlock (tsp=,
rwlock=)
at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_umtx.h:239
#4  _thr_rtld_wlock_acquire (lock=0x505bdd00)
at /home/chmeee/freebsd/pristine/lib/libthr/thread/thr_rtld.c:141
#5  0x50026bf4 in wlock_acquire (lock=0x5004cf20 ,
lockstate=0xcab0)
at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld_lock.c:222
#6  0x50022b1c in dlclose (handle=0x51d62000)
at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:3021
#7  0x50022c90 in free_needed_filtees (n=0x509f7420)
at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:2113
#8  0x50022d18 in unload_filtees (obj=0x509fa800)
at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:2129
#9  0x50022e54 in unload_object (root=)
at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:4464
#10 0x50022c20 in dlclose (handle=0x50054000)
at /home/chmeee/freebsd/pristine/libexec/rtld-elf/rtld.c:3044
---Type  to continue, or q  to quit---q


This happens on powerpc64.  I haven't tested on powerpc or any other arch.

- Justin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: smp_rendezvous_action: Are atomics correctly used ?

2017-03-09 Thread Alexandre Martins
Le jeudi 9 mars 2017, 16:25:17 Konstantin Belousov a écrit :
> On Thu, Mar 09, 2017 at 02:52:09PM +0100, Alexandre Martins wrote:
> > Le jeudi 9 mars 2017, 15:07:54 Konstantin Belousov a ?crit :
> > > On Thu, Mar 09, 2017 at 10:59:27AM +0100, Alexandre Martins wrote:
> > > > I have the save question for the cpu_ipi_pending here:
> > > > 
> > > > https://svnweb.freebsd.org/base/head/sys/x86/x86/mp_x86.c?view=annotat
> > > > e#l1
> > > > 080>
> > > > 
> > > > Le jeudi 9 mars 2017, 10:43:14 Alexandre Martins a ?crit :
> > > > > Hello,
> > > > > 
> > > > > I'm curently reading the code of the function smp_rendezvous_action,
> > > > > in
> > > > > kern/subr_smp.c file. In that function, i see that the variable
> > > > > smp_rv_waiters is read in some while() loop in a non-atomic way.
> > > > > 
> > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annota
> > > > > te#l
> > > > > 412
> > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annota
> > > > > te#l
> > > > > 458
> > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annota
> > > > > te#l
> > > > > 472
> > > > > 
> > > > > I suspect one of my freeze to be due by that.
> > > 
> > > You should provide either evidence or, at least, some reasoning
> > > supporting
> > > your claims.
> > 
> > I curently have a software watchdog that triger and does a coredump. In
> > the
> > coredumps, I always see a CPU trying to write-lock a "rm lock". Every
> > time,
> > that CPU is spinning into the smp_rendezvous_action, in the first while
> > loop) while the others are into the idle threads.
> > 
> > The fact is that freeze is not clear and I start to search "exotic" causes
> > to explain it.
> 
> This sounds as the 'usual' deadlock, where some other thread owns rmlock in
> read mode.  I recommend you to follow the
> https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernel
> debug-deadlocks.html

As habit, with theses options, in our test environment, it never happen. But 
at customers, in production, ... :-D

The only thing I have it's the coredump. In it, the rm_lock seems free of 
readers/writers. There is nothing in the pcpu->pc_rm_queue (of all CPU) and 
nothing in the rm->rm_activeReaders.

Thank you. It' s realy nice to try to help me !
-- 
Alexandre Martins
STORMSHIELD



smime.p7s
Description: S/MIME cryptographic signature


Re: r314708: panic: tdsendsignal: ksi on queue

2017-03-09 Thread Konstantin Belousov
On Wed, Mar 08, 2017 at 09:00:17PM -0800, Bryan Drewery wrote:
> I'm on r314708.  I hit ^C while running 'kyua test' in /usr/tests/bin/pwait.
> 
> > panic: tdsendsignal: ksi on queue
> > cpuid = 10
> 
> > #10 kdb_enter (why=0x814488f5 "panic", msg=) at 
> > /usr/src/sys/kern/subr_kdb.c:444
> > #11 0x80a577f3 in vpanic (fmt=, 
> > ap=0xfe35601a3620) at /usr/src/sys/kern/kern_shutdown.c:772
> > #12 0x80a5764f in _kassert_panic (fatal=1, fmt=0x81448fd7 
> > "%s: ksi on queue") at /usr/src/sys/kern/kern_shutdown.c:669
> > #13 0x80a5c843 in tdsendsignal (p=0xf80c39389a80, td=0x0, 
> > sig=20, ksi=0xf803888a2bd0) at /usr/src/sys/kern/kern_sig.c:2095
> > #14 0x80a13828 in exit1 (td=, rval=, 
> > signo=) at /usr/src/sys/kern/kern_exit.c:459
> > #15 0x80a5b28c in sigexit (td=0xf802f0bee000, sig=9) at 
> > /usr/src/sys/kern/kern_sig.c:3081
> > #16 0x80a5b88e in postsig (sig=9) at 
> > /usr/src/sys/kern/kern_sig.c:2992
> > #17 0x80a5b56b in kern_sigsuspend (td=0xf802f0bee000, mask=...) 
> > at /usr/src/sys/kern/kern_sig.c:1515
> > #18 0x80a5b441 in sys_sigsuspend (td=0xf802f0bee000, 
> > uap=) at /usr/src/sys/kern/kern_sig.c:1479
> > #19 0x80ee04da in syscallenter (td=0xf802f0bee000, 
> > sa=) at 
> > /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
> > #20 amd64_syscall (td=0xf802f0bee000, traced=0) at 
> > /usr/src/sys/amd64/amd64/trap.c:902
> 
> 
> > (kgdb) frame 18
> > #18 0x80a5b441 in sys_sigsuspend (td=0xf802f0bee000, 
> > uap=) at /usr/src/sys/kern/kern_sig.c:1479
> > 1479return (kern_sigsuspend(td, mask));
> 
> > (kgdb) p td->td_proc->p_comm
> > $3 = "timeout", '\000' 
> 
> > (kgdb) frame 13
> > #13 0x80a5c843 in tdsendsignal (p=0xf80c39389a80, td=0x0, 
> > sig=20, ksi=0xf803888a2bd0) at /usr/src/sys/kern/kern_sig.c:2095
> > 2095KASSERT(ksi == NULL || !KSI_ONQ(ksi), ("%s: ksi on queue", 
> > __func__));
> > (kgdb) p *ksi
> > $4 = {ksi_link = {tqe_next = 0x0, tqe_prev = 0xf80c39389c58}, ksi_info 
> > = {si_signo = 20, si_errno = 0, si_code = 2, si_pid = 90903, si_uid = 0, 
> > si_status = 9, si_addr = 0x0, si_value = {
> >   sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, 
> > _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, 
> > _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {
> > __spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0, ksi_flags = 
> > 6, ksi_sigq = 0xf80c39389c28}
> 
> > (kgdb) p *ksi->ksi_sigq
> > $6 = {sq_signals = {__bits = {524288, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 
> > 0, 0}}, sq_ptrace = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 
> > 0xf803888a2bd0, tqh_last = 0xf803888a2bd0},

Yes, there is a race, apparently, with the child zombie still not finishing
sending the SIGCHLD to the parent and parent exiting.  The following should
fix the issue, but I do not think that reproducing the problem is easy.

diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
index c524fe5df37..ba5ff84e9de 100644
--- a/sys/kern/kern_exit.c
+++ b/sys/kern/kern_exit.c
@@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
 {
struct proc *p, *nq, *q, *t;
struct thread *tdt;
+   ksiginfo_t ksi;
 
mtx_assert(, MA_NOTOWNED);
KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
@@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
proc_reparent(q, q->p_reaper);
if (q->p_state == PRS_ZOMBIE) {
PROC_LOCK(q->p_reaper);
-   pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
+   if (q->p_ksi != NULL) {
+   ksiginfo_init();
+   ksiginfo_copy(q->p_ksi, );
+   }
+   pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
+   NULL ?  : NULL);
PROC_UNLOCK(q->p_reaper);
}
} else {
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: process killed: text file modification

2017-03-09 Thread Gergely Czuczy

[+freebsd-fs]


On 2017. 03. 09. 14:20, Gergely Czuczy wrote:

On 2017. 03. 09. 11:27, Gergely Czuczy wrote:

Hello,

I'm trying to build a few things from ports on an rpi3, the ports 
collection is mounted over NFS from another machine. When it's trying 
to build pkg i'm getting the error message in syslog:


rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file modification

The report to pkg@:
https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/002048.html

In ports-mgmt/pkg's config.log It fails at the following entry:
configure:3726: checking whether we are cross compiling
configure:3734: cc -o conftest -O2 -pipe  -Wno-error 
-fno-strict-aliasing   conftest.c  >&5

configure:3738: $? = 0
configure:3745: ./conftest
configure:3749: $? = 137
configure:3756: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0':
configure:3760: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details

# uname -a
FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949: Thu Mar 9 
08:58:46 CET 2017 
ae...@marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.aarch64/tank/rpi3/src/sys/AEGIR 
arm64

So far, a few additions:
Time is synced between the NFS server and the client.
it's an open() call which is getting the kill, and it's not the file 
what's being opened, but the process executing it.

Here's a simple code that reproduces it:
#include 

int main() {

  FILE *f = fopen ("/bar", "w");

  fclose(f);
  return 0;
}

Conditions to reproduce it:
 - The resulting binary must be executed from the nfs mount
 - The binary must be built after mounting the NFS share.

I haven't tried building it on a different host, I don't have access 
to multiple RPis. Also, if I build the binary, umount/remount the NFS 
mount point, which has the binary, execute it, then it works.


I've also tried this with the raspbsd.org's image, I could reproduce 
it as well.


Another interesting thing is, when I first booted the RPi up, the NFS 
server was a 10.2-STABLE, and later got updated to 11-STABLE. While it 
was 10.2 I've tried to build some port, and I don't remember having 
this issue.


So, could someone please help me figure this out and fix it? This 
stuff should work pretty much.



So, this error message comes from here:
https://svnweb.freebsd.org/base/head/sys/fs/nfsclient/nfs_clbio.c?revision=314436=markup#l1674

It's the NFS_TIMESPEC_COMPARE(>n_mtime, >n_vattr.na_mtime) 
comparision that fails, np should be the NFS node structure, from the 
vnode's v_data, and n_vattr is the attribute cache. As I've seen these 
two are being updated together, so I don't really see by the code why 
they might differ. Could someone please take a look at it, with more 
experience in the NFS code? -czg

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: smp_rendezvous_action: Are atomics correctly used ?

2017-03-09 Thread Konstantin Belousov
On Thu, Mar 09, 2017 at 02:52:09PM +0100, Alexandre Martins wrote:
> Le jeudi 9 mars 2017, 15:07:54 Konstantin Belousov a ?crit :
> > On Thu, Mar 09, 2017 at 10:59:27AM +0100, Alexandre Martins wrote:
> > > I have the save question for the cpu_ipi_pending here:
> > > 
> > > https://svnweb.freebsd.org/base/head/sys/x86/x86/mp_x86.c?view=annotate#l1
> > > 080> 
> > > Le jeudi 9 mars 2017, 10:43:14 Alexandre Martins a ?crit :
> > > > Hello,
> > > > 
> > > > I'm curently reading the code of the function smp_rendezvous_action, in
> > > > kern/subr_smp.c file. In that function, i see that the variable
> > > > smp_rv_waiters is read in some while() loop in a non-atomic way.
> > > > 
> > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l
> > > > 412
> > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l
> > > > 458
> > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l
> > > > 472
> > > > 
> > > > I suspect one of my freeze to be due by that.
> > 
> > You should provide either evidence or, at least, some reasoning supporting
> > your claims.
> 
> I curently have a software watchdog that triger and does a coredump. In the 
> coredumps, I always see a CPU trying to write-lock a "rm lock". Every time, 
> that CPU is spinning into the smp_rendezvous_action, in the first while loop) 
> while the others are into the idle threads.
> 
> The fact is that freeze is not clear and I start to search "exotic" causes to 
> explain it.
This sounds as the 'usual' deadlock, where some other thread owns rmlock in
read mode.  I recommend you to follow the
https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: start-up failure at SVN r314889

2017-03-09 Thread Michael Butler

Per Hans .. this should fix it:

r314953 | hselasky | 2017-03-09 04:17:43 -0500 (Thu, 09 Mar 2017) | 9 lines
Changed paths:
   M /head/sys/compat/linuxkpi/common/src/linux_work.c

Don't create any threads before SI_SUB_INIT_IF in the LinuxKPI. Else
kthread_add() will assert it is called too soon. This fixes a startup
issue when COMPAT_LINUXKPI is in enabled the kernel configuration
file.

imb


On 3/8/17 6:02 PM, Michael Butler wrote:

The difference between a kernel that boots and another that won't is ..

imb@toshi:/home/imb> diff -cw /sys/amd64/conf/TOSHI~ /sys/amd64/conf/TOSHI
*** /sys/amd64/conf/TOSHI~  Wed Mar  8 10:05:09 2017
--- /sys/amd64/conf/TOSHI   Wed Mar  8 17:33:25 2017
***
*** 373,379 
   # Enable Linux ABI emulation
   #options  COMPAT_LINUX32
   # Enable Linux KPI
! #options  COMPAT_LINUXKPI

   # Enable the linux-like proc filesystem support (requires COMPAT_LINUX
   # and PSEUDOFS)
--- 373,379 
   # Enable Linux ABI emulation
   #options  COMPAT_LINUX32
   # Enable Linux KPI
! options   COMPAT_LINUXKPI

   # Enable the linux-like proc filesystem support (requires COMPAT_LINUX
   # and PSEUDOFS)

Seems to point at something in SVN r314843 :-(

imb


On 03/08/17 17:10, Eric Camachat wrote:

I have the same issue on Dell Precision M4800.

On Wed, Mar 8, 2017 at 6:26 AM, David Wolfskill  wrote:

On Wed, Mar 08, 2017 at 07:55:44AM -0500, Michael Butler wrote:

My laptop usually starts like this ..

FreeBSD 12.0-CURRENT #21 r314812M: Mon Mar  6 19:34:51 EST 2017
 i...@toshi.auburn.protected-networks.net:/usr/obj/usr/src/sys/TOSHI amd64
FreeBSD clang version 4.0.0 (branches/release_40 296509) (based on LLVM
4.0.0)
...

This morning, I get this :-(

FreeBSD 12.0-CURRENT #27 r314889M: Tue Mar  7 19:55:25 EST 2017
 i...@toshi.auburn.protected-networks.net:/usr/obj/usr/src/sys/TOSHI
FreeBSD clang version 4.0.0 (branches/release_40 296509) (based on LLVM
4.0.0)
VT(vga): resolution 640x480
panic: kthread_add called too soon
  [ .. ]

Any thoughts?


"uname -vp" output from my last several (successful) build/smoke-tests
for head:

FreeBSD 12.0-CURRENT #274  r314653M/314653:1200023: Sat Mar  4 06:46:18 PST 
2017 r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

FreeBSD 12.0-CURRENT #275  r314700M/314700:1200023: Sun Mar  5 07:45:20 PST 
2017 r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

FreeBSD 12.0-CURRENT #276  r314770M/314770:1200023: Mon Mar  6 05:45:44 PST 
2017 r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

FreeBSD 12.0-CURRENT #277  r314842M/314842:1200023: Tue Mar  7 05:55:58 PST 
2017 r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

FreeBSD 12.0-CURRENT #278  r314906M/314906:1200024: Wed Mar  8 06:05:49 PST 
2017 r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

Sorry it's not more help.

Peace,
david
--
David H. Wolfskill  da...@catwhisker.org
How could one possibly "respect" a misogynist, racist, bullying con-man??!?

See http://www.catwhisker.org/~david/publickey.gpg for my public key.




___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: smp_rendezvous_action: Are atomics correctly used ?

2017-03-09 Thread Alexandre Martins
Le jeudi 9 mars 2017, 15:07:54 Konstantin Belousov a écrit :
> On Thu, Mar 09, 2017 at 10:59:27AM +0100, Alexandre Martins wrote:
> > I have the save question for the cpu_ipi_pending here:
> > 
> > https://svnweb.freebsd.org/base/head/sys/x86/x86/mp_x86.c?view=annotate#l1
> > 080> 
> > Le jeudi 9 mars 2017, 10:43:14 Alexandre Martins a ?crit :
> > > Hello,
> > > 
> > > I'm curently reading the code of the function smp_rendezvous_action, in
> > > kern/subr_smp.c file. In that function, i see that the variable
> > > smp_rv_waiters is read in some while() loop in a non-atomic way.
> > > 
> > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l
> > > 412
> > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l
> > > 458
> > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l
> > > 472
> > > 
> > > I suspect one of my freeze to be due by that.
> 
> You should provide either evidence or, at least, some reasoning supporting
> your claims.

I curently have a software watchdog that triger and does a coredump. In the 
coredumps, I always see a CPU trying to write-lock a "rm lock". Every time, 
that CPU is spinning into the smp_rendezvous_action, in the first while loop) 
while the others are into the idle threads.

The fact is that freeze is not clear and I start to search "exotic" causes to 
explain it.

> 
> > > Should this function be patched to use
> > > "atomic_load_acq_int(_rv_waiters[])" ?
> 
> There too.
> 
> As a side note, any read or write of the naturally aligned integer
> types with size less or equal than the machine word, on all supported
> architectures, are atomic.  The meaning of the word atomic there is
> that when reading, you always get a complete value that was written by
> a writer into this location, not some out of thin air value.  Similarly,
> when writing, you are guaranteed that any observer of the write will see
> the value you have wrote.
> 
> The guarantees above hold both for C-level code and for the assembler
> accesses.
> 
> atomic_load_acq() provides additional guarantees which do not affect the
> value read from the variable itself, but establish the ordering on the
> visibility of the related operations.

OK, I got it. Thank you !

-- 
Alexandre Martins
STORMSHIELD



smime.p7s
Description: S/MIME cryptographic signature


Re: input/output error @boot

2017-03-09 Thread Toomas Soome

> On 9. märts 2017, at 15:03, Dexuan Cui  wrote:
> 
>> From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-
>> curr...@freebsd.org] On Behalf Of Toomas Soome
>> 
>> IMO there are multiple issues around this problem and workaround.
>> 
>> First of all, to control UEFI memory allocation, the AllocatePages() has 
>> options:
>> 
>> AllocateAnyPages,
>> AllocateMaxAddress,
>> AllocateAddress
>> 
>> On x86, we use:
>> 
>>staging = 1024*1024*1024;
>>status = BS->AllocatePages(AllocateMaxAddress, EfiLoaderData,
>>nr_pages, );
>> 
>> Which means:
>> 
>> "Allocation requests of Type AllocateMaxAddress allocate any available range 
>> of
>> pages whose uppermost address is less than or equal to the address pointed to
>> by Memory on input.”
>> 
>> So, we are asking for an amount of memory (64MB), with condition that all the
>> pages should be below 1GB.
>> 
>> And we get it. If hyper-v is in fact returning us memory from already 
>> occupied
>> area - there can be exactly one conclusion - it is bug in hyper-v.
> 
> Hyper-V has no bug here: Hyper-V doesn't return memory from already occupied
> area. The issue is: the loader here tries to write the 64MB staging area 
> (BTW, it's
> 48MB in 10.3) into the physical memory range [2MB, 2MB+64MB) -- the loader
> assumes this range is writable. However, this is not true with Hyper-V EFI
> firmware: there is a read-only BootServicesData memory block starting at
> about 47.449MB, causing a crash in the loader.
> 
> If you're interested, the whole long story is in the below link.  :-)
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211746, e.g. please see the
> screenshot in comment #8.
> 


ah, right, so it already does the relocation and will get busted there, sorry, 
missed that:D


> 
>> Note, this allocation method does *not* set the starting point for 
>> allocation, it
>> can return us *any* chunk of memory of given size, below 1GB.
> Yes. This can potentially cause new issues...
> 
>> So the attempt to control such allocation by size, is unfortunately flawed - 
>> it
>> really does not control the allocation.
> Yes, you're correct.
> The patch is flawed. I only expect (or hope) it can work around the issues 
> with
> typical Hyper-V UEFI firmware.
> In my test, it works with Hyper-V 2012 R2 and 2016.
> I hope it could work in future Hyper-V too...
> 
>> Note that I have also seen AllocateAddress failures - there was nicely 
>> available
>> chunk of memory, but the firmware just did not allocate with given address 
>> (it
>> did happen with OVMF + qemu).
>> 
>> The secondary flaw there is also about firmware. Sure, with UEFI you can have
>> “random” allocations and the actual control over memory is actually problem,
>> but to plant an “egg”  in 1MB-1GB range, where you have most chances any OS
>> will live - IMO this is just stupid.
>> 
>> The only real solution here is to either rise the MaxAddress limit or use
>> AllocateAnyPages, get kernel loaded into the memory, and after switching off
>> the boot services and before jumping to kernel, relocate the kernel to 
>> available
>> location below 1GB…
> Yes. IMO the biggest issue is that currently the kernel can't be relocated... 
> :-(
> It's a long term work to make it relocatable, I'm afraid.
> 
> Thanks,
> -- Dexuan

true, and there are other systems with same issue. relocatable kernels are not 
really that common even today;) anyhow, good work from your side;)

rgds,
toomas

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: process killed: text file modification

2017-03-09 Thread Gergely Czuczy

On 2017. 03. 09. 11:27, Gergely Czuczy wrote:

Hello,

I'm trying to build a few things from ports on an rpi3, the ports 
collection is mounted over NFS from another machine. When it's trying 
to build pkg i'm getting the error message in syslog:


rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file modification

The report to pkg@:
https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/002048.html

In ports-mgmt/pkg's config.log It fails at the following entry:
configure:3726: checking whether we are cross compiling
configure:3734: cc -o conftest -O2 -pipe  -Wno-error 
-fno-strict-aliasing   conftest.c  >&5

configure:3738: $? = 0
configure:3745: ./conftest
configure:3749: $? = 137
configure:3756: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0':
configure:3760: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details

# uname -a
FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949: Thu Mar 9 
08:58:46 CET 2017 
ae...@marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.aarch64/tank/rpi3/src/sys/AEGIR 
arm64

So far, a few additions:
Time is synced between the NFS server and the client.
it's an open() call which is getting the kill, and it's not the file 
what's being opened, but the process executing it.

Here's a simple code that reproduces it:
#include 

int main() {

  FILE *f = fopen ("/bar", "w");

  fclose(f);
  return 0;
}

Conditions to reproduce it:
 - The resulting binary must be executed from the nfs mount
 - The binary must be built after mounting the NFS share.

I haven't tried building it on a different host, I don't have access to 
multiple RPis. Also, if I build the binary, umount/remount the NFS mount 
point, which has the binary, execute it, then it works.


I've also tried this with the raspbsd.org's image, I could reproduce it 
as well.


Another interesting thing is, when I first booted the RPi up, the NFS 
server was a 10.2-STABLE, and later got updated to 11-STABLE. While it 
was 10.2 I've tried to build some port, and I don't remember having this 
issue.


So, could someone please help me figure this out and fix it? This stuff 
should work pretty much.





I have no idea what's causing it, it should pretty much work out of 
the box. Could someone please explain me what's going on here, what's 
causing it and how can I fix it?


Best regards,
-czg

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 
"freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: smp_rendezvous_action: Are atomics correctly used ?

2017-03-09 Thread Konstantin Belousov
On Thu, Mar 09, 2017 at 10:59:27AM +0100, Alexandre Martins wrote:
> I have the save question for the cpu_ipi_pending here:
> 
> https://svnweb.freebsd.org/base/head/sys/x86/x86/mp_x86.c?view=annotate#l1080
> 
> Le jeudi 9 mars 2017, 10:43:14 Alexandre Martins a ?crit :
> > Hello,
> > 
> > I'm curently reading the code of the function smp_rendezvous_action, in
> > kern/subr_smp.c file. In that function, i see that the variable
> > smp_rv_waiters is read in some while() loop in a non-atomic way.
> > 
> > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l412
> > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l458
> > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l472
> > 
> > I suspect one of my freeze to be due by that.
You should provide either evidence or, at least, some reasoning supporting
your claims.

> > 
> > Should this function be patched to use
> > "atomic_load_acq_int(_rv_waiters[])" ?
There too.

As a side note, any read or write of the naturally aligned integer
types with size less or equal than the machine word, on all supported
architectures, are atomic.  The meaning of the word atomic there is
that when reading, you always get a complete value that was written by
a writer into this location, not some out of thin air value.  Similarly,
when writing, you are guaranteed that any observer of the write will see
the value you have wrote.

The guarantees above hold both for C-level code and for the assembler
accesses.

atomic_load_acq() provides additional guarantees which do not affect the
value read from the variable itself, but establish the ordering on the
visibility of the related operations.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RE: input/output error @boot

2017-03-09 Thread Dexuan Cui
> From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-
> curr...@freebsd.org] On Behalf Of Toomas Soome
>
> IMO there are multiple issues around this problem and workaround.
>
> First of all, to control UEFI memory allocation, the AllocatePages() has 
> options:
>
> AllocateAnyPages,
> AllocateMaxAddress,
> AllocateAddress
>
> On x86, we use:
>
> staging = 1024*1024*1024;
> status = BS->AllocatePages(AllocateMaxAddress, EfiLoaderData,
> nr_pages, );
>
> Which means:
>
> "Allocation requests of Type AllocateMaxAddress allocate any available range 
> of
> pages whose uppermost address is less than or equal to the address pointed to
> by Memory on input.”
>
> So, we are asking for an amount of memory (64MB), with condition that all the
> pages should be below 1GB.
>
> And we get it. If hyper-v is in fact returning us memory from already occupied
> area - there can be exactly one conclusion - it is bug in hyper-v.

Hyper-V has no bug here: Hyper-V doesn't return memory from already occupied
area. The issue is: the loader here tries to write the 64MB staging area (BTW, 
it's
48MB in 10.3) into the physical memory range [2MB, 2MB+64MB) -- the loader
assumes this range is writable. However, this is not true with Hyper-V EFI
firmware: there is a read-only BootServicesData memory block starting at
about 47.449MB, causing a crash in the loader.

If you're interested, the whole long story is in the below link.  :-)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211746, e.g. please see the
screenshot in comment #8.


> Note, this allocation method does *not* set the starting point for 
> allocation, it
> can return us *any* chunk of memory of given size, below 1GB.
Yes. This can potentially cause new issues...

> So the attempt to control such allocation by size, is unfortunately flawed - 
> it
> really does not control the allocation.
Yes, you're correct.
The patch is flawed. I only expect (or hope) it can work around the issues with
typical Hyper-V UEFI firmware.
In my test, it works with Hyper-V 2012 R2 and 2016.
I hope it could work in future Hyper-V too...

> Note that I have also seen AllocateAddress failures - there was nicely 
> available
> chunk of memory, but the firmware just did not allocate with given address (it
> did happen with OVMF + qemu).
>
> The secondary flaw there is also about firmware. Sure, with UEFI you can have
> “random” allocations and the actual control over memory is actually problem,
> but to plant an “egg”  in 1MB-1GB range, where you have most chances any OS
> will live - IMO this is just stupid.
>
> The only real solution here is to either rise the MaxAddress limit or use
> AllocateAnyPages, get kernel loaded into the memory, and after switching off
> the boot services and before jumping to kernel, relocate the kernel to 
> available
> location below 1GB…
Yes. IMO the biggest issue is that currently the kernel can't be relocated... 
:-(
It's a long term work to make it relocatable, I'm afraid.

Thanks,
-- Dexuan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

RE: input/output error @boot

2017-03-09 Thread Dexuan Cui
> From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-
> curr...@freebsd.org] On Behalf Of Pete Wright
> Sent: Thursday, March 9, 2017 14:04
> To: freebsd-current@freebsd.org
> Subject: Re: input/output error @boot
> On 3/8/17 10:00 PM, Dexuan Cui wrote:
> > For now, I suggest we should only apply the idea "reduce the size of the
> > staging area if necessary" to VM running on Hyper-V, we should restore the
> > old behavior on physical machines since that has been working for people
> > for a long period of time, though it's  potentially unsafe.
> >
> +1
> 
> i'd like to see the old behaviour for physical machines to be restored
> as well since this has rendered my drm-next test rig broken :(
> 
> -pete

Eventually I committed 314956 for the issue:
https://svnweb.freebsd.org/base?view=revision=314956
The old behaviour for physical machines are restored.

PS, I understand usually I should put the patch on phabricator for review,
before it's committed, but since the issue here is critical, I committed it
directly to unblock people first. Sorry.
Please comment on the patch if you think it needs rework  -- I hope not. :-) 

Thanks,
-- Dexuan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


process killed: text file modification

2017-03-09 Thread Gergely Czuczy

Hello,

I'm trying to build a few things from ports on an rpi3, the ports 
collection is mounted over NFS from another machine. When it's trying to 
build pkg i'm getting the error message in syslog:


rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file modification

The report to pkg@:
https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/002048.html

In ports-mgmt/pkg's config.log It fails at the following entry:
configure:3726: checking whether we are cross compiling
configure:3734: cc -o conftest -O2 -pipe  -Wno-error 
-fno-strict-aliasing   conftest.c  >&5

configure:3738: $? = 0
configure:3745: ./conftest
configure:3749: $? = 137
configure:3756: error: in `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0':
configure:3760: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details

# uname -a
FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949: Thu Mar 9 
08:58:46 CET 2017 
ae...@marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.aarch64/tank/rpi3/src/sys/AEGIR 
arm64


I have no idea what's causing it, it should pretty much work out of the 
box. Could someone please explain me what's going on here, what's 
causing it and how can I fix it?


Best regards,
-czg

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: smp_rendezvous_action: Are atomics correctly used ?

2017-03-09 Thread Alexandre Martins
I have the save question for the cpu_ipi_pending here:

https://svnweb.freebsd.org/base/head/sys/x86/x86/mp_x86.c?view=annotate#l1080

Le jeudi 9 mars 2017, 10:43:14 Alexandre Martins a écrit :
> Hello,
> 
> I'm curently reading the code of the function smp_rendezvous_action, in
> kern/subr_smp.c file. In that function, i see that the variable
> smp_rv_waiters is read in some while() loop in a non-atomic way.
> 
> https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l412
> https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l458
> https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l472
> 
> I suspect one of my freeze to be due by that.
> 
> Should this function be patched to use
> "atomic_load_acq_int(_rv_waiters[])" ?
> 
> Best regards

-- 
Alexandre Martins
STORMSHIELD



smime.p7s
Description: S/MIME cryptographic signature


smp_rendezvous_action: Are atomics correctly used ?

2017-03-09 Thread Alexandre Martins
Hello,

I'm curently reading the code of the function smp_rendezvous_action, in 
kern/subr_smp.c file. In that function, i see that the variable smp_rv_waiters 
is read in some while() loop in a non-atomic way.

https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l412
https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l458
https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?view=annotate#l472

I suspect one of my freeze to be due by that.

Should this function be patched to use 
"atomic_load_acq_int(_rv_waiters[])" ?

Best regards

-- 
Alexandre Martins
STORMSHIELD



smime.p7s
Description: S/MIME cryptographic signature