Re: kmem API to allocate arrays

2017-07-30 Thread Kamil Rytarowski
On 30.07.2017 15:45, Taylor R Campbell wrote:
>> Date: Sun, 30 Jul 2017 10:22:11 +0200
>> From: Kamil Rytarowski 
>>
>> I think we should go for kmem_reallocarr(). It has been designed for
>> overflows like realocarray(3) with an option to be capable to resize a
>> table fron 1 to N elements and back from N to 0 including freeing.
> 
> Initially I was reluctant to do that because (a) we don't even have a
> kmem_realloc, perhaps for some particular reason, and (b) it requires
> an extra parameter for the old size.  But I don't know any particular
> reason in (a), and perhaps (b) not so bad after all.  Here's a draft:
> 
> int
> kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt, int flags)
> {
>   void *optr, *nptr;
> 
>   KASSERT(size != 0);
>   if (__predict_false((size|ncnt) >= SQRT_SIZE_MAX &&
>   ncnt > SIZE_MAX/size))
>   return ENOMEM;
> 
>   memcpy(&optr, ptrp, sizeof(void *));
>   KASSERT((ocnt == 0) == (optr == NULL));
>   if (ncnt == 0) {
>   nptr = NULL;
>   } else {
>   nptr = kmem_alloc(size*ncnt, flags);
>   KASSERT(nptr != NULL || flags == KM_NOSLEEP);
>   if (nptr == NULL)
>   return ENOMEM;
>   }
>   KASSERT((ncnt == 0) == (nptr == NULL));
>   if (ocnt & ncnt)
>   memcpy(nptr, optr, size*MIN(ocnt, ncnt));
>   if (ocnt != 0)
>   kmem_free(optr, size*ocnt);
>   memcpy(ptrp, &nptr, sizeof(void *));
> 
>   return 0;
> }
> 

I would allow size to be 0, like with the original reallocarr(3). It
might be less pretty, but more compatible with the original model and
less vulnerable to accidental panics for no good reason.



signature.asc
Description: OpenPGP digital signature


Re: kmem API to allocate arrays

2017-07-30 Thread Kamil Rytarowski
On 29.07.2017 16:19, Taylor R Campbell wrote:
> It's stupid that we have to litter drivers with
> 
>   if (SIZE_MAX/sizeof(struct xyz_cookie) < iocmd->ncookies) {
>   error = EINVAL;
>   goto out;
>   }
>   cookies = kmem_alloc(iocmd->ncookies*sizeof(struct xyz_cookie),
>   KM_SLEEP);
>   ...
> 
> and as you can tell from some recent commits, it hasn't been done
> everywhere.  It's been a consistent source of problems in the past.
> 
> This multiplication overflow check, which is all that most drivers do,
> also doesn't limit the amount of wired kmem that userland can request,
> and there's no way for kmem to say `sorry, I can't satisfy this
> request: it's too large' other than to panic or wedge.
> 
> In userland we now have reallocarr(3).  I propose that we add
> something to the kernel, but I'm not immediately sure what it should
> look like because kernel is a little different.  Solaris/illumos
> doesn't seem to have anything we could obviously parrot, from a
> cursory examination.
> 
> We could add something like
> 
>   void*kmem_allocarray(size_t n, size_t size, int flags);
>   voidkmem_freearray(size_t n, size_t size);
> 
> That wouldn't address bounding the amount of wired kmem userland can
> request.  Perhaps that's OK: perhaps it's enough to have drivers put
> limits on the number of (say) array elements at the call site,
> although then there's not as much advantage to having a new API.
> Instead, we could make it
> 
>   void*kmem_allocarray(size_t n, size_t size, size_t maxn,
>   int flags);
> or
>   void*kmem_allocarray(size_t n, size_t size, size_t maxbytes,
>   int flags);
> 
> It's not clear that the call site is exactly the right place to
> compute a bound on the number of bytes a user can allocate.  On the
> other hand, if it's not clear up front what the bound is, then that
> makes a foot-oriented panic gun, or an instawedge, if the kernel can
> decides at run-time how many bytes is more than it can possibly ever
> satisfy, which is not so great either.  If you specify up front in the
> source, at least you can say by examination of the source whether it
> has a chance of working or not on some particular platform.  And maybe
> we can make it easier to write an expression for `no more than 10% of
> the machine's current RAM' or something.
> 
> Either way, kmem_allocarray would have to have the option of returning
> NULL, unlike kmem_alloc(..., KM_SLEEP), which is a nontrivial change
> to the contract now that chuq@ recently dove in deep to make sure it
> never returns NULL.
> 
> Thoughts?
> 

I think we should go for kmem_reallocarr(). It has been designed for
overflows like realocarray(3) with an option to be capable to resize a
table fron 1 to N elements and back from N to 0 including freeing.



signature.asc
Description: OpenPGP digital signature


Re: rwlock

2017-06-28 Thread Kamil Rytarowski
On 28.06.2017 19:05, co...@sdf.org wrote:
> Hi,
> does rwlock RW_WRITER prevent new readers?
> 
> thanks
> 

I recommend Solaris Internals on this topic. I'm not sure off-hand the
mechanism in NetBSD but it should be close to it.



signature.asc
Description: OpenPGP digital signature


Re: Questions of lock usage in NetBSD kernel code

2017-06-25 Thread Kamil Rytarowski
On 25.06.2017 20:34, Taylor R Campbell wrote:
> Sleeping with a spin lock held is absolutely prohibited and does not
> work.
> 

An example of this abuse is described here:

https://mail-index.netbsd.org/current-users/2014/07/19/msg025295.html

Jia-Ju, can you detect bugs like this one?



signature.asc
Description: OpenPGP digital signature


Re: kernel aslr: someone interested?

2017-06-17 Thread Kamil Rytarowski
On 17.06.2017 12:25, Maxime Villard wrote:
> Le 23/03/2017 à 18:30, Maxime Villard a écrit :
>> I have some plans to implement kernel aslr on amd64. Actually, a few
>> months
>> ago I wrote set of patches for the bootloader and the kernel, and also a
>> complete kernel relocator. As far as I can test, everything works
>> correctly
>> and reliably; the whole implementation can relocate and jump into a
>> PIE binary
>> in kernel mode with a proper page tree.
>>
>> But the thing is, I don't quite see how to have the kernel itself
>> compiled as
>> PIE. My attempts so far have been unfruitful, so I thought I could ask
>> here.
>> Ideally, we would have a kernel that has the same binary layout as our
>> kernel
>> modules.
>>
>> Is there someone interested in working on that? This is a toolchain
>> work, but
>> I don't know that stuff.
> 
> This still stands; beyond aslr, there are several new features that we
> could implement - such a live kernel patching -, and they imply being able
> to build a PIE kernel in the first place.
> 
> Perhaps add the "Toolchain: Build kernel as PIE" idea in the projects list?

I noted that Kernel ASLR is treated as industry standard now. Fuchsia
(Magenta) implemented it from get go and enabled when possible.

I have dreams to get sanitizers (asan, msan, ubsan, ...) into the kernel
at some point. It should reduce significantly the time required to shake
out bugs from the kernel. However first I need to get these debugging
facilities to the usable point in userland.



signature.asc
Description: OpenPGP digital signature


Re: panic(9) with insufficient RAM or disk space

2017-06-09 Thread Kamil Rytarowski
On 09.06.2017 13:08, Kamil Rytarowski wrote:
> On 09.06.2017 13:07, Martin Husemann wrote:
>> One of the panics sounds like PR 52110.
>>
>> Martin
>>
> 
> This looks similar, I have another kernel core dump with exactly the
> same backtrace. I observed it few times on my amd64 laptop with 3GB of RAM.
> 
> Crash version 7.99.59, image version 7.99.66.
> WARNING: versions differ, you may not be able to examine this image.
> System panicked: kernel diagnostic assertion "uvmexp.swpgonly + npages
> <= uvmexp.swpginuse" failed: file "/usr/src/sys/uvm/uvm_pager.c", line 472
> Backtrace from time of crash is available.
> crash> bt
> _KERNEL_OPT_NARCNET() at 0
> ?() at fe809f632740
> vpanic() at vpanic+0x149
> ch_voltag_convert_in() at ch_voltag_convert_in
> uvm_aio_aiodone_pages() at uvm_aio_aiodone_pages+0x541
> uvm_aio_aiodone() at uvm_aio_aiodone+0x97
> workqueue_worker() at workqueue_worker+0xbc
> 

I've faced the panic again.

panic: kernel diagnostic assertion "(so2->so_options &
SO_ACCEPTCWONANR)N I=N=G :0  S|P|L  sNoO2T- >LsOoW_ElRoEckD  =O= Nu
iSpYcS_ClAoLckL"  f2WaW5AiAl ReRNd
2:N I fINEiNlXGGeI:: T"   /SSuP1sPLL9r /  Ns7NrO
cOT/T  sLyLsOO/WkWWEeAERrRRnEE/NDDIu  iNOOpGNcN :_ u SSsSYrPYSrCLSeAqC
L.AcLNL"OL T,1l12Li nO52eW65 6E 1 E1RE8XX4E II
TDT   cO0fpf Nu 4720T
:6 8RBA2eP0g i nE7 X
trIaTcW eA6b Ra0Nc
IkN.GW.A:.R
SNIPWLN AGN:RO NTSIP NLLG O:NW OESTRP EL LDN OOOWT NEL ROSWEEYDR SEODC
NAO LNT LR TRA1AP P  2EE5XXI6IT vT E Xp66a I 0nT
i0 c
(1)  7aW
tA RNING: SPnLe tNbOsTd :LOvpWWAWaEnARRRiENNcIIDN+N G0GO::x N1S 4 SPS0LYP
 NLSOC TA NLLOLOT W 1ELR EO2DW 5EO6RN E EDTX RIOATNP   fSEfYX9SI6CTaA
eL60L 0 00
 70
 EWXAIcTRh N_fIvfNo9Gl1:ta acSgdP_0Lc  o8NWnA
OvReNT rINWLtGAO_:RWi NnESI(PRL)NE G ND:a Ot TO  SLNPO LWS EYNRSOECTnDA
Le LOtLON Wb 1ETs1RdRA6E:P D c4 hE_X OIvENT oX 6lIS t0TYa
 gSf_CfcA9oL6nLav ee00r 00t  _E6iX
nI
T ffW9A1RaN1I3N0G :8
SPL NWOATR NWLIOANWGRE:RNE IDNSu OGnPNL:p  _ TNRcOSAPoTP En XLnLI OeNTW
cOE6tRT (0E
)DLO  WaOEtNR  ESDY SOCNA LnSLYe St5Cb1As5LWd AL:1RN u2IN0n G: pE X0S_I
PTLc E 2XoN0nIOTn3T   Le6fOc
WtfE+R9E01DWx63 6AO0bRN8 N0
TI NR8A
GP: E XWSIPTA LR6 N NI0ON
TG :L OSWPELR ENDO TO NLd OoSW_YEsSRyCEsAD_L cLOo Nn3 n0WSAe YRcN2StI
CN(EGX)A:I LT LS aP Lt11 N  Oe2T  57L6
O WEnEXIeREWTtDA b 1RsO NdN7I:N d
GTo:RW_ SAAsRPPyLNs I _NEcNXGoO:In TnTS e LPc6Ot LW+ 00EN
OxRT8E eDL
 OOWNE RSEYDS COANL LS Y2S CA2L56L  3E9X I6T  E0Xs Iy7Ts
 _6c2odn4ne50cW0Wt(A0A) RR N6NaI
ItNN GG::W  ASSPRPnLLN eI NNtNbGOOs:TdT  :S sLLPyLOO WsWNE_EORcRTEoE
DnDLn  OOeOcNWN tE +TRT0REARxDP4A  9OPEN
 XE XISITYT S 6C6 A 0L0
L
 1WW0AA2RNR4I NNG4I: N SEGP:XL  INSOTTP  LfLO fWN4sEyO0REs6TD  c8LO2NaO0
WSl EYlR7S
(EC)ADL  La OWt 1ANR  2NSYI5S6NC AGELX:IL nT  Se01P t 7Lb
2s 5d6N:O TsE yXLsIOcTWa ElfRlfE+9D06 xaO1eNd0 80S
 Y7S-
C-A-L WLs Ay1Rs Nc2Ia5Nl6Gl : E (XSnIPuTL m 1bN eO7rT
  9WLWA8ORWA)EN RRI-NENG-DI:-  NS
OGP:LN  N SOSPTY LSL CONAWOLETL7R E 31DLe 3OdONW18 dE Te4RR3A eEPE DX1E
7IXOITNaT  : Sf
Y6f S90c
C6ApauLe2L0: 0 3 E87n
d6  tEWrXAaIRcTNe Ib6Na2Gcd:k4 .5S.0P.W0LA
0 R NN6ION
TG :L SOPWL ENROETD  LOOWENR EDS OYNS CATLRLA P1  25E6 E
XIXdTIum Tpi6 n 0 g07

to dev 0,1 (offset=8, size=7864191):
dump failed: insufficient space (4097249 < 8067552)


It looks like KASSERT(9) in sys/kern/uipc_usrreq.c:unp_connect() here:

   1174 solock(so);
   1175 unp_resetlock(so);
   1176 mutex_exit(vp->v_interlock);
   1177 if ((so->so_proto->pr_flags & PR_CONNREQUIRED) != 0) {
   1178 /*
   1179  * This may seem somewhat fragile but is OK: if we can
   1180  * see SO_ACCEPTCONN set on the endpoint, then it must
   1181  * be locked by the domain-wide uipc_lock.
   1182  */
   1183 KASSERT((so2->so_options & SO_ACCEPTCONN) == 0 ||
   1184 so2->so_lock == uipc_lock);
   1185 if ((so2->so_options & SO_ACCEPTCONN) == 0 ||
   1186 (so3 = sonewconn(so2, false)) == NULL) {
   1187 error = ECONNREFUSED;
   1188 sounlock(so);
   1189 goto bad;
   1190 }
   1191 unp2 = sotounpcb(so2);
   1192 unp3 = sotounpcb(so3);
   1193 if (unp2->unp_addr) {
   1194 unp3->unp_addr = malloc(unp2->unp_addrlen,
   1195 M_SONAME, M_WAITOK);
   1196 memcpy(unp3->unp_addr, unp2->unp_addr,
   1197 unp2->unp_addrlen);
   1198 unp3->unp_addrlen = unp2->unp_addrlen;
   1199 }
   1200 unp3->unp_flags = unp2->unp_flags;
   1201 so2 = so3;
   1202 }

https://nxr.netbsd.o

Re: panic(9) with insufficient RAM or disk space

2017-06-09 Thread Kamil Rytarowski
On 09.06.2017 13:07, Martin Husemann wrote:
> One of the panics sounds like PR 52110.
> 
> Martin
> 

This looks similar, I have another kernel core dump with exactly the
same backtrace. I observed it few times on my amd64 laptop with 3GB of RAM.

Crash version 7.99.59, image version 7.99.66.
WARNING: versions differ, you may not be able to examine this image.
System panicked: kernel diagnostic assertion "uvmexp.swpgonly + npages
<= uvmexp.swpginuse" failed: file "/usr/src/sys/uvm/uvm_pager.c", line 472
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
?() at fe809f632740
vpanic() at vpanic+0x149
ch_voltag_convert_in() at ch_voltag_convert_in
uvm_aio_aiodone_pages() at uvm_aio_aiodone_pages+0x541
uvm_aio_aiodone() at uvm_aio_aiodone+0x97
workqueue_worker() at workqueue_worker+0xbc



signature.asc
Description: OpenPGP digital signature


panic(9) with insufficient RAM od disk space

2017-06-09 Thread Kamil Rytarowski
I observe panics on busy machines with shortage of RAM or/and disk-space.

Disk space (?) panic:

uid 1000, pid 1975, command irc-20151120, on /: file system full
panic: kernel diagnostic assertiWonA "(so2->so_oRpNtiIoNGns:  &S PSLO
_NAOCTC ELPTOCWOERNEND)  =O=N  0T R|A|P  sEoX2I-T >6s o0_
loWAcRkN I=N=G :u iSpPc_L lNoOcTk "L OfaWiElReEd:D  fOiN lSeY S"C/AuLsLr
0/ s0r cE/XsIyTs /fkfe7r6n61/7u0i p8c
_usrreWq.AcR"NI,NG :l iSnPeL  1N1O8T4  L
OWERED ON SYSCALL 0 0 EXIT ff7c6p7ud2f:0  B8e
ginW AtRrNaIcNGe:b SaPcLk .N.O.T
 LOWERED ON SYSCALL 0 0 EXIT ff767df0 8
WARNING: SPL NOT LOWERED ON TRAP EXIT 6 0
WARNING: SPL NOT LOWERED ON SYSCALL 0 0 EXIT ff769530 8
WARNING: SPL NOT LOWERED ON TRAP EXIT 6 0
WARNING:v pSaPnLic ()N OT atLO WERED nONe tbSsYd:SvCpAaLnLi c0 0 EXIT
ff764ed0 8+
0x140
WARNING: SPL NOT LOWERED ON TRAP cEhX_IvoT lt6a g0_
convert_inW()A RaNtI NG: SPL NnOTe tLbOsWdE:RcEhD_ voOlN taSgY_ScCoALnLv
e0r 0t_ iEnX
IT ff764ed0 8
unp_connect() at WAnRNeItNGb:s dS:PuLn pN_OcTo nLnOeWcEtR+0ExD3 0O8N
 TRAP EXIT 6 0
WARNING: dSPoL _NsOyTs _LcOoWEnREnDe cOtN( )T RaAtP  EXIT 6 n0e
tbsd:do_sys_WcAoRNnInNeGc:t +0SxP8L eN
OT LOWERED ON TRAP EXIT 6 0
sys_connWecAtR(NI) NaGt:  SPL NOnT eLtObWsEdR:EsDy sO_Nc oTnRnAePc
tE+X0IxT4 96
 0
WARNING: SPL NOT LOWEREsD ysOcNa lTlR(A)P  aEtX IT 6 0
netbsd:syWscAaRlNlI+N0Gx:1 dS8P
L -N-O-T  sLyOsWcEaRlElD  (OnNu mTbReArP  E9X8IT)  6- -0-

WWAARRNNIINNGG:: 7Sc PLcf S5NP4LO T4 NO3LTeO W1L7EOaRWEEDR EODN :
 TRONcA PpT ERAXuIP2 T: E 6X EIn0Td
  6tra WceA0Rb
NaIcNk...
G: SPL NOT LOWERED ON TRAP EX
ITd 6u m0pi
ngf attoa lde va 0s,y1n c(horfofnsoeuts=8 ,s yssitzeem= 7t8r6a4p1 9i1n) :s
upedurmvpis or mode
trap type 3 code 0x2 rip 0x8021453c cs 0x8 rflags 0x10206 cr2
0x1201000 ilevel 0x8 rsp 0xfe8148b3abd8
curlwp 0xfe83ec75bb40 pid 21375.1 lowest kstack 0xfe8148b372c0
failed: insufficient space (4097249 < 9415259)

RAM shortage panic:

# crash -N ./netbsd.3 -M ./netbsd.3.core
Crash version 7.99.59, image version 7.99.66.
WARNING: versions differ, you may not be able to examine this image.
System panicked: kernel diagnostic assertion "uvmexp.swpgonly + npages
<= uvmexp.swpginuse" failed: file "/usr/src/sys/uvm/uvm_pager.c", line 472
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
?() at fe809f632740
vpanic() at vpanic+0x149
ch_voltag_convert_in() at ch_voltag_convert_in
uvm_aio_aiodone_pages() at uvm_aio_aiodone_pages+0x541
uvm_aio_aiodone() at uvm_aio_aiodone+0x97
workqueue_worker() at workqueue_worker+0xbc



signature.asc
Description: OpenPGP digital signature


Re: Replacement for cpu_exit(9) and cpu_swapout(9)

2017-05-30 Thread Kamil Rytarowski
On 30.05.2017 09:20, Abhinav Upadhyay wrote:
> On Tue, May 30, 2017 at 12:32 PM, Martin Husemann  wrote:
>> On Tue, May 30, 2017 at 12:23:36PM +0530, Abhinav Upadhyay wrote:
>>> I have tried to look through the CVS logs and searched on nxr.n.o but
>>> can't find if there was any replacements for these functions. Is it ok
>>> to remove them from intro(9)
>>> ? Better than having dead
>>> references :)
>>
>> I think they just have been removed - MI code now handles all that is
>> necessary, so no need to have MD (i.e. cpu_*) functions.
> 
> Great, thanks for confirming it. I will remove those references from
> intro(9) then.
> 
> -
> Abhinav
> 

While there - in cpu_lwp_fork(9) there is referenced a dead function
cpu_switch(). cpu_switch() has been refactored into cpu_switchto(9).



signature.asc
Description: OpenPGP digital signature


Re: Adding ruminit(4)

2017-05-23 Thread Kamil Rytarowski
On 24.05.2017 04:10, Pierre Pronchery wrote:
> Hi tech-kernel@,
> 
> as some of you may have noticed, I just added a USB ID to the rum(4)
> driver [1]. It is for a device called "Windy 31" from Synet Electronics,
> product name MW-P54SS [2] (yes it's old).
> 
> As it happens (and as documented in the manual page) it attaches first
> as a mass storage device, much like u3g(4) has to deal with:
> 
>> umass0 at uhub1 port 2 configuration 1 interface 0
>> umass0: Ralink product 0x2578, rev 2.00/0.01, addr 12
>> umass0: using SCSI over Bulk-Only
>> scsibus0 at umass0: 2 targets, 1 lun per target
>> cd0 at scsibus0 target 0 lun 0:  cdrom
>> removable
> 
> u3g(4) works around it with an intermediate u3ginit(4) driver. So I went
> ahead and added ruminit(4) [3]. It works right away now:
> 
>> ruminit0 at uhub1 port 1: Switching to Wireless mode
>> ruminit0: detached
>> ruminit0: at uhub1 port 1 (addr 11) disconnected
>> rum0 at uhub1 port 1
>> rum0: Ralink , rev 2.00/0.01, addr 11
>> rum0: MAC/BBP RT2573 (rev 0x2573a), RF RT2528, address 00:0e:2e:fb:d1:86
>> rum0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
>> rum0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps
>> 24Mbps 36Mbps 48Mbps 54Mbps
> 
> but I had to copy and paste code from u3g(4) and I don't like it :(
> 

I recall something similar is in sys/dev/usb/uhso.c:uhso_switch_mode().
Does it work for you?



signature.asc
Description: OpenPGP digital signature


Re: Introducing localcount(9)

2017-05-11 Thread Kamil Rytarowski
On 12.05.2017 00:09, Paul Goyette wrote:
> On Thu, 11 May 2017, Kamil Rytarowski wrote:
> 
>> On 11.05.2017 15:17, Taylor R Campbell wrote:
>>>> Date: Thu, 11 May 2017 20:35:19 +0800 (+08)
>>>> From: Paul Goyette 
>>>>
>>>> On Thu, 11 May 2017, Kengo NAKAHARA wrote:
>>>>
>>>>>(1) Why splsoftserial() is required instead of kpreempt_disable()?
>>>>>localcount_drain() uses xc_broadcast(0, ...), that is, it uses
>>>>>low priority xcall. Low priority xcall would be done by kthread
>>>>>context, so I think kpreempt_disable() would be sufficient to
>>>>>prevent localcount_drain() xcall running.
>>>>
>>>> I think you are correct.  Taylor, do you agree?
>>>
>>> Yes, I think this is fine.  I probably chose splsoftserial because I
>>> was thinking of pserialize(9).
>>>
>>
>> While there, locking.9 is begging for being updated for new APIs.
> 
> Well, it looks to me like almost everything is listed there, except for
> psref(9) (and now, of course, localcount(9)).
> 
> It's interesting that you added info for pserialize(9) but did not add
> psref(9)!
> 

This file predates psref(9) and SMP-ification of the network stack by IIJ.

> Anyway, I will make a note to add a paragraph for localcount(9).
> 

Thanks!




signature.asc
Description: OpenPGP digital signature


Re: Introducing localcount(9)

2017-05-11 Thread Kamil Rytarowski
On 11.05.2017 15:17, Taylor R Campbell wrote:
>> Date: Thu, 11 May 2017 20:35:19 +0800 (+08)
>> From: Paul Goyette 
>>
>> On Thu, 11 May 2017, Kengo NAKAHARA wrote:
>>
>>>(1) Why splsoftserial() is required instead of kpreempt_disable()?
>>>localcount_drain() uses xc_broadcast(0, ...), that is, it uses
>>>low priority xcall. Low priority xcall would be done by kthread
>>>context, so I think kpreempt_disable() would be sufficient to
>>>prevent localcount_drain() xcall running.
>>
>> I think you are correct.  Taylor, do you agree?
> 
> Yes, I think this is fine.  I probably chose splsoftserial because I
> was thinking of pserialize(9).
> 

While there, locking.9 is begging for being updated for new APIs.




signature.asc
Description: OpenPGP digital signature


Re: New diagnostic routine - mutex_ownable()

2017-04-30 Thread Kamil Rytarowski
On 01.05.2017 00:12, co...@sdf.org wrote:
> On Sun, Apr 30, 2017 at 08:49:04AM +0800, Paul Goyette wrote:
>> While working on getting the localcount(9) stuff whipped into shape, I ran
>> across a situation where it is desirable to ensure that the current
>> process/lwp does not already own a mutex.
>>
>> We cannot use !mutex_owned() since that doesn't return the desired result
>> for a spin mutex, so I'm proposing to add a new routine called
>> mutex_ownable().  This does nothing in normal kernels, but for LOCKDEBUG
>> kernels it does a mutex_enter() followed immediately by mutex_exit(). If the
>> current process already owns the mutex, the system will panic with a
>> "locking against myself" error; otherwise mutex_ownable() just returns 1,
>> enabling its use as
>>
>>  KASSERT(mutex_ownable(mtx));
>>
>> Diffs are attached (including man-page and sets-list updates).
>>
>> Comments?  Any reason why this cannot be committed?
> 
> I have an alternate proposal for the same purpose, but not much of a
> suggestion for yours.
> 
> I find it weird to have two names which kinda mean the same thing but
> also don't, and it's not immediate what the difference is, but not sure
> if I can come up with better names.
> 

I agree with this, it's very odd.. Can we just make this mutex_ownable()
a private assert in the localcount .c file? Without pushing it to other
places.




signature.asc
Description: OpenPGP digital signature


Re: Keep local symbols of rump libraries

2017-04-10 Thread Kamil Rytarowski
On 10.04.2017 12:22, Ryota Ozaki wrote:
> Hi,
> 
> I'm using ATF tests running rump kernels (rump_server)
> for development and debugging. When a rump kernel gets
> panic, it dumps a core file from which we can obtain
> a backtrace by using gdb.
> 
> Unfortunately local symbols (i.e., static functions)
> in a backtrace are unresolved because they are stripped
> by the linker (and objdump). That makes debugging a bit
> difficult.
> 
> The patch introduces a compile option for rump kernels
> called RUMP_KEEPSYMBOLS to keep such symbols:
>   http://www.netbsd.org/~ozaki-r/rumplibs-keep-symbols.diff
> 
> The option is disabled by default to not increase
> the size of all rump libraries.
> 
> I'm not so familiar with the NetBSD build system and
> Makefiles, so the patch may be wrong or clumsy.
> 
> Any comments?
> 
> Thanks,
>   ozaki-r
> 

There are already available the following options:
MKDEBUG
MKDEBUGLIB
MKKDEBUG

How about MKDEBUGRUMP or MKRUMPDEBUG?



signature.asc
Description: OpenPGP digital signature


Re: ELFOSABI_NETBSD

2017-04-07 Thread Kamil Rytarowski
On 07.04.2017 23:10, Christos Zoulas wrote:
> On Apr 7,  7:17pm, n...@gmx.com (Kamil Rytarowski) wrote:
> -- Subject: ELFOSABI_NETBSD
> 
> | Currently we set e_ident[EI_OSABI] to ELFOSABI_SYSV. This makes parsing
> | NetBSD core(5) core files cross-system little bit less obvious. In LLDB
> | we are recognized as generic or unknown unix.
> | 
> | Function ELFNAMEEND(coredump):
> | 
> | 161 /* XXX Should be the OSABI/ABI version of the executable. */
> | 162 ehdr.e_ident[EI_OSABI] =3D ELFOSABI_SYSV;
> 
> It matches the OSAbi of the binaries:
> 
> $ file /bin/sleep
> /bin/sleep: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), 
> dynamically linked, interpreter /libexec/ld.elf_so, for NetBSD 7.99.59, not 
> stripped
> 
> You need to look at the notes...
> 
> $ file sleep.core 
> sleep.core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), NetBSD-style, 
> from 'sleep', pid=17264, uid=10080, gid=10080, nlwps=1, lwp=0 (signal 3/code 
> 32767)
> 
> You always need to look at the notes :-)
> 
> christos
> 

NetBSD isn't the only system to set ELFOSABI_SYSV (value 0), so it's the
way to go (nothing to be changed).

Thanks for the confirmation!



signature.asc
Description: OpenPGP digital signature


ELFOSABI_NETBSD

2017-04-07 Thread Kamil Rytarowski
What's the purpose of ELFOSABI_NETBSD on NetBSD?

Currently we set e_ident[EI_OSABI] to ELFOSABI_SYSV. This makes parsing
NetBSD core(5) core files cross-system little bit less obvious. In LLDB
we are recognized as generic or unknown unix.

Function ELFNAMEEND(coredump):

161 /* XXX Should be the OSABI/ABI version of the executable. */
162 ehdr.e_ident[EI_OSABI] = ELFOSABI_SYSV;


 --- src/sys/kern/core_elf32.c



signature.asc
Description: OpenPGP digital signature


Re: Interested in ext3fs for GSoC

2017-03-29 Thread Kamil Rytarowski


> Sent: Wednesday, March 29, 2017 at 9:51 PM
> From: "Christos Zoulas" 
> To: tech-kern@netbsd.org
> Subject: Re: Interested in ext3fs for GSoC
>
> In article 
> ,
> Miles Fertel   wrote:
> >-=-=-=-=-=-
> >
> >Hello,
> >
> >I'm a computer science undergraduate at Harvard studying Operating Systems
> >and I'm interested in implementing ext3 for a Google Summer of Code
> >project.
> >
> >First off, the project page states that several years of ext3 GSoC projects
> >have failed. I'm interested in finding out why, and figuring out what I can
> >do to avoid their fate. The only documentation I could find was from the
> >2008 GSoC student that just vanished.
> >
> >Where can I find specific information about previous attempts at ext3fs in
> >NetBSD?
> >
> >I'm currently reading through the documentation in
> >http://netbsd-soc.sourceforge.net/projects/ext3/ but it is likely a bit out
> >of date.
> 
> That page is very much out of date. There was a successful ext{2,3,4} project
> last GSoC (2016) by Hrishikesh Goyal. Here's the very nice writeup about it:
> 
> http://hrishikeshgoyal.blogspot.com/
> 
> See at the bottom in "Future work to be done" for a possible project for
> this year.
> 
> Best,
> 
> christos
> 
> 


There was also some work done by Jaromir Dolecek:

http://mail-index.netbsd.org/tech-kern/2016/07/31/msg020946.html


Re: Prospective project for Summer of Code.

2017-03-21 Thread Kamil Rytarowski
On 21.03.2017 13:43, Joerg Sonnenberger wrote:
> On Tue, Mar 21, 2017 at 01:38:09PM +0100, Kamil Rytarowski wrote:
>> I planned to spend my time on kqueue(2)/kevent(2) and aio(3) after May
>> (full-time effort, but not GSoC). There is a list of bug reports around
>> these calls and few missing features (I need SIGEV_KEVENT).
> 
> The point of the project is not to implement aio on top of kqueue, but
> to provide a real kernel implementation.
> 
> Joerg
> 

This part is clear, kevent(2)/kqueue(2) and aio(3) are two diverse
projects. Just that there is an option that I will focus entirely on the
former and Raunaq on the latter during GSoC (and his master thesis).
While Runaq does not need to care about kevent(2)/kqueue(2) improvements
I can stop caring about implementing real aio(2) myself.



signature.asc
Description: OpenPGP digital signature


Re: Prospective project for Summer of Code.

2017-03-21 Thread Kamil Rytarowski
On 20.03.2017 23:43, Raunaq Kochar wrote:
> Hi,
> I am a Masters in Computer Science student at Stony Brook University
> looking to work as part of the Google Summer of Code on the Real
> Asynchronous I/O or Parallelize Page Queues projects.

Welcome!

I planned to spend my time on kqueue(2)/kevent(2) and aio(3) after May
(full-time effort, but not GSoC). There is a list of bug reports around
these calls and few missing features (I need SIGEV_KEVENT).

My plan was to start with regression tests in the ATF framework,
synchronize the features with other BSDs, catch and analyze difference
in behavior, update the documentation - research the bugs (add tests in
the ATF framework, file PR) and squash as many of them as possible.

My primary motivation is to reliably use VirtualBox as host on NetBSD.

If you plan to take the aio(3) part, I will happily focus on the
kqueue(2)/kevent(2) interface and at some point we will join with
SIGEV_KEVENT. Some of the mentioned bugs require rather diverse
environments to trigger anomalies, e.g. golang on evbarm.



signature.asc
Description: OpenPGP digital signature


Re: PCU vs. ptrace

2017-03-04 Thread Kamil Rytarowski
On 04.03.2017 18:25, Chuck Silvers wrote:
> in the case of PT_GETFPREGS or PT_SETFPREGS, the target thread will be
> stopped, but for PT_DUMPCORE the target thread supposedly may be running.
> it appears to me that the PCU save/discard code does the right thing even for
> a running thread, though the thread may change its PCU state again 
> immediately.
> this is ok for PT_DUMPCORE, since PT_DUMPCORE is documented as possibly
> producing an inconsistent image.  ... actually, I looked at the code some more
> and PT_DUMPCORE does require that the target be stopped.

Thank you for looking at it. The PT_DUMPCORE description has been fixed
in src/lib/libc/sys/ptrace.2 r.1.41

http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/sys/ptrace.2.diff?r1=1.40&r2=1.41&only_with_tag=MAIN&f=h

The new form is:

PT_DUMPCORE   Makes the process specified in the pid pid generate a core
  dump.  The addr argument should contain the name of the
  core file to be generated and the data argument should
  contain the length of the core filename.



signature.asc
Description: OpenPGP digital signature


Porting ptrace(2) software to NetBSD

2017-02-27 Thread Kamil Rytarowski
I've prepared slides to illustrate the concepts of porting ptrace(2)
software from Linux and other BSDs to NetBSD.

http://netbsd.org/~kamil/ptrace-netbsd/presentation.html

I used www/py-landslide to generate the HTML presentation, my input was
typed in markdown:

http://netbsd.org/~kamil/ptrace-netbsd/ptrace-netbsd.md


These slides are prepared for our internal usage.

License CC0 (Creative-Commons Public Domain) + the NetBSD flag copied
from Wikipedia.


Credit to spz@ for proof-reading!


Re: PAX mprotect and JIT

2017-02-26 Thread Kamil Rytarowski
On 26.02.2017 16:03, Joerg Sonnenberger wrote:
> On Sun, Feb 26, 2017 at 02:52:39PM +0100, Kamil Rytarowski wrote:
>> It looks difficult to understand on the first sight, the need to
>> "reinvent" malloc(3) with this approach.
> 
> The point here is to have strict segration between code and non-code. It
> doesn't work perfectly due to the additional book keeping pointers, but
> pretty well.
> 
>> Can we have something like MAP_NOMPROTECT?  Something like it would be
>> used to mmap(2) RWX region:
>>
>> void *mapping = mmap(NULL, rounded_size, PROT_READ | PROT_WRITE |
>> PROT_EXEC, MAP_ANON | MAP_PRIVATE | MAP_NOMPROTECT, -1, 0);
>>
>> Are doubled mappings more secure than this?
> 
> Yes, they are. It means you have to at least guess the second location.
> 
> Joerg
> 

While I'm not judging about the technical parts of the diffs, the
general idea looks reasonable.



signature.asc
Description: OpenPGP digital signature


Re: PAX mprotect and JIT

2017-02-26 Thread Kamil Rytarowski
On 26.02.2017 15:05, co...@sdf.org wrote:
> On Sun, Feb 26, 2017 at 02:52:39PM +0100, Kamil Rytarowski wrote:
>> Can we have something like MAP_NOMPROTECT?  Something like it would be
>> used to mmap(2) RWX region:
>>
>> void *mapping = mmap(NULL, rounded_size, PROT_READ | PROT_WRITE |
>> PROT_EXEC, MAP_ANON | MAP_PRIVATE | MAP_NOMPROTECT, -1, 0);
>>
>> Are doubled mappings more secure than this?
>>
> 
> what pax mprotect does is silently turn RWX mapping to RW.
> 

What's the [security] difference between fooling and disabling mprotect
for a memory region?

Is there a room to add this nomprotect allocator in libutil(3) to make
it convenient to reuse out of libffi?



signature.asc
Description: OpenPGP digital signature


Re: PAX mprotect and JIT

2017-02-26 Thread Kamil Rytarowski
On 25.02.2017 22:35, Joerg Sonnenberger wrote:
> Hi all,
> at the moment, PAX mprotect makes it very expensive to implement any
> form of JIT. It is not possible to switch a page from writeable to
> executable. It is not possible to use anonymous memory for JIT in
> multi-threaded applications as you can't have distinct writable and
> executable mappings. The only ways JIT can work right now is by either
> disabling PAX mprotect or creating a temporary file on disk. That's not
> only silly, but IMO actively harmful. Considering that some important
> components like libffi fall into this category, the answer can't be
> "disable PAX mprotect on bin/python*" and various other places.
> 
> I've attached three patches to this mail:
> (1) Implement a new flag for mremap to allow duplicating a mapping
> (M_REMAPDUP). This patch is functional by itself.
> (2) A hack for allow mprotect to switch between W and X, but still
> honoring W^X. This is a hack and needs to be carefully rethought,
> since I believe the way pax is currently implemented is wrong. Consider
> it a PoC.
> (3) A patch for devel/libffi to show how the first two parts can be
> implemented to obtain JIT. With this patch the libffi test suite passes
> with active PAX mprotect and ASLR.
> 
> I find the availability of two separate mappings quite an acceptable
> compromise. It would allow us to easily improve security for most
> binaries by mapping the GOT read-only as far as possible by keeping the
> write mapping separate. But that's a separate topic.
> 
> Joerg
> 

Thank you for working on it.

It looks difficult to understand on the first sight, the need to
"reinvent" malloc(3) with this approach.

Can we have something like MAP_NOMPROTECT?  Something like it would be
used to mmap(2) RWX region:

void *mapping = mmap(NULL, rounded_size, PROT_READ | PROT_WRITE |
PROT_EXEC, MAP_ANON | MAP_PRIVATE | MAP_NOMPROTECT, -1, 0);

Are doubled mappings more secure than this?



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/sys

2017-02-23 Thread Kamil Rytarowski
On 23.02.2017 09:48, Robert Elz wrote:
> Date:Thu, 23 Feb 2017 15:32:16 +0800 (PHT)
> From:Paul Goyette 
> Message-ID:  
> 
>   | On Thu, 23 Feb 2017, Kamil Rytarowski wrote:
>   | 
>   | > I'm evaluating it from the osabi (pkgsrc term) point of view. I'm
>   | > targeting LLDB for 7.99.62+.
> 
> The kernel version number is a horribly blunt, and very ineffetive tool,
> to use for this purpose - much better is to use a feature test, where the
> user level code tests for the presence of a symbol (#define) in the
> appropriate header file.
> 
> If run time testing is required (which is useful far less often than you
> might think) then a run time feature test (either just try the interface
> and see if it works, or add an option/flag to the API which allows an
> explicit test).
> 
>   | Other reasons might include
>   | 
>   | * changes to the contents of prop-libs that are passed between kernel
>   |and userland, or kernel and modules
> 
> Not sure about that one, but I'd expect that kernel-user proplib interactions
> need versioning if some incompatibility is introduced, I don't think a
> kernel version bump will normally achieve anything.
> 
>   | * changes to structs that might be included in ioctl args
> 
> That one definitely needs versioning, and not a kernel version bump.
> 
> Of course, for both of those, if the interface changed is a very new one
> (as in "this ioctl/proplib was introduced last week, but we forgot...")
> then just change it - anything that was built to use the (previous version
> of) the new interface can just get recompiled.
> 
>   | * changes to things that kmem grovelers chase
> 
> I don't think kernel version bumps really help there either - for those
> I think it is normal just to expect that a kmem groveller (of which there
> are not many left, fortunately) might need to match the kernel version if
> it is to be expected to work.   That is, if you boot a new kernel, you're
> likely to have to rebuild those things (if you need them - it has been a
> long time since I was last bothered by one of those failing, whatever the
> difference between the version of user level code and the kernel)
> 
> The internal interface between kernel and modules (ie: making sure the
> correct modules get loaded for the kernel) is 99% of the demand for kernel
> version bumps.   Changing the sizes of kernel structs is the most common
> change that requires a change, and altering the prototype of a non-static
> function is another fairly common one (adding/deleting an arg, or changing
> types) is another.
> 
> kre
> 

Thank you for your explanation.

My bump was still legitimate as I changed size of amd64 and i386 struct
lwp - I removed one MD field.

Testing for runtime features in the kernel is not needed in my case, I
don't target NetBSD older than 8.0.

I'm now working on PT_SYSCALL to dot the ptrace(2) API in my project.
This shouldn't alter any structure.



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/sys

2017-02-22 Thread Kamil Rytarowski


On 23.02.2017 08:32, Paul Goyette wrote:
> On Thu, 23 Feb 2017, Kamil Rytarowski wrote:
> 
>> I'm evaluating it from the osabi (pkgsrc term) point of view. I'm
>> targeting LLDB for 7.99.62+. If the kernel bump approach is reserved for
>> loadable kernel modules, I will follow this in future changes.
> 
> Modules (and specifically, their interfaces to the rest of the kernel)
> are only one reason for a kernel bump.
> 
> Other reasons might include
> 
> * changes to the contents of prop-libs that are passed between kernel
>   and userland, or kernel and modules
> 
> * changes to structs that might be included in ioctl args
> 
> * changes to things that kmem grovelers chase
> 
> 
> New values for existing struct members or in enums generally wouldn't
> need a bump, unless they're accompanied by other changes in data size or
> content.
> 

It's clear now, thank you!

> And I'm sure that other folks can provide more reasons for having a
> kernel bump.
> 
> 
> 
> +--+--++
> | Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
> | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
> | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
> +--+--++



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/sys

2017-02-22 Thread Kamil Rytarowski


On 23.02.2017 07:23, Robert Elz wrote:
> Date:Thu, 23 Feb 2017 05:29:41 +
> From:Martin Husemann 
> Message-ID:  <20170223052941.ga29...@homeworld.netbsd.org>
> 
>   | Does this kind of change really require a version bump?
> 
> That one didn't, but there was another checkin, 5 or 6 mins earlier,
> which changed (added to) struct mdproc (on i386 and amd64 anyway),
> and as that struct is included in struct proc, struct proc got bigger,
> and that does require a version bump.
> 
> However, another change, a few hours earlier, carried a note ...
> 

I'm evaluating it from the osabi (pkgsrc term) point of view. I'm
targeting LLDB for 7.99.62+. If the kernel bump approach is reserved for
loadable kernel modules, I will follow this in future changes.

>  | Kernel bump delayed till introduction of PT_GETDBREGS/PT_SETDBREGS soon.
> 
> and as best I can see, neither that change, nor the ones around it,
> had any reason to require a bump.
> 
> kre
> 
> ps: kernel version bumps are good - we will soon approach 7.77.99 and at
> that point, a branch of netbsd-8 is kind of forced!
> 



signature.asc
Description: OpenPGP digital signature


Locking in cpu_lwp_free()

2017-02-20 Thread Kamil Rytarowski
I've noted that cpu_lwp_free in the current design must not sleep:


src/src/sys/kern/kern_lwp.c:1138

/*
 * We can no longer block.  At this point, lwp_free() may already
 * be gunning for us.  On a multi-CPU system, we may be off p_lwps.
 *
 * Free MD LWP resources.
 */
cpu_lwp_free(l, 0);


src/src/sys/kern/kern_exit.c:587

/* Verify that we hold no locks other than the kernel lock. */
LOCKDEBUG_BARRIER(&kernel_lock, 0);
/*
 * NOTE: WE ARE NO LONGER ALLOWED TO SLEEP!
 */
/*
 * Give machine-dependent code a chance to free any MD LWP
 * resources.  This must be done before uvm_lwp_exit(), in
 * case these resources are in the PCB.
 */
cpu_lwp_free(l, 1);


In the following ports we have sleepable locks

1. HPPA

sys/arch/hppa/hppa/vm_machdep.c:

183 void
184 cpu_lwp_free(struct lwp *l, int proc)
185 {
186 struct pcb *pcb = lwp_getpcb(l);
187
188 /*
189  * If this thread was using the FPU, disable the FPU and record
190  * that it's unused.
191  */
192
193 hppa_fpu_flush(l);
194 pool_put(&hppa_fppl, pcb->pcb_fpregs);
195 }

pool_put is calling mutex internally src/src/sys/kern/subr_pool.c:

void
pool_put(struct pool *pp, void *v)
{
struct pool_pagelist pq;
LIST_INIT(&pq);
mutex_enter(&pp->pr_lock);
pool_do_put(pp, v, &pq);
mutex_exit(&pp->pr_lock);
pr_pagelist_free(pp, &pq);
}

2. SPARC

src/sys/arch/sparc/sparc/vm_machdep.c:

291 /*
292  * Cleanup FPU state.
293  */
294 void
295 cpu_lwp_free(struct lwp *l, int proc)
296 {
297 struct fpstate *fs;
298
299 if ((fs = l->l_md.md_fpstate) != NULL) {
300 struct cpu_info *cpi;
301 int s;
302
303 FPU_LOCK(s);
304 if ((cpi = l->l_md.md_fpu) != NULL) {
305 if (cpi->fplwp != l)
306 panic("FPU(%d): fplwp %p",
307 cpi->ci_cpuid, cpi->fplwp);
308 if (l == cpuinfo.fplwp)
309 savefpstate(fs);
310 #if defined(MULTIPROCESSOR)
311 else
312 XCALL1(ipi_savefpstate, fs, 1 << 
cpi->ci_cpuid);
313 #endif
314 cpi->fplwp = NULL;
315 }
316 l->l_md.md_fpu = NULL;
317 FPU_UNLOCK(s);
318 }
319 }

FPU_LOCK() and FPU_UNLOCK wrap regular mutex
src/sys/arch/sparc/include/proc.h:

 65 /*
 66  * FPU context switch lock
 67  * Prevent interrupts that grab the kernel lock
 68  * XXX mrg: remove (s) argument
 69  */
 70 extern kmutex_t fpu_mtx;
 71
 72 #define FPU_LOCK(s) do {\
 73 (void)&(s); \
 74 mutex_enter(&fpu_mtx);  \
 75 } while (/* CONSTCOND */ 0)
 76
 77 #define FPU_UNLOCK(s)   do {\
 78 mutex_exit(&fpu_mtx);   \
 79 } while (/* CONSTCOND */ 0)
 80 #endif

My understanding is that these calls should be moved to cpu_lwp_free2.

I was following similar approach from hppa on amd64 with 8 CPUs and I
was able to trigger - during distribution build - crash, that LWP went
to sleep on lock and once was woken up it faced LSZOMB kernel assertions.



signature.asc
Description: OpenPGP digital signature


Re: LWP resume and suspend ptrace(2) API

2017-02-11 Thread Kamil Rytarowski
On 11.02.2017 17:18, Christos Zoulas wrote:
> In article <897028fd-f18a-9ec4-bd5f-3930f40dc...@gmx.com>,
> Kamil Rytarowski   wrote:
>>
>> There is one nit... this code (at least to my tests) cannot unstop a
>> thread that was created by a tracee with LWP_SUSPENDED.
>>
>> http://netbsd.org/~kamil/patch-00028-pt_suspend-pt_resume.txt-resume2
>> (man pages will be applied in next patch)
>>
>> Are there needed more actions to be performed? I had some trouble to
>> call lwp_continue from ptrace(2) and not managed to make it work.
> 
> Write a test and I will take a look.
> 

patch-00028-pt_suspend-pt_resume.txt-resume2 ships with resume2 test
triggering this. This test hangs waiting on not correctly unstopped thread.

I'm also not fully sure that resume1 is behaving always correctly. When
I added more synchronization handshakes I was receiving undetermined
results - once thread exited silenly and the other time it was
detectable with _lwp_wait(2).

>> There is undocumented behavior that passing LWP ID when a process has
>> single thread results in ignoring the passed value and detecting the
>> proper one. For example this code will always return proper value for a
>> single-threaded process, no matter what the LWP ID is.
>>
>> ptrace(PT_GETREGS, child, &r, 123)
>>
>> Can I streamline it and remove fallback to an existing LWP? I don't see
>> a "feature" in this "bug".
> 
> I think this was for backwards compatibility with unthreaded code. If
> it bothers you, remove it. It does not matter since the pthread(2) consumers
> are few, and since you are changing already a lot in interface...
> 

I will try to dig for historical references. If there were consumers of
this behavior, I will leave it as it is. So far new interfaces are
mostly backwards-compatible in terms of not breaking existing software
(I'm not aware of anything that was broken).

>> Can a user-space process contain in-kernel threads? Is this used for
>> compat with old userland? I was thinking about switching branch for
>> in-kernel-thread to KASSERT(). It would make the code cleaner.
> 
> What does that mean exactly? Do you mean kernel lwp's that are dedicated
> to kernel tasks, instead of kernel lwp started from userland to be used
> as in-process threads? You mean the LW_SYSTEM tests? Look in kern_lwp.c,
> there are KASSERTS for that already...
> 

Yes, LW_SYSTEM checks. I will try to dig for it.

>> Thank you for the initial review.
> 
> You are always welcome :-)
> 
> christos
> 




signature.asc
Description: OpenPGP digital signature


Re: LWP resume and suspend ptrace(2) API

2017-02-10 Thread Kamil Rytarowski


On 11.02.2017 04:15, Christos Zoulas wrote:
> On Feb 11, 12:49am, krytarow...@gmail.com (Kamil Rytarowski) wrote:
> -- Subject: LWP resume and suspend ptrace(2) API
> 
> | I'm proposing an API to restore the functionality to resume or suspend a
> | specified thread from execution.
> | 
> | This interface was implemented in the past in user-space inside
> | pthread(3) with the M:N thread model (with help from removed pthread_dbg).
> | 
> | http://netbsd.org/~kamil/patch-00028-pt_suspend-pt_resume.txt
> 
> - The second _lwp_continue should be _lwp_suspend in the man page.
> - I don't like the phrasing "Lock" and "Unlock". What are you locking here?
>   Why not suspend/resume execution; or prevent/allow executiom.
> 

I will apply these notes.

> | This code is close to FreeBSD and shares the same request names
> | (PT_RESUME and PT_SUSPEND), however on NetBSD we pass the full pair of
> | tracee's pid_t and thread's lwpid_t. FreeBSD specifies just thread ID,
> | which is insufficient on NetBSD, as a single tracer can control multiple
> | tracees and face duplicated lwpid_t.
> | 
> | I've added an interface to detect if a specific LWP has been suspended
> | (or not) with extending the PT_LWPINFO interface with a new pl_event
> | value PL_EVENT_SUSPENDED (next to PL_EVENT_NONE and PL_EVENT_SIGNAL).
> | 
> | There is a new check preventing deadlocks and ptrace(2) can set with
> | this patch new errno EDEADLK. I haven't checked the existing code but it
> | appears that we can deadlock tracee with current PT_CONTINUE and friends.
> 
> Otherwise nicely done :-)
> 

There is one nit... this code (at least to my tests) cannot unstop a
thread that was created by a tracee with LWP_SUSPENDED.

http://netbsd.org/~kamil/patch-00028-pt_suspend-pt_resume.txt-resume2
(man pages will be applied in next patch)

Are there needed more actions to be performed? I had some trouble to
call lwp_continue from ptrace(2) and not managed to make it work.


There is undocumented behavior that passing LWP ID when a process has
single thread results in ignoring the passed value and detecting the
proper one. For example this code will always return proper value for a
single-threaded process, no matter what the LWP ID is.

ptrace(PT_GETREGS, child, &r, 123)

Can I streamline it and remove fallback to an existing LWP? I don't see
a "feature" in this "bug".


Can a user-space process contain in-kernel threads? Is this used for
compat with old userland? I was thinking about switching branch for
in-kernel-thread to KASSERT(). It would make the code cleaner.

Thank you for the initial review.



signature.asc
Description: OpenPGP digital signature


LWP resume and suspend ptrace(2) API

2017-02-10 Thread Kamil Rytarowski
I'm proposing an API to restore the functionality to resume or suspend a
specified thread from execution.

This interface was implemented in the past in user-space inside
pthread(3) with the M:N thread model (with help from removed pthread_dbg).

http://netbsd.org/~kamil/patch-00028-pt_suspend-pt_resume.txt

This code is close to FreeBSD and shares the same request names
(PT_RESUME and PT_SUSPEND), however on NetBSD we pass the full pair of
tracee's pid_t and thread's lwpid_t. FreeBSD specifies just thread ID,
which is insufficient on NetBSD, as a single tracer can control multiple
tracees and face duplicated lwpid_t.

I've added an interface to detect if a specific LWP has been suspended
(or not) with extending the PT_LWPINFO interface with a new pl_event
value PL_EVENT_SUSPENDED (next to PL_EVENT_NONE and PL_EVENT_SIGNAL).

There is a new check preventing deadlocks and ptrace(2) can set with
this patch new errno EDEADLK. I haven't checked the existing code but it
appears that we can deadlock tracee with current PT_CONTINUE and friends.


PT_[GS]ET_SIGINFO in ptrace(2)

2017-01-03 Thread Kamil Rytarowski
The current implementation of ptrace(2) has missing interface to
retrieve and fake a value of siginfo_t of a signal that was interjected
by a tracer. The former is required to help to determine exact event
that happened in the code and the latter to programmatically fake routed
signal to tracee in terms of si_code and other values as described in
siginfo(5). Both accessors make use in debuggers.

Code:
http://netbsd.org/~kamil/patch-00026-pl_siginfo.5.txt

The PT_GET_SIGINFO call is destinated to be used now in LLDB
(pkgsrc-wip/lldb-netbsd) in the NetBSD Process Plugin. The functionality
of PT_SET_SIGINFO is planned to be used (by myself) long-term.

I've added two new dedicated ptrace(2) calls for this
#define PT_SET_SIGINFO  19  /* set signal state, defined below */
#define PT_GET_SIGINFO  20  /* get signal state, defined below */

I've added new structure ptrace_siginfo to be used to communicate
user-space and kernel-space with the following shape:
/*
 * Signal Information structure
 */
typedef struct ptrace_siginfo {
siginfo_t   psi_siginfo;/* signal information structure */
lwpid_t psi_lwpid;  /* destination LWP of the signal
 * value 0 means the whole process
 * (route signal to all LWPs) */
} ptrace_siginfo_t;


This interface is close to the Linux one:

  PTRACE_GETSIGINFO (since Linux 2.3.99-pre6)
Retrieve information about the signal that caused the stop.  Copy a
siginfo_t structure (see sigaction(2)) from the tracee to  the
address data in the tracer.  (addr is ignored.)

  PTRACE_SETSIGINFO (since Linux 2.3.99-pre6)
Set  signal  information: copy a siginfo_t structure from the
address data in the tracer to the tracee.  This will affect only
signals that would normally be delivered to the tracee and were
caught by the tracer.  It may be difficult to tell these normal
signals  from  synthetic signals generated by ptrace() itself.
(addr is ignored.)

On FreeBSD there exists only an interface to retrieve siginfo_t in
per-thread manner as a member pl_siginfo of the ptrace_lwpinfo
structure. This approach isn't applicable to the current NetBSD design
as PT_LWPINFO is used to iterate over all threads - not just to retrieve
the one that caused process to be interrupted. Also it has no interface
to inject new faked signal.

I'm attaching three basic ATF tests:
 - siginfo1 - test PT_GET_SIGINFO
 - siginfo2 - test PT_GET_SIGINFO and PT_SET_SIGINFO without changing
signal's information
 - siginfo3 - test PT_GET_SIGINFO and PT_SET_SIGINFO with faking
signal's information

All the ATF tests are passing correctly.


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
On 15.12.2016 19:49, Andrew Cagney wrote:
> 
> On 15 December 2016 at 13:23, Kamil Rytarowski  <mailto:n...@gmx.com>> wrote:
> 
> BTW. I'm having some DWARF related questions, if I may reach you in a
> private mail?
> 
> 
> Better is
> http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org so you
> don't have me as a bottle neck.
> 
> Andrew
> 

OK, I will keep it in mind, thanks!



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
On 15.12.2016 19:30, Andrew Cagney wrote:
> 
> On 13 December 2016 at 12:16, Kamil Rytarowski  <mailto:n...@gmx.com>> wrote:
> 
> >> 5. Do not allow to mix PT_STEP and hardware watchpoint, in case of
> >> single-stepping the code, disable (it means: don't set) hardware
> >> watchpoints for threads. Some platforms might implement single-step 
> with
> >> hardware watchpoints and managing both at the same time is generating
> >> extra pointless complexity.
> 
> 
> Is this wise?  I suspect it might be better to just expose all the hairy
> details and let the client decide if the restriction should apply.
> (to turn this round, if the details are not exposed, then clients will
> wonder why their platform is being crippled).

This is subject to change. I'm discussing it with debugger developers on
LLDB. They wish to have as many data available about
breakpoint/watchpoint as possible. This implies request for dedicated
si_code for hardware assisted traps.



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
On 15.12.2016 18:45, Andrew Cagney wrote:
> [see end]


Please see inline.

> 
> On 13 December 2016 at 12:16, Kamil Rytarowski  <mailto:n...@gmx.com>> wrote:
> 
> On 13.12.2016 04:12, Valery Ushakov wrote:
> > On Tue, Dec 13, 2016 at 02:04:36 +0100, Kamil Rytarowski wrote:
> >
> >> The design is as follows:
> >>
> >> 1. Accessors through:
> >>  - PT_WRITE_WATCHPOINT - write new watchpoint's state (set, unset, 
> ...),
> >>  - PT_READ_WATCHPOINT - read watchpoints's state,
> >>  - PT_COUNT_WATCHPOINT - receive the number of available watchpoints.
> >
> > Gdb supports hardware assisted watchpoints.  That implies that other
> > OSes have existing designs for them.  Have you studied those existing
> > designs?  Why do you think they are not suitable to be copied?
> >
> 
> They are based on the concept of exporting debug registers to tracee's
> context (machine context/userdata/etc). FreeBSD exposes MD-specific
> DBREGS to be set/get by a user, similar with Linux and with MacOSX.
> 
> GDB supports hardware and software assisted watchpoints. Software ones
> are stepping the code and checking each instruction, hardware ones make
> use of the registers.
> 
> I propose to export an interface that is not limited to one type of
> hardware assisted action, while it can be fully used for hardware
> watchpoints (if CPU supports it). This interface will abstract
> underlying hardware specific capabilities with a MI ptrace(2) calls (but
> MD-specific ptrace_watchpoint structure).
> 
> These interfaces are already platform specific and aren't shared between
> OSes.
> 
> 
> That isn't true (or at least it shouldn't).
> 
> While access to the registers is OS specific, the contents of the
> registers, and their behaviour is not.  Instead, that is specified by
> the Instruction Set Architecture.
> For instance, FreeBSD's gdb/i386fbsd-nat.c uses the generic
> gdb/x86-nat.c:x86_use_watchpoints() code.
> 
>  

Thank you for your pointer.

It's similar in LLDB, that there are some assumptions that I consider
fragile (like dbregs in mcontext). However in the end I think the result
is the same.

One of my motivations was to use a simplified interface on the
user-level and more easily integrate it within existing applications
with builtin debuggers like radare2. Another motivation was to use
breakpoints without mprotect restrictions.. and handing few small
breakpoints is easier with hardware assisted watchpoints, and more
tricky with software ones.

> 
> Some time ago I checked and IIRC the only two users of these interfaces
> were GDB and LLDB, I implied from this that there is no danger from
> heavy patching 3rd party software.
> 
> 
> I'm not sure how to interpret this.  Is the suggestion that, because
> there are only two consumers, hacking them both will be easy; or
> something else?  I hope it isn't.  Taking on such maintenance has a
> horrendous cost.
> 

So, please help to promote a local debuggers' developer to be aboard and
take the maintenance cost sensu largo.

> Anyway, lets look at the problem space.  It might help to understand why
> kernel developers tend to throw up their hands.
> 
> First lets set the scene:
> 
> - if we're lucky we have one hardware watch-point, if we're really lucky
> there's more than one
> - if we're lucky it does something, if we're really lucky it does what
> the documentation says
> 
> which reminds me:
> 
> - if we're lucky we've got documentation, if we're really lucky we've
> correct and up-to-date errata explaining all the hair brained
> interactions these features have with other hardware events
> 
> and now lets consider this simple example, try to watch c.a in:
> 
> struct { char c; char a[3]; int32_t i; int64_t j; } c;
> 
> Under the proposed model (it looks a lot like gdb's remote protocol's Z
> packet) it's assumed this will allocate one watch-point:
> 
> address=&c.a, size=3
> 
> but wait, the hardware watch-point registers have a few, er, standard
> features:
> 
> - I'll be kind, there are two registers
> - size must be power-of-two (lucky size==4 isn't fixed)
> - address must be size aligned (lucky addr & 3 == 0 isn't fixed)
> - there are separate read/write bits (lucky r+w isn't fixed)
> 
> so what to do?  With this hardware we can:
> 
> - use two watch-point registers (making your count meaningless), so that
> accesses only apply to the address in question
> 
> 

Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
Hello,

Please see inline, I tried to refer to other questions offlist.

On 15.12.2016 16:42, Valery Ushakov wrote:
> Again, you don't provide any details.  What extra logic?  Also, what
> are these few dozens of instructions you are talking about?  I.e. what
> is that extra work you have to do for a process-wide watchpoint that
> you don't have to do for an lwp-specific watchpoint on each return to
> userland?
> 
> 

1. Complexity is adding extra case in ptrace_watchpoint structure,
adding there a way to specify per-thread or per-process. If there
someone wants to set per-thread watchpoints inside the process
structure.. there would be need to have a list of available watchpoints,
that would scale to number of watchpoints possible x number of threads list.

2. Complexity on returning to userland - need to lock structure process
in userret(9) and check every watchpoint if it's process-wide or
dedicated for the thread.

I implemented it originally per process and I finally decided to throw
the per-process vs per-thread logic away, out of the kernel and expose
watchpoints (or technically bitmasks of available debug registers) to
userland.

It's easier to check perlwp local structure and end up with up to 4
fields there, than lock a list and iterate over N elements. Every thread
has also dedicated bit in its property indicating whether it has
attached watchpoints.

From user-land point of view, and management it's equivalent. With the
difference that debugger needs to catch thread creation and apply
desired watchpoint to it.

Why bitmasks and not raw registers? On some level there is need to check
if the composed combination is valid in the kernel - dividing
user-settable bits from registers to bitmask is needed on some level
anyway, and while it's possible to be done in kernel, why not to export
it to userland?

I've found it easier to be reused in 3rd party software.



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
On 15.12.2016 19:45, Andrew Cagney wrote:
> 
> 
> On 15 December 2016 at 13:22, Eduardo Horvath  > wrote:
> 
> 
> On Thu, 15 Dec 2016, Andrew Cagney wrote:
> 
> > Might a better strategy be to first get the registers exposed, and 
> then, if
> > there's still time start to look at an abstract interface?
> 
> That's one way of looking at it.
> 
> Another way is to consider that watchpoints can be implemented through
> careful use of the MMU.
> 
> 
> Yes, HP implemented something like this with wildebeast (gdb fork) on HP-UX.

Can it work for short data fields like shorts, integers? I know there
are features like mprotect(2) that perhaps could be used for the same
purpose... to some extend.




signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-13 Thread Kamil Rytarowski
On 13.12.2016 04:12, Valery Ushakov wrote:
> On Tue, Dec 13, 2016 at 02:04:36 +0100, Kamil Rytarowski wrote:
> 
>> The design is as follows:
>>
>> 1. Accessors through:
>>  - PT_WRITE_WATCHPOINT - write new watchpoint's state (set, unset, ...),
>>  - PT_READ_WATCHPOINT - read watchpoints's state,
>>  - PT_COUNT_WATCHPOINT - receive the number of available watchpoints.
> 
> Gdb supports hardware assisted watchpoints.  That implies that other
> OSes have existing designs for them.  Have you studied those existing
> designs?  Why do you think they are not suitable to be copied?
> 

They are based on the concept of exporting debug registers to tracee's
context (machine context/userdata/etc). FreeBSD exposes MD-specific
DBREGS to be set/get by a user, similar with Linux and with MacOSX.

GDB supports hardware and software assisted watchpoints. Software ones
are stepping the code and checking each instruction, hardware ones make
use of the registers.

I propose to export an interface that is not limited to one type of
hardware assisted action, while it can be fully used for hardware
watchpoints (if CPU supports it). This interface will abstract
underlying hardware specific capabilities with a MI ptrace(2) calls (but
MD-specific ptrace_watchpoint structure).

These interfaces are already platform specific and aren't shared between
OSes.

Some time ago I checked and IIRC the only two users of these interfaces
were GDB and LLDB, I implied from this that there is no danger from
heavy patching 3rd party software.

> 
>> 4. Do not set watchpoints globally per process, limit them to
>> threads (LWP). [...]  Adding process-wide management in the
>> ptrace(2) interface calls adds extra complexity that should be
>> pushed away to user-land code in debuggers.
> 
> 
> I have no idea what amd64 debug registers do, but this smells like you
> are exposing in the MI interface some of those details.  I don't think
> this can be done in hardware on sh3, e.g.  
> 

No, I'm not exposing anything in MI code - except the number of
available watchpoints defined by MD code (but this information goes
through a function called from MD part).

The functions are hidden under __HAVE_PTRACE_WATCHPOINTS ifdefs.

"watchpoint" terminology can be misleading, but since I couldn't get
better, I called this interface with this word.

> Also, you quite often have no idea which thread stomps on your data,
> so I'd imagine most of the time you do want a global watchpoint.

This is true.

With the proposed interface per-thread a debugger can set the same
hardware watchpoint for each LWP and achieve the same result. There are
no performance or synchronization challenges as watchpoints can be set
only when a process is stopped.

In my older code I had logic per-process to access watchpoints, but it
required extra logic in thread-specific functions to access process
specific data. I assumed that saving few dozens of CPU cycles before
each thread entering user-space is precious. (I know it's a small
optimization, however it's for free)

A user-interface of a debugger ("from a user point of view") is agnostic
to both approaches.

> Note, that if you want to restrict your watchpoint to one thread, you
> can probably (I don't know and I haven't checked) do this with gdb
> "command" that "continue"s if it's on the wrong thread.
> 

The proposed approach is just on the level of ptrace(2) implementation,
any debugger is free to support free to implement it in any way it's
possible, while making an option to set watchpoints per thread.

I don't want to appear like escaping the choice, but I'm trying to
propose an implementation that was easier to be applied on the kernel
side. From a userland point of view I think it does not matter.

> 
>> 5. Do not allow to mix PT_STEP and hardware watchpoint, in case of
>> single-stepping the code, disable (it means: don't set) hardware
>> watchpoints for threads. Some platforms might implement single-step with
>> hardware watchpoints and managing both at the same time is generating
>> extra pointless complexity.
> 
> I don't think I see how "extra pointless complexity" follows.
> 

1. At least in MD x86 specific code, watchpoint traps triggered with
stepped code are reported differently to those reported with plain steps
and also differently to plain hardware watchpoint traps. They are 3rd
type of a trap.

2. Single stepping can be implemented with hardware assisted watchpoints
(technically breakpoints) on the kernel side in MD. And if so, trying to
apply watchpoints and singlestep will conflict and this will need
additional handling on the kernel side.

To oppose extra complexity I propose to make stepping and watchpoints
sep

ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-12 Thread Kamil Rytarowski
I've prepared interface for hardware watchpoints:

http://netbsd.org/~kamil/patch-00023-ptrace-watchpoints.txt

For the purpose of this task I propose to call monitoring operations of
data as "watchpoints" and monitoring of instruction's executaion as
"breakpoints". However this interface is not limited to neither, as a
port might expose any other type of events to be monitored (like branch
instructions). Sometimes I'm referring to "hardware" watchpoints it
means just any type of trap realized with hardware association without
need for software trap.



My goals of this project:
 - restrict code in the kernel-side to functional minimum,
 - restrict performance impact to minimum,
 - security - don't expose weaker points,
 - make common MI parts where applicable,
 - if something is doable with a userlevel debugger code, don't put
extra functions to the kernel.

Benefits of this project:
 - hardware breakpoints without violating mproctect restrictions,
 - make possible observability of data changes,
 - basic and common interface to set breakpoints within few lines of
code [1].

[1] Software breakpoints are definitely more complex, as they overwrite
target's .text section inserting there instructions to generate traps..
and then they need to move PC backwards and insert original instruction
for target...


The design is as follows:

1. Accessors through:
 - PT_WRITE_WATCHPOINT - write new watchpoint's state (set, unset, ...),
 - PT_READ_WATCHPOINT - read watchpoints's state,
 - PT_COUNT_WATCHPOINT - receive the number of available watchpoints.

2. Hardware watchpoint API is designed to be MI with MD specialization.
MI parts:
 - ptrace(2) calls as mentioned in 1.

 - struct ptrace_watchpoint of the following shape:

/*
 * Hardware Watchpoints
 *
 * MD code handles switch informing whether a particular watchpoint is
enabled
 */
typedef struct ptrace_watchpoint {
int pw_index;   /* HW Watchpoint ID (count from 0) */
lwpid_t pw_lwpid;   /* LWP described */
struct mdpw pw_md;  /* MD fields */
} ptrace_watchpoint_t;

 - example specialization for amd64:

/*
 * This MD structure translates into x86_hw_watchpoint
 *
 * pw_address - 0 represents disabled hardware watchpoint
 *
 * conditions:
 * 0b00 - execution
 * 0b01 - data write
 * 0b10 - io read/write (not implemented)
 * 0b11 - data read/write
 *
 * length:
 * 0b00 - 1 byte
 * 0b01 - 2 bytes
 * 0b10 - undefined (8 bytes in modern CPUs - not implemented)
 * 0b11 - 4 bytes
 *
 * Helper symbols for conditions and length are available in 
 *
 */
struct mdpw {
void*md_address;
int  md_condition;
int  md_length;
};

 - I put md_address and others field to MD part as it's purely MD
specific. I wanted to leave room for possible watchpoints of types
without specified address.


3. Do not expose CPU Debug Registers to userland. I finally decided to
restrict the underlying hardware implementation (based on CPU Debug
Registers) to kernel only. In FreeBSD these registers are a part of
machine context (mcontext), I think it's a misdesign as it should be
limited to the tracer only and it has no use-case in the tracee. CPU
Debug Registers can expose privileged gates and in theory can try to
alter watchpoints set by a debugger on the fly.. AMD64 Debug Registers
are rather part of the tracer context observing a tracee.

4. Do not set watchpoints globally per process, limit them to threads
(LWP). In general kernel hardware watchpoints must be set for all CPUs
and userland watchpoints must be limited to CPU running a thread. Adding
process-wide management in the ptrace(2) interface calls adds extra
complexity that should be pushed away to user-land code in debuggers.

5. Do not allow to mix PT_STEP and hardware watchpoint, in case of
single-stepping the code, disable (it means: don't set) hardware
watchpoints for threads. Some platforms might implement single-step with
hardware watchpoints and managing both at the same time is generating
extra pointless complexity.

6. I have no strong opinions on si_code, on amd64 we set TRAP_TRACE for
hardware watchpoints. POSIX specifies two types: TRAP_BRKPT and
TRAP_TRACE. I don't want to introduce new third type as: it might be
unportable across ports, its usability is questionable (I would limit it
myself to curiosity without significant real-life impact).

7. Linux has interfaces to allocate (reserve) watchpoints for in-kernel
usage and user-land one... I think it's unnecessary complexity for
little gain. In the case that someone would use in-kernel hardware
watchpoints I just recommend to stop setting them for userland (it's
doable with a single if() condition...).

8. The design for the amd64 port is as follows:
 - track watchpoints in a private LWP structure, within a table,
 - all threads before entering userland call userret() - at the end of
this function check if a thread has active watchpoints and isn

Re: ptrace(2) thoughts and design

2016-12-01 Thread Kamil Rytarowski
On 26.11.2016 04:31, Kamil Rytarowski wrote:
> On 22.11.2016 07:05, Kamil Rytarowski wrote:
>>
>> On 21.11.2016 05:23, Kamil Rytarowski wrote:
>> [...]
>>> My plan for the coming days:
>>>
>>> A. Add introductory man-pages for pthread_dbg, currently just for the
>>> used functions in the existing ATF tests, as other interfaces might be
>>> altered later... or just dropped as unneeded. This library keeps having
>>> dept from Scheduler Activation times, and that shall be just revamped.
>>>
>>
>> I consider this finished (as in good enough for now). I switch to CPU
>> (amd64) debug registers tomorrow.
>>
>> I plan to keep verifying and improving pthread_dbg(3) later during work
>> on LLDB.
>>
>>> B. Implement debug registers, base this code on FreeBSD. Add ATF tests,
>>> commit to master repository.
>>>
>>> C. Implement locally PT_SUSPEND and keep it on a local branch.
>>>
>>> D. Implement locally PTRACE_VFORK (right now just for calling vfork(2)
>>> and for creating a child) and keep it on a local branch.
>>>
>>> ... switch to LLDB
>>>
>>
> 
> I would trade PT_SUSPEND and PTRACE_VFORK for now for more extensive
> implementation of cpu debug registers on x86 - for i386 and amd64 at
> once. It's more invasive than I estimated and since it's started I will
> finish it.
> 
> C. & D. weren't planned to be committed before implementing in lldb anyway.
> 

I've published internally draft for debug registers on amd64 and I'm
working towards finalization of it. I need to add amd64 (and i386)
specific CPU Debug registers tests in ATF. Once committed to src I will
publish a summary on the TNF blog.



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) thoughts and design

2016-11-25 Thread Kamil Rytarowski
On 22.11.2016 07:05, Kamil Rytarowski wrote:
> 
> On 21.11.2016 05:23, Kamil Rytarowski wrote:
> [...]
>> My plan for the coming days:
>>
>> A. Add introductory man-pages for pthread_dbg, currently just for the
>> used functions in the existing ATF tests, as other interfaces might be
>> altered later... or just dropped as unneeded. This library keeps having
>> dept from Scheduler Activation times, and that shall be just revamped.
>>
> 
> I consider this finished (as in good enough for now). I switch to CPU
> (amd64) debug registers tomorrow.
> 
> I plan to keep verifying and improving pthread_dbg(3) later during work
> on LLDB.
> 
>> B. Implement debug registers, base this code on FreeBSD. Add ATF tests,
>> commit to master repository.
>>
>> C. Implement locally PT_SUSPEND and keep it on a local branch.
>>
>> D. Implement locally PTRACE_VFORK (right now just for calling vfork(2)
>> and for creating a child) and keep it on a local branch.
>>
>> ... switch to LLDB
>>
> 

I would trade PT_SUSPEND and PTRACE_VFORK for now for more extensive
implementation of cpu debug registers on x86 - for i386 and amd64 at
once. It's more invasive than I estimated and since it's started I will
finish it.

C. & D. weren't planned to be committed before implementing in lldb anyway.



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) thoughts and design

2016-11-21 Thread Kamil Rytarowski

On 21.11.2016 05:23, Kamil Rytarowski wrote:
[...]
> My plan for the coming days:
> 
> A. Add introductory man-pages for pthread_dbg, currently just for the
> used functions in the existing ATF tests, as other interfaces might be
> altered later... or just dropped as unneeded. This library keeps having
> dept from Scheduler Activation times, and that shall be just revamped.
> 

I consider this finished (as in good enough for now). I switch to CPU
(amd64) debug registers tomorrow.

I plan to keep verifying and improving pthread_dbg(3) later during work
on LLDB.

> B. Implement debug registers, base this code on FreeBSD. Add ATF tests,
> commit to master repository.
> 
> C. Implement locally PT_SUSPEND and keep it on a local branch.
> 
> D. Implement locally PTRACE_VFORK (right now just for calling vfork(2)
> and for creating a child) and keep it on a local branch.
> 
> ... switch to LLDB
> 



signature.asc
Description: OpenPGP digital signature


ptrace(2) thoughts and design

2016-11-20 Thread Kamil Rytarowski
In short we are already in a good position with the existing ptrace(2)
interfaces, as most necessary functions in LLDB are representable by
existing NetBSD specific interfaces.

I didn't want to implement (commit) new interfaces prior matching them
with real-life software like LLDB in a real and testable code-path.


1. fork(2) events are implementable with EVENT_MASK (PT_SET_EVENT_MASK,
PT_GET_EVENT_MASK, PT_GET_PROCESS_STATE). Event name is PTRACE_FORK.

2. vfork(2) events should be implemented with EVENT_MASK and event name
PTRACE_VFORK using the same ptrace(2) functions and structures like fork(2).

I'm holding on, as there is additional event used in the FreeBSD and
Linux world to report vfork(2) parent's continuation after child's
termination. Both events can be combined with the same PTRACE_VFORK
type, just set pe_other_pid to an invalid number when child exits to
distinguish them.

3. New interfaces for PT_LWPINFO are not needed, as this call already
can iterate over a list of all lwps. This ptrace(2) call has different
meaning to the FreeBSD one, as in FreeBSD it returns the lwp that
stopped the process (and the one that switched to debugger), in NetBSD
it retrieves the next lwp with the following rule:

pl_lwpid contains a thread LWP ID.  Information is
returned for the thread following the one with the
specified ID in the process thread list, or for the first
thread if pl_lwpid is 0.  Upon return pl_lwpid contains the
LWP ID of the thread that was found, or 0 if there is no
thread after the one whose LWP ID was supplied in the call.

This interface is planned to be used once pthread_dbg will be
unavailable, as there are dedicated functions for the same purpose
(namely td_thr_iter()).

4. There is no need to extend struct ptrace_lwpinfo to size of
FreeBSD... as there is pthread_dbg that takes this job to inspect thread
(like thread name, sigmask, ...).

5. I want to use pthread_dbg for the following operations:
 - inspect threads (retrieve its name, sigmask, whether there are
waiters etc)
 - set/get registers per lwp
 - suspend/resume thread

6. Implementation of the thread resume/suspend operation in thread_dbg
is trivial (I keep it locally) and it's based on new callback functions
SUSPEND() and RESUME():
 - for local process just call there _lwp_continue(2) and _lwp_suspend(2),
 - for remote ptrace(2) call with PT_CONTINUE and [temporarily missing]
PT_SUSPEND.

7. For non-pthread world keep fallback to native ptrace(2) interfaces
with the same features like in pthread_dbg minus inspection of threads.
Keep suspension, resume capability, get/set regs.. skip investigation of
sigmask, name of a process etc.

Usage of pthread_dbg largely simplifies interface between kernel and
user-space and deduplicates information, no need to pass it via the
ptrace_lwpinfo structure... just access it in pthread_t via pthread_dbg.

The main benefit of pthread_dbg is that we are in direct control of
pthread_t state and can read waiters for a thread, thread specific data etc.

8. If there will be need to retrieve more information on lwp, especially
in a single-threaded program (or one using lwp interface directly) I
would extend ptrace(2) with retrieving struct lwp in PT_LWPINFO in a new
field, next to pl_lwpid and pl_event... but I want to skip it for later
to match real-life needs in real debuggers first.

9. To achieve thread suspension and continuation in 5. and 6., I need to
add PT_SUSPEND - as a counterpart to PT_CONTINUE. I want to add there
two modes analogously to PT_SUSPEND:
 - for positive values, suspend the whole process with pid "value"
 - for negative values, suspend lwp with identity "-value"

10. I'm evaluating addition of new types of pl_event (in ptrace_lwpinfo)
- it describes what stopped a thread. Next to PL_EVENT_NONE and
PL_EVENT_SIGNAL, I would add:
 - PL_EVENT_TRACER thread suspended by a debugger calling PT_SUSPEND
 - PL_EVENT_LWP thread suspended by _lwp_suspend(2) from user-space

Like previously, I'm not sure whether these new PT_EVENT types will be
used and useful at all in real-application so I will hold on with them.

11. Add support for CPU debug registers. Unlike the above parts, this
one could be clearly ported as is from FreeBSD right now. It's also very
useful and needed to set watchpoints in memory.

These are currently used by all supported LLDB targets but NetBSD, so:
Linux, Android, MacOSX, Windows, FreeBSD.

12. There are code paths for the SIGINFO event in LLDB, I haven't been
evaluating it so far, as it looks like a low priority for now. In worst
scenario there will be need to add new EVENT_TYPE for SIGINFO and pass
siginfo struct there, but it's not certain and I delayed it for later
(if ever). For the first look we can just capture regular signal,
compare it with regular functions and handle ksiginfo struct with..
perhaps wait6(2)-like function. It's not researched and for now skipped.

My plan for the coming days:

A. Add introductory man-pages for pthread_dbg, currently

Re: LLGS for Free/NetBSD (was: Re: [PATCH] D25756: FreeBSD ARM support for software single step.)

2016-10-24 Thread Kamil Rytarowski
On 24.10.2016 20:38, Ed Maste wrote:
> On 24 October 2016 at 06:26, Pavel Labath  wrote:
>>
>> It's not my place to tell you how to work, but I'd recommend a
>> different approach to this. If you base your work on the current
>> FreeBSD in-process plugin, then when you get around to actually
>> implementing remote support, you will find that you will have to
>> rewrite most of what you have already done to work with lldb-server,
>> as it uses completely different class hierarchies and everything. I'd
>> recommend starting with lldb-server right away. It's going to be a bit
>> more work as (I assume) freebsd implementation is closer to what you
>> need than linux, but I think it will save you time in the long run. I
>> can help you with factoring out any linux-specific code that you
>> encounter.
> 
> I definitely second the approach Pavel suggests here, and am happy to
> work with others on refactoring the Linux lldb-server so that we can
> get it to support both FreeBSD and NetBSD at the same time.
> 
> A direct port of the current FreeBSD support probably would result in
> a basic level of support running sooner, but that work would be
> largely thrown away in a future migration to lldb-server.
> 

I will take your recommended path as it will lead to the same goal.

I will try to shorten my initial work on ptrace(2) leaving additional
features+tests for later and jump to lldb-server as soon as possible.

For start, before switching to process plugin stage is to extend NetBSD
ptrace(2) with the following features:

 - PT_LWPINFO extend struct ptrace_lwpinfo with additional fields used
in LLDB in the current FreeBSD process code (pl_flags, pl_child_pid,
pl_siginfo),

 - PT_GETNUMLWPS - number of kernel threads associated with the traced
process,

 - PT_GETLWPLIST - get the current thread list,

 - PT_SUSPEND - suspend the specified thread,

 - PT_RESUME - resume the specified thread.

I need to add basic tests for new ptrace(2) calls in our automated test
infrastructure in order to get this code accepted.

I will reschedule debug registers and additional ptrace(2) calls for the
end, if time will permit.

I will also add support in LLDB for handling NetBSD Real-Time signals
(SIGRTMIN..SIGRTMAX) as it was already implemented during the latest
GSoC for NetBSD (thanks Google!).

I might need some guidance from LLDB developers (I prefer via IRC and
the dedicated LLDB channel) and maybe proof reading of patches and
debugging issues. I consider that the difficult part is not adapting
FreBSD or Linux specific implementation for NetBSD, but taking
everything to work.

My ultimate deadline for the overall LLDB work is February 28th, 2017 -
as I'm switching to Swift port for NetBSD *.



This work is sponsored by The NetBSD Foundation. If you like it, please
consider supporting it by making a donation.

* http://blog.netbsd.org/tnf/entry/funded_contract_2016_2017



signature.asc
Description: OpenPGP digital signature


Re: Generic crc32c support in kernel?

2016-08-13 Thread Kamil Rytarowski


On 13.08.2016 23:17, Thor Lancelot Simon wrote:
> On Fri, Aug 12, 2016 at 07:37:50PM +, paul_kon...@dell.com wrote:
>>
>> It seems sensible.  It could be done by a common CRC routine that takes a 
>> table pointer argument, then the two specific routines are just wrappers.
>>
> 
> It would be really nice to not pay the function call overhead on modern
> Intel CPUs where this is a single instruction.  Not sure this is possible
> in a generic kernel without resorting to binary patch, but we do that in
> other places...
> 
> Thor
> 
A good resource for CRC32C offload in hardware is the DPDK code. There
is AMD64 (SSE) and ARMv8.

This code is BSD-licensed.

http://dpdk.org/browse/dpdk/tree/lib/librte_hash/rte_hash_crc.h

It might be taken as-is with adaptions.



signature.asc
Description: OpenPGP digital signature


Re: FWIW: sysrestrict

2016-07-26 Thread Kamil Rytarowski


On 23.07.2016 10:36, Maxime Villard wrote:
> Eight months ago, I shared with a few developers the code for a kernel
> interface [1] that can disable syscalls in user processes.
> 
> The idea is the following: a syscall bitmap is embedded into the ELF binary
> itself (in a note section, like PaX), and each time the binary performs a
> syscall, the kernel checks whether the syscall in question is allowed in
> the bitmap.
> 
> In details:
>  - the ELF section is a bitmap of 64 bytes, which means 512 bits, the
>number of syscalls. 0 means allowed, 1 means restricted.
>  - in the proc structure, 64 bytes are present, just a copy of the
>ELF section.
>  - when a syscall is performed, the kernel calls sysrestrict_enforce
>with the proc structure and the syscall number, and gives a look
>at the bitmap to make sure it is allowed. If it isn't, the process
>is killed.
>  - a new syscall is added, sysrestrict, so that programs can restrict
>a syscall at runtime. This might be useful, particularly if a
>program calls a syscall once and wants to make sure it is not
>allowed any longer.
>  - a userland tool (that I didn't write) can add and update such an ELF
>section in the binary.
> 
> This interface has the following advantages over most already-existing
> implementations:
>  - it is system-independent, it could almost be copied as-is in FreeBSD.
>  - it is syscall-independent, we don't need to patch each syscall.
>  - it does not require binaries to be recompiled.
>  - the performance cost is low, if not non-existent.
> 
> I've never tested this code. But in case it inspires or motivates someone.
> 
> [1] http://m00nbsd.net/garbage/sysrestrict/

I like this approach of not shipping external toolchain for new ABI
(CloudABI) and not patching and rebuilding software (pledge).

About the restrictions with paths (like prohibiting/permitting $HOME or
/etc access), how about making it a separate interface? It's currently
built into the pledge() interface:

"int pledge(const char *promises, const char *paths[]);"

That way people can use one or the other mechanism, or both. I think it
could also make sense to have compatibility support with the pledge()
interface - with an external libpledge library. To achieve this it would
be needed to have a capability to drop access to previously allowed
syscalls by an executable.



signature.asc
Description: OpenPGP digital signature


Re: Audio - In kernel audio mixing

2016-05-15 Thread Kamil Rytarowski
On 15.05.2016 22:23, Timo Buhrmester wrote:
>> I believe that the vaudio approach is better and wanted to start a 
>> discussion 
>> about in kernel-mixing and hopefully which approach (if any) should be 
>> included in NetBSD in future.
> A third option would be taking OpenBSD's sndiod (which we have in
> pkgsrc/wip); it seems rather sane and it's probably not all too much
> work to make support our OSS (there are two or three ioctls missing).
> 

Onno was working on it. I got positive feedback on sndio from ardour
developers.


Re: Improvements in amd64

2016-05-13 Thread Kamil Rytarowski
On 13.05.2016 12:53, Maxime Villard wrote:
> I've committed several improvements in amd64 these last days.
> 

Thank you for working on it!

One question, have you got plans for W^X stacks? I'm asking because I
still find trampolines/nested functions useful.


Re: Revamping directory structure of acpica

2015-12-10 Thread Kamil Rytarowski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 22.10.2015 03:04, Kamil Rytarowski wrote:
> I request to synchronize our in-tree directory structure with
> upstream acpica.
> 
> At the moment we are moving files and directories around, it makes 
> harder to upstream our local patches or compare our version with
> the upstream one. I know well the reason for our current design, it
> used to be unstable in the past, but in recent years it's mostly
> unaltered.
> 

I'm going to perform this over the weekend.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJWak3UAAoJEEuzCOmwLnZsDKYP/jHaNcAmhBkgEkv+G/2ojdE4
3oiOTXQVrCfWL+6AaOx4OEqnmgnJ/o/DXbg9mkRmlq8a600omyX1C3R76V0lXluJ
qh2qVkI8n3QSFGXWKWrNMVr5bys5m1blKl6s+bMauEDKCo3LjGvirnIagfPB9VFo
jHgCpY8RL727bgQQH8uR2h9ZWyQSrgfjnYSV5IG9I39ugnCjnqEYnAALi8Am4LVP
ZWKkJx2Hkwbjp3zDKmCOYbjo3TLHhGWoNZfhfhoL3yZHtf1fDOGyfFxwe7PNI8cT
KyS66pT6R8qEAPI0mA7KTfSM4A/173jdaHRReQut6hOSSCwW8yswOYGtsSlHVvXh
3I0tqBnP0e1RqvwFXykUtT1vmJG6b+4dBLLUyJ9XbM9DZ8mkH00kjwWxTuuYo3zF
Anjeh2MGRqsAT2wMwcdxLxCj9UBp8Emy9voSUJxzhZ5vqkGQPzW9rjy8BT4eZ28n
nZGCAh9NHgeRJxilU9vthT1fVb4yIKnawcOojw74bVQtbY3O6gMj0gwJwmd3l0dt
QyxbN+z3N1tUWswgyYFl978X6RVvtxHFSEputtkgbQfMFsI6H40zYvcQwTxdzwc5
JDQY7p0sKZ4eNaXFROpPlbtvAx5EhpXQAJOZot1FnVgoTbIBKhxGxBTgbG2Ddb+h
XLWQCYyEIQGNlDLWz1Kz
=4JwC
-END PGP SIGNATURE-


Revamping directory structure of acpica

2015-10-21 Thread Kamil Rytarowski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I request to synchronize our in-tree directory structure with upstream
acpica.

At the moment we are moving files and directories around, it makes
harder to upstream our local patches or compare our version with the
upstream one. I know well the reason for our current design, it used
to be unstable in the past, but in recent years it's mostly unaltered.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJWKDYiAAoJEEuzCOmwLnZsj8oP+wcrUxdUWKf7GjSFiFiuNa+v
oauqByQURxZ4+DJUq+Ojv4H9S1H4ZMac/z7JHstGyZfVx/yeOOp3aNAN2KGp+xFS
7cipVWspWBZd+uSfa03ED/+2xqXyRyPHgo1F7s4u1+AW9oXU6jDJDFuoTRJJABGD
oYLqklM8BiYuB5AK93e6P3CCBRn1v9prubrpX4HPr1RHDZqWTe6j+XL5i60mFz1+
jfG6zMVKqcxsOYGAa6a7sfMJZJmh8mXzNxBaIeuv+zVrHfqNV3RrVu4KX8OPaNEI
9nUrKlLscrjqcvn7pICh4j1OpMvIyAhWf1zAZAeAgaC8QVKJ59KIOcRFFN1lO/nI
G36klUdMCazIEQrUQgRwZhAyVJUWV3LxOz06o9np/uV8pOlNOjSOrUDWSNjsa7Jk
OnW33lDv65mKMEzGlmzqhTmGCfRCGPqNwADDvvAh9X3Tgsq5oe/jyrpxfBORzzj+
r485s2isTbMQARKtuej+1CEPxhRwyIWibkIrARoQyLUA5e8WRIZKJGM7dyAsYtJV
LlN1tU/x3obPFUHxJ1N4KCv3Gtx8Dstld5vs9QYiQ3tNwFErgyZZ/yy0zeMYH7YA
r/pbSfbbPHSIXd6pAX/yY43gZUEpzdTiTT5dHmqhHV/CCIpN88jTdH99FkbEVyYN
I9962HGj+QCgvr1c7ksK
=Fvbb
-END PGP SIGNATURE-


Re: New sysctl entry: proc.PID.realpath

2015-09-30 Thread Kamil Rytarowski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 07.09.2015 03:50, Kamil Rytarowski wrote:
> I'm proposing a new sysctl(7) entry: proc.PID.realpath.
> 
> It's modeled after FreeBSD's kern.proc.pathname [1].
> 

For the reference, the missing sysctl was implemented by Christos
(thanks!) as KERN_PROC_PATHNAME.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJWDHA/AAoJEEuzCOmwLnZsLOMQAKRRJAFa9VMmemsS1QrLt8L/
H/eJGGGLaaULZP8r2C+GyVorfGhZXQEcdJRxFM33ZoXeaOxnG2E5uaO0xwjLJerk
zUKOLY/TDMDE1e4lmsXJpEvufKlaKtumropS11CkyJry4o2whMKv5HM3C1oQO7ar
ISbQguTcC6LP84/JrxaVegOdfPAy/7Ah/W2FYEM+O7PV30v33rJ3R4aNdGqAgULd
gIim/BitywVnXwaX88f+/iQ0dCi/+7CwcbyfGZeI8xjHgGL0J44OXATaq7ULxTQ1
IXCn/Q6SoV1vvevl1FGrzj2LnQqgMG7Ymmdv4iQY/5wW4Pnk3utZ55DqPeHDbikF
haloDZtQVRFkkPEDXxuGQuFofOQzN/W0gggNhcr8C0DnDWE72UFp6dDiX9EMdfYy
yV5pHh7rwAKk8xw6viCgQIY+gD3Lgx8MeiU6Vlys/uU0rHwNStnsGTfRuMhHALvT
Bex5Qsul+HBMzVNr0CWRjA8CZRou+rq1zkKL15/hq+HrXbDKm1Vs4lC41XbaqAT0
UAcBVz3tJ1/zu+O2h4+nX2RdVCv23CbAiYUu0etVs1mcxvhOPCVcvBpFR2IB178U
8YwRHoz75dR654vUVDm+MUim5rCTUfgZwWUvdEipmesu3E3QvGBLCUtH3iBuDUz4
YDneZ1HXCF4zU+kzwPBQ
=ne3G
-END PGP SIGNATURE-


Re: Preproc condition for GCC 2.x

2015-09-17 Thread Kamil Rytarowski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 18.09.2015 00:53, Michael McConville wrote:
> Joerg Sonnenberger wrote:
>> On Mon, Sep 14, 2015 at 11:15:34PM -0400, Michael McConville
>> wrote:
>>> I suspect that this preproc condition isn't necessary anymore?
>>> It's in sys/sys/device.h:246.
>> 
>> We require C99 support for the kernel, so no, just use FMA.
> 
> Would the devs be interested in a bulk patch to remove these
> conditions?
> 

It's reasonable.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV+0iIAAoJEEuzCOmwLnZszP8P/jakbV7cCWg2EIx/UFXy10xK
xk08lsdpFYxyuYB1XZ7Xe1PFZjhkHUgl1K7AEAN08+0zDjheXPptP170OpA0+5Bi
ad8v7Np/8h1Ja76V4u9EVhrU7HNYWSTrismIHeWDxukp56/7FhP6mPJFuhSNbTKq
RDDFvrkE78WBQjyIrJcig7IUWA+1LT60p4w7+3NSCPhnWRh5n/Ke6X1N0Y7cjQM/
CmqHFKZiCKazm+phJHXS+9IsHhqqzP2T6lxyfWwOsfqhrrSKUiI+tppFuIbv0SdD
YxXKg7xHO49e71PyTZCBeXq3SksG6venbP72MYlmPJa5Uhgrh4GIzqhOaMHbEetL
YSciYq1v3RMuX0yGeqiactQM2Y3fYREuLg2F2WtFxkvz1uwTTOY1ziJeg7Bvswgt
2ElYnJy9juMKm0p6+dfDTaFI6X/2tjFN/uTNpB2yc/rIJQZIQ9WfPEvc0SP7EHqa
PvYl38YHW/WZyJeOd4JSoXyUpluPNGzsfi66CUFjlXfW/aexI8Y/CkuVuj+p1qNZ
xvaACHvmycwVQslysbxNiXpeTWnMD6VE6METRU08lWIdguEPPyHX95vw4Df9/9j6
/NrMDqHVjY40+tiTrL06QKce0jsuzb6zVgVCbnLeZpp5vNovio2TPSVALuCWJKtN
4DmnH+HVKFyjkTN6DEKs
=MAIr
-END PGP SIGNATURE-


Re: New sysctl entry: proc.PID.realpath

2015-09-15 Thread Kamil Rytarowski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 07.09.2015 17:58, Jean-Yves Migeon wrote:
> Hello there,
> 
> Le 2015-09-07 12:24, Kamil Rytarowski a écrit :
>> I'm here to get the support for it. At the moment it (cache nits)
>> exceeds my comprehension too.
>> 
>> Are the other bits ok? KAUTH usage,
> 
> I wouldn't create an action/subaction (AUTH_PROCESS_REALPATH and 
> KAUTH_REQ_PROCESS_REALPATH_GET) specifically for this sysctl. I 
> think you could get this information through other code paths 
> combined with find(1) (like fstat(1)ing the process and find the 
> dev/inode associated with "text"). Adding access restrictions to 
> this sysctl means you have to kauth-audit the other paths too.
> 

Do you mean that if a user can access (fstat(1)) a file, then should
see its entry in the exec pathname in this sysctl(7)?

I was follow the rules of corename here.

>> colonization kern_resource.c etc.
> 
> Shouldn't it be in kern_proc.c?
> 

Perhaps yes, I was inspired by corename here too.

Thanks!
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV+IrnAAoJEEuzCOmwLnZs9+8P/2vkOFNQQHj5yvzzhFIrtCOD
IwAxlYISrKsRbte5sqmVXqC5N0smUsETfoSzjhqxJFOU4IiQQvPrQ5kQ33LEAPk9
RHg0Nrw0oDcMJS4ntBewWQnczOQ4ko/guWWQAA2E4HRWnEgrJf36LHQMGHfwvxmC
vIX+uYZj6ivGuGBk5Pr4J5iSQ5ms5q0y0Hj8bxRodQ0LJpBLhTGLopqe0Cd9S+oz
2jGL1LdQzrkOZQmBTbIjcvGhjzfc2YCKchfEGRpoM9PsqkY0UcD/5VRcoXU+RW/4
B1mx1+BH6F6fpi5IBMwrClzgG8eHUT02WIkARYmFywusLfC5P3+H7UtN/Plm+3LT
zSP75cKbUFKSKy6SCTvqxnx3YqwzAx+m++ieL1zLYqRMVi7W0ZESFswOvWZl4r2M
+LOUJBpU0gmg91NkjOXSPwNNKAKTDKb4C1VVmALvQzyCO1Q+Wahz3RfRg2myO98w
525B2Bx8a2xt8zFQcN18dZ9P6aweSpMvCiwMchGoHoVDJIH+/vw/ZacpIJRdiIml
rlXq62VbZl6PFKZzPOarl2W2R+frAnoNvhG1FE5PA21GSvyTVb4fTr1Gt38EMfw0
8Gpdk2bFlD0FRwRGSTuAiRv10GLpLDsq1il4Cawwr2BzT2y/dB7e49NjK6p0XD7D
1+vV8xx6HvmQ8XK6TpaI
=vseP
-END PGP SIGNATURE-


Re: New sysctl entry: proc.PID.realpath

2015-09-14 Thread Kamil Rytarowski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Thank you for your insight!

On 07.09.2015 13:32, Robert Elz wrote:
> Date:Mon, 07 Sep 2015 12:24:58 +0200 From:    Kamil 
> Rytarowski  Message-ID:  <55ed65fa.1000...@gmx.com>
> 
> | I'm here to get the support for it. At the moment it (cache nits)
> | exceeds my comprehension too.
> 
> What is the semantic you're hoping to provide?   The path that was 
> used to exec the process, or the path that would be used now to
> get the same one?   [The difference is if the binary has been
> (re)moved between when it was exec'd and now.]
> 

The path that was used to execute the process. sysctl(7) can fail
gracefully and its collapse is easier to maintain (through the proper
interface) then almost undefined behavior of /proc/PID/exe.

> The exec time path is easier, and must exist (or the exec would 
> have failed), the current path is more useful, but not guaranteed 
> to exist, and much harder to find.
> 

I'm completely OK with possible failure if the file was moved or removed
.

I don't see need to scan the disk for its possible new location or
name. I prefer to receive failure then get it unexpectedly located
somewhere else.

> Which is wanted depends upon what the real purpose of this is - if 
> it is just to replace the entry in /proc then before doing that
> ask what use that has - does anything really use it, and if so,
> for what, and how well does it work.
> 

I started to search for its usage (readlink over /proc/PID/exe):
1. gdb
2. lldb
3. valgrind
4. Chromium
5. tup
6. GNUStep-base
7. Wireshark
8. Firefox
9. Openglad
10. Alcextra
11. Cafu
12. Caster
13. Physfs
14. jruby-launcher
15. CDE

And then I stopped as there was a set of over 100 more Open Source
projects. [1]

What's the use of /proc/PID/exe, for this please let me cite the
comment from CDE [2]:

  // super hack!  if the program is trying to access the special
  // /proc/self/exe file, return perceived_program_fullpath if
  // available, or else cde-exec will ERRONEOUSLY return the path
  // to the dynamic linker (e.g., ld-linux.so.2).
  //
  // programs like 'java' rely on the value of /proc/self/exe
  // being the true path to the executable, in order to dynamically
  // load libraries based on paths relative to that full path!
  char is_proc_self_exe = (strcmp(filename, "/proc/self/exe") == 0);

  // another super hack!  programs like Google Earth
  // ('googleearth-bin') access /proc/self/exe as /proc//exe
  // where  is ITS OWN PID!  be sure to handle that case proper
ly
  // (but don't worry about handling cases where  is the PID of
  // another process).
  //
  // (again, these programs use the real path of /proc//exe as
  // a basis for dynamically loading libraries, so we must properly
  // 'fake' this value)
  char* self_pid_name = format("/proc/%d/exe", tcp->pid);

> Others have been suggesting that the use is for debuggers (gdb or 
> whatever). If that's it, and the only reason, then personally I'd 
> just forget it. If someone is debugging a process, but can't even 
> work out where the binary is (or is too lazy to type it) then I'd 
> suggest that they ought to just give up...
> 
> On the other hand, if it is so the debugger can verify that it is
working
> from the correct binary file, then the pathname isn't needed, just 
> the  pair, which is the true name of a unix file 
> (pathnames are just a human interface improvement ... and a bit of 
> added security). With just that, the debugger can verify that the 
> file running
(assuming that
> sysctl (or /proc) provide that info) is the one that the user told
them is
> to be debugged, and issue a warning (or whatever) if the don't 
> match.
> 
> Alternatively, perhaps there's some other use ?
> 
> kre
> 

This sysctl(7) happened to be handy for libproc, as a missing puzzle
for our userland DTrace support. I don't want to focus on every
legitimate usage, as there set of software using it is very wide; I
would focus now on rather providing the software -- in my case lldb(1)
- -- what it expects and let it process further.

To sum it up, the overview gave me the idea that proc.PID.realpath:
- - must be absolute path,
- - preferably canonicalized (links, '.' and '..' resolved) - if so then
call it .realpath, if not then .pathname (to be compatible with FreeBSD)
,
- - pathname of the execution time,
- - missing, moved, removed or replaced file - let the consumer of this
interface handle it on his or her own.

There are in-kernel issues with long path names, as we store limited
number of characters. I would leave this as it is, as the current
limits handle sane number of charac

Re: New sysctl entry: proc.PID.realpath

2015-09-07 Thread Kamil Rytarowski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 07.09.2015 10:47, Stephan wrote:
> Wasn´t this the same as with RPATH and the name cache?
> 
> 2015-09-07 9:23 GMT+02:00 Martin Husemann :
>> On Mon, Sep 07, 2015 at 03:50:21AM +0200, Kamil Rytarowski
>> wrote:
>>> + error = vnode_to_path(path, MAXPATHLEN,vp, l, p);
>> 
>> Two nits:
>> 
>> 1) vnode_to_path(9) is undocumented 2) it only works if you are
>> lucky (IIUC) - which you mostly are
>> 
>> The former is easy to fix, the latter IMHO is a killer before we
>> expose this interface prominently and make debuggers depend on
>> it.

I'm here to get the support for it. At the moment it (cache nits)
exceeds my comprehension too.

Are the other bits ok? KAUTH usage, colonization kern_resource.c etc.

>> We then should also make $ORIGIN work in ld.elf_so ;-}
>> 

It's scheduled.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV7WX4AAoJEEuzCOmwLnZso8YQAJerKuqnQ0rSgVtytEdxwmTK
mgZAZVQk40BTvoCU5ytD1ze3tbtbgvX9E7IQXgTHGLEXQE58EbNNWXH9Ou63OgvF
8xdsGCZqM137XbNwbQfktSJHdRsug2Gx8ph+8eUKEbnvzd7doCpOKCjhWeVmRI1S
uUC7YVUCJ2BzvuQrOvmtujpCULBFGz+2QpXC3POtEMfNc76t6SJ+Hizb3hMxDk93
4B6ByCv0KgG6PIVU7WwXwNz768KofDvfbYf/vA0V1hqAoXnr+I2bSTO5z6ct53NY
UAO5YtQLTSx5jCSbeAcGUKzo0CIG+bI67yVwx7+EoEALwBu0azMLSzb4+le02ZPP
2VX6oZXqMJHXy1YHUdaMo6IPIuGaEOmsyTaNXnOX+yEZgMfLjqp1u82mWR5JXsgt
axZ+Ze10rasqK+Drv0bj317TX24OyA5vPORA3co6sKl5f8DpxE5uB9aQLULttfm9
S/B4JF4NAx4ttQYw+g0IFPR0yj0vLoxhCgY76NEj9J6rci4rfX0JudJVHjJmW4ZA
eDL8ZrW14trldeqYgKgdxsLmaa+ZnRSo4bFTqWi+wiCqMKbQNp+5xAgJRTRS12DN
kET0UfK2MnUGErMTzeQQHfZa22d+3WPZjW8U7KDnVc2v5sIhQT4hh+80CqZWZ1Ld
cSFuv9thoY1ACc5wtPKG
=mvY+
-END PGP SIGNATURE-


New sysctl entry: proc.PID.realpath

2015-09-06 Thread Kamil Rytarowski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I'm proposing a new sysctl(7) entry: proc.PID.realpath.

It's modeled after FreeBSD's kern.proc.pathname [1].

$ sysctl proc.201.realpath
proc.201.realpath = /usr/pkg/bin/cscope
$ sysctl proc.1.realpath
sysctl: proc.1.realpath: Operation not permitted
$ sysctl proc.curproc.realpath
proc.curproc.realpath = /sbin/sysctl

This sysctl interface makes its use in debuggers, namely gdb(1) [2],
lldb(1) [3] and virtual machine runtimes (like dart-language).

The original NetBSD (and formerly FreeBSD) way of handling was to
readlink(2) of /proc/%d/exe (perhaps a Linux-like style):

char *
nbsd_pid_to_exec_file (struct target_ops *self, int pid)
{
  ssize_t len;
  static char buf[PATH_MAX];
  char name[PATH_MAX];

  xsnprintf (name, PATH_MAX, "/proc/%d/exe", pid);
  len = readlink (name, buf, PATH_MAX - 1);
  if (len != -1)
{
  buf[len] = '\0';
  return buf;
}

  return NULL;
}

I'm suggesting procfs independence here, because of a safer access
(/proc/%d/exe is world readlink(1)able) and not hardcoding of paths &
depending upon available /proc.

The kernel patch is attached to this mail.

[1] https://www.freebsd.org/cgi/man.cgi?sysctl%283%29
[2] src/external/gpl3/gdb/dist/gdb/nbsd-nat.c
[3]
https://github.com/llvm-mirror/lldb/blob/master/source/Host/freebsd/Host
InfoFreeBSD.cpp#L75
[4] https://github.com/dart-lang/sdk/issues/24302
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJV7O1aAAoJEEuzCOmwLnZsEOMP/imnjxm6itJC0GffGn+5ogdg
iNujdlK+WkcHyb/wq4WXXmidJbgGe2qV6HFstUjnev8NoGIFzAE7z8lnw16Xbjs1
gIh8IhjfrqGJpjN/72OYkMxjpPVRDmBpR/wP1ZZsODmTw4yAX6VtzV82BBUDcON2
YQrnacHhtUVMTKOpoP7XKVlfOGPHzT4XPlre5SzWH0lmsy2+TfC27m8hBasr69bq
lOMmvlernT7dVKmdR0KFm9NTiLe/R7R6lhSrfPs1YuguqI0jmPALvPYCICaYHdqP
sTDHVhqVOy35Py6h/NafGQDlMY+xX2Hgw/GebDHp9OcSMaiynkeUml9XNz4xeNpT
csnxbajtkabUsycYBvc6ekXENaokXC0rw+Pia9qMqCZm7JJJwYgIA5EeelFRLjyR
AvKk/sN1RCzFfO+a9FNSwIP0Oxf7N2w/3RVnHXfIkAGDmeTQ9VVGzrNLlofNp9JO
xgFL1DOoaYmmjjjNa+c+GSmAuzXNipAZdN8Nsq3if5KjK/KeJLXw4AYGPMl/n2ss
SJAYsK83mwxpzJToFo5EoQ8NEaw6qWPXtu9IAYFjGH1HIXwc/14PlNH/0Aeo/LIW
xKIh5qp4Dnki7NdbkHrSS8YalxkmCurAWpxPrcwFwKKveDy/Gzv3Ry9/ogC1Rbko
vj1zS7KxpyVtmP8UBBMA
=PrYO
-END PGP SIGNATURE-
Index: kern/kern_proc.c
===
RCS file: /public/netbsd-rsync/src/sys/kern/kern_proc.c,v
retrieving revision 1.193
diff -u -r1.193 kern_proc.c
--- kern/kern_proc.c12 Jul 2014 09:57:25 -  1.193
+++ kern/kern_proc.c6 Sep 2015 19:26:59 -
@@ -294,6 +294,7 @@
 
case KAUTH_PROCESS_CORENAME:
case KAUTH_PROCESS_STOPFLAG:
+   case KAUTH_PROCESS_REALPATH:
if (proc_uidmatch(cred, p->p_cred) == 0)
result = KAUTH_RESULT_ALLOW;
 
Index: kern/kern_resource.c
===
RCS file: /public/netbsd-rsync/src/sys/kern/kern_resource.c,v
retrieving revision 1.174
diff -u -r1.174 kern_resource.c
--- kern/kern_resource.c18 Oct 2014 08:33:29 -  1.174
+++ kern/kern_resource.c6 Sep 2015 21:22:52 -
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -898,6 +899,65 @@
 }
 
 /*
+ * sysctl_proc_realpath: helper routine to get or set the absolute pathname
+ * for a process specified by PID.
+ */
+static int
+sysctl_proc_realpath(SYSCTLFN_ARGS)
+{
+   struct proc *p;
+   struct sysctlnode node;
+   int error;
+   struct vnode *vp;
+   char path[MAXPATHLEN];
+
+   /* First, validate the request. */
+   if (namelen != 0 || name[-1] != PROC_PID_REALPATH) {
+   return EINVAL;
+   }
+
+   /* Find the process.  Hold a reference (p_reflock), if found. */
+   error = sysctl_proc_findproc(l, (pid_t)name[-2], &p);
+   if (error) {
+   return error;
+   }
+
+   /* XXX-elad */
+   error = kauth_authorize_process(l->l_cred, KAUTH_PROCESS_CANSEE, p,
+   KAUTH_ARG(KAUTH_REQ_PROCESS_CANSEE_ENTRY), NULL, NULL);
+   if (error) {
+   goto done;
+   }
+
+   error = kauth_authorize_process(l->l_cred,
+   KAUTH_PROCESS_REALPATH, p,
+   KAUTH_ARG(KAUTH_REQ_PROCESS_REALPATH_GET), NULL, NULL);
+   if (error) {
+   goto done;
+   }
+
+   vp = p->p_textvp;
+   if (vp == NULL) {
+   error = EINVAL;
+   goto done;
+   }
+
+   error = vnode_to_path(path, MAXPATHLEN,vp, l, p);
+   if (error) {
+   goto done;
+   }
+
+   node = *rnode;
+   node.sysctl_data = path;
+   node.sysctl_size = strlen(path) + 1;
+   error = sysctl_lookup(SYSCTLFN_CALL(&node));
+
+done:
+   rw_exit(&p->p_reflock);
+   return error;
+}
+
+/*
  * sysctl_proc_stop: helper routine for checking/setting the stop flags.
  */
 static int
@@ -1116,4 +1176,10 @@
   SYSCTL_DESCR("Stop process before completing exit"),
  

Re: Interrupt flow in the NetBSD kernel

2015-06-22 Thread Kamil Rytarowski
On 22.06.2015 19:07, David Young wrote:
> On Sun, Jun 21, 2015 at 08:01:47AM -0700, Matt Thomas wrote:
>>
>>> On Jun 21, 2015, at 7:30 AM, Kamil Rytarowski  wrote:
>>>
>>> I have got few questions regarding the interrupt flow in the kernel.
>>> Please tell whether my understanding is correct.
>>
>> You are confusing interrupts with exceptions.  Interrupts are 
>> asynchronous events.  Exceptions are (usually) synchronous and
>> are the result of an instruction.
> 
> I took Kamil's question to be, "When interrupts at the highest priority
> level are blocked, can control flow still be interrupted?  How?"  The
> answer to the question is yes.  Both synchronous events (exceptions,
> such as "data abort" on ARM) and asynchronous events (non-maskable
> interrupts, such as NMI on x86) can interrupt control flow.
> 
> Dave
> 

Thanks! I already realized it, it's worth to put it in locking(9).


Re: New manpage: locking(9)

2015-06-22 Thread Kamil Rytarowski
On 22.06.2015 18:43, Thor Lancelot Simon wrote:
> On Sat, Jun 20, 2015 at 03:18:48AM +0200, Kamil Rytarowski wrote:
>>
>> I see no reason to capitulate and drop the original naming, refreshed
>> for the current kernel design in favor of some invented linuxism.
> 
> You're going to cause massive confusion if you write documentation
> intended for kernel beginners that uses the terms "top half" and
> "bottom half" to mean something different than Linux means.  Like it or
> not, the Linux use of these terms is the prevalent one and has been for
> a decade or more.
> 
> Thor
> 

It's sufficient to describe what happens on the other penguin hemisphere
(where thing are upside down) and what are the differences with the
daemon side. The comparison will give the full picture.


Re: Interrupt flow in the NetBSD kernel

2015-06-21 Thread Kamil Rytarowski
On 21.06.2015 17:01, Matt Thomas wrote:
> 
>> On Jun 21, 2015, at 7:30 AM, Kamil Rytarowski  wrote:
>>
>> I have got few questions regarding the interrupt flow in the kernel.
>> Please tell whether my understanding is correct.
> 
> You are confusing interrupts with exceptions.  Interrupts are 
> asynchronous events.  Exceptions are (usually) synchronous and
> are the result of an instruction.
> 

Thank you for your clarification!


Interrupt flow in the NetBSD kernel

2015-06-21 Thread Kamil Rytarowski
I have got few questions regarding the interrupt flow in the kernel.
Please tell whether my understanding is correct.

There are software and hardware interrupts.
Part of the hardware interrupts are maskable with the spl(9) levels.
Some are unmaskable and must be handled unconditionally, like the
exception data abort from ARM.
Hardware interrupts are handled by the hardware interrupt handler.
System calls (syscalls) and softint(9) are software interrupts handled
by the same software interrupt handler.
Syscalls come from the userland with the user address space context,
softint(9) come from the kernel with kernel address space context.

The spl(9) calls mask maskable interrupts, both software and hardware
ones - with the exception to the unmaskable ones -- like data abort on ARM.

There are three contexts in the kernel:
- hardware interrupt (within hardware interrupt handler),
- software interrupt (within software interrupt handler) for syscalls
and softint(9),
- thread context for LWP (lightweight processes).

Bottom half (BSD naming) is responsible for the hardware interrupts, top
half (BSD naming) is responsible for the software and thread contexts.

Process is heavy with user address space oneness running in the
user-space, thread is lightweight with shared kernel address space for
all threads. Kernel can access the whole physical memory, but doesn't
know the user address mapping. There is one process running in the
kernel address space -- proc0 = swapper.

How physically works the spl(9) interrupt masking for software
interrupts? On ARM svc (or monitors) aren't maskable, like IRQ
(exception), a type of (ARM naming) exception and (kernel naming)
hardware interrupt.

I'm trying to get the big picture first, before getting to details.

When I look into details, I don't get the things, like the line 268
here:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/arm/arm32/exception.S?annotate=1.17.2.2
Is it a leftover from line 252 and should be erased?

Back to the big picture. How technically works IPL_SOFT, does it mask
syscalls and softint(9) the same way? If it's not maskable (to my
understanding) are we scheduling it in some sort of queue or stack
waiting for the spl(9) level change?


Re: New manpage: locking(9)

2015-06-19 Thread Kamil Rytarowski
On 19.06.2015 01:06, Paul Goyette wrote:
> Great to have this.
> 
> I've attached a diff file with some minor wording/grammar changes.
> 

Thank you.

When I will make progress in my research I will drop for review new version.

I will add notes about the halves anyway, as I can learn from it in the
Design and Implementation of the 4.4 BSD Operating System book. The same
term was used in the continuation for FreeBSD (from 2003) and the 2nd
FreeBSD edition from 2014/2015.

I see no reason to capitulate and drop the original naming, refreshed
for the current kernel design in favor of some invented linuxism.

I will write more information about contexts and interrupt handling.

> WRT to the table of "applicability", I'm not sure I like having it say
> 
> mutex(9)yes depends   depends
> 
> Can we maybe specify the dependency?  Perhaps
> 
> mutex(9)yes ???   spin-mutex only
> 
> 

Right, I will make it more clear.

I can rename softirq and hardirq to full names, as these shortcuts
aren't used on NetBSD, what do you think?


Re: New manpage: locking(9)

2015-06-19 Thread Kamil Rytarowski
On 18.06.2015 22:33, Christos Zoulas wrote:
> In article 
> ,
> Kamil Rytarowski  wrote:
>> -=-=-=-=-=-
>>
>> I'm attaching a proposition of locking(9).
>>
>> It was inspired by:
>> http://leaf.dragonflybsd.org/cgi/web-man?command=locking§ion=ANY
>> https://www.freebsd.org/cgi/man.cgi?query=locking%289%29
>>
>> And by this page:
>> http://www.feyrer.de/NetBSD/bx/blosxom.cgi/nb_20080409_0027.html
>>
>> I included some extra notes about the kernel design and contexts:
>> - thread context vs softirq context vs hardirq context,
>> - process vs kernel thread (LWP),
>> - top kernel half vs bottom kernel half.
>>
>> These details might be off topic, but I need them to understand the
>> overall design and the internal flow.
> 
> That's very nice. I would like to include information on what is the
> typical use for each one and also which ones are obsolete. I also think
> that *tsleep should be included in the docs (at least saying that it has
> been replaced by condvars).
> 

I will do it.

I was told that the mb(9) interface deprecated I don't know why? And
indeed, I see it just on a few archs.

$ grep -r 'mb_write()' .
./arch/hppa/include/mutex.h:mb_write();

$ grep -r 'mb_memory()' .
./arch/alpha/include/mutex.h:#defineMUTEX_GIVE(mtx) 
mb_memory()
./arch/hppa/include/mutex.h:mb_memory();
./arch/m68k/include/mutex.h:#define MUTEX_GIVE(mtx) 
mb_memory()
./arch/mips/include/lock.h: mb_memory();
./arch/mips/include/lock.h: mb_memory();
./arch/powerpc/include/mutex.h:#define  MUTEX_GIVE(mtx) 
mb_memory()

$ grep -r 'mb_read()' .
./arch/alpha/include/mutex.h:#defineMUTEX_RECEIVE(mtx)  
mb_read()
./arch/m68k/include/mutex.h:#define MUTEX_RECEIVE(mtx)  
mb_read()
./arch/mips/include/lock.h: mb_read();
./arch/mips/include/lock.h: mb_read();
./arch/powerpc/include/mutex.h:#define  MUTEX_RECEIVE(mtx)  
mb_read()
./arch/sparc/include/lock.h:mb_read();
./arch/sparc64/include/mutex.h:#define  MUTEX_RECEIVE(mtx)  
mb_read()
./arch/sparc64/include/rwlock.h:#define RW_RECEIVE(rw)  
mb_read()



Re: New manpage: locking(9)

2015-06-18 Thread Kamil Rytarowski
> I'm attaching a proposition of locking(9).
New version attached.

Changes:
1. I was told that kernel halves are not used in NetBSD.
2. ras(9) is for userland only, remove it from USAGE.

There are additional things to be done in intro(9):
1. Remove reference to dropped lock(9).
2. Include reference to pserialize(9).
.\" $NetBSD$
.\"
.\" Copyright (c) 2015 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Kamil Rytarowski.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"notice, this list of conditions and the following disclaimer in the
.\"documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.Dd June 17, 2015
.Dt LOCKING 9
.Os
.Sh NAME
.Nm locking
.Nd introduction to the kernel synchronization and interrupt control
.Sh DESCRIPTION
The
.Nx
kernel provides several synchronization and interrupt control primitives.
This manpage aims at giving an overview of these interfaces and their proper
application.
This document includes also basic kernel thread control primitives and rough
overview of the
.Nx
kernel design.
.Sh KERNEL OVERVIEW
The aim of the kernel synchronization, kernel thread and interrupt control is:
.Bl -bullet -offset indent
.It
To control concurrent access to shared resources (critical sections).
.It
Spawn tasks from an interrupt in the thread context.
.It
Mask interrupts from threads.
.It
Scale to multiple CPUs.
.El
.Pp
There are three types of contexts in the
.Nx
kernel:
.Bl -bullet -offset indent
.It
.Em Thread context
- here run processes (represented by
.Dv struct proc )
and light-weight processes (represented by
.Dv struc lwp
and known as kernel threads).
Code in this context can sleep, block resources and posses address-space.
.It
.Em Software interrupt context
- it's limited thread context.
Code in this context must be processed shortly.
These interrupts don't possess any address space context.
Software interrupts are a way of deferring hardware interrupts to do more
expensive processing at a lower interrupt priority.
.It
.Em Hard interrupt context
- code must be processed as quickly as possible.
It's forbidden for a code here to sleep or access long-awaited resources.
.El
.Pp
The main differences between processes and kernel threads are:
.Bl -bullet -offset indent
.It
Single process can own multiple kernel threads (LWPs).
.It
Process possesses address space context to map userland address space.
.It
Processes are designed for userland executables and kernel threads for
in-kernel tasks.
The only process running in the kernel-space is
.Dv proc0
(called swapper).
.El
.Sh INTERFACES
The
.Nx
kernel is written to run across multiple unicore and multicore CPUs.
The following lists lists alphabetically.
.Ss Atomic memory operations
The
.Nm atomic_ops
family of functions provide atomic memory operations.
There are 7 classes of atomic memory operations available:
addition, logical
.Dq and ,
compare-and-swap, decrement, increment, logical
.Dq or ,
swap.
.Pp
See
.Xr atomic_ops 3 .
.Ss Condition variables
Condition variables (CVs) are used in the kernel to synchronize access to
resources that are limited (for example, memory) and to wait for pending I/O
operations to complete.
.Pp
See
.Xr condvar 9 .
.Ss Memory access barrier operations
The
.Nm membar_ops
family of functions provide memory access barrier operations necessary for
synchronization in multiprocessor execution environments that have relaxed load
and store order.
.Pp
See
.Xr membar_ops 3 .
.Ss Memory barriers
The memory barriers can be used to control the order in which memory accesses
occur, and thus the order in which those accesse

New manpage: locking(9)

2015-06-18 Thread Kamil Rytarowski
I'm attaching a proposition of locking(9).

It was inspired by:
http://leaf.dragonflybsd.org/cgi/web-man?command=locking§ion=ANY
https://www.freebsd.org/cgi/man.cgi?query=locking%289%29

And by this page:
http://www.feyrer.de/NetBSD/bx/blosxom.cgi/nb_20080409_0027.html

I included some extra notes about the kernel design and contexts:
- thread context vs softirq context vs hardirq context,
- process vs kernel thread (LWP),
- top kernel half vs bottom kernel half.

These details might be off topic, but I need them to understand the
overall design and the internal flow.

Please review!.\" $NetBSD$
.\"
.\" Copyright (c) 2015 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Kamil Rytarowski.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"notice, this list of conditions and the following disclaimer in the
.\"documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.Dd June 17, 2015
.Dt LOCKING 9
.Os
.Sh NAME
.Nm locking
.Nd introduction to the kernel synchronization and interrupt control
.Sh DESCRIPTION
The
.Nx
kernel provides several synchronization and interrupt control primitives.
This manpage aims at giving an overview of these interfaces and their proper
application.
This document includes also basic kernel thread control primitives and rough
overview of the
.Nx
kernel design.
.Sh KERNEL OVERVIEW
The aim of the kernel synchronization, kernel thread and interrupt control is:
.Bl -bullet -offset indent
.It
To control concurrent access to shared resources (critical sections).
.It
Spawn tasks from the bottom half in the thread context.
.It
Scale to multiple CPUs.
.El
.Pp
There are three types of contexts in the
.Nx
kernel:
.Bl -bullet -offset indent
.It
.Em Thread context
- here run processes (represented by
.Dv struct proc )
and light-weight processes (represented by
.Dv struc lwp
and known as kernel threads).
Code in this context can sleep, block resources and posses address-space.
.It
.Em Software interrupt context
- it's limited thread context.
Code in this context must be processed shortly.
These interrupts don't possess any address space context.
Software interrupts are a way of deferring hardware interrupts to do more
expensive processing at a lower interrupt priority.
.It
.Em Hard interrupt context
- code must be processed as quickly as possible.
It's forbidden for a code here to sleep or access long-awaited resources.
.El
.Pp
There are two model halves in the
.Nx
kernel:
.Bl -bullet -offset indent
.It
.Em Top half
- here runs code in the thread and soft interrupt context.
This half may be treated as a library for the userland processes.
.It
.Em Bottom half
- here runs code in the hard context.
The bottom half handles asynchronous hardware interrupts.
.El
.Pp
The main differences between processes and kernel threads are:
.Bl -bullet -offset indent
.It
Single process can own multiple kernel threads (LWPs).
.It
Process possesses address space context to map userland address space.
.It
Processes are designed for userland executables and kernel threads for
in-kernel tasks.
The only process running in the kernel-space is
.Dv proc0
(called swapper).
.El
.Sh INTERFACES
The
.Nx
kernel is written to run across multiple unicore and multicore CPUs.
The following lists lists alphabetically.
.Ss Atomic memory operations
The
.Nm atomic_ops
family of functions provide atomic memory operations.
There are 7 classes of atomic memory operations available:
addition, logical
.Dq and ,
compare-and-swap, decrement, increment, logical
.Dq or ,
swap.
.Pp
See
.Xr atomic_ops 3 .
.Ss Condition variables
Condition variables (

Re: Removing ARCNET stuffs

2015-05-31 Thread Kamil Rytarowski
Antti Kantee wrote:
> On 31/05/15 06:05, matthew green wrote:
> > hi Andrew! :)
> >
> >> Who is appalled to discover that pc532 support has been removed!
> 
> In addition to toolchain support, the hardware was near-extinct at the 
> time of removal.
> 
> Now, the hardware is no longer near-extinct:
> http://cpu-ns32k.net/
> 
> I used the FPGA pc532 running NetBSD 1.5.x(?) a few weeks back. 
> Unbelievable experience, especially since I spent quite some time and 
> effort trying to get a pc532 I had on loan 10+ years ago to function.
> 
> > get your GCC and binutils and GDB pals to put the support back
> > in the toolchain and we'll have something to talk about :-)
> 
> Didn't know that things to *talk* about were short in supply...
> 

I was looking for the so called open-source resources of pc532. Can we
put it somewhere on the web again? Last time I was checking (not so long
ago) and they were off-line.

At the moment I'm focused on acorn26, I obtained RISCOS. This port should
be saved to put fun into the computing. I work at work with newer
ARM CPUs, but the basic ideas are the same in ARMv2/v3 (IRQ, FIQ, RISC,
etc) - this isn't only fun but also good for learning - even though I
will be just in the platform emulator.

Personally (sorry for this) I don't like pcc, so I started clank [1], a
clang/llvm clone in plain C with the goal of minimal possible footprint,
supporting exclusively the C language, being portable (-lnbcompat) and
with interchangeable with clang/llvm compiler parts - being as close
to the original as possible to track upstream...
I want to save sun2, deC++ base (as the MK option) and in general get
acquainted with the llvm internals. This is rather long-term project
just in a spare time. Lost of time? Probably, but even in this very
early stage I can catch places for enhancement in the big-brother
clang/llvm [2].

David, forking NetBSD? Usually teams are more powerful than individuals
and they win at end. Forks are valuable only then when they will attract
new people and submit patches back to upstream -- this is a relation
between DragonflyBSD and FreeBSD. There is also EdgeBSD [3] a good project
to prototype features in git.

I can see a market share in desktops, I belive we need better support for
recent DE, graphical installer (calamares.io is my choice for a NetBSD
livecd with preinstalled packages). As someone already stated, if we
aren't on desktops we aren't anywhere else. I saw that companies put
Ubuntu (ubuntu-core) even on your PCI peripherals, just because people are
familiar with this system. This is the reason why I'm for winning back
desktop.

What to do to make the project easier for newcomers - from a stand point
of a new wannabe developer? Improving the contribution platform - well,
I am aware of the fact that people are used to the same techniques like
in nineties and have the standards set in stone.

I can see the following problems:
- no central location for patches -- several mailing-lists, PR,
  private-mails..
- no way to track pending patches -- except pinging developers..
- no standard patch format - extra work on developers to maintain it

My proposition is (expressed already before) to set official GitHub
mirror and accept there patches and issues at dedicated git branches.
I almost stopped to contribute my patches to projects if they are outside
GitHub, mailing lists are extra burden to register an account, track
traffic, spam and general noise. With the GitHub platform the cost of
maintainership will be higher on start and quite low later. No need
for extra internal infrastructure except scheduled cvs2git script.

[1] https://github.com/krytarowski/clank
[2] 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150209/258953.html
[3] http://edgebsd.org/


Re: Removing ARCNET stuffs

2015-05-30 Thread Kamil Rytarowski
Johnny Billquist
> On 2015-05-28 21:19, Tom Ivar Helbekkmo wrote:
> > paul_kon...@dell.com writes:
> >
> >> And DECnet nodes exist around the Internet; the “Hobbyist DECnet”
> >> group (“hecnet”) is the main focus of that activity as far as I know.
> >
> > ...and while I'm sure Johnny Billquist can supply more details, and
> > correct me if I'm wrong, DECnet on NetBSD seems to me to be an active
> > component of the hecnet environment.
> 
> Nope. NetBSD do not run DECnet. I run a bridge program, which I 
> initially developed on NetBSD, but it runs on pretty much anything.
> 
> I also hack NetBSD/VAX on and off, but it's becoming more and more "off" 
> with every new development within NetBSD. But that's a different story.
> 
> Oh, and DECnet on Linux is not so great either, and I believe it has 
> been dropped from the main tree.
> But if anyone wants to try and get NetBSD to talk DECnet, Paul and me 
> can certainly help in many ways.
> 

I expressed my concerns regarding DECNet removal to Ryota in a private mail.
I regret that I don't have enough resources to keep this code in the repos.



Re: (patch) Improved documentation and examples of dynamic modules

2015-05-13 Thread Kamil Rytarowski
Paul Goyette wrote:
>
> I have added the EXAMPLES sections.
> 

Thank you!

If it will be pulled for -7 then please change: "first appeared in NetBSD 8.0" 
to 7.0 in src/sys/modules/examples/README.


Re: (patch) Improved documentation and examples of dynamic modules

2015-05-13 Thread Kamil Rytarowski
Joerg Sonnenberger wrote:
> On Wed, May 13, 2015 at 01:52:05PM +0200, Kamil Rytarowski wrote:
> > A FreeBSD developer studying our examples told me that we traditionally use
> > u_int in place of unsigned [int] - calling the usage of 'unsigned' a 
> > linuxism.
> > I have no opinions on it. How is it?
> 
> Personally, I consider the use of u_int a historic mistake, but it is
> wildly used. Why the use of plain standard C is a Linuxism is beyond me.
> 

I sympathize with being free to use ISO C.

> > I think that src/sys/modules/examples/ping/cmd_ping.c has malformed RCSID,
> > there is missing ':' after NetBSD.
> 
> More like there should be no space. Fixed.
> 
Thanks!


The last thing, the original patch contained the following lines:
.Sh EXAMPLES
A list of example modules is located in
.Pa sys/modules/examples .

Please embed them (before the SEE ALSO section) in module.9,
driver.9 and intro.9lua this way:

.Sh EXAMPLES
A list of example modules is located in
.Pa sys/modules/examples .


.Sh EXAMPLES
A list of example drivers is located in
.Pa sys/modules/examples .


.Sh EXAMPLES
A list of example Lua modules is located in
.Pa sys/modules/examples .


There will be more examples, but not for the netbsd-7 branch (at least from me).


Re: (patch) Improved documentation and examples of dynamic modules

2015-05-13 Thread Kamil Rytarowski
Paul Goyette wrote:
> On Mon, 11 May 2015, Kamil Rytarowski wrote:
> 
> > I'm OK with this.
> > Could you please integrate my patch this way with src/sys?
> 
> The modules have been committed.
> 

Many thanks!

A FreeBSD developer studying our examples told me that we traditionally use
u_int in place of unsigned [int] - calling the usage of 'unsigned' a linuxism.
I have no opinions on it. How is it?

I can see both variations used in the kernel.

I think that src/sys/modules/examples/ping/cmd_ping.c has malformed RCSID,
there is missing ':' after NetBSD.

> > I'm attaching a small patch for the man-pages. It's:
> > - adding new linked pages for MODULE(9) -> module(9), modcmd(9) -> module(9)
> >  and uio(9) -> uiomove(9),
> > - including lua(9lua) reference in module(9).
> 
> I'm not so sure about the extra links.  Certainly you should be able to 
> find the right man pages with apropos?
> 
> 'apropos -C -9 MODULE'first result it module(9)
> 'apropos modcmd'  returns only one result - module(9)
> 'apropos uio' first returned result is uiomove(9)
> 

I wrote an article and I add notes $function($SECTION) to each function
or structure, pointing out that there is documentation. The only missing
parts (not including Lua) are: MODULE, modcmd, uio and devsw_attach +
devsw_detach.

For the devsw pair please review and integrate the attached file.

It will be everything for now, of course excluding the Lua modules.

> I think it will be reasonable to add the lua cross reference.  But it 
> goes at the the end of the list.  Sort order for SEE ALSO used the 
> section number as the most significant sort field, then lexicograhpic 
> (alphabetic) sort on the name.  :)
> 

I see, my mistake!

> > I have a request to back-port it to NetBSD-7 as I want to depend on their
> > existence. Thanks!
> 
> I have requested a pull-up for 7.0.
> 

Thank you!

If the man-pages will be enhanced please include them in the pull-up request.

devsw_attach.9
Description: Binary data


Re: (patch) Improved documentation and examples of dynamic modules

2015-05-11 Thread Kamil Rytarowski
Paul Goyette wrote:
> On Mon, 11 May 2015, Kamil Rytarowski wrote:
> 
> > I've attached new patch.
> 
> Thanks - I will review as soon as I can get to it.
> 

Thank you!

> > For now I will schedule share/man pages for later.
> > After committing it please remove sys/modules/example.
> >
> > I went for share/examples/sys/kmodule.
> 
> Well, I've done some more thinking on that topic!  Greg Troxel has a 
> good comment about continuing to build all the example modules as part 
> of normal 'build.sh release'.  In order to do that, and to get all the 
> build goop right (Makefiles, Makefiles.inc, etc.) it will actually be 
> easier to place them in the sys/modules/ hierarchy.  So, one might say 
> that I've "had second thoughts."
> 
> I would now propose that we remove the existing sys/modules/example/ 
> code and place the several new example modules as subdirectories of the
> sys/modules/example/ directory.  Then the main sys/modules/Makefile can 
> descend into modules/examples, and a new Makefile there can further 
> descend into the multiple subdirectories.
> 
> I'll wait a few days (at least) before committing anything, to give the 
> rest of the community an opportunity to respond.
> 
> 

I'm OK with this.
Could you please integrate my patch this way with src/sys?


I'm attaching a small patch for the man-pages. It's:
- adding new linked pages for MODULE(9) -> module(9), modcmd(9) -> module(9)
  and uio(9) -> uiomove(9),
- including lua(9lua) reference in module(9).

I have a request to back-port it to NetBSD-7 as I want to depend on their
existence. Thanks!Index: distrib/sets/lists/comp/mi
===
RCS file: /home/kamil/netbsd-cvs/netbsd/src/distrib/sets/lists/comp/mi,v
retrieving revision 1.1957
diff -u -r1.1957 mi
--- distrib/sets/lists/comp/mi  6 May 2015 15:57:07 -   1.1957
+++ distrib/sets/lists/comp/mi  11 May 2015 12:24:20 -
@@ -9741,6 +9741,7 @@
 ./usr/share/man/cat9/MGET.0comp-sys-catman .cat
 ./usr/share/man/cat9/MGETHDR.0 comp-sys-catman .cat
 ./usr/share/man/cat9/MH_ALIGN.0comp-sys-catman 
.cat
+./usr/share/man/cat9/MODULE.0  comp-sys-catman .cat
 ./usr/share/man/cat9/M_ALIGN.0 comp-sys-catman .cat
 ./usr/share/man/cat9/M_COPY_PKTHDR.0   comp-sys-catman .cat
 ./usr/share/man/cat9/M_LEADINGSPACE.0  comp-sys-catman .cat
@@ -10529,6 +10530,7 @@
 ./usr/share/man/cat9/microseq.0comp-sys-catman 
.cat
 ./usr/share/man/cat9/microtime.0   comp-sys-catman .cat
 ./usr/share/man/cat9/microuptime.0 comp-sys-catman .cat
+./usr/share/man/cat9/modcmd.0  comp-sys-catman .cat
 ./usr/share/man/cat9/module.0  comp-sys-catman .cat
 ./usr/share/man/cat9/module_autoload.0 comp-sys-catman .cat
 ./usr/share/man/cat9/module_builtin_require_force.0 comp-sys-catman.cat
@@ -11001,6 +11003,7 @@
 ./usr/share/man/cat9/ubc_uiomove.0 comp-sys-catman .cat
 ./usr/share/man/cat9/ucas.0comp-sys-catman .cat
 ./usr/share/man/cat9/ucom.0comp-sys-catman .cat
+./usr/share/man/cat9/uio.0 comp-sys-catman .cat
 ./usr/share/man/cat9/uiomove.0 comp-sys-catman .cat
 ./usr/share/man/cat9/ungetnewvnode.0   comp-sys-catman .cat
 ./usr/share/man/cat9/untimeout.0   comp-sys-catman .cat
@@ -16554,6 +16557,7 @@
 ./usr/share/man/html9/MGET.htmlcomp-sys-htmlman
html
 ./usr/share/man/html9/MGETHDR.html comp-sys-htmlmanhtml
 ./usr/share/man/html9/MH_ALIGN.htmlcomp-sys-htmlmanhtml
+./usr/share/man/html9/MODULE.html  comp-sys-htmlmanhtml
 ./usr/share/man/html9/M_ALIGN.html comp-sys-htmlmanhtml
 ./usr/share/man/html9/M_COPY_PKTHDR.html   comp-sys-htmlmanhtml
 ./usr/share/man/html9/M_LEADINGSPACE.html  comp-sys-htmlmanhtml
@@ -17307,6 +17311,7 @@
 ./usr/share/man/html9/microseq.htmlcomp-sys-htmlmanhtml
 ./usr/share/man/html9/microtime.html   comp-sys-htmlmanhtml
 ./usr/share/man/html9/microuptime.html comp-sys-htmlmanhtml
+./usr/share/man/html9/modcmd.html  comp-sys-htmlmanhtml
 ./usr/share/man/html9/module.html  comp-sys-htmlmanhtml
 ./usr/share/man/html9/module_autoload.html comp-sys-htmlmanhtml
 ./usr/share/man/html9/module_builtin_require_force.html comp-sys-htmlman html
@@ -17768,6 +17773,7 @@
 ./usr/share/man/html9

Re: (patch) Improved documentation and examples of dynamic modules

2015-05-11 Thread Kamil Rytarowski
Paul Goyette wrote:
>
> OK, I looked at all of the examples, except for the lua one (I am 
> lua-clueless).  Everything looks OK to me, and I don't see any reason 
> not to commit these.
> 
> I do, however, have a request...
> 
> The "happy" module makes a claim that "4 digit numbers cannot cycle", 
> and uses a cache[] table for all numbers below 1000.  Can you please 
> provide a reference to back up the "cannot cycle" claim?  :)  And please 
> initialize (or reinitialize) the entire cache[] array in your modcmd's 
> INIT routine, rather than a single static initialization.  (It could 
> make a difference if the module were ever "built-in" to a kernel ...)
> 
> Also, for all tests that create a cdevsw, please insert a comment to 
> ensure that the reader should verify that the number should be changed 
> if needed to avoid conflict with "real" devices.  And maybe also note 
> this in the README file?  Definitely include a comment that these 
> modules should not be used on a "production" machine, just in case...
> 
> Finally, I would appreciate it if other folks would weigh in on the 
> question of the directory hierarchy into which these examples should be 
> placed...
> 

I've attached new patch.

For now I will schedule share/man pages for later.
After committing it please remove sys/modules/example.

I went for share/examples/sys/kmodule.Index: share/examples/sys/kmodule/README
===
RCS file: share/examples/sys/kmodule/README
diff -N share/examples/sys/kmodule/README
--- /dev/null   1 Jan 1970 00:00:00 -
+++ share/examples/sys/kmodule/README   11 May 2015 09:47:18 -
@@ -0,0 +1,58 @@
+   $NetBSD: $
+
+   Kernel Developer's Manual
+
+DESCRIPTION
+ The kernel example dynamic modules.
+
+ This directory contains the following example modules:
+ * hello   - the simplest `hello world' module
+ * properties  - handle incoming properties during the module load
+ * happy   - basic implementation of read(9) with happy numbers
+ * ping- basic ioctl(9)
+ * hellolua- the simplest `hello world' Lua module
+
+ To build the examples you need a local copy of NetBSD sources. You also
+ need the comp set with toolchain. To build the module just enter a
+ directory with example modules and use make(1):
+
+ # make
+
+ To load, unload and stat the module use modload(8), modunload(8) and
+ modstat(8).
+
+ The S parameter in the Makefile files points to src/sys and it can be
+ overloaded in this way:
+
+ # make S=/data/netbsd/src/sys
+
+ The code of a module does not need to be in src/sys unless you use
+ the autoconf(9) framework.
+
+ A cross-built of a module for a target platform is possible with the
+ build.sh framework. You need to generate the toolchain and set
+ appropriately PATH to point bin/ in the TOOLDIR path. An example command
+ to cross-build a module with the amd64 toolchain is as follows:
+
+# nbmake-amd64 S=/data/netbsd/src/sys
+
+
+ The example modules should not be used on a production machine.
+
+ All modules that create a cdevsw should be verified that the major number
+ should not conflict with a real device.
+
+SEE ALSO
+ lua(9lua), modctl(2), modload(8), module(7), module(9), modstat(8),
+ modunload(8)
+
+HISTORY
+ An example of handling incoming properties first appeared in NetBSD 5.0
+ and was written by Julio Merino with further modifications by Martin
+ Husemann, Adam Hamsik, John Nemeth and Mindaugas Rasiukevicius.
+
+ This document and additional modules (hello, happy and ping, hellolua)
+ first appeared in NetBSD 8.0 and they were written by Kamil Rytarowski.
+
+AUTHORS
+ This document was written by Kamil Rytarowski.
Index: share/examples/sys/kmodule/happy/Makefile
===
RCS file: share/examples/sys/kmodule/happy/Makefile
diff -N share/examples/sys/kmodule/happy/Makefile
--- /dev/null   1 Jan 1970 00:00:00 -
+++ share/examples/sys/kmodule/happy/Makefile   11 May 2015 09:41:59 -
@@ -0,0 +1,7 @@
+#  $NetBSD: Makefile,v 1.2 2008/02/10 10:51:18 jmmv Exp $
+
+S?=/usr/src/sys
+KMOD=  happy
+SRCS=  happy.c
+
+.include 
Index: share/examples/sys/kmodule/happy/happy.c
===
RCS file: share/examples/sys/kmodule/happy/happy.c
diff -N share/examples/sys/kmodule/happy/happy.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ share/examples/sys/kmodule/happy/happy.c11 May 2015 10:05:58 -
@@ -0,0 +1,182 @@
+/* $NetBSD: $  */
+
+/*-
+ * Copyright (c) 2015 The NetBSD Foundation, Inc.
+ * All rights reserved.
+ 

Re: (patch) Improved documentation and examples of dynamic modules

2015-05-10 Thread Kamil Rytarowski
Greg Troxel wrote:
> "Kamil Rytarowski"  writes:
> 
> >> Also, "happy" doesn't seem like a useful name; examples should have
> >> names that suggest the kinds of things they do with respect to the
> >> module system, to guide people choosing which ones to read.
> >
> > Well, I like /dev/happy for Happy Number generator. What would be the
> > better name?
> 
> random_generator?  The point is that someone looking at examples does
> not care what particular generator you are using; the point is about the
> glue code.  It would be fair, arguably better, to have a "random
> generator" that emits sequential numbers. Anything other than module
> glue is a distraction.
> 

For this reason, I took one of the simplest integral sequences and for the
joy of math I would like to stick to it.

It's common to use existing algorithms in basic examples. like Fibonacci
numbers, primes etc.

http://www.home.unix-ag.org/bmeurer/NetBSD/howto-lkm.html uses Fibonacci.
Personally I have a reason to not use it in my patch, I will eventually
unveil why after a month or so. Thanks!

> >> > I like it. Maybe place them in `share/examples/kmodules'?
> >> 
> >> share/examples/sys/modules would be better; kmodules is not the name of
> >> a program in base and would be confusing to someone expecting examples
> >> to be about programs in base.
> >
> > kmodule was inspired by (/usr/share/mk/)bsd.kmodule.mk and it was so
> > natural for me to pick up this name. Maybe share/examples/sys/kmodule?
> 
> I am ok with sys/kmuodules (if this is mis-installed in share/examples).
> 

Therefore share/examples/sys/kmodules (plural form)?

> > I'm improving this single example adding to it 4 additional modules and
> > README. There is no revolution, except moving the original example to a
> > new location.
> 
> So why not just example-foo, example-bar, etc.
> 

Modules named 'example' in a directory 'examples' aren't verbose to me
(is it a pleonasm?).

I'm listing them in README this way:
* hello   - the simplest `hello world' module
* properties  - handle incoming properties during the module load
* happy   - basic implementation of read(9) with happy numbers
* ping- basic ioctl(9)
* hellolua- the simplest `hello world' Lua module

The description of what they do is in the README file, not embedded in
their names. I designed it minimal and funny (doing something
quasi-functional) for each case :) It's a matter of taste, but a module dots
with device /dev/dots printing ".\n" would be enough too, but (to me) boring.
With happy numbers people can play a bit with it, easily change the algorithm
to something else (like redefining happy (1) and sad number (4)) and checking
results just for the fun of experimenting with read(4).

> > I'm trying to improve the documentation and add more enlightening examples
> > for newcomers (like myself!).
> 
> > I'm aware of the fact that experienced kernel programmers familiar with
> > NetBSD internals know how to handle all kinds of things, dependencies and
> > for them following 'ab uno disces omnis' (single example just parsing the
> > parameters) is sufficient.
> 
> > Please let it be easier for people starting to hack in the kernel from
> > the modules (people like me) and not assuming they mastered the kernel
> > internals.
> 
> That's great that you are spiffin gup docs and I didn't mean to be
> critical of it.  I just meant that 1) examples belong in the source tree
> 2) example name should be about the kind of function and 3) examples
> should have minimal semantic distraction besides their glue functions.
> So for example a fs transformation module could be gzip or just
> rot13, with just barely enough code to see that it is doing something,
> and a generator could just return 0, 1, 2, 3, 4, etc. enough that you
> can tell it is working.
> 

I understood your argument.

I plan to rediff and send my new patch tomorrow.


Re: (patch) Improved documentation and examples of dynamic modules

2015-05-10 Thread Kamil Rytarowski
Greg Troxel wrote:
> "Kamil Rytarowski"  writes:
> 

Thank you Greg for your reply!

> > Paul Goyette wrote:
> >> The "happy" module makes a claim that "4 digit numbers cannot cycle", 
> >> and uses a cache[] table for all numbers below 1000.  Can you please 
> >> provide a reference to back up the "cannot cycle" claim?  :)  And please 
> >> initialize (or reinitialize) the entire cache[] array in your modcmd's 
> >> INIT routine, rather than a single static initialization.  (It could 
> >> make a difference if the module were ever "built-in" to a kernel ...)
> >
> > There are two algorithms: naive and optimal. The naive one uses caching
> > and the optimal check whether we are in the cycle: 4, 16, 37, 58, 89,
> > 145, 42, 20. The claims comes from this statement that the largest number
> > from the cycle is 3-digit.
> >
> > I can go for the the optimal one and cut the caching design in the code.
> > In the module example the algorithm implementation shouldn't divert 
> > attention from the important part - how to implement read(2).
> 
> My reaction to this is that this entire algorithm is a distraction to
> the module example.
> 

It will be simplified.

> Also, "happy" doesn't seem like a useful name; examples should have
> names that suggest the kinds of things they do with respect to the
> module system, to guide people choosing which ones to read.
> 

Well, I like /dev/happy for Happy Number generator. What would be the
better name?

> > I like it. Maybe place them in `share/examples/kmodules'?
> 
> share/examples/sys/modules would be better; kmodules is not the name of
> a program in base and would be confusing to someone expecting examples
> to be about programs in base.
> 

kmodule was inspired by (/usr/share/mk/)bsd.kmodule.mk and it was so
natural for me to pick up this name. Maybe share/examples/sys/kmodule?

> There's already an example module in the kernel source tree, too.  I
> don't follow why that just can't be improved.
> 
I'm improving this single example adding to it 4 additional modules and
README. There is no revolution, except moving the original example to a
new location.

The approach 'Ab uno disces omnis' (from one [example] learn everything)
was insufficient for me and I had to do research, now I'm going to make
this knowledge available for everybody.

I'm trying to improve the documentation and add more enlightening examples
for newcomers (like myself!).

I'm aware of the fact that experienced kernel programmers familiar with
NetBSD internals know how to handle all kinds of things, dependencies and
for them following 'ab uno disces omnis' (single example just parsing the
parameters) is sufficient.

Please let it be easier for people starting to hack in the kernel from
the modules (people like me) and not assuming they mastered the kernel
internals.


Re: (patch) Improved documentation and examples of dynamic modules

2015-05-10 Thread Kamil Rytarowski
Paul Goyette wrote:
> OK, I looked at all of the examples, except for the lua one (I am 
> lua-clueless).  Everything looks OK to me, and I don't see any reason 
> not to commit these.
> 

Please try to follow the code, it should illuminate you. If it's too
difficult it should be corrected.

> I do, however, have a request...
> 
> The "happy" module makes a claim that "4 digit numbers cannot cycle", 
> and uses a cache[] table for all numbers below 1000.  Can you please 
> provide a reference to back up the "cannot cycle" claim?  :)  And please 
> initialize (or reinitialize) the entire cache[] array in your modcmd's 
> INIT routine, rather than a single static initialization.  (It could 
> make a difference if the module were ever "built-in" to a kernel ...)
> 

There are two algorithms: naive and optimal. The naive one uses caching
and the optimal check whether we are in the cycle: 4, 16, 37, 58, 89,
145, 42, 20. The claims comes from this statement that the largest number
from the cycle is 3-digit.

I can go for the the optimal one and cut the caching design in the code.
In the module example the algorithm implementation shouldn't divert 
attention from the important part - how to implement read(2).

> Also, for all tests that create a cdevsw, please insert a comment to 
> ensure that the reader should verify that the number should be changed 
> if needed to avoid conflict with "real" devices.

I will do it.

> And maybe also note this in the README file?

Good idea.

> Definitely include a comment that these 
> modules should not be used on a "production" machine, just in case...
> 

I will put a line about it in the README file.

> Finally, I would appreciate it if other folks would weigh in on the 
> question of the directory hierarchy into which these examples should be 
> placed...
> 

I like it. Maybe place them in `share/examples/kmodules'?


Re: (patch) Improved documentation and examples of dynamic modules

2015-05-09 Thread Kamil Rytarowski
Paul Goyette wrote:
> To: "Kamil Rytarowski" 
> Cc: tech-kern@netbsd.org
> Subject: Re: (patch) Improved documentation and examples of dynamic modules
>
> I'd like to suggest that perhaps the example modules should belong in 
> /usr/share/examples/ or similar?
> 

Looks saner. My motivation to put them into sys/modules/example was already
existing sys/modules/example.

Just please make it clear where it is in the module.9 file.


(patch) Improved documentation and examples of dynamic modules

2015-05-08 Thread Kamil Rytarowski
RCS file: sys/modules/examples/README
diff -N sys/modules/examples/README
--- /dev/null   1 Jan 1970 00:00:00 -
+++ sys/modules/examples/README 8 May 2015 13:33:26 -
@@ -0,0 +1,52 @@
+   $NetBSD: $
+
+   Kernel Developer's Manual
+
+DESCRIPTION
+ The kernel example dynamic modules.
+
+ This directory contains the following example modules:
+ * hello   - the simplest `hello world' module
+ * properties  - handle incoming properties during the module load
+ * happy   - basic implementation of read(9) with happy numbers
+ * ping- basic ioctl(9)
+ * hellolua- the simplest `hello world' Lua module
+
+ To build the examples you need a local copy of NetBSD sources. You also
+ need the comp set with toolchain. To build the module just enter a
+ directory with example modules and use make(1):
+
+ # make
+
+ To load, unload and stat the module use modload(8), modunload(8) and
+ modstat(8).
+
+ The S parameter in the Makefile files points to src/sys and it can be
+ overloaded in this way:
+
+ # make S=/data/netbsd/src/sys
+
+ The code of a module does not need to be in src/sys unless you use
+ the autoconf(9) framework.
+
+ A cross-built of a module for a target platform is possible with the
+ build.sh framework. You need to generate the toolchain and set
+ appropriately PATH to point bin/ in the TOOLDIR path. An example command
+ to cross-build a module with the amd64 toolchain is as follows:
+
+# nbmake-amd64 S=/data/netbsd/src/sys
+
+SEE ALSO
+ lua(9lua), modctl(2), modload(8), module(7), module(9), modstat(8),
+ modunload(8)
+
+HISTORY
+ An example of handling incoming properties first appeared in NetBSD 5.0
+ and was written by Julio Merino with further modifications by Martin
+ Husemann, Adam Hamsik, John Nemeth and Mindaugas Rasiukevicius.
+
+ This document and additional modules (hello, happy and ping, hellolua)
+ first appeared in NetBSD 8.0 and they were written by Kamil Rytarowski.
+
+AUTHORS
+ This document was written by Kamil Rytarowski.
Index: sys/modules/examples/happy/Makefile
===
RCS file: sys/modules/examples/happy/Makefile
diff -N sys/modules/examples/happy/Makefile
--- /dev/null   1 Jan 1970 00:00:00 -
+++ sys/modules/examples/happy/Makefile 7 May 2015 20:03:49 -
@@ -0,0 +1,7 @@
+#  $NetBSD: Makefile,v 1.2 2008/02/10 10:51:18 jmmv Exp $
+
+S?=/usr/src/sys
+KMOD=  happy
+SRCS=  happy.c
+
+.include 
Index: sys/modules/examples/happy/happy.c
===
RCS file: sys/modules/examples/happy/happy.c
diff -N sys/modules/examples/happy/happy.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ sys/modules/examples/happy/happy.c  8 May 2015 13:25:09 -
@@ -0,0 +1,176 @@
+/* $NetBSD: $  */
+
+/*-
+ * Copyright (c) 2015 The NetBSD Foundation, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+__KERNEL_RCSID(0, "$NetBSD: $");
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Create a device /dev/happy to generate happy numbers.
+ *
+ * To use this device you need to do:
+ * mknod /dev/happy c 210 0
+ *
+ * A happy number is a number defined by the following process: Starting with
+ * any positive integer, replace the number by the sum of the squares of its
+ * digits, and repeat the process until the number equals 1 (where it will
+ * sta

Re: Dynamic modules

2015-05-04 Thread Kamil Rytarowski


> Sent: Monday, May 04, 2015 at 11:39 AM
> From: "Paul Goyette" 
> To: "Kamil Rytarowski" 
> Cc: tech-kern@netbsd.org
> Subject: Re: Dynamic modules
>
> On Mon, 4 May 2015, Kamil Rytarowski wrote:
> 
> 
> > 3. Is it possible to automatically create a device file in /dev from a
> >module?
> 
> Not really, at least not from a loaded kernel module.
> 
I see.

> On the other hand, modules are also applicable to "rump" environments, 
> and you _can_ dynamically create the /dev entry (in a rump'd file 
> system, of course).  For an example, please have a look at
> 
>   src/sys/rump/dev/lib/libsysmon/sysmon_component.c

Interesting note!

> 
> > 4. What is the best way to extract the major from devsw_attach call in
> >case we let it to automatically generate the major? For now I'm
> >using printf(9) and I'm checking dmesg(8). This gives me an output
> >for mknod(8) with an appropriate value.
> 
> You could always create a sysctl and make the return values from 
> devsw_attach() available to user-land.
> 
> BTW, devsw_attach() doesn't "automatically generate the major".  It does 
> a lookup in the majors table.  If there is no entry in the table, you 
> get an error.
> 
> (At least, that's how I read that code.  If I've misinterpreted, I'd be 
> happy to have someone explain how it really works!)
> 

I see, so there is no good mechanism for something called by me automatic major 
pickup.

So my best try is to choose the major number and hard-code it in the module.

Thanks!


Dynamic modules

2015-05-04 Thread Kamil Rytarowski
I have got a few questions.

1. luapmf and luasystm
src/sys/modules/luapmf/luapmf.c
src/sys/modules/luasystm/luasystm.c

These modules are empty for the !_MODULE build. Why? Can we make them available 
for the builtin mode.

2. luactl(8)

$ sudo luactl help
usage: luactl [-cq]
   luactl [-cq] create name [desc]
   luactl [-cq] destroy name
   luactl [-cq] require name module
   luactl [-cq] load name path

When I first looked at it "name" was unclear for me, could we rename it to 
"state-name"?
"luactl load name path" suggests me a name of the file under path, but it's a 
name of the state!

3. Is it possible to automatically create a device file in /dev from a module?

4. What is the best way to extract the major from devsw_attach call in case we 
let it to automatically generate the major? For now I'm using printf(9) and I'm 
checking dmesg(8). This gives me an output for mknod(8) with an appropriate 
value.


Re: Note entry types (core dumps)

2015-04-28 Thread Kamil Rytarowski


> Sent: Tuesday, April 28, 2015 at 12:13 PM
> From: "Joerg Sonnenberger" 
> To: tech-kern@netbsd.org
> Subject: Re: Note entry types (core dumps)
>
> On Tue, Apr 28, 2015 at 01:45:13PM +0200, Kamil Rytarowski wrote:
> > From what I can see these numbers are standardized across systems (Linux, 
> > FreeBSD).
> 
> Not really.
> 
> > I need libunwind (a dependency of quite few runtimes) for NetBSD and I just 
> > landed here.
> 
> I still maintain that any runtime depending on HP's libunwind is broken.
> But people don't want to listen.
> 
Well, I added support for NetBSD in a piece of software, in the critical part 
it was using
mostly the ptrace(2) interface... it was a real mess, because the same kernel 
and different
CPU on some platforms can result in different APIs.. so I can understand why 
people
prefer to have the mess abstracted in a building block and not implementing it 
locally.

I'm not experienced in runtime development, so I cannot judge whether all that 
is needed:

http://www.nongnu.org/libunwind/docs.html

Can you propose an alternative to making libunwind work on NetBSD?

> > At least partial implementation doesn't look too heavy.
> 
> I've been there. It's a mess.
> 

OK, For now it's not a critical feature and my runtime doesn't make use of it.

> Joerg
> 


Note entry types (core dumps)

2015-04-28 Thread Kamil Rytarowski
Hello,

What's the status of the note entry types in core dumps in NetBSD?

They are extra notes saved during the memory dumping in additional section.
mostly dumping content of structures.

Just to list their types:
#define NT_PRSTATUS 1   /* prstatus_t */
#define NT_PRFPREG  2   /* prfpregset_t   */
#define NT_PRPSINFO 3   /* prpsinfo_t */
#define NT_PRXREG   4   /* prxregset_t*/
#define NT_PLATFORM 5   /* string from sysinfo(SI_PLATFORM) */
#define NT_AUXV 6   /* auxv_t array */
#define NT_GWINDOWS 7   /* gwindows_t   SPARC only  */
#define NT_ASRS 8   /* asrset_t SPARC V9 only   */
#define NT_LDT  9   /* ssd array IA32 only */
#define NT_PSTATUS  10  /* pstatus_t  */
#define NT_PSINFO   13  /* psinfo_t   */
#define NT_PRCRED   14  /* prcred_t   */
#define NT_UTSNAME  15  /* struct utsname*/
#define NT_LWPSTATUS16  /* lwpstatus_t*/
#define NT_LWPSINFO 17  /* lwpsinfo_t */
#define NT_PRPRIV   18  /* prpriv_t   */
#define NT_PRPRIVINFO   19  /* priv_impl_info_t */
#define NT_CONTENT  20  /* core_content_t*/
#define NT_ZONENAME 21  /* string from getzonenamebyid(3C)  */
#define NT_NUM  21

http://www.opensource.apple.com/source/dtrace/dtrace-90/sys/elf.h

>From what I can see these numbers are standardized across systems (Linux, 
>FreeBSD).

They are used widely e.g.: gdb, gcore (FreeBSD), file(1), lldb, binutils.

I need libunwind (a dependency of quite few runtimes) for NetBSD and I just 
landed here.

At least partial implementation doesn't look too heavy.

PS. libexecinfo vs libunwind debates are irrelevant here.


ACPICA-20150410, request for review

2015-04-12 Thread Kamil Rytarowski
tech-pkg:

I request for review wip/acpica-utils and to upgrade sysutils/acpica-utils with 
it.

Changes against sysutils/acpica-utils
- Upgrade from 20090625 to 20150410
  Changelog: https://github.com/acpica/acpica/blob/master/documents/changes.txt

- New dual license: modified-bsd OR gnu-gpl-v2

- Add new category: sysutils

- NetBSD has sem_timedwait(). Hey I miss the .Sh HISTORY notes about it!

- Drop local tweaks for DragonflyBSD. For this platform please see new
  upstream patch:
  
https://github.com/acpica/acpica/commit/3e93431674abe947202b0f9a0afa7b625b17caa6

CVS commit: 
https://mail-index.netbsd.org/pkgsrc-wip-cvs/2015/04/12/msg037946.html


The package could be pushed during the freeze for 2015Q1,
as at least for me acpica-utils from pkgsrc is broken.

Tested on NetBSD-current/amd64.

tech-kern:

I request for upgrade the in-kernel ACPICA subsystem.
This will help me to debug ACPI related problems and upstream more local 
changes.


Re: Removal of compat-FreeBSD

2015-02-13 Thread Kamil Rytarowski
Greg Troxel wrote:
> Maxime Villard  writes:
> 
> > Apparently, compat-FreeBSD is needed by tw_cli users.
> >
> > Therefore I think I will just disable it by default in the GENERIC kernels,
> > unless anyone disagrees.
> 
> Our norms for significant changes are more or less about consensus or
> preponderance of opinion.  So far you've said that you want to
> remove/disable this, and a number of people have said they use it.  No
> one else has spoke up in favor of disabling.  We don't have evidence
> that anyone (besides you) is disabling this in their kernels.
> 
> So I think the opinions to date fall well short of enough to support
> even disabling by default.
> 
> 

I find this offer to solve problems with FreeBSD-compat nerver-wracking.
I would prefer to see solving it this way:
- add FreeBSD-compat to the list of projects at the NetBSD website,
- notify users and developers that FreeBSD-compat layer suffers from
serious issues and is not usable and unmaintained,
- notify that FreeBSD-compat development will be tracked for 3-6 months
and if there nothing will considerably improve, it will be disabled and
pushed to the queue of code-parts to be removed.

With this approach of (somehow) sustainable development developers and
users, those who care could plan their time of next projects for NetBSD
 without alarm to throw all their current works and put all hands on
this or that.

Personally, I can't start hacking on the FreeBSD-compat immediately, as
there is a set of 5 patches waiting for peer-review and acceptance/
rejection from a NetBSD developer. Then it's a matter of days to submit
next patches in the same field (some are already developed) and switch
to usual projects.


Re: Removal of compat-FreeBSD

2015-02-07 Thread Kamil Rytarowski
Maxime Villard wrote:
> Hi,
> I intend to remove the compat-FreeBSD support from the system.
> 

Please don't do it.


Tru64 AdvFS porting to NetBSD - 4. status 2014-12-25

2014-12-25 Thread Kamil Rytarowski
Merry Christmas!

The last month I used to... clean up and merge improvements with upstream 
NetBSD.
My intention was not to just propose plain new code import, but give a 
reference patch and try to research a bit a possible improvement to the 
existing solution.

In particular, I pay attention to two stdlib.h functions, happened to be bit 
controversial. I wish I saw them accepted and merged soon as there are 
scheduled some patches to NetBSD and pkgsrc considering these functions merged 
or reusing them.

So time to dive again in the AdvFS code!

AdvFS in-kernel code [1] is going to be rebased to the -current and all 
branches merged.

1. What is done
- N/A
2. What is in progress
- cleaning up branches
3. Issues
- N/A
4. Next steps
- UVM, VFS.. and all things from the previous mail
5. Pushed to NetBSD
- little pieces here and there

Previous porting status: 
http://mail-index.netbsd.org/tech-kern/2014/11/16/msg018007.html

[1] https://github.com/krytarowski/netbsd-current-src-sys


sys/clock.h finalization

2014-12-21 Thread Kamil Rytarowski
Hello,

Please let finalize src/sys/sys/clock.h:

1. Please review and add man page to src/share/man/man9
Thomas K. is responsible for man pages according to src/doc/RESPONSIBLE

2. src/tools/compat/dev/clock_subr.h defines:

/* Some handy constants. */
#define SECDAY  (24 * 60 * 60)
#define SECYR   (SECDAY * 365)

/* Traditional POSIX base year */
#define POSIX_BASE_YEAR 1970

Please nuke them, as these defines are now in src/sys/sys/clock.h and they 
aren't needed/used elsewhere.

3. src/sys/dev/clock_subr.c

Please change this code from clock_secs_to_ymdhms():
/* Hours, minutes, seconds are easy */
dt->dt_hour = rsec / 3600;
rsec = rsec % 3600;
dt->dt_min  = rsec / 60;
rsec = rsec % 60;
dt->dt_sec  = rsec;

to:
/* Hours, minutes, seconds are easy */
dt->dt_hour = rsec / SECS_PER_HOUR;
rsec = rsec % SECS_PER_HOUR;
dt->dt_min  = rsec / SECS_PER_MINUTE;
rsec = rsec % SECS_PER_MINUTE;
dt->dt_sec  = rsec;

Regards,

clock.9
Description: Binary data


Re: Reuse strtonum(3) and reallocarray(3) from OpenBSD

2014-12-19 Thread Kamil Rytarowski
Alan Barrett wrote:
> To: tech-userle...@netbsd.org, tech-kern@netbsd.org
> Subject: Re: Reuse strtonum(3) and reallocarray(3) from OpenBSD
>
> On Sat, 29 Nov 2014, Kamil Rytarowski wrote:
> > My proposition is to add a new header in src/sys/sys/overflow.h 
> > (/usr/include/sys/overflow.h) with the following content:
> >
> > operator_XaddY_overflow()
> > operator_XsubY_overflow()
> > operator_XmulY_overflow()
> >
> > X = optional s (signed)
> > Y = optional l,ll, etc
> > [* see comment]
> 
> OK, so you have told us the names of the proposed functions.  But what
> are their semantics, and why would they be useful?
> 

The purpose was to make a library for arithmetic operations with checking for 
overflow
(it doesn't matter whether undefined behavior or not). The first consumers are 
malloc(3) and realloc(3).
Grepping our sources for malloc and realloc shows that that there is often 
multiplication passed as the size parameter, and it could be replaced with 
reallocarray(3) that checks for overflow.

I've failed to write them in C99 without compiler extensions... The closest 
example is libo (entirely wrote with extensions), with usage:

if (overflow_mul(&c, a, b))
printf("overflow!\n");

For more details see: https://github.com/xiw/libo
and an article of the author: 
http://kqueue.org/blog/2012/03/16/fast-integer-overflow-detection/

It would way easier to do it with C11 _Generic...

My initial intention was to make it available for the kernel
I've decided to not go for dozen of variants of overflow_XmulY as it 
wouldn't be obvious how to handle functions that take typedefed parameters 
(size_t, ssize_t etc). A user would be sure whether size_t is ulong or 
ulonglong... etc. Making it the other way than type agnostic won't help.

> > Last but not least please stop enforcing 
> > programmers' fancy to produce this kind of art: 
> > https://github.com/ivmai/bdwgc/commit/83231d0ab5ed60015797c3d1ad9056295ac3b2bb
> >  
> > :-)
> 
> Please don't assume that people reading your email messages have
> convenient internet access.  It's fine to give URLs thatrexpand on what
> you have said, but if you give the URL without any explanation then I
> have no idea what you are talking about.
> 
> --apb (Alan Barrett)
> 

OK!

Back to the topic, I've posted a problem-report with the patches that merge 
reallocarray(3) with out sources.

http://mail-index.netbsd.org/netbsd-bugs/2014/12/19/msg039485.html

Tested and verified to work! Please review and apply!

As a bonus I've extended our malloc(3) man-page :) partly with information 
taken from OpenBSD's malloc(3).

Thank you very much,


Re: shipping processes between ttys

2014-12-08 Thread Kamil Rytarowski
Hello,

I used to use reptyr on RPI with ArchLinux and it worked well. There were two 
caveats:
- processes sleeping in background,
- portability - it used to work only on 80x86, amd64 and ARM.

I loved it as it was a way to capture a long running process and put it to a 
screen session.

I'm not sure but proper implementation might be with support from the kernel 
(for some reason there were the mentioned restrictions). There is also 
(different but similar) process "watch" in FreeBSD that is missing in NetBSD.

With regards,


Re: Reuse strtonum(3) and reallocarray(3) from OpenBSD

2014-11-28 Thread Kamil Rytarowski
Hello,

+ tech-kern@

I've revisited my idea of reallocarray(3). As it's an emerging standard 
(quickly merged with libbsd, developed in glibc) I won't discus the the facts 
:) and leave its benefits for interested readers to:

http://www.lteo.net/blog/2014/10/28/reallocarray-in-openbsd-integer-overflow-detection-for-free/

Thank you, mainly Joerg for your constructive comments. I wish we assumed that 
due to pointer-aliasing traps (in the proposed alternative) reinventing new 
reallocarray-like function is not worth it.


So back to the revisited idea.
Current CLANG and GCC (5.0 [1]) support set of basic operators with checks for 
overflows, namely the '+', '-' and '*' operations.

My proposition is to add a new header in src/sys/sys/overflow.h 
(/usr/include/sys/overflow.h) with the following content:

operator_XaddY_overflow()
operator_XsubY_overflow()
operator_XmulY_overflow()

X = optional s (signed)
Y = optional l,ll, etc
[* see comment]

These functions will be static-inlined and in design fully MI, with discovery 
guards for features of GCC / CLANG (I presume that PCC still needs a patch 
contribution to add this feature), in case of missing in-compiler/platform 
support there will be a fall-back for pure (and simple) C implementation.

Here is an interesting article of "We Need Hardware Traps for Integer Overflow" 
reflecting my ideas:
http://blog.regehr.org/archives/1154

BTW. Actually it's possible to optimize overflow checks at i386/amd64 -- with a 
conditional jump at overflow a compiler will do it well, not sure for ARM right 
now.

After wrapping up overflow.h (and merging with the current sources) I will 
propose reallocarray(3) that is compatible with OpenBSD, simpler to read and 
reusing our operator_mulY_overflow().

Last but not least please stop enforcing programmers' fancy to produce this 
kind of art: 
https://github.com/ivmai/bdwgc/commit/83231d0ab5ed60015797c3d1ad9056295ac3b2bb 
:-)

What do you think?

Best regards!

[1] https://gcc.gnu.org/gcc-5/changes.html
[*] I would like to see C99 type-generic macro here... but would it be allowed 
for portability reasons?


Tru64 AdvFS porting to NetBSD - 3. status 2014-11-16

2014-11-16 Thread Kamil Rytarowski
Hello,

This is the third status [1] of porting AdvFS to NetBSD.

Thanks especially to the rump team for help! The world is small as I still meet 
new people who tried to get open pieces of Tru64 or Alpha (like the SRM code) 
in the past. Lately I was more reading then porting as I have the occasion to 
learn new things to me, regarding virtual memory subsystems.

1. What is done
- Basic locking is done [2] (I could say that 80% of code is adapted, and this 
required 20% of time and effort for locking)
2. What is in progress
- Studying NetBSD specific bits of VFS, UVM, Virtual Memory, Pager, UBC etc, to 
be prapared for porting virtual-memory logic
- Analyzing Tru64 / AdvFS usage of VM (available documentation is helping here)
- Cleaning the code to stop the flood of annoying errors from compiler -- it 
started to be really clean when comparing to the initial stage
3. Issues
- This time mostly time and knowledge shortage (regarding VM and VFS internals)
4. Next steps
- Migrate VM, VFS for NetBSD's API
- Squash as many trivial compiler warnings as possible, to stop the flood of 
errors (GCC 4.8.x and clang 3.5.x do the job well)
5. Pushed to NetBSD
- sys/time.h patches waiting for review/comments (at problem-reports)
- Proposed small patches regarding improvement of documentation (NVNODE, 
uvn_findpages())

Help and motivation support is appreciated.

Code is here: https://github.com/krytarowski/netbsd-current-src-sys

[1] Previous report is at 
http://mail-index.netbsd.org/tech-kern/2014/10/11/msg017782.html
[2] Mostly with this patch set 
https://github.com/krytarowski/netbsd-current-src-sys/compare/c07c47d556...75dc491d86


Re: link-set

2014-11-13 Thread Kamil Rytarowski
From: Masao Uebayashi
> This is what I've learned about link-set.
> 
> TL;DR - link-set is fine except already unused sections are exposed
> after final link
> 
> [...]

Hello Masao,

Thank you for your research!

Maybe irrelevant here, but with you work could we consider having room for 
possible implementation of FDT [1]?

FDT (Flattened Device Tree) is a binary configuration file for kernel, heavily 
used in embedded Linux and FreeBSD solutions.

[1] https://wiki.freebsd.org/FlattenedDeviceTree


Re: kernel constructor

2014-11-11 Thread Kamil Rytarowski
From: Masao Uebayashi
> 
> The biggest problem of constructors (and indirect function call in
> general), I am aware of, is, static code analysis (code reading, tag
> jump, ...) becomes difficult (or impossible).
> 

Limited static code analysis is not a bigger problem then a broken machine at 
boot due to introduction of a new class of bugs, with most notable:
- circular dependencies,
- races in boot dependencies,
- obfuscated source of running code (something is messing by unknown caller),


Re: kernel constructor

2014-11-11 Thread Kamil Rytarowski
>From David Holland
> Please don't do that. Nothing good can come of it - you are asking for
> a thousand weird problems where undisclosed ordering dependencies
> silently manifest as strange bugs.
> 
> Furthermore, the compiler can and probably will assume that
> constructor functions get called before all non-constructor code, and
> owing to unavoidable issues in early initialization this will not be
> the case in some contexts. (I first hit this problem back in about
> 1995ish when some more gung-ho colleagues were trying to use C++
> global constructors in a C++ kernel, and we eventually had to declare
> a moratorium on all global constructors.)
> 
> init_main.c could use some tidying, but there's nothing fundamentally
> wrong with it that will be improved by adding a lot of implicit magic
> that doesn't do what the average passerby expects.
> 
> -- 
> David A. Holland
> dholl...@netbsd.org
> 

Hello,

Invitation for kernel constructors and destructors (in meaning of GNU 
extension) looks as not being appropriate to me:
- it's not standardized in C language, so people and tool-chain have right to 
be not familiar with it or not implement it the same way, neither preserve the 
same functionality in long term...,
- we can lost control over function calling order,
- if we really want such constructors it's doable with simple callback 
functions (a module is registering callback to be executed by a master process).

Privately I always considered __attribute__((constructor)) and 
__attribute__((destructor)) as a sign of bad design, with need to be boosted by 
extensions. Normally (always?) constructors are called before entering 
main()... I prefer to get the console/uart ready as soon as possible, not after 
a set of procedures initialized by a list of modules when we can got frozen 
without legible output.

And last but not least... what's wrong with init_main.c? It must be clear for a 
developer adding a new platform or debugging hardware bring-up. It gives me big 
picture on that what's going on step-by-step, even when I was lurking into 
assembly of our kernel... call it, call that, call this.. making it all clear.


Re: Tru64 AdvFS porting to NetBSD - 2. status 2014-10-11

2014-10-12 Thread Kamil Rytarowski
Justin Cormack wrote
> The Linux LTTng performance tool is basically compatible with NetBSD
> dtrace hooks, ie you can use them on rump kernel code in Linux
> userspace.
>
> See 
> https://github.com/rumpkernel/wiki/wiki/Howto%3A-Profiling-the-TCP-IP-stack-with-LTTng

Thank you!

Actually some tracing admin-oriented is desired, please see advfsstat(8) [1]. A 
developer-oriented tracing IMO shall to be done with external tools, not from 
inherited code. For the latter LTTng looks promising!

[1] 
http://h50146.www5.hp.com/products/software/oe/tru64unix/manual/v51a_ref/HTML/MAN/MAN8/0351.HTM


Re: Tru64 AdvFS porting to NetBSD - 2. status 2014-10-11

2014-10-12 Thread Kamil Rytarowski
Hello,

> Are you sure a new dtrace provider would not be the way to go?

Yes. I believe that internal performance tracing and tracking locks is 
misdesign (also not very portable). I'm not familiar with DTrace, more with 
Linux tools (perf, valgrind) and my general plan was to debug it with librump 
in Linux.

Of course the first milestone is not performance optimization but getting it 
well integrated with NetBSD and stable. That's why I took Tru64 long-term 
tested release, not HP/UX that was never tested in production.

Some checks will be needed anyway, like function assumes that a lock is 
acquired in a certain function - with asserts such bugs will be narrowed-down 
quickly.


Tru64 AdvFS porting to NetBSD - 2. status 2014-10-11

2014-10-11 Thread Kamil Rytarowski
Hello,

This is the second status [1] of porting AdvFS to NetBSD.

Thank you for your motivation support, including mails from outside the NetBSD 
world.

1. What is done
- Moved AdvFS files from src/sys/fs/msfs to src/sys/external/gpl2/msfs and 
updated the build machinery
- Picked missing dyn_hashtable functionality from the HP/UX port of AdvFS
- Designed new debugging & tracing system, with changeable levels (none, fatal 
asserts, debug asserts, extensive checks) it's intended to replace the existing 
fine-grained debugging that is placed in the original work almost everywhere 
and it's impractically difficult to port 1:1, as it's utilizing Tru64-specific 
features -- the HP/UX port went with similar path; most debugging code (most 
notably related to locking) will be gone
- Stopped using indent(1) as it introduces a lot of harm because of extensive 
usage of macros (missing semicolons etc..)
- Overall: cleaned and squashed 377 proof-of-concept (aka throw-away) commits 
[2] into 115 cleaned revisions (aka throw-away later) [3]
2. What is in progress
- Adapting locking code, with verification of the right path with the HP/UX port
- Adapting debugging for new design, removing unneeded code-complication and 
Tru64-specific debug solutions
- Converting macros, used as in-lined functions with side-effects, to functions
- Removing alternative compilation paths (exception for _KERNEL in general and 
MSFS_DEBUG in msfs/ms_assert.h)
- Other compatibility patches for a modern compiler and NetBSD
3. Issues
- Missing subsystems' details from Tru64, still no idea about definitions of 
functions from overlap.h, missing quota's code (but not looked at it closely)
4. Next steps
- Virtual Memory porting
5. Pushed to NetBSD
- Proposed new  patch [4] still pending :-( and blocking further 
reduction of MD code..

Help and motivation support is still appreciated.

[1] Previous status at 
http://mail-index.netbsd.org/tech-kern/2014/09/17/msg017684.html
[2] 
https://github.com/krytarowski/netbsd-current-src-sys/tree/advfs_2014-09-27_old
[3] https://github.com/krytarowski/netbsd-current-src-sys/tree/advfs
[4] http://mail-index.netbsd.org/netbsd-bugs/2014/10/08/msg038523.html & 
http://mail-index.netbsd.org/netbsd-bugs/2014/10/09/msg038531.html


Re: Unification of common date/time macros

2014-10-08 Thread Kamil Rytarowski
Hello,

For your interest there are already patches against current:
http://mail-index.netbsd.org/netbsd-bugs/2014/10/08/msg038523.html

Best regards,


Re: Unification of common date/time macros

2014-09-22 Thread Kamil Rytarowski
Hello,

Good point with reducing the (U)L modifiers, also reducing possible 
side-effects of 16-bit int, so going for 86400 explicitly is a good idea.

I've already proposed patches with this bug-report:
http://mail-index.netbsd.org/netbsd-bugs/2014/09/16/msg038315.html

My idea was to extract verbatim defines from clock_subr.h, put them to a file 
inside 'sys/' and keep 100% compatibility with files that depend on the 
original defines inside clock_subr.h (so add #include in it to the new .h file).

With regards,


Re: Tru64 AdvFS porting to NetBSD - 1. status 2014-09-17

2014-09-17 Thread Kamil Rytarowski
Hello,

Thank you for your feedback.

I will put the moving directories around onto my TODO stack.

Please let me do it after resolving thousands of lines of compilation errors 
(just the kernel-part). I will go for a 'msfs' external name.

When I will be done with making it buildable then I will contact the core team 
/ foundation to help me to organize steps to relicense it.

With regards,


Tru64 AdvFS porting to NetBSD - 1. status 2014-09-17

2014-09-17 Thread Kamil Rytarowski
Hello,

This is the first status of significant efforts of porting AdvFS [1] [2] to 
NetBSD.

Long term primary goals:
- complete port of AdvFS to NetBSD,
- relicense the original work with for BSD-friendly license.

1. What is done
- Put all sbin, usr.sbin and lib files into the NetBSD tree
- Add AdvFS user-space parts to initial Makefiles of build.sh
- Add initial variable MKADVFS to stop or start building user-land pieces
- Put kernel-space code into the tree (sys/fs/msfs/)
- Add msfs (aka AdvFS) to the kernel build-machinery and include it for amd64 
kernel "ALL"
- Adapted for NetBSD or nuked missing #includes of files from sys/fs/msfs
2. What is in progress
- Party adapted locking dialect (mutex, krwmutex, condvar) for NetBSD (*)
- List of compatibility patches adapting for modern GCC and NetBSD (*)
- Researched and touched compatibility of catgen from NetBSD and Tru64
3. Issues
- Missing subsystems from Tru64: evm.h, devio, overlap.h (added an empty stub), 
disklabel details (*)
4. Next steps
- Tru64 Mach VM dialect porting to NetBSD UVM (*)
5. Pushed to NetBSD
- Proposed  patch [3]

(*) Considered significant effort and/or difficult.

Help and motivation support is appreciated.

References
https://github.com/krytarowski/netbsd-current-src-sbin/tree/advfs
https://github.com/krytarowski/netbsd-current-src-usr.sbin/tree/advfs
https://github.com/krytarowski/netbsd-current-src-lib/tree/advfs
https://github.com/krytarowski/netbsd-current-src-include/tree/advfs
https://github.com/krytarowski/netbsd-current-src-share/tree/advfs
https://github.com/krytarowski/netbsd-current-src-sys/tree/advfs

[1] http://advfs.sourceforge.net/
[2] http://en.wikipedia.org/wiki/AdvFS
[3] http://mail-index.netbsd.org/netbsd-bugs/2014/09/16/msg038315.html


Re: Unification of common date/time macros

2014-09-15 Thread Kamil Rytarowski
Hello,

I did some further investigation:
- FreeBSD provides NetBSD's src/syc/dev/clock_subr.h as /usr/include/sys/clock.h
- OpenBSD merged src/sys/dev/clock_subr.h with src/sys/sys/time.h [2]
- Linux kernel nothing (?)
- Tru64 as mentioned before, clock.h inside several paths:
include/alpha/clock.h
include/machine/clock.h
include/sys/machine/clock.h
sys/include/arch/alpha/clock.h
sys/include/machine/clock.h
sys/include/sys/machine/clock.h

My proposition is to go for a new file src/sys/sys/clock.h. Normalize naming 
with /usr/include/tzfile.h, then uniformly export the file for reuse across the 
kernel.

#define SECSPERMIN  60L
#define MINSPERHOUR 60L
#define HOURSPERDAY 24L
#define DAYSPERWEEK 7L
#define DAYSPERNYEAR365L
#define DAYSPERLYEAR366L
#define SECSPERHOUR (SECSPERMIN * MINSPERHOUR)
#define SECSPERDAY  (SECSPERHOUR * HOURSPERDAY)
#define MONSPERYEAR 12L
#define EPOCH_YEAR  1970L

+ macros/defines of leap-year macro, weak-of-day etc.

Maybe avoid name-clashes with tzfile.h and go for SECSMIN etc.?

What do you think? Is it worth adding?

Thanks in advance,

[1] http://fxr.watson.org/fxr/source/sys/clock.h
[2] 
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/sys/time.h.diff?r1=1.22&r2=1.23&f=h


<    1   2   3   4   5   >