Re: kernel panic on ibm4xx-based powerpc box with DDB

2016-12-28 Thread Matt Thomas

> On Dec 27, 2016, at 11:59 PM, Rin Okuyama  wrote:
> 
> Thank you for your kind explanation. I'm starting to understand.
> I will read again the reference manual from this point of view.
> So, could I commit the patch?

Go commit it.

Re: kernel panic on ibm4xx-based powerpc box with DDB

2016-12-27 Thread Matt Thomas

> On Dec 27, 2016, at 10:49 PM, Rin Okuyama  wrote:
> 
> Thank you very much for your reply. I revised the patch accordingly, and
> it passed some stress tests on my OPENBLOCKS266.
> 
> However, sorry for bothering you, but I don't understand why this work.
> The original DDB/IPKDB handlers use ddbstk/ipkdbstk, that clearly do not
> support nested traps, as you pointed out. The patched version uses
> CI_{DDB,IPKDB}SAVE, that are save areas in cpu_info. It seems to me that
> they also do not support nested traps; a succeeding trap overwrites
> save areas already used by a preceding trap, doesn't it? I'm a beginner
> of assembler programming, and maybe I misunderstand something...

Much nicer.  It does support nested traps because %r1 (sp) isn't loaded if we 
are already in kernel mode.  So the trapframe is just saved further down the 
stack.

You can't get an exception while saving into the saveareas so that part doesn't 
need to stack.  Only after the saveareas are moved into a trapframe will 
exceptions be reenabled.  That's what PSL_RI enables.

Re: kernel panic on ibm4xx-based powerpc box with DDB

2016-12-27 Thread Matt Thomas

> On Dec 27, 2016, at 1:26 AM, Rin Okuyama  wrote:
> 
> I would like to fix port-powerpc/51367,
> 
> http://gnats.netbsd.org/51367
> 
> where ibm4xx-based machine is unstable with DDB option is specified.
> 
> DDB hooks the program interrupt (EXC_PGM). In the privileged mode,
> this is OK. However, in the user mode, it must dispatch directly
> to the usual trap handler, in the same manner as OEA:
> 
> http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/powerpc/powerpc/trap_subr.S#rev1.51
> 
> Otherwise, for example, a FPU instruction in the user mode triggers
> the program interrupt (ibm4xx does not have FPU), which results in
> inappropriate execution of DDB handler and kernel panic.
> 
> With attached patch, DDB and IPKDB handlers check whether they run
> in the privileged or user modes as in a similar manner to
> powerpc/trap_subr.S rev >= 1.51, raised above. I've confirmed that
> kernel panics are avoided on my OPENBLOCKS266. Can I commit this?
> 
> Thanks,
> Rin

Why don't use you

ACCESS_PROLOG(CI_DDBSAVE)
bla ddbtrap

and just rid of ddbstk?  since ddbstk/ipkdbstk don't support nested traps.

Ditto for ipkdbtrap.



Re: Portmasters alert: was => Re: uvm physseg - put the lid on it.

2016-12-19 Thread Matt Thomas

> On Dec 19, 2016, at 12:19 PM, Cherry G. Mathew  wrote:
> 
>Cherry> I believe, a more formal way to do this, would be to use the 
> extent(9)
>Cherry> API to manage all of the paddr_t space, thus formalising this
>Cherry> relationship with consumers. I'm not inclined to look at this in 
> the
>Cherry> context of hotplug, due to "mission creep".
> 
> Committed a preview on -current:
> http://mail-index.netbsd.org/source-changes/2016/12/19/msg080026.html
> 
> Here's the "switch over" patch:
> http://ftp.netbsd.org/pub/NetBSD/misc/cherry/uvm_physmem/src.diff
> 
> These have been tested on various architectures by a few people, so
> unless there's vigorous disagreement, I will commit this once -current
> is stable again.

Don't add another uvm_* call to all ports.  It's just another things to do 
wrong.
Instead add a uvm_md_init and make it call uvm_setpagesize and uvm_physseg_init.
Thus if something new gets added we don't have to mung all the ports again.

You can't use extent for managing paddr_t space since extent uses longs and on
ports paddr_t is a long long.

Change uvm_physseg_valid to uvm_physseg_valid_p to show it's a predicate test.

Re: UVM and the NULL page

2016-07-28 Thread Matt Thomas

> On Jul 28, 2016, at 12:21 PM, Maxime Villard  wrote:
> 
> Le 28/07/2016 à 19:45, Eduardo Horvath a écrit :
>> On Thu, 28 Jul 2016, Maxime Villard wrote:
>> 
>>> Currently, there is no real way to make sure a userland process won't be
>>> able to allocate the NULL page. There is this attempt [1], but it has two
>>> major issues.
>> 
>> I don't think this is a good idea.  You should leave this to the pmap
>> layer rather than polluting UVM.  There are some architectures that need
>> to have page zero mapped in for various reasons.
>> 
> 
> No. Quite on the contrary. UVM should handle that, instead of polluting
> pmap.
> 
> You are saying that some architectures need the NULL page. That's fine,
> since #define __USER_VA0_IS_SAFE is architecture-dependent.
> 
> For your information, __USER_VA0_IS_SAFE is never set, so clearly, no
> one  needs the NULL page.

That is not true.  Older ARM processors use a vector page located at 0 which
has to be mapped into each processes address space.  

Now while the VA is mapped but it does not need to be writeable.




Re: FWIW: sysrestrict

2016-07-23 Thread Matt Thomas

> On Jul 23, 2016, at 1:36 AM, Maxime Villard  wrote:
> 
> Eight months ago, I shared with a few developers the code for a kernel
> interface [1] that can disable syscalls in user processes.
> 
> The idea is the following: a syscall bitmap is embedded into the ELF binary
> itself (in a note section, like PaX), and each time the binary performs a
> syscall, the kernel checks whether the syscall in question is allowed in
> the bitmap.
> 
> In details:
> - the ELF section is a bitmap of 64 bytes, which means 512 bits, the
>   number of syscalls. 0 means allowed, 1 means restricted.

Seems you only need the number of bytes needed to encode the hightest
restricted syscall.  However, I think I'd prefer a level of indirection.  Have 
a name of a bitmap embedded which references to a bitmap already loaded.
These would be visible via kern.restriction_sets. which would contain the 
bitmap.
There would also be a sysctl controlling what happens if you try to run a 
program
with an unknown bitmap set which only take effect where securelevel is non-zero.

> - in the proc structure, 64 bytes are present, just a copy of the
>   ELF section.
> - when a syscall is performed, the kernel calls sysrestrict_enforce
>   with the proc structure and the syscall number, and gives a look
>   at the bitmap to make sure it is allowed. If it isn't, the process
>   is killed.

What happens when we get more than 512 syscalls?  Is this for NetBSD
binaries only?  

> - a new syscall is added, sysrestrict, so that programs can restrict
>   a syscall at runtime. This might be useful, particularly if a
>   program calls a syscall once and wants to make sure it is not
>   allowed any longer.

I assume it can't unrestrict.  do you pass the size of the array(s)?

> - a userland tool (that I didn't write) can add and update such an ELF
>   section in the binary.
> 
> This interface has the following advantages over most already-existing
> implementations:
> - it is system-independent, it could almost be copied as-is in FreeBSD.
> - it is syscall-independent, we don't need to patch each syscall.
> - it does not require binaries to be recompiled.
> - the performance cost is low, if not non-existent.

If a syscall is restricted, what error is returned?  EPERM?  ENOSYS?

> I've never tested this code. But in case it inspires or motivates someone.
> 
> [1] http://m00nbsd.net/garbage/sysrestrict/




Re: SOSEND_LOAN problems in MIPS

2016-07-10 Thread Matt Thomas

> On Jun 21, 2016, at 8:30 AM, Michael  wrote:
> 
> Hello,
> 
> On Tue, 21 Jun 2016 08:49:44 +
> co...@sdf.org wrote:
> 
>> Replying to myself on tech-kern for the sake of completeness, in case
>> anyone has similar issues.
>> 
>> For the failure of `cat hugefile.gz | gunzip -` I must use both
>> PIPE_SOCKETPAIR and SOSEND_LOAN (default on)
> 
> That explains why it didn't trigger on sgimips.

It's very CPU dependent.  Maybe the CPUs used on your sgimips
don't have the problem.  It's related to virtual cache aliasing.


Re: Introduce curlwp_bind and curlwp_unbind for psref(9)

2016-06-16 Thread Matt Thomas

> On Jun 13, 2016, at 5:53 PM, Ryota Ozaki  wrote:
> 
> On Mon, Jun 13, 2016 at 11:21 PM, Taylor R Campbell
>  wrote:
>>   Date: Mon, 13 Jun 2016 14:00:16 +0200
>>   From: Joerg Sonnenberger 
>> 
>>   On Mon, Jun 13, 2016 at 07:36:31PM +0900, Ryota Ozaki wrote:
>>> Currently we do it by open-coding in each place,
>>> but we should provide some API to simplify codes.
>>> riastradh@ suggested curlwp_bind and curlwp_unbind
>>> some time ago (*1) and this patch (*2) just follows
>>> the idea.
>> 
>>   The primary question for me is whether nesting should be allowed or not.
>>   That would mean a reference count behind the flag.
>> 
>> This `reference count' gets stored on the stack.  The caller does:
>> 
>>int bound = curlwp_bind();
>> 
>>... psref_wotsit ...
>> 
>>curlwp_unbind(bound);
>> 
>> If it was already bound, bound = 1 and curlwp_unbind does nothing; if
>> it was not already bound, bound = 0 and curlwp_unbind unbinds it.
>> 
>> Perhaps the name should be `curlwp_bound_restore' or something else to
>> emphasize this, but I haven't come up with one that I like better on
>> aesthetic grounds.
> 
> - curlwp_bind and curlwp_unbind
> - curlwp_bound_set and curlwp_bound_restore
> - curlwp_bound and curlwp_boundx
> 
> Any other ideas? :)

Since we already use preempt_disable() to force an lwp to stick to a cpu,
doesn't that solve the problem?  If need be, we can enforce nonpreemptable
lwp's don't migrate.


Re: device-major question

2016-05-12 Thread Matt Thomas

> On May 11, 2016, at 4:18 AM, SODA Noriyuki  wrote:
> 
> There are two points which need to be clarified.
> 
> 1) The conf/majors files contains the following comment:
>   # Majors 160-255 are used for the MI drivers.
>   This "255" has to be changed, because the majors.storage file is
>   already using 332.
> 
>   What is the new limit for MI drivers?  1023?

511 for now.


> 2) I think the majors.{ws,usb,std,tty,storage} files are only used
>   for the device drivers which were made before the sys/conf/majors
>   file was introduced.  And new MI devices should go to sys/conf/majors
>   instead of majors.{ws,usb,std,tty,storage} even if the device
>   is a ws/usb/std/tty/storage device.
> 
>   Is this right?

Not quite.  They are new ports which don't have an existing majors so they can 
use a common MI scheme,  Existing ports should still use what they have.  New 
ports can just use the new definitions leaving the port majors mostly empty.


>   If so, How about changing the following comments in sys/conf/majors
>   # 210-219 reserved for MI ws devices
>   # 220-239 reserved for MI usb devices
>   # 240-259 reserved for MI "std" devices
>   # 260-269 reserved for MI tty devices
>   # 310-339 reserved for MI storage devices
>   to
>   # 210-219 reserved for old MI ws devices
>   # 220-239 reserved for old MI usb devices
>   # 240-259 reserved for old MI "std" devices
>   # 260-269 reserved for old MI tty devices
>   # 310-339 reserved for old MI storage devices
>   # NOTE: new MI devices should go to this file instead of above
> 
> P.S.
> I added Matt to the "To:" field, because he made the majors.* files.
> -- 
> soda



Re: NFS writes being corrupted?

2015-08-09 Thread Matt Thomas

 On Aug 9, 2015, at 4:01 PM, Jeff Rizzo r...@tastylime.net wrote:
 
 This would seem to indicate a problem with the particular interface (awge0), 
 perhaps specific to the odroid-c1, as opposed to some l2 cache controller 
 issue, which is kind of where I was leaning before.  However, my banana pi 
 has awge0 as well, but does not exhibit this corruption.

The l2 cache flushing routines are different between the two.

The awge on the a5 may be using the coherent interface to the pl310
and cache flushing may not be even needed.  USB probably doesn’t use
the coherent interface so that might be why it works.



Re: 2*(void *) atomic swap?

2015-07-30 Thread Matt Thomas

 On Jul 30, 2015, at 12:11 PM, Dennis Ferguson dennis.c.fergu...@gmail.com 
 wrote:
 
 I know arm does double word ll/sc, but what else does?
 I don't know of a way to use single register ll/sc to do an
 atomic swap of two pointers.

Not many.  running 32-bit on 64-bit CPUs.

You can do an atomic swap of a pointer to two pointers.
Ya, indirection.

Genericizing sys/compat/netbsd32

2015-07-11 Thread Matt Thomas

sys/compat/netbsd32 is great at running 32-bit NetBSD on a 64-bit kernel.
But with a little tweaking, it could do so much more.

For example, aarch64 will need multiple instances of compat_netbsd32
(one for arm32 eabi, one for arm32 oabi, and possibly one for aarch64
ilp32 unless it can use the arm32 eabi)

This requires being able to change the netbsd32_ that starts every function
to something unique.

Now if we are going that far, with a little more work we can separate out
the netbsd32 specific pieces and have a generic netbsd on netbsd compat
layer.  This could be used on ARM or some MIPS, or even PowerPC to run a
reverse endian userland (big endian user program on little endian kernel
for example).  Or improve the efficiency of running ARM OABI programs on
an EABI kernel (since much of the netbsd32 compatibility isn’t needed and
could be skipped).

I have started some effort towards this and have a set of diffs at
http://www.netbsd.org/~matt/netbsd32-diff.txt showing how syscalls could
be handled.  netbsd32_wait.c it the furthest along and being genericized.

I particularly like the NETBSDX_SYSCALL(foo) and 
NETBSDX_COMPAT_SYSCALL(n, foo) macros simplify things.




Re: drop volatile from __cpu_simplelock_t typedef

2015-06-26 Thread Matt Thomas

 On Jun 26, 2015, at 8:17 AM, Antti Kantee po...@iki.fi wrote:
 
 Such as?  I can only think of the C debugging version of simple_lock ;)
 
 Can't those be fixed by making them call __SIMPLELOCK_LOCKED_P()?  They 
 arguably should have been doing that in the first place anyway.  Or are you 
 worried that we won't be able to catch all of them?

Atomic instruction typically have a lot of overhead so you loop until the 
variable changes and then retry the atomic instruction.

Re: drop volatile from __cpu_simplelock_t typedef

2015-06-26 Thread Matt Thomas

 On Jun 26, 2015, at 6:55 AM, Antti Kantee po...@iki.fi wrote:
 
 __cpu_simplelock_t was born 15+ years ago with the following commit message:
 
 === snip ===
 Let each platform typedef the new __cpu_simple_lock_t, which should
 be the most efficient type used for the atomic operations in the
 simplelock structure, and should also be __volatile.
 === snip ===
 
 So, thinking about fixing lib/49989, I started wondering why volatile is 
 necessary in the simplelock typedefs.  should also be doesn't explain much, 
 and may just be there because that's what the pre-simplelock_t definitions 
 used.  Shouldn't simplelocks always be operated on with atomic instructions 
 and instruction barriers or some non-SMP equivalent thereof?  Assuming so, 
 volatile in the typedef doesn't do anything except probably throw compilers 
 off and therefore we should drop volatile from the typedefs.
 
 RAS might need volatile (not sure yet), but that can probably be pushed 
 inside the RAS sequence instead of exposing it everywhere.
 
 Thoughts?  Seems like the right thing to do irrespective of lib/49989.

__cpu_simpe_lock_unlock concerns me without volatile.  

Also, many have loops that count on the variable changing.
Without volatile those will become infinite loops.

For RISC-V, I used the builtin C11-ish gcc atomics to implement
the __cpu_simple_lock_t operations.  I just moved it to
sys/common_lock.h so other ports could use it.

Re: Interrupt flow in the NetBSD kernel

2015-06-24 Thread Matt Thomas

 On Jun 23, 2015, at 12:17 PM, Reinoud Zandijk rein...@netbsd.org wrote:
 
 Hi Matt,
 
 On Sun, Jun 21, 2015 at 01:42:38PM -0700, Matt Thomas wrote:
 On Jun 21, 2015, at 12:02 PM, Reinoud Zandijk rein...@netbsd.org wrote:
 On Sun, Jun 21, 2015 at 08:01:47AM -0700, Matt Thomas wrote:
 IMO, softints are an abberation and should really be thread priorities
 and dealt by the thread scheduler.
 
 Each level of softint as a kernel thread that gets woken up by condition
 variables?
 
 I envision them being hard realtime kernel threads that would preempt lower
 priority threads.
 
 as in kernel priority threads that get sheduled immediately when they can?
 without preemption?

Yes.  In theory it could be any thread but it they would most likely be 
kernel threads.

 Could in a virtualisation context those threads also be used and be woken
 by signalling the relevant condition variable on reception of say an
 virtio push?
 
 Could be.
 
 But my goal is something intrinsically different.  In the interrupt, you
 signal a condition variable or some other method of making a thread
 runnable.  A run though the scheduler happens and new thread is selected to
 run.
 
 In addition to exceptions and interrupts using a common trapframe,
 cpu_switchto should also need to use a trapframe to store the lwp?s context.
 When restoring a trapframe, switchto will need to know which type of
 trapframe it was.
 
 When the interrupt is about to restore the trapframe, if the scheduler
 decided to switch to another lwp, it will do so using that lwp?s trapframe.  
 
 The goal is to have near instant context switching without the hackery of
 the current preeemption code.
 
 This sounds good, esp. if that means we are leaving the concept of `interrupt
 context' other than the short time before it makes its handling thread
 runnable.

That’s my goal.  Fast context switch would go away since it would no longer be
a special mechanism.  

 Leaves us with inventarising and porting the drivers to use the new mechanism.
 Would maybe also be a good idea to clean up our forrest of primitives as so
 far they are still used.

We keep around too many kernel threads which are kept idle for most of the time.
Instead we should use a work-crew approach and select an idle worker to do a 
task.  When it’s done, it goes looking for some more work.

 How much time do you recon its going to take to do it properly? Could TNF be
 tempted to sponser someone, maybe you even ? :)

Depends on the port.  I’ll probably use MIPS to test the initial work since
gxemul makes it easy to test on.  I’d take money to work on this.



Re: Guidelines for choosing MACHINE MACHINE_ARCH?

2015-06-24 Thread Matt Thomas

 On Jun 24, 2015, at 10:27 AM, Jeff Rizzo r...@tastylime.net wrote:
 
 On 6/24/15 7:13 AM, matthew green wrote:
 David Holland writes:
 
 I think keeping evb* for boards makes sense, though.
 i dunno.
 
 i don't see what it adds.  in particular, evb means evaluation
 board, and there are heaps of things in evb* that are *not*
 evaluation boards, but stuff that might have once been once.
 
 i wish we'd just collapse as much as possible back to plain old
 MACHINE=MACHINE_ARCH=whatever.  i just don't see any value or
 validity in evb.  is ERLITE an evaluation board?  what about
 the RPI or CUBIE* systems?  they come pretty complete AFAICT,
 designed as end-user systems, not what we used to consider as
 being evalation boards.
 
 
 
 I agree that evb* is confusing and increasingly meaningless and would like to 
 see us transition away from it.

I contend that moving to sys/arch/cpu is incorrect which there are multiple 
MACHINE values for that CPU.  sys/tem/mips (haha!) or sys/platform/mips (yuk) 
or sys/arch/cpusys or something better.

Re: Interrupt flow in the NetBSD kernel

2015-06-22 Thread Matt Thomas

 On Jun 21, 2015, at 12:02 PM, Reinoud Zandijk rein...@netbsd.org wrote:
 
 Hi Matt,
 
 On Sun, Jun 21, 2015 at 08:01:47AM -0700, Matt Thomas wrote:
 IMO, softints are an abberation and should really be thread priorities and
 dealt by the thread scheduler.
 
 Each level of softint as a kernel thread that gets woken up by condition
 variables?

I envision them being hard realtime kernel threads that would preempt 
lower priority threads.

 Could in a virtualisation context those threads also be used and be woken by
 signalling the relevant condition variable on reception of say an virtio push?

Could be.

But my goal is something intrinsically different.  In the interrupt, you signal
a condition variable or some other method of making a thread runnable.
A run though the scheduler happens and new thread is selected to run.

In addition to exceptions and interrupts using a common trapframe, 
cpu_switchto should also need to use a trapframe to store the lwp’s
context.  When restoring a trapframe, switchto will need to know which type
of trapframe it was.

When the interrupt is about to restore the trapframe, if the scheduler decided
to switch to another lwp, it will do so using that lwp’s trapframe.  

The goal is to have near instant context switching without the hackery of the
current preeemption code.



Re: Interrupt flow in the NetBSD kernel

2015-06-21 Thread Matt Thomas

 On Jun 21, 2015, at 7:30 AM, Kamil Rytarowski n...@gmx.com wrote:
 
 I have got few questions regarding the interrupt flow in the kernel.
 Please tell whether my understanding is correct.

You are confusing interrupts with exceptions.  Interrupts are 
asynchronous events.  Exceptions are (usually) synchronous and
are the result of an instruction.

 There are software and hardware interrupts.
 Part of the hardware interrupts are maskable with the spl(9) levels.
 Some are unmaskable and must be handled unconditionally, like the
 exception data abort from ARM.

data abort is a synchronous exception, not an interrupt.

 Hardware interrupts are handled by the hardware interrupt handler.
 System calls (syscalls) and softint(9) are software interrupts handled
 by the same software interrupt handler.

syscalls are synchronous exceptions, softint can be either a real
interrupt (like mips or VAX) or emulated in the SPL code (ARM).

 Syscalls come from the userland with the user address space context,

Currently, only syscalls from user mode are handled.

 softint(9) come from the kernel with kernel address space context.

But softint(9) use interrups as a mechanism, they don’t require them.
In fact, I’d like to see that die.

 The spl(9) calls mask maskable interrupts, both software and hardware
 ones - with the exception to the unmaskable ones -- like data abort on ARM.

Again, data abort is an exception, not an interrupt.

 There are three contexts in the kernel:
 - hardware interrupt (within hardware interrupt handler),
 - software interrupt (within software interrupt handler) for syscalls
 and softint(9),
 - thread context for LWP (lightweight processes).
 
 Bottom half (BSD naming) is responsible for the hardware interrupts, top
 half (BSD naming) is responsible for the software and thread contexts.

Bottom half talks to the hardware and processor.  Top deal with requests
from userland.  The pmap is bottom half even though it’s only invoked by
the top half (UVM).

 Process is heavy with user address space oneness running in the
 user-space, thread is lightweight with shared kernel address space for
 all threads. Kernel can access the whole physical memory, but doesn't
 know the user address mapping. There is one process running in the
 kernel address space -- proc0 = swapper.

A process is a collection of threads sharing the same address space.
That address space be a user address space or a kernel address space.

 How physically works the spl(9) interrupt masking for software
 interrupts? On ARM svc (or monitors) aren't maskable, like IRQ
 (exception), a type of (ARM naming) exception and (kernel naming)
 hardware interrupt.

That depends on the implementation and the underlying hardware.

 I'm trying to get the big picture first, before getting to details.
 
 When I look into details, I don't get the things, like the line 268
 here:
 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/arm/arm32/exception.S?annotate=1.17.2.2
 Is it a leftover from line 252 and should be erased?

It’s gone now.

 Back to the big picture. How technically works IPL_SOFT, does it mask
 syscalls and softint(9) the same way? If it's not maskable (to my
 understanding) are we scheduling it in some sort of queue or stack
 waiting for the spl(9) level change?

IMO, softints are an abberation and should really be thread priorities and
dealt by the thread scheduler.


Re: Groff

2015-06-04 Thread Matt Thomas

 On Jun 3, 2015, at 10:05 PM, Aleksej Saushev a...@inbox.ru wrote:
 
 Just in case you don't know, nearly any user has libxml2 and libxslt
 installed anyway.

None of my systems do.


Re: retrocomputing NetBSD style

2015-06-03 Thread Matt Thomas

Ultrix stopped supporting VAX with CVAX.  NetBSD/vax supports later systems 
like the VS4000s that are still quite zippy systems.  A VS4000/96 is a really 
nice box.

This is where retrocomputing NetBSD style shines.  

The nice thing about old hardware is that bad design choice will likely be more 
visible on them than current machines which are orders of magnitude faster.

Re: Removing ARCNET stuffs

2015-05-30 Thread Matt Thomas

 On May 29, 2015, at 1:31 PM, paul_kon...@dell.com wrote:
 
 No, transfering a whole file is a single stream of stuff; reading individual 
 records is a more complex handshake.  And apart from that, things get 
 significantly simpler if you only support Sequential files.  Simpler still if 
 you limit it to just two cases: fixed:512, and stream.

Alas, that means most text files on VMS won’t be transferred since they are 
record based.
Can you get EDT or TPU to emit stream files?



Re: Removing ARCNET stuffs

2015-05-29 Thread Matt Thomas

 On May 28, 2015, at 4:15 PM, Johnny Billquist b...@update.uu.se wrote:
 
 On 2015-05-28 21:19, Tom Ivar Helbekkmo wrote:
 paul_kon...@dell.com writes:
 
 And DECnet nodes exist around the Internet; the “Hobbyist DECnet”
 group (“hecnet”) is the main focus of that activity as far as I know.
 
 ...and while I'm sure Johnny Billquist can supply more details, and
 correct me if I'm wrong, DECnet on NetBSD seems to me to be an active
 component of the hecnet environment.
 
 Nope. NetBSD do not run DECnet. I run a bridge program, which I initially 
 developed on NetBSD, but it runs on pretty much anything.

Your’s don’t. :)  

 I also hack NetBSD/VAX on and off, but it's becoming more and more off with 
 every new development within NetBSD. But that's a different story.

Sigh.

 Oh, and DECnet on Linux is not so great either, and I believe it has been 
 dropped from the main tree.
 But if anyone wants to try and get NetBSD to talk DECnet, Paul and me can 
 certainly help in many ways.

I have a Phase IV+ (so I didn’t have to much with the physical address) 
implementation but never got around to writing the apps.  socket interface is 
identical to DECnet-ULTRIX.  DAP is a beast as is CTERM.  I could run IP 
protocols over, but then I have IP for that. :)

I never committed it because I doubted there was interest.  It’s probably bit 
rotted but I could resurrect it.

In a world a long time ago I was one of the kernel engineers for DECnet for 
ULTRIX and OSF/1 (nee Digital UNIX). It was one reason I could stand the netiso 
code because it was so horrible.

Re: Inter-driver #if dependencies

2015-05-17 Thread Matt Thomas

 On May 17, 2015, at 3:40 PM, Paul Goyette p...@vps1.whooppee.com wrote:
 
 My crusade for modularity has arrived at the pcppi(4) driver, and I've 
 discovered that there are a number of places in the code where a #if is used 
 to determine whether or not some _other_ driver is available to provide 
 certain routines.  For pcppi(4), these dependencies are for the attimer(4) 
 and pckbd(4) drivers.  (While I haven't yet gone searching, I'd be willing to 
 wager that there are other similar examples in other drivers.)
 
 These #if constructs make it very difficult to modularize these drivers.
 
 I'd like to propose the following new kernel mechanism that will allow us to 
 remove these #if dependencies.
 
 1. Extend the struct cfattach to have an additional member, and create
  a new CFATTACH_DECL4_NEW macro to initialize it (and updates to the
  existing CFATTACH_* family to default the value to NULL).
 
   int (*ca_callback)(device_t, int, void *);
 
  (This will require a kernel version bump.)

Ewww.  Gross.

It might be better to use weak symbols and then fix them up later.



Re: change MSI/MSI-X APIs

2015-05-15 Thread Matt Thomas

 On May 14, 2015, at 8:17 AM, Christos Zoulas chris...@zoulas.com wrote:
 
 On May 14,  7:40pm, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote:
 -- Subject: Re: change MSI/MSI-X APIs
 
 Thanks!
 
 Here's a slightly modified version that gets rid of the flags and simplifies
 the more common case.
 
 The following could be an enum:
 
 typedef int pci_intr_type_t;
 
 #define PCI_INTR_TYPE_INTX 0
 #define PCI_INTR_TYPE_MSI  1
 #define PCI_INTR_TYPE_MSIX 2
 #define PCI_INTR_TYPE_MAX  3
 
 int
 pci_intr_alloc(const struct pci_attach_args *, pci_intr_handle_t **ihps,
int *counts, size_t ncounts);
 
 If the counts[i] == 0, then you don't allocate, if it is -1, you allocate max,
 otherwise you allocate up to count.
 
 NULL in the counts means do whatever you think is best.
 (I am passing the ncounts so that the API does not need to be versioned
 if there are more interrupt flavors added)
 
 pci_intr_type_t pci_intr_type(pci_intr_handle_t *ihp);
 
 _attach()
 {
 
 We return a negative errno and a positive nintrs, so the common case where
 the driver does not care:
 
   if  ((nintrs = pci_intr_alloc(*pa, sc-sc_intrs, NULL, 0))  0) {
   error(foo %d\n, -nintrs);
   return;
   }
 
   for (i = 0; i  nintrs; i++)
   sc-sc_ihs[i] = pci_intr_establish(pc, sc-sc_intrs[i], level,
   func, arg);
 
 Otherwise if we want to be selective:
 
   int counts[PCI_INTR_TYPE_MAX];
   memset(counts, 0, sizeof(counts);
 
   // Don't want msi, leave it 0
   counts[PCI_INTR_TYPE_MSIX] = -1;
   counts[PCI_INTR_TYPE_INTX] = -1;
 
   if  ((nintrs = pci_intr_alloc(*pa, sc-sc_intrs, counts,
   PCI_INTR_TYPE_MAX))  0)
   
 
 What do you think?

Don’t you need a pci_intr_free(pc, sc-sc_intrs, nintrs); ?



Re: Guidelines for choosing MACHINE MACHINE_ARCH?

2015-05-01 Thread Matt Thomas

 On May 1, 2015, at 10:53 AM, David Holland dholland-t...@netbsd.org wrote:
 
 On Fri, May 01, 2015 at 07:48:37PM +0200, Joerg Sonnenberger wrote:
 On Fri, May 01, 2015 at 01:58:34PM -0300, Leandro Santi wrote:
 A quick look at build.sh shows that one of the first things that
 needs to be done is to map the MACHINE name to the CPU architecture
 name, i.e. MACHINE_ARCH. I noticed that some ports set
 MACHINE=MACHINE_ARCH, but some others don't. Which leads me to the
 following: what are the guidelines for choosing these names?
 
 If reasonably possible, use MACHINE=MACHINE_ARCH=whatever config.guess
 uses for the platform. A lot of the different MACHINE values are due to
 historical reasons annd wouldn't happen again.
 
 I think keeping evb* for boards makes sense, though.

I agree.  MACHINE=evbavr32 and MACHINE_ARCH=avr32 make the most sense.

Re: Removing if_type switches in if_vlan.c

2015-04-21 Thread Matt Thomas

 On Apr 21, 2015, at 12:47 AM, Ryota Ozaki ozak...@netbsd.org wrote:
 
 Hi,
 
 There are several if_type switches in if_vlan.c,
 which were introduced to support other hardware
 types such as FDDI many years ago. However, no
 implement hadn't come since then.

Doesn’t mean they aren’t correct.  Leave them.

 I think there is no reason to keep them and
 I want to get rid of them to improve code
 readability.
 
 Any objections?

Yes.  The one in vlan_config must stay for
correctness.  The one is vlan_unconfig doesn’t
hurt.  If you remove the other switches, at
least add a KASSERT() to make sure it’s IFT_ETHER.



Re: x86 MD MSI/MSI-X implementation

2015-04-09 Thread Matt Thomas

 On Apr 9, 2015, at 3:44 AM, Kengo NAKAHARA k-nakah...@iij.ad.jp wrote:
 
 Hi,
 
 I implement x86 MD MSI/MSI-X support code. Here is the patch:
   http://www.netbsd.org/~knakahara/md-msi-msix/x86-md-msi-msix.patch
 
 Furthermore, here is the usage example of if_wm:
   http://www.netbsd.org/~knakahara/md-msi-msix/if_wm-msi-msix.patch
 
 I believe this MD implementation help to decide the MI APIs.
 One of MI MSI/MSI-X API is dyoung@n.o's bus_msi(9),
   http://mail-index.netbsd.org/tech-kern/2014/06/06/msg017209.html
 another is matt@n.o's API,
   http://mail-index.netbsd.org/tech-kern/2011/08/05/msg011130.html
 The other is mine, above patch includes my API manual.
 
 I want feedback from various device driver MSI/MSI-X implementations,
 such as usability, portablity, performance, and potential issues.
 So, I would like to commit above patch. If no one opposites, I commit
 above patch after one or two weeks.
 # Of course, I will commit by dividing each component.
 
 Could you comment this implementation?

PCI_MSI_MSIX should be __HAVE_PCI_MSI_MSIX 

void *vih;

That should be pci_msix_handle_t or pci_msi_handle_t unless you want
still use pci_intr_handle_t.

+   mutex_enter(cpu_lock);
+   error = intr_distribute(vih, affinity, NULL);
+   mutex_exit(cpu_lock);


Why the cpu_lock?  shouldn't intr_distribute handle that itself?

This should a pci_intr_distribute and be done before the establish, not after.

Why do you assume you will only allocate one slot per msi/msix call?
Some device will use multiple contiguous slots so you have to alllocate
them as a set.

Re: Removal of compat-FreeBSD

2015-02-13 Thread Matt Thomas

 On Feb 13, 2015, at 2:14 PM, Christos Zoulas chris...@astron.com wrote:
 
 In article 20150213192419.gb5...@britannica.bec.de,
 Joerg Sonnenberger  jo...@britannica.bec.de wrote:
 
 I have asked the same question a long time ago when we pruned a bunch of
 other obsolete emulations. From a security stand point, I fully agree
 with Maxime. The usefulness of the FreeBSD emulation is *very* limited,
 it can't even handle most FreeBSD 4 binaries. I find it highly
 questionable to keep a non-trivial attack surface for the sake of a
 single device driver, which most people likely don't even have. I don't
 see any evidence in the tree of COMPAT_FREEBSD improving or any of the
 users of tw_cli working on improving the situation by removing the need
 for it. As such I find disabling COMPAT_FREEBSD by default a very good
 idea for increasing the visibility of the problem. Maybe someone who
 should be caring actually starts to...
 
 I agree with joerg here. I think that reducing the footprint of
 GENERIC for the benefit of security is the right approach to this
 matter... We have the ALL kernel to test compilation, and the
 approach should be that GENERIC should be appropriate for all
 normal uses and I think COMPAT_FREEBSD belongs in the fringe
 users side (or at least in the limited number of users). I.e.
 If you want to run FreeBSD binaries, you can build your own kernel.

Also, shouldn't the compat_freebsd module be autoloaded if you need it?
If so, not having it in the kernel shouldn't really affect things.


Re: pserialize hw interrupt

2014-12-09 Thread Matt Thomas

 On Dec 9, 2014, at 6:41 PM, Ryota Ozaki ozak...@netbsd.org wrote:
 
 On Wed, Dec 10, 2014 at 11:29 AM, Thor Lancelot Simon t...@panix.com wrote:
 On Wed, Dec 10, 2014 at 11:19:52AM +0900, Ryota Ozaki wrote:
 On Tue, Dec 9, 2014 at 1:17 PM, Thor Lancelot Simon t...@panix.com wrote:
 
 Can you try increasing HZ?
 
 Thank you for the suggestion. Which HZ is good for the purpose?
 1000 was not good for my environment (KVM for now) and I'm trying
 other HZ (500, 200, etc.).
 
 I couldn't get good results on this approach...
 
 It may be poorly suited to virtualized platforms.
 
 Okay, I'll try it on a physical machine.
 
 BTW, could you tell me how increasing HZ affects vioif and softint?

If you have __HAVE_FAST_SOFTINTS it shouldn’t affect it at all.

Re: L2 cache evbarm

2014-12-08 Thread Matt Thomas

 On Dec 8, 2014, at 4:58 PM, Frank Zerangue frank.zeran...@gmail.com wrote:
 
 Can anyone tell me what work would be needed to add L2 cache support to the 
 evbarm port?

L2 cache support is very SoC specific.  Unless you tell us the target, we can't 
help you.
Chances are that the L2 cache is already supported.

Re: Critical section

2014-11-26 Thread Matt Thomas

 On Nov 26, 2014, at 8:04 AM, Masao Uebayashi uebay...@gmail.com wrote:
 
 Critical section must stop soft interrupt which may block  sleep
 (using the preempted lwp).  Thus critical sections must be at least
 IPL_SOFTSERIAL.

That is not true.  If the softint thread sleeps, control is returned back
to the preempted lwp.  


Re: kernel constructor

2014-11-10 Thread Matt Thomas

 On Nov 8, 2014, at 11:16 PM, Masao Uebayashi uebay...@tombi.co.jp wrote:
 
 Ideally the long hardcoded sequence of init functions in init_main:main() is
 converted to a single vector whose order is resolved by modular dependency.
 But for the moment such a hardcoded priority should be good enough to improve
 modularity.
 
 Question - where to put the declarations (typedef, __link_set_decl())?

No more link sets please.
Can’t we use __attributes__((__constructor__))
and __attributes__((destructor));


Re: MI linker script

2014-11-07 Thread Matt Thomas

 On Nov 7, 2014, at 9:10 AM, Masao Uebayashi uebay...@gmail.com wrote:
 
 On Sat, Nov 8, 2014 at 1:47 AM, Matt Thomas m...@3am-software.com wrote:
 linker scripts aren't used when doing -r.
 
 Not used implicitly.  You can explicitly specify one.
 
 (I didn't know this fact one week ago.  I understand your feeling.)

Again, why?  the final linker script is the important one.  Using one with 
doesn't really do anything significant.


Re: Re HDMI transmitter interface

2014-09-23 Thread Matt Thomas

On Sep 23, 2014, at 6:47 AM, Taylor R Campbell riastr...@netbsd.org wrote:

   Date: Tue, 23 Sep 2014 15:41:25 +0200
   From: Martin Husemann mar...@duskware.de
 
   On Tue, Sep 23, 2014 at 01:38:53PM +, Taylor R Campbell wrote:
 If it's GPL, we can make kernel modules.
 
   That is not very practical for display drivers on platforms like Manuel
   is working on.
 
 Why not?  Can't the boot loader load modules?

No, arm is typically only monolithic kernels.  There is no NetBSD bootloader,
just u-boot or similar.


Re: Re HDMI transmitter interface

2014-09-23 Thread Matt Thomas

On Sep 23, 2014, at 11:24 AM, Christoph Egger christoph_eg...@gmx.de wrote:

 Am 23.09.14 um 17:26 schrieb Matt Thomas:
 
 On Sep 23, 2014, at 6:47 AM, Taylor R Campbell riastr...@netbsd.org wrote:
 
 Date: Tue, 23 Sep 2014 15:41:25 +0200
 From: Martin Husemann mar...@duskware.de
 
 On Tue, Sep 23, 2014 at 01:38:53PM +, Taylor R Campbell wrote:
 If it's GPL, we can make kernel modules.
 
 That is not very practical for display drivers on platforms like Manuel
 is working on.
 
 Why not?  Can't the boot loader load modules?
 
 No, arm is typically only monolithic kernels.  There is no NetBSD bootloader,
 just u-boot or similar.
 
 
 AFAIK arm has a device tree in the firmware you use to load modules.

That device tree, if present, is presented only to linux kernels.  And it
will not help you load modules since u-boot will not assist with that.

Re: detect valid fd

2014-09-15 Thread Matt Thomas

On Sep 15, 2014, at 4:59 PM, Patrick Welche pr...@cam.ac.uk wrote:

 On Tue, Sep 16, 2014 at 12:51:24AM +0100, Justin Cormack wrote:
 On Tue, Sep 16, 2014 at 12:20 AM, Patrick Welche pr...@cam.ac.uk wrote:
 Given a filedescriptor, how can you tell that it is valid and has been
 opened?
 
 In the attached simple program, a file and a directory are opened
 (with CLOEXEC set). I then call fcntl(fd, F_GETFD) on the range
 fd = [3..15].  fd = {3,4} correspond to the open file and directory.
 Why don't I get fcntl(4):
 
 [EBADF]fildes is not a valid open file descriptor.
 
 for fd = [5..15], but only for some of them?
 
 $ ./cloexec
 fd  3 testfile.txt flags = 0x1 (0x1)
 fd  4 testdir flags = 0x1 (0x1)
 fd  3's flags = 0x1 (0x1)
 fd  4's flags = 0x1 (0x1)
 fd  5's flags = 0x0 (0x0)
 fd  6's flags = 0x0 (0x0)
 fd  7's flags = 0x0 (0x0)
 fd  8's flags = 0x0 (0x0)
 fd  9's flags = 0x0 (0x0)
 fd 10's flags = 0x0 (0x0)
 cloexec: fcntl 11: Bad file descriptor
 cloexec: fcntl 12: Bad file descriptor
 fd 13's flags = 0x0 (0x0)
 fd 14's flags = 0x0 (0x0)
 cloexec: fcntl 15: Bad file descriptor
 
 I get
 
 fd  3 testfile.txt flags = 0x1 (0x1)
 fd  4 testdir flags = 0x1 (0x1)
 fd  3's flags = 0x1 (0x1)
 fd  4's flags = 0x1 (0x1)
 cloexec: fcntl 5: Bad file descriptor
 cloexec: fcntl 6: Bad file descriptor
 cloexec: fcntl 7: Bad file descriptor
 cloexec: fcntl 8: Bad file descriptor
 cloexec: fcntl 9: Bad file descriptor
 cloexec: fcntl 10: Bad file descriptor
 cloexec: fcntl 11: Bad file descriptor
 cloexec: fcntl 12: Bad file descriptor
 cloexec: fcntl 13: Bad file descriptor
 cloexec: fcntl 14: Bad file descriptor
 cloexec: fcntl 15: Bad file descriptor
 
 Which looks fine, on netbsd6.1.4 and 7-pre, both on amd64.
 
 What NetBSD version are you testing on?
 
 So for both of you, things look correct!
 
 This is on Sunday's NetBSD 7.99.1 amd64, but this is an old problem for
 me...

What does fstat show for your shell or add a pause to the program and fstat it?

fstat -p $$ 




Re: Which should kcpuset use cpuid or cpu index?

2014-09-07 Thread Matt Thomas

On Sep 7, 2014, at 7:28 PM, Kengo NAKAHARA k-nakah...@iij.ad.jp wrote:

 I have a question about kcpuset. Referring man 9 kcpuset, the APIs
 need cpuid (the type is cpuid_t) to identify CPU. However, the callers
 such as idle_loop() sys/kern/kern_idle.c use cpu index (the type is u_int).
 
 cpu index means return value of cpu_index(), and cpu index is sequential
 number. On the other hand, cpuid means struct cpu_info.ci_cpuid and cpuid
 is non-sequential number at least x86 and sparc architectures. So, cpu index
 is different from cpuid. Which should kcpuset use cpuid or cpu index?

ci_cpuid is MD and has no standard semantics.

Use cpu_index(ci) (curcpu()-ci_index)

Re: Which should kcpuset use cpuid or cpu index?

2014-09-07 Thread Matt Thomas

On Sep 7, 2014, at 8:51 PM, Kengo NAKAHARA k-nakah...@iij.ad.jp wrote:

 Hi,
 
 Thank you for answer.
 
 (2014/09/08 12:34), Matt Thomas wrote:
 
 On Sep 7, 2014, at 7:28 PM, Kengo NAKAHARA k-nakah...@iij.ad.jp wrote:
 
 I have a question about kcpuset. Referring man 9 kcpuset, the APIs
 need cpuid (the type is cpuid_t) to identify CPU. However, the callers
 such as idle_loop() sys/kern/kern_idle.c use cpu index (the type is u_int).
 
 cpu index means return value of cpu_index(), and cpu index is sequential
 number. On the other hand, cpuid means struct cpu_info.ci_cpuid and cpuid
 is non-sequential number at least x86 and sparc architectures. So, cpu index
 is different from cpuid. Which should kcpuset use cpuid or cpu index?
 
 ci_cpuid is MD and has no standard semantics.
 
 Use cpu_index(ci) (curcpu()-ci_index)
 
 I see. So, should kcpuset APIs use u_int (not cpuid_t)? As cpu_index()
 which is MI function in sys/sys/cpu.h returns u_int type.

I don't think it matters since it's a u_long.

Re: RFC: IRQ affinity (aka interrupt routing)

2014-08-27 Thread Matt Thomas

On Aug 26, 2014, at 11:16 PM, Kengo NAKAHARA k-nakah...@iij.ad.jp wrote:

 It seems good, except return value. IRQ affinity may fail (e.g. when
 all cpus are set nointr flag), so return value should not be void.

then we should have a kcpuset_interruptable which is kcpuset_running
minus those cpus which have nointr.

we also need a callback to the interrupt subsystem when intr changes
on a cpu.


Re: Marvell 88SE9230 AHCI?

2014-07-28 Thread Matt Thomas

On Jul 27, 2014, at 7:09 AM, Thor Lancelot Simon t...@panix.com wrote:
 
 Same system I have; same symptom I have.  Though I don't have any drive
 that works.  We do correctly probe that a target is present and we get
 the link speed right at 6.0 Gbit/sec (though we seem to also probe another
 spurious target with speed of 1.5Gbit/sec) but IDENTIFY fails.

When I got the ASMedia ASM1061, I had a similar problem. 

ahci_cmd_complete channel 1 CMD 0x4c017 CI 0x0
ahci_cmd_done channel 1
wd1: IDENTIFY failed
wd1: unable to open device, error = 19
ahci_exec_command port 1 CI 0x0
ahci_cmd_start CI 0x0
ahcisata0 port 1 tbl 0xc5ecb000
ahcisata0 port 1 header 0xc5ec4400
ahci_cmd_complete channel 1 CMD 0x4c017 CI 0x0
ahci_cmd_done channel 1

This was solved by adding AHCI_P_IX_PSS to list of enabled interrupts.

I would suggest looking at what the status is when it timesout.


Re: icache sync private rump component

2014-07-20 Thread Matt Thomas

On Jul 19, 2014, at 2:02 AM, Alexander Nasonov al...@yandex.ru wrote:

 To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1
 to Makefile.rump for mips platforms. This is the only change outside
 of sljit scope.

the cache instructions are privileged.  There's a sysarch interface
that you can use the clean the cache.


Re: wm_intr may lead wm_start unexpectedly

2014-06-24 Thread Matt Thomas

On Jun 24, 2014, at 10:43 PM, Ryota Ozaki ozak...@netbsd.org wrote:

 Hi,
 
 I found a strange behavior of if_wm that
 its interrupt handler may call its if_start
 (xmit function) eventually. I don't think
 it's sane. It makes difficult to use mutex for MP.
 
 Here is a call trace:
 wm_intr = wm_linkintr = wm_linkintr_gmii =
 mii_pollstat = makphy_service = mii_phy_update =
 if_link_state_change = in6_if_link_up =
 nd6_dad_start = nd6_dad_ns_output = ... =
 = ether_output = ... = wm_start
 
 The interrupt handler calls mii, mii notifies
 a link state change to inet6, and inet6 tries DAD.
 This IPv6 DAD code (nd6_dad_start and
 nd6_dad_ns_output) is the main issue of my claim.
 nd6_dad_start normally sets up a callout for
 nd6_dad_ns_output, however, it may call
 nd6_dad_ns_output directly at random.

Simple, set IFF_OACTIVE before calling mii


Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)

2014-06-22 Thread Matt Thomas

On Jun 21, 2014, at 9:48 PM, Darren Reed darr...@netbsd.org wrote:

 On 22/06/2014 8:13 AM, Matt Thomas wrote:
 On Jun 21, 2014, at 4:56 AM, Darren Reed darr...@netbsd.org wrote:
 On 21/06/2014 11:00 AM, Matt Thomas wrote:
 On Jun 20, 2014, at 5:57 AM, Ryota Ozaki ozak...@iij.ad.jp wrote:
 
 Hi,
 
 I've prepared a trial patch of MPSAFE networking.
 
 http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff
 
 The kmutex_t in ifqueue, etc. should be pointers and not in the structure 
 themselves.
 That can simply the macros to test for a NULL pointer for the locks in the 
 non-WM case.
 
 Consider using mutex_obj_alloc to get mutexes instead of the embedding 
 them in the
 structures.
 
 This coding pattern goes against what is done almost everywhere else.
 
 What's the rationale behind it?
 
 That's not true.  uvm_object's, sockets, etc. all use external kmutex's.
 In this isntance It allows for the ifqueue to use device kmutex is
 there is one. Also, mutexes should be cacheline aligned and that is
 difficult to do in a structure where mutex_obj_alloc does that 
 autmoatuiccally.
 
 Ah, I didn't know that this type of approach was used elsewhere in
 NetBSD. Are there any compiler hints that can be embedded in structs
 that will result in padding and correct alignment? My solution has
 been to always put the lock(s) at the front of a structure.
 Can  __attribute__((aligned(X)) be used appropriately here or is it
 likely to be ignored in the middle of a structure?

The problem is that the memory allocators won't honor it since they
will return a maximum alignment of ALIGNBYTES+1, where COHERENCY_UNIT
can be much higher than that.

Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)

2014-06-21 Thread Matt Thomas

On Jun 21, 2014, at 4:56 AM, Darren Reed darr...@netbsd.org wrote:

 On 21/06/2014 11:00 AM, Matt Thomas wrote:
 On Jun 20, 2014, at 5:57 AM, Ryota Ozaki ozak...@iij.ad.jp wrote:
 
 Hi,
 
 I've prepared a trial patch of MPSAFE networking.
 
 http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff
 
 The kmutex_t in ifqueue, etc. should be pointers and not in the structure 
 themselves.
 That can simply the macros to test for a NULL pointer for the locks in the 
 non-WM case.
 
 Consider using mutex_obj_alloc to get mutexes instead of the embedding them 
 in the
 structures.
 
 This coding pattern goes against what is done almost everywhere else.
 
 What's the rationale behind it?

That's not true.  uvm_object's, sockets, etc. all use external kmutex's.  In 
this isntance It allows for the ifqueue to use device kmutex is there is one.  
Also, mutexes should be cacheline aligned and that is difficult to do in a 
structure where mutex_obj_alloc does that autmoatuiccally.




Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)

2014-06-21 Thread Matt Thomas

On Jun 21, 2014, at 5:56 AM, Ryota Ozaki ozak...@iij.ad.jp wrote:

 On Sat, Jun 21, 2014 at 10:00 AM, Matt Thomas m...@3am-software.com wrote:
 
 
 On Jun 20, 2014, at 5:57 AM, Ryota Ozaki ozak...@iij.ad.jp wrote:
 
 Hi,
 
 I've prepared a trial patch of MPSAFE networking.
 
 http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff
 
 
 The kmutex_t in ifqueue, etc. should be pointers and not in the structure 
 themselves.
 That can simply the macros to test for a NULL pointer for the locks in the 
 non-WM case.
 
 Consider using mutex_obj_alloc to get mutexes instead of the embedding them 
 in the
 structures.
 
 Well...do you mean that the macros should be like these?
 
 #define WM_LOCK(_sc)   if ((_sc)-sc_txrx_lock) 
 mutex_enter((_sc)-sc_txrx_lock)
 #define WM_UNLOCK(_sc) if ((_sc)-sc_txrx_lock) 
 mutex_exit((_sc)-sc_txrx_lock)

I more thinking of the ifq macros.



Re: RFC: mpsafe bridge and NIC drivers (vioif and wm)

2014-06-20 Thread Matt Thomas

On Jun 20, 2014, at 5:57 AM, Ryota Ozaki ozak...@iij.ad.jp wrote:

 Hi,
 
 I've prepared a trial patch of MPSAFE networking.
 
  http://www.netbsd.org/~ozaki-r/mpsafe-wm.diff
 

The kmutex_t in ifqueue, etc. should be pointers and not in the structure 
themselves.
That can simply the macros to test for a NULL pointer for the locks in the 
non-WM case.

Consider using mutex_obj_alloc to get mutexes instead of the embedding them in 
the
structures.

Some of your macros are missing 'do's :)

 It enables the interrupt handler of if_wm to run without
 KERNEL_LOCK; an interrupt context and a LWP context (e.g.,
 wm_start) run in parallel safely.
 
 You can try it by applying the patch to -current
 and commenting in NET_MPSAFE in sys/net/if.h.
 
 A complete patch of my work can be found at usual places:
 - http://www.netbsd.org/~ozaki-r/mpsafe-bridge-wm-vioif.diff
 - 
 https://github.com/ozaki-r/netbsd-src/tree/experimental/mpsafe-bridge-wm-vioif
 
  ozaki-r



Re: RFC: add MSI/MSI-X support to NetBSD

2014-06-06 Thread Matt Thomas

On Jun 6, 2014, at 10:40 AM, David Young dyo...@pobox.com wrote:

 1 An MI API for establishing mailboxes (or doorbells or whatever
  we may call them).  A mailbox is a special physical address (PA) or
  PA/data-pair in correspondence with a callback (function, argument).
 
  An MI API for mapping the mailbox into various address spaces,
  but especially the message-signalling devices.  In this way, the
  mailbox API is a use or an extension of bus_dma(9).
 
  Somewhere I have a draft proposal for this MI API, I will try to
  dig it up.

Note that some system need to point the MSI at a specific address
(PPC PQIII for instance).


Re: RFC: add MSI/MSI-X support to NetBSD

2014-06-06 Thread Matt Thomas

On Jun 6, 2014, at 10:40 AM, David Young dyo...@pobox.com wrote:

 On Fri, May 30, 2014 at 05:55:25PM +0900, Kengo NAKAHARA wrote:
 Hello,
 
 I'm going to add MSI/MSI-X support to NetBSD. I list tasks about this.
 Would you comment following task list?
 
 I think that MSI/MSI-X logically separates into a few pieces, what do
 you think about these pieces?

Don't forget being able to supply a kcpuset_t *

In fact, now that we have SMP we need to rethink the intr_establish stuff
to better deal with instead of hacking around it.


Re: RFC: add MSI/MSI-X support to NetBSD

2014-06-06 Thread Matt Thomas

On Jun 6, 2014, at 12:06 PM, Taylor R Campbell 
campbell+netbsd-tech-k...@mumble.net wrote:

   Date: Fri, 6 Jun 2014 12:56:53 -0500
   From: David Young dyo...@pobox.com
 
   Here is the proposal that I came up with many months (a few years?) ago
   with input from Matt Thomas.  I have tried to account for Matt's
   requirements, but I'm not sure that I have done so.
 
 For those ignoramuses among us who remain perplexed by the apparent
 difficulty of using a new interrupt delivery mechanism, could you add
 some notes to your proposal about what driver authors would need to
 know about it and when  how one would use it in a driver?
 
 Would all architectures with PCI support bus_msi(9), or would PCI
 device drivers need to conditionally use it?  Why isn't it just a
 matter of modifying pci_intr_map, or calling pci_intr_map_msi like in
 OpenBSD?  Would there be other non-PCI buses with message-signalled
 interrupts too?

Those that support PCIe or PCIX and support MSIs should change.  But
we can continue the legacy INT[A-D].  For performance, MSIs should
be more efficient.

 (Still not having done my homework to study what this MSI business is
 all about, I'll note parenthetically that it seems FreeBSD and OpenBSD
 have supported MSI for a while, and I understand neither why it was so
 easy for them nor what advantage they lack by not having bus_msi(9).)



Re: Making tmpfs reserved memory configurable

2014-06-05 Thread Matt Thomas

On Jun 5, 2014, at 8:47 AM, Martin Husemann mar...@duskware.de wrote:

 On Thu, Jun 05, 2014 at 03:36:37PM +, Eduardo Horvath wrote:
 Have you tested this?
 
 I ran an install on a 8 MB simh VAX, and it worked good enough for that.
 No, I wouldn't call that serious testing.

can you try using freetarg?


Re: uvm objects with physical address constraints

2014-05-20 Thread Matt Thomas

On May 20, 2014, at 1:40 PM, Taylor R Campbell riastr...@netbsd.org wrote:

 DRM/GEM uses uvm_aobj for long-term pageable graphics buffers, but
 when these buffers are assigned physical pages whose addresses can be
 programmed into the GPU's page tables, only certain physical pages are
 allowed -- specifically, Intel GPUs can handle only 32-bit, 36-bit, or
 40-bit physical addresses, depending on the model.  Normally we use
 bus_dmamem_alloc and bus_dmatag_subregion to impose these constraints,
 but bus_dmamem memory is not pageable.
 
 When I wrote the code to hook GEM objects up to uvm_aobj last summer I
 kinda quietly hoped this wouldn't be a problem, but it turns out this
 is a problem in practice.
 
 The attached patch
 
 (a) implements a uvm page allocation strategy UVM_PGA_STRAT_LIMITED
 which lets the caller specify low and high addresses, for which
 uvm_pagealloc defers to uvm_pglistalloc;

Wrong approach.  These should be on dedicated vm freelists instead.
Look at how mips64 has first512m, first4g, etc.  You could have 
first4g, first64g, first1t.  Then you can use UVM_PGA_STRAT_ONLY.

 (b) rearranges locking in uvm_pglistalloc a little so this works;

Dont need this.

 (c) adds a uao_limit_paddr(uao, low, high) to let a uao client specify
 bounds on the allowed physical addresses; and

Choose a freelist.

 (d) uses uao_limit_paddr in i915drmkms.
 
 It doesn't change page allocation in any other case: uao still uses
 the normal page allocation strategy if you don't call uao_limit_paddr,
 and other calls to uvm_pagealloc are not affected.
 
 Comments?  Objections?  Lewd Spenserian sonnets?

see above.


Re: uvm objects with physical address constraints

2014-05-20 Thread Matt Thomas

On May 20, 2014, at 4:20 PM, Taylor R Campbell riastr...@netbsd.org wrote:

   Date: Tue, 20 May 2014 13:54:44 -0700
   From: Matt Thomas m...@3am-software.com
 
   Wrong approach.  These should be on dedicated vm freelists instead.
   Look at how mips64 has first512m, first4g, etc.  You could have 
   first4g, first64g, first1t.  Then you can use UVM_PGA_STRAT_ONLY.
 
 How about the attached patch to add uao_set_pgfl(uao, freelist)
 instead?
 uao_freelist.patch

Looks reasonable.


Re: Patch: cprng_fast performance - please review.

2014-04-18 Thread Matt Thomas

On Apr 18, 2014, at 11:23 AM, Markku-Juhani Olavi Saarinen m...@iki.fi wrote:

 It has been there on all new systems purchased in some last 3 years,
 so I would *guess* that it would be  50% of systems fielded out
 there.

Not everything is x86 based.

Re: nanosleep accuracy

2014-03-26 Thread Matt Thomas

On Mar 26, 2014, at 6:03 PM, David Holland dholland-t...@netbsd.org wrote:

 http://www.dragonflybsd.org/presentations/nanosleep/
 
 Can someone who's familiar with the timecounter code (that is, not me)
 look at this and see if we can steal their fixes?

The problem isn't timecounter, it's hardclock and the callout interface.
Ticks must die :)

callout_nschedule(callout *, u_long secs, u_long nsecs)

or just pass a uint64_t nsecs to callout_schedule.

Then the callout code can call a md routine with the amount of time
until the next callout is scheduled.  

of course hardclock would need to invoked with a second argument to
indicate how much time passed.  This would affect profiling accuracy
but increase battery life.


asymmetric smp

2014-03-26 Thread Matt Thomas

I recently ordered an ODROID-XU Lite to help beat on the my ARM MP code.

However, it has a quirk that I don't think our scheduler will deal with.

It has 4 Cortex-A15 cores @ 1.4Ghz and 4 Cortex-A7 cores @ 1.2Ghz.  Even if the 
frequencies weren't different, the A15 cores at least twice as fast per cycle 
than the A7.  That asymmetry is going to cause havoc with the scheduler.  In 
terms of power, the A7s use a lot less than the A15s so if you can keep the 
work on the A7s and leave the A15s sleeping, you'll extend your battery life a 
lot.

It should also be noted the A15s have different cache structure than the A7s as 
well.

There is no hyperthreading so that headache.

Imagine doing a build, you might want to keep the shells and the like on the A7 
while letting A15s get the compiler/assembler/loader (more compute intensive).




Re: recent sysctl changes

2014-03-07 Thread Matt Thomas

On Mar 7, 2014, at 8:32 AM, Andreas Gustafsson g...@netbsd.org wrote:

 Thor Lancelot Simon wrote:
 An application could, for example, maintain a single, shared,
 malloc'ed buffer that is reused for multiple sysctl() calls and only
 resized on ENOMEM returns.  IMO, this is allowed by the API, but with
 your change, a read of a CTLTYPE_QUAD variable will return the wrong
 result if the buffer happened to be left with a size of 4 by a
 previous read of a CTLTYPE_INT variable.
 
 But such a program would already have been buggy -- you can't read 
 CTLTYPE_QUAD
 into a buffer of size 4.
 
 You can *attempt* to read CTLTYPE_QUAD into a buffer of size 4, 
 get ENOMEM, increase the buffer size, and retry.

That's a specious argument.  If you know the type if QUAD, and you
use less than a buffer size of 8, it's a bug (and you're an idiot).

That's like saying I can supply a bogus address, get EFAULT, and try
with a good address.

pmap_kenter_pa pmap_kremove

2014-02-22 Thread Matt Thomas

I've been wondering...

Should pmap_kenter_pa overwrite an existing entry should it be operating
on an unmapped VA.  I think that if you want to change a mapping, you
should do a pmap_kremove first.




Re: pmap_kenter_pa pmap_kremove

2014-02-22 Thread Matt Thomas

On Feb 22, 2014, at 2:04 PM, Mindaugas Rasiukevicius rm...@netbsd.org wrote:

 Matt Thomas m...@3am-software.com wrote:
 
 I've been wondering...
 
 Should pmap_kenter_pa overwrite an existing entry should it be operating
 on an unmapped VA.
 
 You mean already mapped VA?

I do.

 I think that if you want to change a mapping, you
 should do a pmap_kremove first.
 
 I tend to agree.  I have not seen a need for such re-mapping (overwriting),
 but even if there is, it can be done efficiently by removing, entering and
 then calling pmap_update().  With the deferred update, that would result in
 a single TLB flush/invalidation.
 
 In x86 pmap, there is a printf() for overwriting case:
 
 http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c?r=1.181#1005
 
 Having this converted to an assert might catch something interesting.

My common page code has:

KASSERT(!pte_valid_p(*ptep));

in this instance.

Re: pcb offset into uarea

2014-02-19 Thread Matt Thomas

On Feb 19, 2014, at 8:34 AM, David Holland dholland-t...@netbsd.org wrote:

 On Mon, Feb 17, 2014 at 09:25:49PM +, David Laight wrote:
 I'm adding code to i386 and amd64 to save the ymm registers on process
 switch - allowing userspace to use the AVX instructions.
 [ensuing crap about the u area]
 
 Why put it in the u area at all? It's a legacy concept of little
 continuing value.
 
 Certainly most of the stuff that is in the pcb could be put into the lwp
 structure. Apart form the fp save area it isn't even very big.
 
 Putting the FP save area at the low address of the kernel stack pages
 saves you having to worry about how bit it is.
 (for 'stack grows down' systems).
 
 On the other hand, varying the size of the kernel stack unpredictably
 doesn't sound like such a great plan. I suppose it's not by that much
 though.
 
 I dunno, I just tend to think the u area is an anachronism and we'd be
 better off abolishing it.

For the aarch64 port, the only thing in the PCB is the fpu register set.
Everything else is in mdlwp.  Now the context switch code can ignore
the PCB entirely.  I've been thinking of doing something similar for
other ports i maintain.


Re: [Milkymist port] virtual memory management

2014-02-18 Thread Matt Thomas

On Feb 18, 2014, at 3:54 AM, Yann Sionneau yann.sionn...@gmail.com wrote:

 Le 10/02/14 23:00, Yann Sionneau a écrit :
 Thanks for all your explanations, if everything I said here is correct 
 (which would mean I understood correctly your answer) then I think I'm ready 
 to implement all this :)
 
 Hi,
 
 I have made good progress on the NetBSD port, it is now booting up to 
 enabling interrupts and cpu_initclocks() call, see the boot log [0].
 
 But then I am wondering how I can map the memory mapped registers of the 
 timer0 of Milkymist SoC in order to use it as the main ticking clock.
 
 Basically, I need to map physical address 0xe000.1000 somewhere in kernel 
 virtual memory.
 
 Is there somewhere a function like vaddr_t map_paddr(paddr_t, prot)?
 
 I could indeed walk the kernel page table and insert somewhere in a free PTE 
 (a NULL one) a reference to the 0xe000.1000 physical address, but then how to 
 be sure that the vm subsystem will not allocate this virtual address twice?
 
 Is there an iomapping mechanism?
 
 Thank you for your help :)
 
 [0] -- http://pastebin.com/MYitt9L4

see bus_space(9), specifically bus_space_map.

internally, it allocates some KVA via uvm and uses pmap_kenter_pa to 
map the I/O address via the allocated KVA.



Re: pcb offset into uarea

2014-02-16 Thread Matt Thomas

On Feb 16, 2014, at 1:41 PM, David Laight da...@l8s.co.uk wrote:

 I'm adding code to i386 and amd64 to save the ymm registers on process
 switch - allowing userspace to use the AVX instructions.
 
 I also don't want to have to do it all again when the next set of
 extensions appear.
 This means that the size of the FPU save area (currently embedded in
 the pcb) can't be determined until runtime.
 
 Plan A is to move the FPU save are to the end of the pcb, and then
 locate the pcb at the correct offset in the uarea so that the written
 region ends at the end of the page.
 The problem with this is that the offset of the pcb in the uarea
 is set by MI code based on some #defines - and there seem to be
 several related values.
 
 Now on x86 (like most systems) the cpu stack advances into low memory.
 The pcb is placed at the end of the uarea with the intial stack pointer
 just below it.
 I suspect that a long time ago (when the uarea had a fixed KVA) an
 additional memory page was placed below the uarea to give interrupts
 more stack space. I don't think this happens any more.
 
 As an aside: The uarea used to be pageable, whereas (what is now) the
 lwp structure isn't. Paging of uarea's was disabled a few years back
 - so there is no real difference between the lifetimes of an lwp a uarea.
 (zombies probably lose the uarea before the lwp).
 
 An alternative would be to place the FP save area at the start of the uarea.
 This would mean that, on stack overflow, the FP save area would be trashed
 before some random piece of memory.
 It might even be worth putting the pcb at the start of the uarea - so that
 stack overflow crashes out the failing process, and probably earlier
 than the random corruption would.

For most ports, the pcb is at the start of the uarea.

 This gives me three options:
 A) Put the save area at the end of the pcb and dynamically adjust the pcb
   offset.
 B) Put the save area at the start of the uarea, with the pcb at a fixed
   offset at the end of the uarea.
 C) Put the save area at the end of the pcb, and put the pcb at the start
   of the uarea.
 
 Votes?
 What have I missed?

Keep a default mmx/sse save area in the pcb along with a pointer to it.
If a variant is used that needs a larger save area, dynamically allocate
it and save it in the pcb pointer.

Since it's unlikely most processes will be AVX why waste the space?



Re: [Milkymist port] virtual memory management

2014-02-10 Thread Matt Thomas

On Feb 10, 2014, at 9:10 AM, Eduardo Horvath e...@netbsd.org wrote:

 On Sun, 9 Feb 2014, Yann Sionneau wrote:
 
 Thank you for your answer Matt,
 
 Le 09/02/14 19:49, Matt Thomas a écrit :
 On Feb 9, 2014, at 10:07 AM, Yann Sionneau yann.sionn...@gmail.com wrote:
 
 
 Since the kernel runs with MMU on, using virtual addresses, it cannot
 dereference physical pointers then it cannot add/modify/remove PTEs,
 right?
 Wrong.  See above.
 You mean that the TLB contains entries which map a physical address to 
 itself?
 like 0xabcd. is mapped to 0xabcd.? Or you mean all RAM is always
 mapped but to the (0xa000.000+physical_pframe) kind of virtual address you
 mention later in your reply?
 
 What I did for BookE is reserve the low half of the kernel address space 
 for VA=PA mappings.  The kernel resides in the high half of the address 
 space.  I did this because the existing PPC port did strange things with 
 BAT registers to access physical memory and copyin/copyout operations and 
 I couldn't come up with a better way to do something compatible with the 
 BookE MMU.  It did limit the machine to 2GB RAM, which wasn't a problem 
 for the 405GP.

For the MPC85xx, I did something similar but I used fixed TLB1 entries
to map the physical ram 1:1 with VA=PA mappings.

 Also, the user address space is not shared with the kernel address space 
 as on most machines.  Instead, user processes get access to their own 4GB 
 address space, and the kernel has 2GB to play with when you deduct the 2GB 
 VA==PA region.  (It's the same sort of thing I did for sparc64 way back 
 when it was running 32-bit userland.  But it doesn't need VA==PA mappings 
 and can access physical and userland addresses while the kernel address 
 space is active.  Much nicer design.)

BookE could have avoided the VA==PA mappings but it simplifies a lot of things.

 When a BookE machine takes an MMU miss fault, the fault handler examines 
 the faulting address if the high bit is zero, it synthesizes a TLB entry 
 where the physical address is the same as the virtual address.  If the 
 high bit is set, it walks the page tables to find the TLB entry.

Not true for the MPC85xx BookE.  Then a trap happens, the PSL[PR] bit gets
reset along with the PSL[DS] and PSL[IS] bits.  This forces the MMU to only
match the match TLB entries with a MAS1[TS] == 0 (e.g. kernel TLB entries).

 This did make the copyin/copyout operations a bit complicated since it 
 requires flipping the MMU between two contexts while doing the copy 
 operation.

Well, for the MPC85xx, I only set the PSL[DS] bit so I can access the
user address space, load/store stuff to/from register, and then reset it back.
It's not as fast as I'd like but it works.

 Also, is it possible to make sure that everything (in kernel space) is
 mapped so that virtual_addr = physical_addr - RAM_START_ADDR +
 virtual_offset
 In my case RAM_START_ADDR is 0x4000 and I am trying to use
 virtual_offset of 0xc000 (everything in my kernel ELF binary is mapped
 at virtual address starting at 0xc000)
 If I can ensure that this formula is always correct I can then use a very
 simple macro to translate statically a physical address to a virtual
 address.
 Not knowing how much ram you have, I can only speak in generalities.
 I have 128 MB of RAM.
 But in general you reserve a part of the address space for direct mapped
 memory and then place the kernel about that.
 
 For instance, you might have 512MB of RAM which you map at 0xa000.
 and then have the kernel's mapped va space start at 0xc000..
 So if I understand correctly, the first page of physical ram (0x4000.) is
 mapped at virtual address 0xa000. *and* at 0xc000. ?
 Isn't it a problem that a physical address is mapped twice in the same 
 process
 (here the kernel)?
 My caches are VIPT, couldn't it generate cache aliases issues?
 
 If the MMU is always on while the kernel is running, and covers all of the 
 KVA, then you could relocate the kernel text and data segments wherever 
 you want them to be.  If you want to put the kernel text and data segments 
 in the direct-mapped range, you can easily do that.  If you want it 
 elsewhere, that should work too.  

In fast, I'd recommend doing that.  Leaving the mapped KVA space for those
things that need to dynamically mapped.

 The cache aliasing issues in VIPT caches only occur if the cache way size 
 is larger than the page size.  If you're designing your own hardware, 
 don't do that.  Otherwise, remember to only access a page through a single 
 mapping and you won't have aliasing issues.  And flush the page from the 
 cache wenever establishing a new mapping.

I agree.  You can play other games with VIPT but it becomes complex quickly.

Re: [Milkymist port] virtual memory management

2014-02-10 Thread Matt Thomas

On Feb 10, 2014, at 2:00 PM, Yann Sionneau yann.sionn...@gmail.com wrote:

 So if I understand correctly I could implement the following scheme:
 
 Let my linker put the kernel ELF virtual addresses to 0xc000.. Load the 
 kernel at base of RAM (0x4000.)
 Then reserve this memory region as a window over physical ram : 
 0xc000.-0xc800. (ram size is 128 MB) by doing something like 
 physseg[0].avail_start = 0xc800.; in pmap_bootstrap()
 
 Then in my tlb miss handlers I could do:
 
 if (fault happened in kernel mode) /* we don't want user space to access all 
 ram through the kernel window */
 {
if ( (miss_vaddr   0xc800)  (miss_vaddr = 0xc000) ) /* = this 
 would be kind of like your test of the high bit of the faulty vaddr */
{
reload_tlb_with(atop(miss_vaddr), atop(miss_vaddr - 0xc000 + 
 0x4000) | some_flags); /* = create the mapping for accessing the window 
 */

atop(miss_vaddr ^ 0x800) :)

return_from_exception;
}
 } else {
- access the page table to reload tlb
- page table contains only physical addresses
- but I can dereference those using the 0xc000.-0xc800. window 
 knowing that a nested tlb miss can happen
 }

I'd leave the page_table using virtual addresses but since they will all
be direct mapped, simply convert to them to physical before using them.

That way, when the kernel is running with the TLB enabled (which is most
of the time), it can traverse the page tables normally.

 Does this sound reasonable ?
 
 
 The cache aliasing issues in VIPT caches only occur if the cache way size
 is larger than the page size.  If you're designing your own hardware,
 don't do that.  Otherwise, remember to only access a page through a single
 mapping and you won't have aliasing issues.  And flush the page from the
 cache wenever establishing a new mapping.
 Well, lm32 caches are configurable but for the Milkymist SoC they are not 
 configured too big such that there is no alias problem.
 In order to handle all cases of lm32 cache sizes I guess I need to add a 
 macro that the machine headers will define if there are cache alias issues 
 possible.
 But then if I am using the 0xc000.-0xc800. window during my tlb miss 
 handler I guess I will have no choice but to invalidate the cache because any 
 fault while reading this window will then add a tlb entry to this window 
 which would possibly cause a physical page to be mapped twice and then could 
 cause alias issues (in the scenario where caches are too big).

Hopefully, if they make the caches larger they increase the number of ways.
I wouldn't add code to flush.  Just add a panic if you detect you can have
aliases and deal with it if it ever happens.



Re: [Milkymist port] virtual memory management

2014-02-09 Thread Matt Thomas

On Feb 9, 2014, at 10:07 AM, Yann Sionneau yann.sionn...@gmail.com wrote:

 This seems like the easiest thing to do (because I won't have to think about 
 recursive faults) but then if I put physical addresses in my 1st level page 
 table, how does the kernel manage the page table entries?

BookE always has the MMU on and contains fixed TLB entries to make sure
all of physical ram is always mapped.

 Since the kernel runs with MMU on, using virtual addresses, it cannot 
 dereference physical pointers then it cannot add/modify/remove PTEs, right?

Wrong.  See above.  Note that on BookE, PTEs are purely a software 
construction and the H/W never reads them directly.

 I'm sure there is some kernel internal mechanism that I don't know about 
 which could help me getting the virtual address from the physical one, do you 
 know which mechanism it would be?

Look at __HAVE_MM_MD_DIRECT_MAPPED_PHYS and/or PMAP_{MAP,UNMAP}_POOLPAGE.


 Also, is it possible to make sure that everything (in kernel space) is mapped 
 so that virtual_addr = physical_addr - RAM_START_ADDR + virtual_offset
 In my case RAM_START_ADDR is 0x4000 and I am trying to use virtual_offset 
 of 0xc000 (everything in my kernel ELF binary is mapped at virtual 
 address starting at 0xc000)
 If I can ensure that this formula is always correct I can then use a very 
 simple macro to translate statically a physical address to a virtual 
 address.

Not knowing how much ram you have, I can only speak in generalities. 
But in general you reserve a part of the address space for direct mapped
memory and then place the kernel about that.

For instance, you might have 512MB of RAM which you map at 0xa000.
and then have the kernel's mapped va space start at 0xc000..

Then conversion to from PA to VA is just adding a constant while getting
the PA from a direct mapped VA is just subtraction.

 Then I have another question, who is supposed to build the kernel's page 
 table? pmap_bootstrap()?

Some part of MD code.  pmap_bootstrap() could be that.

 If so, then how do I allocate pages for that purpose? using 
 pmap_pte_pagealloc() and pmap_segtab_init() ?

usually you use pmap_steal_memory to do that.
But for mpc85xx I just allocate the kernel initial segmap in the .bss.
But the page tables were from allocated using uvm can do prebootstrap
allocations.

 
 FYI I am using those files for my pmap:
 
 uvm/pmap/pmap.c
 uvm/pmap/pmap_segtab.c
 uvm/pmap/pmap_tlb.c
 
 I am taking inspiration from the PPC Book-E (mpc85xx) code.



Re: [PATCH] netbsd32 swapctl, round 3

2014-02-01 Thread Matt Thomas

On Feb 1, 2014, at 12:41 AM, Emmanuel Dreyfus m...@netbsd.org wrote:

 + int count = SCARG(uap, misc);
 + int i, error;
 +
 + sep = kmem_alloc(sizeof(*sep) * count, KM_SLEEP);
 + sep32 = kmem_alloc(sizeof(*sep32) * count, KM_SLEEP);

Before using count, one must limit it using:

if ((size_t)count  (size_t)uvmexp.nswapdev)
misc = uvmexp.nswapdev;

or a user could exhaust all memory by supplying bogus counts.

You only need one sep32 and then copyout each entry:

for (i = 0, error = 0; i  count  error == 0; i++) {
struct netbsd32_swapent sep32;
sep32.se_dev = sep[i].se_dev;
sep32.se_flags = sep[i].se_flags;
sep32.se_nblks = sep[i].se_nblks;
sep32.se_inuse = sep[i].se_inuse;
sep32.se_priority = sep[i].se_priority;
size_t len = strlcpy(sep32.se_path, sep[i].se_path,
sizeof(sep32.se_path));

error = copyout(sep32, SCARG(uap, arg + i),
offsetof(sep32.sep_path) + len + 1);
}




Re: [PATCH] netbsd32 swapctl, round 3

2014-02-01 Thread Matt Thomas

On Feb 1, 2014, at 4:49 PM, Emmanuel Dreyfus m...@netbsd.org wrote:

 Matt Thomas m...@3am-software.com wrote:
 
 You only need one sep32 and then copyout each entry:
 
 Isn't there a performance impact to call copyout several times instead
 of one?

Compared to kmem_alloc/kmem_free?  Notice we are only copying out
what we need, not the whole path name.

So we copy a lot less.


RFC: stop having a single global page size

2014-01-31 Thread Matt Thomas

Instead of the system sharing a common page size, the page size would be
dependent on what the pmap for that address range wants.  Note that different
processes (vmspaces) could have different page sizes.  The kernel could have
a different page than user processes.  

NBPG/PAGE_SIZE as globals would go away.  

Sharing of pages would become more difficult since sharing pages of different
sizes would be difficult.

Why would anyone want this?  Say you have a system in which the MMU can have
per translation table page sizes.  a 16KB page size might be desirable for the
kernel and for LP64 processes.  If you are running a ILP32 process, possibly
of an older architecture, you might want to use a 4KB page size.

Just a thought for pondering...

Re: RFC: stop having a single global page size

2014-01-31 Thread Matt Thomas

On Jan 31, 2014, at 11:46 AM, Martin Husemann mar...@duskware.de wrote:

 On Fri, Jan 31, 2014 at 03:03:21PM +, Justin Cormack wrote:
 Linux maps the kernel with 4MB pages to save TLB entries too, I believe.
 
 Yes, we do that too, but I guess wired kernel memory does not really
 count for Matt's proposal.

Correct.  The above is of a more dynamic nature.

 Sparc64 actually uses 4M, 64kb and 8kb pages for various wired kernel
 mappings.

I actually proposed a size to pmap_kenter so MD code could use larger
pages for wired kernel pages.


Re: [patch] put ptrdiff_t in the kernel and create sys/stddef.h

2013-12-04 Thread Matt Thomas

On Dec 4, 2013, at 1:33 PM, Alan Barrett a...@cequrux.com wrote:

 On Wed, 04 Dec 2013, David Holland wrote:
 (*) A complete scheme for doing it right removes all the _BSD_FOO_T_
 drivel and ifdefs scattered in userland headers in favor of:
  - a single header file that defines all the needed types prefixed
with __, which can be included anywhere;
  - in userland, include-guarded header files akin to sys/null.h
that define single or common groups of the names without the
__ prefixes, e.g. types/size_t.h;
  - including these header files in the proper places, such as in
standard userland header files like stddef.h;
  - in the kernel, a single header file that defines all the types
without the __, that is or is exposed to sys/types.h but does
not affect userland.
 
 Yes, that's one way of doing it right.
 
 Until such time as somebody does it right, please follow the pattern of 
 what's done already.

which is what my suggested patch does.

Re: ptrdiff_t in the kernel

2013-12-03 Thread Matt Thomas

On Dec 3, 2013, at 7:25 PM, Lourival Vieira Neto lourival.n...@gmail.com 
wrote:

 Hi Matt,
 
 Is there a reason to do not have ptrdiff_t defined in the kernel?
 Shouldn't be OK to define it in sys/cdefs.h? Or even for having
 stddef.h itself in the kernel?
 
 It is defined in the kernel and comes from machine/ansi.h via
 sys/types.h.
 
 Actually, it isn't. Only _BSD_PTRDIFF_T_ is defined by machine/ansi.h.
 The ptrdiff_t type is defined only in stddef.h.

That surprises me.  Easy enough to add.  

http://www.netbsd.org/~matt/ptrdiff-diff.txt

 No, stddef.h is not allowed in the kernel.  Symbols from it are
 provided via other means.
 
 I know. In fact, I'm asking if it would be alright to allow that.
 AFAIK, it would be inoffensive if available in the kernel.

Actually, it would be offensive.  

Re: ptrdiff_t in the kernel

2013-12-02 Thread Matt Thomas

On Dec 2, 2013, at 5:58 PM, Lourival Vieira Neto lourival.n...@gmail.com 
wrote:

 Hi Folks,
 
 Is there a reason to do not have ptrdiff_t defined in the kernel?
 Shouldn't be OK to define it in sys/cdefs.h? Or even for having
 stddef.h itself in the kernel?

It is defined in the kernel and comes from machine/ansi.h via
sys/types.h.

No, stddef.h is not allowed in the kernel.  Symbols from it are
provided via other means.

Re: bus_dmamap_destroy no longer callable from interrupt context?

2013-11-15 Thread Matt Thomas

On Nov 15, 2013, at 6:22 AM, Christoph Badura b...@bsd.de wrote:

 While trying to port BCM586x support I discovered that I get the following
 panic under -current.  The same code works fine on -6.  What gives?
 
 panic: kernel diagnostic assertion ((!cpu_intr_p()  !cpu_softintr_p()) || 
 (pc-pc_pool.pr_ipl != IPL_NONE || cold || panicstr != NULL) failed: file 
 ../../../../kern/subr_pool.c, line 2209 pool 'vmmpepl' is IPL_NONE, but 
 called from interrupt context
 
 backtrace:
 ...
 kern_assert
 pool_cache_get_paddr
 _uvm_mapent_alloc.clone.2
 uvm_map_dup_start
 uvm_unmap_remove
 uvm_unmap1
 _bus_dmamap_destroy_clone.8
 ubsec_callback+0x9d
 ubsec_intr+0x100
 intr_biglock_wrapper
 ...
 
 This is generic ubsec code that is used by our currently supported devices.
 I.e. ubsec(4) should be completely busted.

it's intentional.  dmamap create/destroy can't be done from interrupt because 
they allocate memory.  besides mbufs, memory can't be allocated.  I don't agree 
with the softintr restriction (where else can drivers allocate).

Re: bus_dmamap_destroy no longer callable from interrupt context?

2013-11-15 Thread Matt Thomas

On Nov 15, 2013, at 10:56 AM, Lars Heidieker l...@heidieker.de wrote:

 Matt, you mean allocating and freeing memory from softint context should
 be ok? That's something that went through my mind as well and I think
 it's the right way.

I do.  softint routines can wait for mutexes so allocation in them should
be safe.  Adding the complexity of needing a real thread context to do
safe allocations seems more prone to problems.


Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]

2013-11-12 Thread Matt Thomas

On Nov 12, 2013, at 9:33 AM, Dennis Ferguson dennis.c.fergu...@gmail.com 
wrote:

 
 On 11 Nov, 2013, at 15:31 , Justin Cormack jus...@specialbusservice.com 
 wrote:
 On Mon, Nov 11, 2013 at 10:56 PM, Michael van Elst mlel...@serpens.de 
 wrote:
 m...@3am-software.com (Matt Thomas) writes:
 
 Exactly.  with hf, floating point values are passed in floating point
 registers.  That can not be hidden via a library (this works on x86
 since the stack has all the arguments).
 
 It could be hidden by emulating the floating point hardware.
 
 Thats not sane. The slowdown would be enormous. You are emulating
 registers as well as operations.
 
 I'm not positive, but isn't this how the original ARM ABI works?
 I thought the reason they replaced this with the earm ABI is that
 almost no CPUs of that vintage had floating point units and with
 eabi the soft float binaries don't have to pay the emulated
 instruction cost for function calls.  And I thought the reason we
 got earmhf is that most modern processors now have floating point
 units (though not the same instruction set that the original ABI
 assumed) but the instructions for copying values between the floating
 point and integer registers, which get used a lot if you compile
 hardware floating point with the earm ABI, are abysmally slow (and
 there aren't a whole lot of integer registers anyway, so the integer
 argument registers get filled up fast if you are passing doubles).

The original arm abi had optional support FPA but netbsd never used
it even though we had a FPA emulator.  That bitrotted into uselessness
since everyone just used softfloat. 



Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]

2013-11-12 Thread Matt Thomas

On Nov 12, 2013, at 9:33 AM, Dennis Ferguson dennis.c.fergu...@gmail.com 
wrote:

 - Some attention should be given to figuring out what runs on what.  Even
  if I've compiled the base system for my BeagleBone for earmv7hf myself,
  it would be nice to still be able to install pkgsrc binaries built for
  the RPi if that's what is available (though installing pkgsrc binaries
  built for armv7 on an RPi might not be a good idea).

I’ve been thinking about supporting a “list” of compatible architectures for 
pkg_add. 

earmv7hf:earmv6hf:earmhf




Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]

2013-11-12 Thread Matt Thomas

On Nov 11, 2013, at 10:08 PM, Michael van Elst mlel...@serpens.de wrote:

 The slowdown is already enormous due to lack of floating point
 hardware. That's why emulating the FP hardware is a very common
 way to handle this situation, just look at the other platforms.

The exception handling is much costlier than doing a softfloat call.
It’s also adding kernel bloat.



Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]

2013-11-12 Thread Matt Thomas

On Nov 12, 2013, at 11:47 AM, Michael van Elst mlel...@serpens.de wrote:

 m...@3am-software.com (Matt Thomas) writes:
 
 
 On Nov 11, 2013, at 10:08 PM, Michael van Elst mlel...@serpens.de wrote:
 
 The slowdown is already enormous due to lack of floating point
 hardware. That's why emulating the FP hardware is a very common
 way to handle this situation, just look at the other platforms.
 
 The exception handling is much costlier than doing a softfloat call.
 
 You missed the second paragraph.

No I didn’t.

 It’s also adding kernel bloat.
 
 Indeed, a little bit of kernel bloat compared to a dozen userlands
 and a dozen package repositories that require building and testing.

There are a lot of floating point and load/store instruction variants on ARM.  
It’s not “a little bit” of code, it’s a lot.  I have 

from http://mail-index.netbsd.org/port-powerpc/2012/09/26/msg003275.html

 On a P2020, a build.sh distribution for evbppc took 6.2% less time on a 
 softfloat userland .vs. hardfloat userland with kernel-emulation.
 
 10h55m32s (soft) vs. 11h38m50s (hard)

Doing a release build is not a floating point intensive workload yet it 
incurred a significant amount of overhead.  The platforms without FP are going 
to be the slowest so emulating FP would make then even slower.

When it comes to building a system I am willing to incur a slower build or 
consume more resources in order to get a faster system.  Spend the time upfront 
to get things to as fast as possible.  My builds might be slower or needs more 
space for packages, but the resultant system will run faster.  That seems to be 
a tradeoff worth making.

Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]

2013-11-11 Thread Matt Thomas

On Nov 11, 2013, at 8:33 PM, Warner Losh i...@bsdimp.com wrote:

 Is there a complete write up of the conventions here?

Conventions?

earm{v[4567],}{hf,}{eb}  except earmv4hf isn’t valid.

Due to recent GCC changes, the earmv6* and earmv7* not only will have 
instructions that execute on pre-armv6 CPUs they will do unaligned accesses 
which will handled by the CPU transparently.  These unaligned accesses are not 
supported by pre-armv6 CPUs.

That’s yet another ABI permutation.  The kernel could fix them up but at 
significant cost. 



Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]

2013-11-10 Thread Matt Thomas

On Nov 10, 2013, at 12:57 PM, Justin Cormack jus...@specialbusservice.com 
wrote:

 On Sun, Nov 10, 2013 at 7:38 PM, Alistair Crooks a...@pkgsrc.org wrote:
 On Sun, Nov 10, 2013 at 04:56:04AM +, Jun Ebihara wrote:
 Module Name:  pkgsrc
 Committed By: jun
 Date: Sun Nov 10 04:56:04 UTC 2013
 
 Modified Files:
 pkgsrc/misc/raspberrypi-userland: Makefile
 
 Log Message:
 support earmhf.
 ONLY_FOR_PLATFORM=  NetBSD-*-*arm*
 oked by jmcneill.
 
 Thanks for doing this, Jun-san.
 
 But in the big picture, having hf and sf versions of a platform's
 userland, in the year 2013, is, well, sub-optimal.  I don't think the
 ramifications of the change were considered in enough detail, and we
 need to discuss it, before we have to start growing new architectures
 in pkgsrc for this and that.
 
 Can't we lean on what was done for i386/i387 twenty years ago, and
 use a userland library to decide whether to use softfloat in the
 absence of hardware?
 
 So let's discuss...
 
 armhf is not just about whether there is or is not hardfloat, it is
 also a different ABI. Its more like mips o32 vs n32 in that it is an
 ABI change that requires some hardware requirements too.

Exactly.  with hf, floating point values are passed in floating point
registers.  That can not be hidden via a library (this works on x86
since the stack has all the arguments).  

Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]

2013-11-10 Thread Matt Thomas

On Nov 10, 2013, at 1:39 PM, Alistair Crooks a...@pkgsrc.org wrote:

 On Sun, Nov 10, 2013 at 01:20:41PM -0800, Matt Thomas wrote:
 Exactly.  with hf, floating point values are passed in floating point
 registers.  That can not be hidden via a library (this works on x86
 since the stack has all the arguments).  
 
 Thanks, I understand.  But...  there has to be a different way of
 doing this that does not require such wholesale changes, especially
 when they were made without discussion.
 
 + use virtual registers which get mapped onto the real thing, either
 through compilation or JIT

Doesn’t help since there are also FP instructions.

 + optimise for one passing scheme, and translate the other dynamically

We already have a libc_vfp.so for earm which will use real FP
instructions to do the softfloat ops.

 + have both sets of passing conventions in a fat binary, and select
 accordingly

ELF doesn’t really support fat binaries.  

 I'm sure there are way more than I've outlined above, and that others
 have much better ideas than I have.
 
 At the moment, this has been optimised for the kernel architecture,
 with the userlevel changes assumed to be collateral damage.  Since the
 users are what matters, that needs to be changed.

I strongly disagree with that.  I specifically choose use different machine
arches so that the hard/soft float binary packages would be separate.  
From using soft/hard float userlands on PPC, I already knew that mixing
them was wrong.  

 How do you propose to fix this (interim) mess for pkgsrc?  This is a
 real issue for us, and you should send your proposal to
 tech-...@netbsd.org.

Is it just the multiplicity of arm packages or something else?



Re: hf/sf [Was Re: CVS commit: pkgsrc/misc/raspberrypi-userland]

2013-11-10 Thread Matt Thomas

On Nov 10, 2013, at 2:24 PM, Justin Cormack jus...@specialbusservice.com 
wrote:

 On Sun, Nov 10, 2013 at 9:48 PM, Matt Thomas m...@3am-software.com wrote:
 I strongly disagree with that.  I specifically choose use different machine
 arches so that the hard/soft float binary packages would be separate.
 From using soft/hard float userlands on PPC, I already knew that mixing
 them was wrong.
 
 Whats so wrong with it?

that i never considered the userlevel problems when I made the choice
for a separate MACHINE_ARCH.  I made the decision precisely for 
userlevel concerns.  The kernel doesn’t care at all.

Re: pulse-per-second API status

2013-11-01 Thread Matt Thomas

On Nov 1, 2013, at 12:53 PM, Mouse mo...@rodents-montreal.org wrote:

 Also, see below - 1ms strikes me as pretty bad for PPS.
 Sure, but it's vastly better than nmea timecode, which is what the
 other choice is.
 
 Oh, sure; as I said in other words upthread, I'm not arguing that it's
 not a good choice for you - I'm arguing that it's not a good choice for
 the main tree.

I'm thinking of a middle ground.  Maybe only make it enabled
if you enabled a sysctl variable.

sysctl -w hw.ucom0.enable_pps=1

That way you have to explicit enable and have it documented
in the ucom man page about its problems.


Re: MACHINE_ARCH on NetBSD/evbearmv6hf-el current

2013-10-26 Thread Matt Thomas

On Oct 26, 2013, at 5:45 AM, Izumi Tsutsui tsut...@ceres.dti.ne.jp wrote:

 By static MACHINE_ARCH, or dynamic sysctl(3)?
 If dynamic sysctl(3) is prefered, which node?

hw.machine_arch

which has been defined for a long long time.


Re: MACHINE_ARCH on NetBSD/evbearmv6hf-el current

2013-10-26 Thread Matt Thomas

On Oct 26, 2013, at 10:54 AM, Izumi Tsutsui tsut...@ceres.dti.ne.jp wrote:

 By static MACHINE_ARCH, or dynamic sysctl(3)?
 If dynamic sysctl(3) is prefered, which node?
 
 hw.machine_arch
 
 which has been defined for a long long time.
 
 Yes, defined before sf vs hf issue arised, and
 you have changed the definition (i.e. make it dynamic)
 without public discussion.  That's the problem.

It was already dynamic (it changes for compat_netbsd32).



Re: Why do we need lua in-tree again? Yet another call for actual evidence, please. (was Re: Moving Lua source codes)

2013-10-19 Thread Matt Thomas

On Oct 19, 2013, at 12:26 AM, Marc Balmer m...@msys.ch wrote:

 Am 19.10.13 09:03, schrieb Alan Barrett:
 On Sat, 19 Oct 2013, Marc Balmer wrote:
 The inclusion and use of Lua in base, for use in userland and the
 kernel, [...] has, last but not least, core's blessing.
 
 Would you please either present some evidence for that claim, or stop
 making the claim.
 
 I am not making a claim.  And what is this, a trial, that you ask me to
 present evidence?  You were not a core team member at the time, so I
 really can't blame you that you don't remember it.  But I blame you for
 making this up as if it was sweeping kernel change or so.  It's a tiny
 device driver that uses source code that is already in the tree since
 about three years.  I will eventually dig out the email exchange, but
 that will have to wait, I am at a trade show right now.

Well, I've been on core a lot longer (over a decade now) and I don't 
remember approving in-kernel Lua either.  I checked my mail archives.
The relevant mail is from around October 24th, 2010.
  
The only kernel references are for things like exec_script support
and to make sure userland Lua does not conflict with the kernel Lua 
from his [Lourival Neto] GSoC project.  That strongly implies that you
were only asking for userland lua support and that's what core granted
permission for.

Looking through past mail, it saddens me to note that the bozohttpd 
changes took nearly 4 years to get into the tree.  





Re: storage-class memory (was: Re: state of XIP?)

2013-10-18 Thread Matt Thomas

On Oct 17, 2013, at 10:41 PM, David Holland dholland-t...@netbsd.org wrote:

 If the XIP code is not mergeable, what's entailed in doing a different
 implementation that would be? Also, is the getpages/putpages interface
 expressive enough to allow doing this without major UVM surgery? For
 now I'm assuming a file system that knows about storage-class memory
 and can fetch the device physical page that corresponds to any
 particular file and offset. ISTM that at least in theory it ought to
 be sufficient to implement getpages by doing this, and putpages by
 doing nothing at all, but I don't know that much specifically about
 UVM or the pager interface.

IMO, no, getpages interface is not sufficient.  You also have the 
problem that the pages to be mapping are not managed pages.  
Additionally, you know these pages are almost certainly going to
be physically contiguous so you really to use large page sizes to
map them.  So you don't want UVM allocating pages nor do you want
to deal with unified buffer cache.  

Indeed, it might be cheaper to avoid uvm_fault to map the pages
and just map them.  The only problem is marking data as copy-on-write
but again these pages aren't managed so the current COW code won't
be happy.


Re: storage-class memory (was: Re: state of XIP?)

2013-10-18 Thread Matt Thomas

On Oct 18, 2013, at 1:06 AM, David Holland dholland-t...@netbsd.org wrote:

 The only problem is marking data as copy-on-write
 but again these pages aren't managed so the current COW code won't
 be happy.
 
 We shouldn't have to care about that unless we want to move to
 MAP_COPY from MAP_PRIVATE.

Huh?  I'm was talking about an executable's .data section.
Since we are talking about execute-in-place.


Re: CVS commit: src/sys/lib/libunwind

2013-10-17 Thread Matt Thomas

On Oct 17, 2013, at 4:57 AM, Izumi Tsutsui tsut...@ceres.dti.ne.jp wrote:

 Anyway, our commit guidelines explicitly require Core's approval
 before adding a new package into base.  You violate the rule.
 That's the enough reason to revert your commit without discussion.

He did get core's approval and core did require some changes to the
import. We did ask why not sys/external and other things.  Given the
reasons given, we thought sys/lib/libunwind was the best place so it
can be used both in the kernel and userland.

If you've ever looked at the stacktrace code for ARM or MIPS, any
alternative would be better than the existing code.  Being able to
-fomit-stack-pointer for performance and still get a ddb stacktrace
is also a win.


Re: state of XIP?

2013-10-15 Thread Matt Thomas

On Oct 14, 2013, at 11:41 PM, David Holland dholland-t...@netbsd.org wrote:

 Did uebayasi@'s XIP work get finished/committed? Which things does it
 work with? And (other than UTSL) where am I supposed to look to find
 out more?

It was not committed since core felt that it needed too many kludges
to properly work.


Re: Getting the device name from a struct tty *

2013-10-15 Thread Matt Thomas

On Oct 15, 2013, at 12:09 AM, Marc Balmer m...@msys.ch wrote:

 In a tty line discipline, I want to get the name of the tty driver
 instance, e.g. dtyU0.  The line disciplines are called with a struct
 tty * as argument, is there any (halfway sane) way to get at the name
 of the driver instance?  I need the name of the instance, not only the
 name of the driver.
 
 struct tty * contains a dev_t element, fwiw.
 
 Is that possible?

No.


processor abstraction

2013-10-07 Thread Matt Thomas

A lot of systems are coming with compute/peripheral processors with
limited ram, etc.

I was wondering what the abstraction should be?

Obviously, mmap()'ing their memory would be nice.  But what about
stopping/starting?  Messaging?

Ideas are welcome.



Re: cpu_intr_p() related question

2013-09-01 Thread Matt Thomas

On Sep 1, 2013, at 4:24 PM, Yann Sionneau yann.sionn...@gmail.com wrote:

 On 01/09/13 23:42, Yann Sionneau wrote :
 Hello NetBSD hackers,
 
 I have read over there [0] that cpu_intr_p() should return true if curcpu() 
 is currently in the context of a hardware interrupt.
 
 Does this mean that cpu_intr_p() should return true when CPU is in exception 
 handler (tlb miss handler, tlb fault handler, division by zero etc) as well 
 as in exception caused by an interruption? Or Should it only return true for 
 true interruption like UART or timer interrupt and not for exception 
 handlers not related to an interruption?

Only those that are caused by an interrupt (think asynchronous event).
Synchronous exceptions (tlb misc, syscall etc) do not affect cpu_intr_p.
cpu_intr_p() in a tlb miss in an interrupt handler would return true.
 
 I also had a look at this thread [1] but it did not provide me with much 
 information, except that I could just implement it as return false;, is it 
 still true?

Not really since that means there will be no reporting interrupt time.

How to deal with nand controller with multiple CE.

2013-08-28 Thread Matt Thomas

I have a SoC with a smart nand controller with 8 CE (chip enables).
The board using the SoC has a MT29F16G which has 2 CEs (each
CE accesses 8 Gb).  As far as I can tell, I con only attach one
nand device to the controller since struct nand_interface doesn't
have a cookie that could be used to disambiguate the children.

One solution is to fake up a controller for each CE but that seems
like a lot of overhead.

Ideas?


Re: Sending ATA commands?

2013-08-11 Thread Matt Thomas

On Aug 11, 2013, at 9:37 PM, Mouse mo...@rodents-montreal.org wrote:

 [...], I wonder if you could attach the HPA area as an additional
 partition on the default disklabel, or, if the disk is gpt
 partitioned, fake up another partition in the gpt table.
 
 I don't see any reason why not.  I'm not sure whether you're proposing
 that the HPA not be accessible any other way or whether this is just a
 default.

or make it a ld device attached to the wd device.


Re: Use of the PC value in interrupt/exception handlers

2013-08-02 Thread Matt Thomas

On Aug 2, 2013, at 4:43 AM, Martin Husemann mar...@duskware.de wrote:

 On Fri, Aug 02, 2013 at 10:46:31AM +, Piyus Kedia wrote:
 Dear all,
 
 We are working on developing a dynamic binary translator for the kernel.
 Towards this, we wanted to confirm if the interrupted PC value pushed on
 stack by an interrupt/exception is used by the interrupt/exception handlers?
 
 You are assuming some special architecture here, aren't you?
 
 This is all very much machine dependend, and can not be answered for NetBSD
 in general, please give a bit more details. It might be better to ask
 on arch specific mailing lists, if you only care about a certain arch.

In general, interrupt do not care about the PC unless they are returning back
to usermode and an ast has been requested. This will causes all the registers
to be saved and a context switch to happen.  The clock handler cares about the 
PC to do profiling.

The code checking for interrupting a RAS handler in cpu_switchto will modify 
the PC back to the start of the RAS if the PC was in the middle of a RAS.

For copying faults, NetBSD uses pcb_onfault whose implementation varies by
architecture but involves saving the PC (and maybe other registers) and 
using the onfault information to return/react to fault by restoring the 
registers saved but this is only applicable to kernel PC addresses.

Re: Use of the PC value in interrupt/exception handlers

2013-08-02 Thread Matt Thomas

On Aug 2, 2013, at 9:41 AM, Piyus Kedia piyuske...@gmail.com wrote:

 Hi,
 
 I can see a system call sys_rasctl which install a RAS area. So I assume that 
 the RAS PC's will be only user PC's and the cpu_switchto will modify the user 
 PC if it is interrupted in the middle of a RAS.

No. Some architectures use RAS against kernel addresses 
but those comparisons against the PC are explicitly done.
(MIPS is an example for implementing the atomic_ops on
MIPS processors which don't have the LL/SC instructions.)

A simpler tty driver model

2013-07-28 Thread Matt Thomas

I have several SoC targets that I've stalled on due to the need of writing
a tty driver.  Sure I could cut  paste from another driver but having to
do that 3+ times seems inordinately stupid.  So I've been thinking of 
making that cp'ed code into a common tty driver and exporting a small of
h/w specific functions.

int (*t_enable)(void *);// powerup device
int (*t_disable)(void *);   // powerdown device
int (*t_detach)(void *);// detach device

int (*t_status)(void *v);
if t_getc returns -1, call this to reason for input error.
(framing error, parity error, etc.)

int (*t_getbuf)(void *v, uint8_t *buf, size_t n);
returns up to n characters at a time into buf.  returns # of characters
returned (0..n) or -1 if an error happened.
int (*t_putbuf)(void *v, const uint8_t *buf, size_t n);
write n characters to tty, if there is no space for n, return the number
of characters placed in the fifo.
int (*t_flush)(void *v, u_int mask);
flush the receive and/or/transmit fifo depending on mask.
int (*t_active)(void *v, u_int signals);
Set the following signal(s) in sigmals to active (CTS,RTS,DTR,DTS,CD,etc).
int (*t_inactive)(void *v), u_int signals);
Set the following signal(s) in sigmals to inactive (CTS,RTS,DTR,DTS,CD,etc).
int (*t_setmode)(void *v, u_int mode);
contains the char width, stop bits, speed, partity (even, odd, none)
speed is in the low 24 bits and everything else in the top 8.
if the top 8 bits are 0, that will mean 8 bit, 1 stop, no parity.
bits 24:25 is char size (8-[25:24]), 27:26 stop bits (1 + [27:26]/2)
bits 31:30 are parity (3 = odd, 2 = even, 0 = none).

I would like to be able to make the h/w driver  200 lines if possible.
This is only a rough guess at what it would take.


Re: How to compile the kernel identical to the vanilla one?

2013-07-25 Thread Matt Thomas

On Jul 24, 2013, at 8:02 PM, Edwina Ng edwinan...@gmail.com wrote:

 Hi,
 
 I have extracted the configuration of /netbsd after a vanilla installation 
 with config(1), then untar the syssrc.tgz from source/sets and build the 
 kernel. The file netbsd generated in the build directory is not the same file 
 size as the one in the root directory.
 
 I save the kernel that I have built, delete the build directory and started 
 again afresh. The resultant netbsd has the same file size of the previous one 
 yet, cmp(1) says there are some different.
 
 So my question is, given the same source, same kernel config file, same tool 
 chain and same platform, is that possible for two systems to produce 
 identical compiled kernel images? If not, what is the reason? If yes, which 
 step did I miss?

you probably need MKREPRO=yes when building the kernel to remove things like 
dates
and stuff.



Re: DTrace syscall provider - please test/comment

2013-06-25 Thread Matt Thomas

On Jun 25, 2013, at 5:25 AM, chris...@zoulas.com (Christos Zoulas) wrote:

 On Jun 24,  6:12pm, m...@3am-software.com (Matt Thomas) wrote:
 -- Subject: Re: DTrace syscall provider - please test/comment
 
 | 
 | On Jun 24, 2013, at 6:01 PM, Christos Zoulas chris...@astron.com wrote:
 | 
 |  Can't this be done as an addition/enhancement to the trace_enter()/
 |  trace_exit() facility instead of having to enter each syscall entry?
 | 
 | that only gets called if p-p_trace_enabled is set.  So now you need
 | a hook to set that on every lwp switch if the provider is tracing.
 
 Right, and it (dtrace) can set a different (or the same flag) to enable
 it.


How does it set the same flag since that's per-proc and will need to changed
on context switch.  

A different flag is more overhead per syscall.



Re: DTrace syscall provider - please test/comment

2013-06-25 Thread Matt Thomas

On Jun 25, 2013, at 10:19 AM, Jeff Rizzo r...@tastylime.net wrote:

 On 6/25/13 10:06 AM, Christos Zoulas wrote:
 On Jun 25,  9:32am, m...@3am-software.com (Matt Thomas) wrote:
 -- Subject: Re: DTrace syscall provider - please test/comment
 
 |
 | On Jun 25, 2013, at 5:25 AM, chris...@zoulas.com (Christos Zoulas) wrote:
 |
 |  On Jun 24,  6:12pm, m...@3am-software.com (Matt Thomas) wrote:
 |  -- Subject: Re: DTrace syscall provider - please test/comment
 | 
 |  |
 |  | On Jun 24, 2013, at 6:01 PM, Christos Zoulas chris...@astron.com 
 wrote:
 |  |
 |  |  Can't this be done as an addition/enhancement to the trace_enter()/
 |  |  trace_exit() facility instead of having to enter each syscall entry?
 |  |
 |  | that only gets called if p-p_trace_enabled is set.  So now you need
 |  | a hook to set that on every lwp switch if the provider is tracing.
 | 
 |  Right, and it (dtrace) can set a different (or the same flag) to enable
 |  it.
 |
 |
 | How does it set the same flag since that's per-proc and will need to 
 changed
 | on context switch.
 |
 | A different flag is more overhead per syscall.
 
 I am trying to balance that against adding of two more conditionals per
 syscall per architecture and touching dozens of source files adding the
 same code in each one. Perhaps the syscall_plain/syscall_fancy idea
 was not that bad after all :-( Perhaps a different bit on the same flag.
 If any of them is set, you call trace enter, and you clear/move the
 bit on context switch.
 
 christos
 
 I am by no means an expert on this part of the kernel, but during the course 
 of this project, I noticed that FreeBSD seems to have made a 
 kern/subr_syscall.c which has an MI place where they put their entry/exit.  
 It seemed a lot cleaner than what we have; I can't speak to performance. Can 
 someone with more design clue than I comment on this setup?

I added an inline to sys/syscall.h.

int sy_invoke(const struct sysent *, struct lwp *, const void *, register_t *,
   int code);

which does the trace_enter/trace_exit dance, that can be modified to do
the dtrace dance as well.




  1   2   3   >