Re: Proposal: Restore malloc(9) interface

2024-01-09 Thread Masanobu SAITOH
Hi.

On 2023/12/31 7:44, Taylor R Campbell wrote:
> I propose to deprecate the kmem(9) interface and go back to the
> malloc(9) interface.
> 
> 
> 1. The main difference is that the malloc(9) interface enables
>attribution of memory usage: how many bytes have been used for this
>purpose vs that purpose, partitioned by named malloc tags like
>M_MBUF or M_ACPI.  The conversion to the kmem(9) interface lost all
>this valuable diagnostic information.
> 
>I've personally spent probably dozens of hours over the last year
>or two puzzling over `vmstat -m' output to guess which subsystem
>might be leaking memory based on allocation sizes and which
>kmem-N pool looks fishy.  This is extremely frustrating and a
>huge waste of time to recover information we used to gather and
>report systematically.

I think adding malloc type again is good thing.

> 2. A secondary difference is reduced diffs from FreeBSD and OpenBSD
>drivers if we use malloc(9).
> 
> 3. A small difference is that kmem(9) distinguishes legacy allocation
>from interrupt context, kmem_intr_alloc/free, from front ends that
>forbid that, kmem_alloc/free.
> 
>I'm not sure this has provided much valuable diagnostic
>information, but it has provided a lot of frustrating crashes.  If
>we want the same frustrating crashes we could introduce an M_INTR
>flag which is mandatory when calling malloc from interrupt context.

Current kmen_(intr_)alloc()'s return address is usually aligned with
CACHE_LINE_SIZE. (Note that the smallest number is KMEM_ALIGN(== 8)
and the max is PAGE_SIZE.)
On kern_malloc(), the memory is first allocated with kmem_intr_alloc()
and put the malloc_header to the head. As a result, the return address
usually aligned with lower than CACHE_LINE_SIZE (at least on amd64).

I'm glad if the new malloc(9) API usually returns pointer aligned
with CACHE_LINE_SIZE or more.

> Note: I am NOT proposing any substantive changes to the implementation
> of the allocator -- I'm just proposing that we go back to the old
> _interface_, using the new pool-cache-based _implementation_, and to
> add lightweight per-CPU, per-tag usage counting to the malloc and free
> paths.
> 
> Nor am I suggesting changing anything about uvm_km(9), pool_cache(9),
> or anything else -- just changing kmem_alloc(N, KM_[NO]SLEEP) back to
> malloc(N, T, M_NOWAIT/WAITOK) and kmem_free(P, N) back to free(P, T),
> or possibly free(P, T, N) like OpenBSD does.
> 
> Thoughts?
> 
> 
> I asked for rationale for the kmem(9) interface last year, and none of
> the answers gave any compelling reason to have changed interfaces in
> the first place or to finish the conversion now:
> 
> https://mail-index.netbsd.org/tech-kern/2022/10/29/msg028498.html
> 
> As far as I can tell, we just spent umpteen hundreds of hours on
> engineering effort over the last decade to convert various drivers and
> subsystems from malloc(9) to kmem(9), in exchange for the loss of
> valuable diagnostic information about leaks, for increased cost to
> porting drivers, and for crashes when old subsystems newly converted
> to kmem(9) still allocate from interrupt context.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: proposed cpuctl modification

2023-03-10 Thread Masanobu SAITOH



On 2023/03/09 19:21, Robert Elz wrote:
> Date:Thu, 9 Mar 2023 16:21:53 +0900
> From:    Masanobu SAITOH 
> Message-ID:  <38ae66bd-1b37-c0ef-5a43-52e0c0a2a...@execsw.org>
> 
>   | Alder Lake-N? 4 E-cores share one microcode image. I have i7-12700 and it
>   | has 4 E-cores. Those 4 cores share one microcode image.
> 
> Mine is an i9-12900KS which has 8 of them (2 groups of 4).
> 
> Thanks for the confirmation, that is what looked to be happening, but
> I was just guessing from what I observed.  I just use intel processors
> (and others on occasion) I don't even pretend to understand them.
> 
>   | I think your idea is the best. Thank you for your commit.
> 
> No problem.  It was not a difficult change to make!
> 
>   | Another solutions is that the kernel returns 0 instead of EEXIST if the
>   | version number is the same as the running microcode's version.
> 
> Yes, I considered that one as well, but as you indicate, doing that just
> loses information, and gains nothing - the same number of sys calls (ioctl's)
> would be performed, all that would be saved is the check to see if the
> error is EEXIST when that happens (ie: peanuts).
> 
> kre
> 
> ps: do your E-cores ever just turn themselves off?   On mine, occasionally,
> and for no reason I can fathom, the BIOS reports there are none of them.
> (and NetBSD doesn't see them either).

I've never seen such problem on my machine.

> They come back after a power cycle.
> This is probably a BIOS issue, but ?

It might be...

> 

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Late MCU, was: proposed cpuctl modification

2023-03-09 Thread Masanobu SAITOH
Hello, all.

On 2023/03/05 11:01, Robert Elz wrote:
> Date:Fri, 3 Mar 2023 21:46:22 -0800
> From:"William 'Cryo' Coldwell" 
> Message-ID:  <7ce92f54-3746-4106-bd63-16e5e4cbc...@netbsd.org>
> 
>   | To throw some extra fun mixture into this discussion:  As of 5.19
>   | Linux will no longer allow late microcode loading:
>   |
>   | ref: https://www.phoronix.com/news/Linux-5.19-Late-ucode-Loading

We _SHOULD_ support the early microcode update function in the boot loader
or the very early stage of the kernel initialization.

One of the big reason is that a new microcode may disable a function.
If the kerne use the function, the kernel may work incorrectly.
Such type of change usually can be identified by the correcponding
CPUID bit. It may be possible to reconfigure some kernel functions by
identifying such change, but it may not possible, and it's simpler to use
the early micorocode update function than to write such complicated code.

I tried to implement the early microcode loading stuff in our boot loader
but I couldn't finish the work. NetBSD developers may read my previos mail
I wrote 2018/01/19 subjected with "x86 microcode early update"...

> I suspect this message was not really aimed at me, but at tech-kern
> my only specific knowledge about any of this is that I dislike spurious
> (seeming) error messages during the boot sequence - and so intend to
> make them go away (as there has been no objection, I will commit that
> change soon).
> 
> But:
>   | Any thoughts on the best approach to this? (boot? EFI?)
> 
> Best?  No idea.   But one approach might be to only start cpu0 in the
> kernel during bootstrap, and then have the rest started by an rc.d
> script, which could update microcode on them (if needed) first.
> 
> By moving cpuctl from /usr/sbin to /sbin and placing the firmware in
> /libdata instead of /usr/pkg that could be made to happen very early
> in the boot sequence (perhaps even before the fsck of / and rw mount).
> 
> I'm not sure how that would work wrt other things that have to happen
> (like arranging interrupt routing) - as clearly the microcode needs to
> be read from the filesystem (whether that's by the kernel, as now, or
> by cpuctl, then passing a memory image of it to the kernel doesn't seem
> as if it would make a lot of difference).

We can take two types of early loading.

 a) Update the microcode in the boot loader. Additional code is not required
in the kerne.

 b) Load the micorocode image in the boot loader and copy it to a memory
(buffer). Pass the pointer as a new boot argument. The kerne updates
all CPU's microcode as soon as possible.

Please someone(TM) try to implement..



> Longer term this could be coupled with more userland control of the
> cpu configuration - all being done at once as part of the startup
> sequence, but from userland code.
> 
> Whatever the issues really are (according to the Linux people) with
> doing the microcode update as we now do it, even assuming that is more
> or less the same as they do it, this should be safe, as code running
> on a CPU has to do it, I don't see it can make any real difference
> whether than is bios code, boot code, or early kernel code.
> 
> I don't much like the idea of extra magic blobs, or more hackery in
> the boot code however.
> 
> kre
> 

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: proposed cpuctl modification

2023-03-08 Thread Masanobu SAITOH
Hello.

I'm sorry for the late of this reply. I was busy for a few weeks.

On 2023/03/03 6:59, Robert Elz wrote:
> This message is about a proposed userland modification, but it seems
> more kernelish to me, hence I am asking here on tech-kern, rather than
> on tech-userlevel
> 
> When my system boots (intel cpu) it runs the intel-microcode (from pkgsrc)
> microcode update.
> 
> Since it is an Intel cpu, that means running the ioctl to perform the
> microcode update on every core.   The cpu has hyperthreading, and while
> I sometimes am inclined to turn that off, I haven't so far.
> 
> Naturally, as they're really the same core, while the base core ucode
> update works fine, the hyperthread companion always fails, as there the
> microcode is already up to the expected version (surprise surprise, since
> we just did that).I guess we could look and skip the update on
> the hyperthread companion cores (pseudo-cores) but that's not what I am
> proposing, partly because I'd have to work out how to do that, but also
> because that by itself would only solve half the problem.
> 
> My processor also has 8 cores, with no hyperthreading, where it looks as
> if internally, there's just 2 sets of ucode, one for the first 4, and one
> for the second 4.

Alder Lake-N? 4 E-cores share one microcode image. I have i7-12700 and it
has 4 E-cores. Those 4 cores share one microcode image.

>   Updates of the other 3 of each group find the ucode
> version already installed, and error out.
> 
> So, what I am proposing is to have cpuctl simply ignore EEXIST errors from
> the update the microcode" ioctl. unless the -v option (which already exists)
> is given.
> 
> Note that this isn't fixing any real problem - everything works as it is now,
> and everything (from what I can tell) gets updated correctly.   The issue is
> more cosmetic - the ucode update currently issues error messages for more than
> half of my (apparent) cores, which looks ugly during the boot process, and
> could lead to people wondering if there is a problem.

I think your idea is the best. Thank you for your commit.

Another solutions is that the kernel returns 0 instead of EEXIST if the
version number is the same as the running microcode's version. I think it's not
good because it hides the information that the running micocode is the latest.

> When I first installed the package from pgksrc to make this happen (a while
> ago now) I immediately disabled it (in rc.conf) so it ran no more, as the
> microcode version offered by the package was what my CPU came equipped with,
> and every core complained about the version not actually being updated.
> 
> pkgsrc was updated to a newer version, with newer microcode, I installed
> that, and enabled it again - and now I'm stuck with the ugly errors.
> 
> Perhaps someone has a better solution than this, perhaps checking which
> microcode version is installed on each core, and not attemmpting to install
> the update if it is the same version, but that's beyond my knowledge.
> It also seems likely to me that it is simpler to just install anyway, and
> ignore the "already there" error if it happens.   Probably.

I agree with you. Since we do not know what kind of sharing scheme future
CPUs will have, I think simple is the best.

 Thanks.


> My proposed patch is appended - it builds, installs, and seems to work
> just fine (not too much of a surprise, it is a trivial change).
> 
> Opinions?
> 
> kre
> 
> Index: cpuctl.8
> ===
> RCS file: /cvsroot/src/usr.sbin/cpuctl/cpuctl.8,v
> retrieving revision 1.20
> diff -u -r1.20 cpuctl.8
> --- cpuctl.8  17 May 2019 23:51:35 -  1.20
> +++ cpuctl.8  2 Mar 2023 21:36:06 -
> @@ -72,6 +72,10 @@
>  .Op Ar file
>  .Xc
>  This applies the microcode patch to CPUs.
> +Unless
> +.Fl v
> +was given, errors indicating that the microcode
> +already exists on the CPU in question are ignored.
>  If
>  .Ar cpu
>  is not specified or \-1, all CPUs are updated.
> Index: cpuctl.c
> ===
> RCS file: /cvsroot/src/usr.sbin/cpuctl/cpuctl.c,v
> retrieving revision 1.32
> diff -u -r1.32 cpuctl.c
> --- cpuctl.c  1 Feb 2022 10:45:02 -   1.32
> +++ cpuctl.c  2 Mar 2023 21:36:06 -
> @@ -247,7 +247,7 @@
>   cpuset_destroy(cpuset);
>   }
>   error = ioctl(fd, IOC_CPU_UCODE_APPLY, );
> - if (error < 0) {
> + if (error < 0 && (verbose || errno != EEXIST)) {
>   if (uc.fwname[0])
>   err(EXIT_FAILURE, "%s", uc.fwname);
>   else
> 
> 

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Lock of NetBSD-current with ifconfig down / up

2022-09-19 Thread Masanobu SAITOH
Hi.

On 2022/09/18 2:55, John Klos wrote:
> Hi,
> 
> Here's a nice issue :)
> 
> Plug in ure* USB ethernet to amd64 machine running NetBSD-current (9.99.99, 
> 22-August-2022):
> 
> [ 1791670.446266] ure0 at uhub8 port 4
> [ 1791670.446266] ure0: Realtek (0x0bda) USB 10/100/1000 LAN (0x8153), rev 
> 2.10/30.00, addr 6
> [ 1791670.446266] ure0: RTL8153 ver 5c30
> [ 1791670.566267] rgephy0 at ure0 phy 0: RTL8251 1000BASE-T media interface, 
> rev. 0
> [ 1791670.586267] rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
> 1000baseT-FDX, auto
> [ 1791670.586267] ure0: Ethernet address a0:ce:c8:e7:88:5f
> [ 1791673.256299] ugen1 at uhub8 port 5
> [ 1791673.256299] ugen1: VIA Labs, Inc. (0x2109) PD3.0 USB-C Device (0x), 
> rev 2.01/0.01, addr 7
> 
> ifconfig ure0 up
> 
> No problem:
> 
> ure0: flags=0x8943 mtu 1500
> capabilities=0x3ff00
> capabilities=0x3ff00
> capabilities=0x3ff00
> enabled=0
> ec_capabilities=0x1
> ec_enabled=0
> address: a0:ce:c8:e7:88:5f
> media: Ethernet autoselect (100baseTX full-duplex)
> status: active
> inet6 fe80::a2ce:c8ff:fee7:885f%ure0/64 flags 0 scopeid 0x9
> 
> ifconfig ure0 down
> 
> Locks the machine. I couldn't get more information because it's 3000 miles 
> away. There's nothing in dmesg because the machine was power cycled.

One of my environment:

# ifconfig ure0 down
[  63.2217114] panic: kernel diagnostic assertion "mii_locked(mii)" failed: 
file "../../../../dev/mii/mii.c", line 381
[  63.2332664] cpu0: Begin traceback...
[  63.2332664] vpanic() at netbsd:vpanic+0x183
[  63.2418447] kern_assert() at netbsd:kern_assert+0x4b
[  63.2418447] mii_down() at netbsd:mii_down+0x9a
[  63.2521575] usbnet_if_stop() at netbsd:usbnet_if_stop+0x134
[  63.2521575] ether_ioctl_reinit() at netbsd:ether_ioctl_reinit+0x78
[  63.2623342] usbnet_if_ioctl() at netbsd:usbnet_if_ioctl+0x80
[  63.2717104] doifioctl() at netbsd:doifioctl+0x322
[  63.2717104] sys_ioctl() at netbsd:sys_ioctl+0x56d
[  63.2817903] syscall() at netbsd:syscall+0x196
[  63.2817903] --- syscall (number 54) ---
[  63.2817903] netbsd:syscall+0x196:
[  63.2930753] cpu0: End traceback...








> Initially I imagined it might be due to the ure* driver, but then it happened 
> locally.
> 
> On an amd64 system running 9.99.98 from 16-July-2022, I ran "ifconfig re1 
> down" and the machine locked - no ICMP, nothing for SIGINFO, no response to 
> keyboard cnmagic.
> 
> It doesn't appear to be hardware, but here's this just because.
> 
> [ 1.044097] re1 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit 
> Ethernet (rev. 0x0c)
> [ 1.044097] re1: interrupting at msix2 vec 0
> [ 1.044097] re1: RTL8168G/8111G (0x4c00)
> [ 1.044097] re1: Ethernet address 4c:cc:6a:01:a5:e0
> [ 1.044097] re1: using 256 tx descriptors
> [ 1.044097] rgephy1 at re1 phy 7: RTL8251 1000BASE-T media interface, 
> rev. 0
> 
> I've ordered some PS/2 keyboards, because I take it that's the only way to 
> reliably get in to the kernel debugger on amd64, unless someone knows a trick 
> to make USB keyboards usable.
> 
> send-pr?
> 
> Thanks,
> John

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: panic in sysmon_envsys_unregister

2022-09-14 Thread Masanobu SAITOH


On 2022/09/14 19:01, Edgar Fuß wrote:
>> I need to build a new install image (since I have no discs).
> I applied your fix to -8 and the panic disappeared.
> Thanks for the quick fix. Maybe it should be pulled up?

I'll send pullup requests.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: panic in sysmon_envsys_unregister

2022-09-13 Thread Masanobu SAITOH
This should be fixed by mfii.c rev. 1.26. Please update it and retry.

On 2022/09/14 6:14, Edgar Fuß wrote:
> I get a panic on shutdown:
> 
> netbsd:sysmon_envsys_unregister+0x128:cmpq0(%rdx),%r12
> sysmon_envsys_unregister
> mfii_detach
> config_detach
> config_detach_all
> cpu_reboot
> kern_reboot
> sys_reboot
> syscall
> ds4da0
> es0
> fs1
> gsc632
> rdi   818f0510sme_global_mtx
> rsi   
> rbp   9008514e4da0
> rbx   90003d04c000
> rdx   0
> rcx   d9e26f07b700
> rax   0
> r80
> r90
> r10   0
> r11   0
> r12   d9e26d5b1c40
> r13   d9e26d7a5a00
> r14   1
> r15   81802c60mfii_ca
> rip   80a8c41esysmon_envsys_unregister+0x128
> cs8
> rflags10246
> rsp   9008514e4d90
> ss10
> 
> This is -current from around yesterday.
> I guess the problem is related to
>   mfii0: autoconfiguration error: unable to register with sysmon (rv = 86)
>   mfii0: autoconfiguration error: unable to create sensors
> So probably someone is trying to un-resgister something not registered.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mfii(4) and Dell PERC

2022-08-08 Thread Masanobu SAITOH
Hi.

On 2022/08/08 19:41, Edgar Fuß wrote:
> Thanks for your answers.
> 
>> Some people reported that kern/56669 (and perhaps kern/55192) still exist 
>> on some systems :-(
> Hm.
> 
>> bioctl mfi(i)X show
> Ah, thanks.
> What do I do in case a drive fails? Will adding a hot spare automagically 
> start a reconstruction?

It will do.
(I've never done "bioctl mfii0 hotspare add". I usually add a hotspare
via BIOS setup menu.)

>> If your system has other number, please let me know.
> I don't have such a system yet. I wanted to find out about NetBSD 
> compatibility 
> before buying one.

For the BBU monitoring, it's easy to add a new device support.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mfii(4) and Dell PERC

2022-08-08 Thread Masanobu SAITOH



On 2022/08/08 18:37, Edgar Fuß wrote:
> I'm unsure whether this is the right list, is port-amd64 more appropriate?
> 
> Does anyone use a Dell PERC H730P or similar RAID controller in RAID mode?

It seems PERC H730P's device ID is 0x005d.

https://pci-ids.ucw.cz/read/PC/1000/005d

It's the Invader series. I have Fujitsu PRAID EP400i. It's also Invader.
It works with one of my machines, but it doesn't guarantee that
your machine with PERC H730P works. Some people reported that kern/56669
(and perhaps kern/55192) still exist on some systems :-(


> mfii(4) says all configuration is done via the controller's BIOS.
> Does that mean I need to shut down in case a drive fails an I need to rebuild?
> 
> Can I monitor the RAID state?

bioctl mfi(i)X show

> Can I monitor the BBU Battery health?

Maybe. If your machine's battery_type is 1, 2, 5 or 6, you can do by envstat(8).

#define MFI_BBU_TYPE_IBBU   1
#define MFI_BBU_TYPE_BBU2
#define MFI_BBU_TYPE_IBBU09 5
#define MFI_BBU_TYPE_CVPM02 6

If your system has other number, please let me know.

> Thanks in advance.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mfii hanging on boot

2022-06-27 Thread Masanobu SAITOH


On 2022/06/23 18:33, Edgar Fuß wrote:
>> I committed the change yesterday.
> I don't get what the #if defined(__LP64__) && 0 is for.

It's just to make the same line as of bus_space_write_8().
I don't know why it's "&& 0"ed.

---
static void
mfii_start(struct mfii_softc *sc, struct mfii_ccb *ccb)
{
#if defined(__LP64__) && 0
u_long *r = (u_long *)>ccb_req;
#else
uint32_t *r = (uint32_t *)>ccb_req;
#endif

bus_dmamap_sync(sc->sc_dmat, MFII_DMA_MAP(sc->sc_requests),
ccb->ccb_request_offset, MFII_REQUEST_SIZE,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);

#if defined(__LP64__) && 0
bus_space_write_8(sc->sc_iot, sc->sc_ioh, MFI_IQPL, *r);
#else
mutex_enter(>sc_post_mtx);
bus_space_write_4(sc->sc_iot, sc->sc_ioh, MFI_IQPL, r[0]);
bus_space_barrier(sc->sc_iot, sc->sc_ioh,
MFI_IQPL, 8, BUS_SPACE_BARRIER_WRITE);

bus_space_write_4(sc->sc_iot, sc->sc_ioh, MFI_IQPH, r[1]);
bus_space_barrier(sc->sc_iot, sc->sc_ioh,
MFI_IQPH, 8, BUS_SPACE_BARRIER_WRITE);
mutex_exit(>sc_post_mtx);
#endif
}
---

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: wm workqueue and agr device

2022-06-22 Thread Masanobu SAITOH
Hi.

On 2022/06/02 19:08, BERTRAND Joël wrote:
>   Hello,
> 
>   In a recent mail (if_wm.c workqueue), I have seen that it is possible
> to activate workqueue on all ethernet devices contained in my server:
> 
> [ 1.002997] wm0 at pci2 dev 0 function 0: I350 Gigabit Network
> Connection (rev. 0x01)
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 0 affinity to 0
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 1 affinity to 1
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 2 affinity to 2
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 3 affinity to 3
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 4 affinity to 4
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 5 affinity to 5
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 6 affinity to 6
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 7 affinity to 7
> [ 1.002997] wm0: for LINK interrupting at msix0 vec 8
> [ 1.002997] wm0: PCI-Express bus
> [ 1.002997] wm0: 8192 words (16 address bits) SPI EEPROM, version
> 1.63, Image Unique ID 0ffd
> [ 1.002997] wm0: Ethernet address b4:96:91:92:77:6e
> [ 1.002997] wm0: Copper
> [ 1.002997] wm0:
> 0x40674040
> [ 1.002997] ihphy0 at wm0 phy 1: I350 10/100/1000 media interface,
> rev. 1
> [ 1.002997] wm1 at pci2 dev 0 function 1: I350 Gigabit Network
> Connection (rev. 0x01)
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 0 affinity to 0
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 1 affinity to 1
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 2 affinity to 2
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 3 affinity to 3
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 4 affinity to 4
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 5 affinity to 5
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 6 affinity to 6
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 7 affinity to 7
> [ 1.002997] wm1: for LINK interrupting at msix1 vec 8
> [ 1.002997] wm1: PCI-Express bus
> [ 1.002997] wm1: 8192 words (16 address bits) SPI EEPROM, version
> 1.63, Image Unique ID 0ffd
> [ 1.002997] wm1: Ethernet address b4:96:91:92:77:6f
> [ 1.002997] wm1: Copper
> [ 1.002997] wm1:
> 0x40474040
> [ 1.002997] ihphy1 at wm1 phy 1: I350 10/100/1000 media interface,
> rev. 1
> [ 1.002997] wm2 at pci0 dev 25 function 0: I218 V Ethernet
> Connection (rev. 0x00)
> [ 1.002997] wm2: interrupting at msi4 vec 0
> [ 1.002997] wm2: PCI-Express bus
> [ 1.002997] wm2: 2048 words FLASH, version 0.1.4
> [ 1.002997] wm2: Ethernet address 08:62:66:47:63:99
> [ 1.002997] wm2: 0x6a4080
> [ 1.002997] ihphy2 at wm2 phy 2: i217 10/100/1000 media interface,
> rev. 5
> [ 1.002997] wm3 at pci6 dev 0 function 0: Intel i82574L (rev. 0x00)
> [ 1.002997] wm3: for TX and RX interrupting at msix6 vec 0 affinity to 1
> [ 1.002997] wm3: for TX and RX interrupting at msix6 vec 1 affinity to 2
> [ 1.002997] wm3: for LINK interrupting at msix6 vec 2
> [ 1.002997] wm3: PCI-Express bus
> [ 1.002997] wm3: 2048 words FLASH, version 1.8.0, Image Unique ID
> 
> [ 1.002997] wm3: ASPM L0s and L1 are disabled to workaround the errata.
> [ 1.002997] wm3: Ethernet address 68:05:ca:02:b2:59
> [ 1.002997] wm3: 0x224080
> [ 1.002997] makphy0 at wm3 phy 1: Marvell 88E1149 Gigabit PHY, rev. 1
> [ 1.002997] wm4 at pci7 dev 0 function 0: Intel i82574L (rev. 0x00)
> [ 1.002997] wm4: for TX and RX interrupting at msix7 vec 0 affinity to 1
> [ 1.002997] wm4: for TX and RX interrupting at msix7 vec 1 affinity to 2
> [ 1.002997] wm4: for LINK interrupting at msix7 vec 2
> [ 1.002997] wm4: PCI-Express bus
> [ 1.002997] wm4: 2048 words FLASH, version 1.8.0, Image Unique ID
> 
> [ 1.002997] wm4: ASPM L0s and L1 are disabled to workaround the errata.
> [ 1.002997] wm4: Ethernet address 68:05:ca:44:cf:30
> [ 1.002997] wm4: 0x224080
> [ 1.002997] makphy1 at wm4 phy 1: Marvell 88E1149 Gigabit PHY, rev. 1
> 
> wm0 and wm1 are bridged (bridge0) and connected to two NAS.
> wm2 is connected to WAN
> wm3 and wm4 are aggregated (agr0) and connected to LAN.
> 
>   I have tried to set in /etc/sysctl.conf:
> 
> hw.wm0.txrx_workqueue=1
> hw.wm1.txrx_workqueue=1
> hw.wm2.txrx_workqueue=1
> hw.wm3.txrx_workqueue=1
> hw.wm4.txrx_workqueue=1
> 
> and system quickly hangs (no panic, no internal debugger, no serial
> console available, only a hard lock). Before lock, agr0 was very very slow.

agr(4) uses softint. Scheduling priority of softint is higher
than workqueue. It might cause the problem.

>   If I set:
> 
> hw.wm0.txrx_workqueue=1
> hw.wm1.txrx_workqueue=1
> hw.wm2.txrx_workqueue=1
> hw.wm3.txrx_workqueue=0
> hw.wm4.txrx_workqueue=0
> 
> system seems to run as expected. Thus 

Re: mfii hanging on boot

2022-06-22 Thread Masanobu SAITOH


On 2022/06/22 4:55, c...@sdf.org wrote:
> Hello Masa -

Hi, Stephen.

> I was able to apply your patch to the 9.2 tree and built a kernel and
> pxebooted it.  Unfortunately the status now looks like this:
> 
> mfii0 at pci5 dev 0 function 0mfii0: firmware fault
> 
> https://sdf.org/eu   (full boot sequence)
> 
> The machine is 8412km away in a data center and I don't have direct
> access to it so I cannot unplug the battery, move the board to another
> slot, et cetera.

I saw almost the same situation. To recover from the error, I had to power-down
the machine, unplug the battery, keep a few minutes, plug the battery and
power-up again.

I committed the change yesterday. I guess that it fixes kern/55192 and 
kern/56669.

 Thanks.


> I can, however, power cycle/reset the machine and access the MegaCLI to
> look at the two raids.  Both appear to be intact and the controller
> reports no errors, no bad disks or any other issues.
> 
> I'm hessitant to make any changes physically to the machine until I'm
> absolutely sure that this is a recovery situation.
> 
> Thank you for your help on this, I appreciate it!
> 
> Stephen
> 
>>  I have 9272-8i and it have been working well on netbsd-9 since several
>> weeks ago.
>>
>>> mfii0 at pci1 dev 0 function 0: "LSI MegaRAID SAS 9272-8i", firmware
>>> 23.34.0-0019, 512MB cache
>>> mfii0: interrupting at ioapic0 pin 16
>>
>> Could you all who have problem with mfii(4) test with the following diff?
>>
>> Index: mfii.c
>> ===
>> RCS file: /cvsroot/src/sys/dev/pci/mfii.c,v
>> retrieving revision 1.15
>> diff -u -p -r1.15 mfii.c
>> --- mfii.c   13 May 2022 10:44:38 -  1.15
>> +++ mfii.c   21 Jun 2022 07:46:51 -
>> @@ -1840,7 +1840,11 @@ mfii_load_mfa(struct mfii_softc *sc, str
>>  static void
>>  mfii_start(struct mfii_softc *sc, struct mfii_ccb *ccb)
>>  {
>> +#if defined(__LP64__) && 0
>>  u_long *r = (u_long *)>ccb_req;
>> +#else
>> +uint32_t *r = (uint32_t *)>ccb_req;
>> +#endif
>>
>>  bus_dmamap_sync(sc->sc_dmat, MFII_DMA_MAP(sc->sc_requests),
>>  ccb->ccb_request_offset, MFII_REQUEST_SIZE,
>>
>>
>>
>>
>> --
>> ---
>> SAITOH Masanobu (msai...@execsw.org
>>  msai...@netbsd.org)
>>
> 
> 

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mfii hanging on boot

2022-06-21 Thread Masanobu SAITOH
 I have 9272-8i and it have been working well on netbsd-9 since several
weeks ago.

> mfii0 at pci1 dev 0 function 0: "LSI MegaRAID SAS 9272-8i", firmware 
> 23.34.0-0019, 512MB cache
> mfii0: interrupting at ioapic0 pin 16

Could you all who have problem with mfii(4) test with the following diff?

Index: mfii.c
===
RCS file: /cvsroot/src/sys/dev/pci/mfii.c,v
retrieving revision 1.15
diff -u -p -r1.15 mfii.c
--- mfii.c  13 May 2022 10:44:38 -  1.15
+++ mfii.c  21 Jun 2022 07:46:51 -
@@ -1840,7 +1840,11 @@ mfii_load_mfa(struct mfii_softc *sc, str
 static void
 mfii_start(struct mfii_softc *sc, struct mfii_ccb *ccb)
 {
+#if defined(__LP64__) && 0
u_long *r = (u_long *)>ccb_req;
+#else
+   uint32_t *r = (uint32_t *)>ccb_req;
+#endif

bus_dmamap_sync(sc->sc_dmat, MFII_DMA_MAP(sc->sc_requests),
ccb->ccb_request_offset, MFII_REQUEST_SIZE,




-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: TSC improvement

2020-06-10 Thread Masanobu SAITOH
Hi.

On 2020/06/10 4:37, Joerg Sonnenberger wrote:
> On Tue, Jun 09, 2020 at 05:16:27PM +, Taylor R Campbell wrote:
>> It's great to see improvements to our calibration of the TSC (and I
>> tend to agree that cpu_counter should be serializing, so that, e.g.,
>> cpu_counter(); ...; cpu_counter() reliably measures time taken in the
>> ellipsis).
> 
> I'm pretty sure we want to have both variants.
> 
> Joerg

You mean we should have both cpu_counter() and cpu_counter_serializing()
(or cpu_counter() and cpu_counter_unserializing())? If so, why?

I have no strong opinion about this. When I wrote that change, I added
cpu_counter_serializing(). We already have tsc_get_timecount(), cpu_counter()
and cpu_counter32() as MI API. In addition, x86 also has rdtsc(and will
have some serializing functions). There are so many similar functions,
so I thought it would be good to avoid code duplication and to be simple.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: [PATCH] Re: EFI memory map

2020-01-06 Thread Masanobu SAITOH
On 2020/01/06 17:06, Emmanuel Dreyfus wrote:
> On Thu, Jan 02, 2020 at 03:55:31PM +, Emmanuel Dreyfus wrote:
>> And indeed, studying the crash in ddb shows it happens when
>> accessing a physical address that is excluded by x86_fake_clusters() 
>> but included by EFI memory map.
> 
> I think the problem lies in sys/arch/x86/x86/x86_machdep.c:init_x86_vm()
> The function loops on the memory segments provided by EFI/BIOS and 
> call x86_load_region() on the ones that can be used.
> 
> The function considers the case of a segment that spans the
> addresses used by the kernel and splits the segments in that case,
> but it does not consider the cases where the segment is included in
> kernel, or if there is a leading or trailing part that overlaps the
> kernel. 
> 
> The bug happens because a segment overlapping the kernel is used
> by x86_load_region(). 
> 
> Attached patch adds code for that three extra cases, and it makes
> the bug vanish. Is it a reasonable fix? Or did I miss something?
> 

The output:
[   1.000] MEMMAP: UEFI MEMORY MAP (17 ENTRIES):
[   1.000] MEMMAP: 0x-0x0003efff
[   1.000]  size=0x0003f000, type=1(Memory)
[   1.000] MEMMAP: 0x0003f000-0x0003
[   1.000]  size=0x1000, type=2(Reserved)
[   1.000] MEMMAP: 0x0004-0x0009
[   1.000]  size=0x0006, type=1(Memory)
[   1.000] MEMMAP: 0x000a-0x000f
[   1.000]  size=0x0006, type=2(Reserved)
[   1.000] MEMMAP: 0x0010-0x7769efff
[   1.000]  size=0x7759f000, type=1(Memory)
[   1.000] MEMMAP: 0x7769f000-0x776aefff
[   1.000]  size=0x0001, type=2(Reserved)
[   1.000] MEMMAP: 0x776af000-0x7d619fff
[   1.000]  size=0x05f6b000, type=1(Memory)
[   1.000] MEMMAP: 0x7d61a000-0x7e0f6fff
[   1.000]  size=0x00add000, type=2(Reserved)
[   1.000] MEMMAP: 0x7e0f7000-0x7e12dfff
[   1.000]  size=0x00037000, type=1(Memory)
[   1.000] MEMMAP: 0x7e12e000-0x7e1e2fff
[   1.000]  size=0x000b5000, type=4(NVS)
[   1.000] MEMMAP: 0x7e1e3000-0x7f23
[   1.000]  size=0x0105d000, type=2(Reserved)
[   1.000] MEMMAP: 0x7f24-0x7f7f
[   1.000]  size=0x005c, type=1(Memory)
[   1.000] MEMMAP: 0x7f80-0x7fff
[   1.000]  size=0x0080, type=2(Reserved)
[   1.000] MEMMAP: 0xe000-0xefff
[   1.000]  size=0x1000, type=2(Reserved)
[   1.000] MEMMAP: 0xfd00-0xfe7f
[   1.000]  size=0x0180, type=2(Reserved)
[   1.000] MEMMAP: 0xff00-0x
[   1.000]  size=0x0100, type=2(Reserved)
[   1.000] MEMMAP: 0x0001-0x00027fff
[   1.000]  size=0x00018000, type=1(Memory)
[   1.000] segment 0 - 3f000
[   1.000] loading freelist 5 0x8000-0x3f000 (0x8-0x3f)
[   1.000] segment 4 - a
[   1.000] loading freelist 5 0x4-0xa (0x40-0xa0)
[   1.000] segment 10 - 7769f000
[   1.000] split kernel overlapping to 10 - 20 and 1d4e000 - 
7769f000
[   1.000] loading freelist 5 0x10-0x20 (0x100-0x200)
[   1.000] loading freelist 4 0x1d4e000-0x4000 (0x1d4e-0x4)
[   1.000] loading freelist 3 0x4000-0x7769f000 (0x4-0x7769f)
[   1.000] segment 776af000 - 7d61a000
[   1.000] loading freelist 3 0x776af000-0x7d61a000 (0x776af-0x7d61a)
[   1.000] segment 7e0f7000 - 7e12e000
[   1.000] loading freelist 3 0x7e0f7000-0x7e12e000 (0x7e0f7-0x7e12e)
[   1.000] segment 7f24 - 7f80
[   1.000] loading freelist 3 0x7f24-0x7f80 (0x7f240-0x7f800)
[   1.000] segment 1 - 28000
[   1.000] loading default 0x1-0x28000 (0x10-0x28)
[   1.000] pool redzone disabled for 'pdppl'
[   1.000] pool redzone disabled for 'kmem-4096'
[   1.000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 
2004, 2005,
[   1.000] 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 
2016, 2017,
[   1.000] 2018, 2019, 2020 The NetBSD Foundation, Inc.  All rights 
reserved.
[   1.000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.000] The Regents of the University of California.  All rights 
reserved.

[   1.000] NetBSD 9.99.34 (GENERIC) #105: Tue Jan  7 15:21:44 JST 2020
[   1.000]  
msai...@foo.execsw.org:/disk/sources/NetBSD-current/src/sys/arch/amd64/compile/GENERIC
[   1.000] total memory = 8155 MB
[   1.000] avail memory = 7890 MB
[   1.000] pool redzone disabled for 'buf4k'
[   1.000] pool redzone disabled for 'buf64k'
[   1.000] cpu_rng: RDSEED
[   1.000] 

Re: To get net ioctl number

2019-02-28 Thread Masanobu SAITOH

On 2019/02/28 16:54, matthew green wrote:

Iain Hibbert writes:

On Thu, 28 Feb 2019, Masanobu SAITOH wrote:


  I'd like to get new number for new ioctl. How should I find unused number
for it? I'm going to add new SIOCXXX. It may not enough to grep sys/net/*.h,
so I made usr.bin/kdump-ioctl.c and did

grep \'i kdump-ioctl.c | sort -n -k 5,5 | uniq | column -t

I think it might not enough because kdump-ioctl.c has no compat-related
code.

  What should I do?


In most cases, an ioctl value is only valid when performed on a handle
leading to a specific subsystem, meaning that it does not really need to
be globally distinct.


Yes, you're correct.


There is a list of the letters used in ioctl(9)

true that that don't _need_ to be globally unique, but it sure
is nice for kdump(1).


Yes, it's also correct. It's better to unique than overlap.


.mrg.


 Currently, kdump-ioctl.c doesn't include compat ioctl. If I mistakenly
choose new value which is used in the compat code, it would cause trouble.
If we can include compat stuff into kdump, it would be good for user
to know about the old syscall's behavior and it also good to check
system-wide ioctl number usage...

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


To get net ioctl number

2019-02-27 Thread Masanobu SAITOH

 Hi.

 This might be FAQ...

 I'd like to get new number for new ioctl. How should I find unused number
for it? I'm going to add new SIOCXXX. It may not enough to grep sys/net/*.h,
so I made usr.bin/kdump-ioctl.c and did

grep \'i kdump-ioctl.c | sort -n -k 5,5 | uniq | column -t

I think it might not enough because kdump-ioctl.c has no compat-related
code.

 What should I do?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: kernel frameworks for jumbo frame

2019-01-31 Thread Masanobu SAITOH

Hi, all.

 I'm sorry for being little late.

On 2019/01/31 14:21, Rin Okuyama wrote:

On 2019/01/30 23:47, Jason Thorpe wrote:

On Jan 30, 2019, at 3:29 PM, Robert Elz  wrote:

| Yes, I think something like:
  |
  | size_t mbuf_cluster_size_for_size(size_t desired_size);

whatever the name, is unlikely to be needed, that would be
used in the "must specify a supportyed size" variant of the
former, rather than the "pick a supported size that will handle
the request" version that you thought was better (as do I.)


I suppose you're right ... I was thinking that there might be some utility to 
exposing the information, in the event that it could be useful for some 
drivers.  I could certainly be left out for now, and exposed at a later time if 
there was a concrete use for it.


OK. I will write a draft, possibly in this weekend.

On 2019/01/30 22:29, Robert Elz wrote:

 Date:    Wed, 30 Jan 2019 13:07:02 +0200
 From:    Jason Thorpe 
 Message-ID:  
...

   | The idea would be that instead of adding additional property fields
   | to struct ifnet, you could add either a prop_dictionary_t or nvlist

Yes, we know that you like property lists ...

They have the advantage that the structs don't need to
keep changing nearly as often when new data is needed,
with the consequential kernel version bumps (and so, it
becomes possible to pull up any changes which require a
new property with new date) but the disadantage that it
is easy for every developer, of every driver (or whatever),
their dog, and each of the dog's fleas, to all add new
properties, ignoring what was done by each of the others,
fail to document them properly (UTSL!) and we end up with
a completely incomprehensible mess, which no-one can
use or understand.


 I'm also one of the man who don't want to use property carelessly.
The reason is almost the same as kre's.

 A few months ago, I surveyed the diversity of "struct ifreq",
"struct if_media", if_flags, and ifcaps among {Net,Open,Free,DragonFly}BSD
and MacOS(xnu). It was caos. One of big problem and the reason of the
caos is that almost all OSes don't do correct layering.

 I thinks an interface's MTU and the Layer 2's frame size should be
distinct.

 For Ethernet, the frame format is shared with some other protocols,
so such type of structures should be shared (e.g. our ec_cap*).
(XXX Last month, I added ETHERCAP_EEE into ec_cap{abilities,enable}.
I think it's not good because 802.11 doesn't have it. If we have
a good framework under the layer, it would be good to move EEE stuff
into it, but I think moving EEE stuff into if_media is not good.)

 if_media is used for more than 20 years. In this two decades, a lot
of media and a lot of optional features have been added. It's too
difficult to add optional features depend on a media, so I think
if_media should be simple and move some feature options to some other
location (new data structure).

 If someone design a good frame work I'll help to implement it.

 Thanks.


Well, I also feel it little bit too much... Anyway, I will write a
patch to see what it looks like.

On 2019/01/30 21:59, Christos Zoulas wrote:

In article <20190130110317.gc28...@mail.duskware.de>,
Martin Husemann   wrote:

On Wed, Jan 30, 2019 at 07:59:13PM +0900, Rin Okuyama wrote:

I think we agree to add API something like

On 2019/01/29 17:32, Maxime Villard wrote:

  int
  m_addjcl(struct mbuf *m, int how, int size)


Just a minor nit: avoid "JCL" in any names - it causes great pain for some
readers.


It is the function that adds support for the Job Control Language to mbufs.


Oops...

Thanks,
rin



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


ppb(4) related changes

2019-01-28 Thread Masanobu SAITOH

 Hi.

 I'm now working to modify PCI bridge stuff. I'd like to do the following
two modifications:

 0) ppbreg.h definitions

All of PCI configuration space's register definitions are in pcireg.h
The ppbreg.h file also defines the same config registers. I think it's
good to remove ppbreg.hs' definitions an use pcireg.h's definitions.
Is it OK?

 1) Mis-configuration of prefetchable memory in pciconf.c::configure_bridge()
(only for PCI_NETBSD_CONFIGURE)

In configure_bridge():


mem = pci_conf_read(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHMEM_REG);


Read register


#if ULONG_MAX > 0x
if (!PCI_BRIDGE_PREFETCHMEM_64BITS(mem) && mem_limit > 0xULL) {


Test if it's marked as 64bit (the maring is in lower 4bit)


printf("Bus %d bridge does not support 64-bit PMEM.  ",
pb->busno);
printf("Disabling prefetchable-MEM accesses\n");
mem_base  = 0x10;   /* 1M */
mem_limit = 0x00;
}
#endif
mem = (((mem_base >> 20) & PCI_BRIDGE_PREFETCHMEM_BASE_MASK)
<< PCI_BRIDGE_PREFETCHMEM_BASE_SHIFT);
mem |= (((mem_limit >> 20) & PCI_BRIDGE_PREFETCHMEM_LIMIT_MASK)
<< PCI_BRIDGE_PREFETCHMEM_LIMIT_SHIFT);
pci_conf_write(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHMEM_REG, mem);


Regenerate mem value from scratch _WITHOUT_SETTING_LOWER_4BITS_
(It's OK because the lower 4bit is read only).


/*
 * XXX -- 64-bit systems need a lot more than just this...
 */
if (PCI_BRIDGE_PREFETCHMEM_64BITS(mem)) {


Test if it's marked as 64bit or not. It must be wrong because "mem"
is not the original value, newly generated and low 4bits are 0.
It should always be false.


mem_base  = (uint64_t) mem_base  >> 32;
mem_limit = (uint64_t) mem_limit >> 32;
pci_conf_write(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHBASE32_REG,
mem_base & 0x);
pci_conf_write(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHLIMIT32_REG,
mem_limit & 0x);
}


So, proposed patch:
--
Index: pciconf.c
===
RCS file: /cvsroot/src/sys/dev/pci/pciconf.c,v
retrieving revision 1.37
diff -u -p -r1.37 pciconf.c
--- pciconf.c   5 Sep 2014 05:29:16 -   1.37
+++ pciconf.c   28 Jan 2019 10:30:13 -
@@ -855,6 +855,7 @@ configure_bridge(pciconf_dev_t *pd)
pciconf_bus_t   *pb;
pcireg_tio, iohigh, mem, cmd;
int rv;
+   boolisprefetchmem64;
 
 	pb = pd->ppb;

/* Configure I/O base & limit*/
@@ -919,8 +920,9 @@ configure_bridge(pciconf_dev_t *pd)
mem_limit = 0x00;
}
mem = pci_conf_read(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHMEM_REG);
+   isprefetchmem64 = PCI_BRIDGE_PREFETCHMEM_64BITS(mem);
 #if ULONG_MAX > 0x
-   if (!PCI_BRIDGE_PREFETCHMEM_64BITS(mem) && mem_limit > 0xULL) {
+   if (!isprefetchmem64 && mem_limit > 0xULL) {
printf("Bus %d bridge does not support 64-bit PMEM.  ",
pb->busno);
printf("Disabling prefetchable-MEM accesses\n");
@@ -936,7 +938,7 @@ configure_bridge(pciconf_dev_t *pd)
/*
 * XXX -- 64-bit systems need a lot more than just this...
 */
-   if (PCI_BRIDGE_PREFETCHMEM_64BITS(mem)) {
+   if (isprefetchmem64) {
mem_base  = (uint64_t) mem_base  >> 32;
mem_limit = (uint64_t) mem_limit >> 32;
pci_conf_write(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHBASE32_REG,
--

I can't test this diff because I have no any machine which can test
PCI_NETBSD_CONFIGURE now. Is it OK to commit without test?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Tidy up / simplify "com" register initialization

2018-12-05 Thread Masanobu SAITOH

On 2018/12/04 13:53, Jason Thorpe wrote:

I'm working on an application where I need some additional run-time flexibility


PLEASE!
sys/dev/ic/com.c has some compile time #ifdef 's and it cause bad
effect to usb serial and/or puc com devices.



in how the "com" driver register map is setup.  COM_REGMAP cam do the job, but 
there is some untidiness around how com_regs is initialized.  This patch is a small step 
in cleaning that up, with some other minor changes to follow later.

This has been compiled-tested everywhere, and I will spot-check on a few 
different devices that I have handy.  Most of the changes are purely mechanical.

I'll commit in a couple of days unless there are serious objections.

Thx.

https://www.dropbox.com/s/xl4y3jv2u68bnr2/com_regs-patch.txt

-- thorpej




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-04 Thread Masanobu SAITOH

On 2018/12/05 4:53, Jaromír Doleček wrote:

I've now disabled MSI-X for ahcisata(4), need to figure what needs to
be setup there.

Le lun. 3 déc. 2018 à 22:41, Jared McNeill  a écrit :

IIUC we don't actually have confirmation that ThunderX AHCI works yet..
Nick is having issues with his board.


Okay - I misunderstood, thought it was confirmed working.

The BAR used for Cavium is slightly suspicious. AHCI spec mentions
(v1.3.1 section 2.4 Serial ATA Capability) 0x10 BAR as one of 'legacy'
access methods. It specifically only talks about 0x24 as the proper
BAR for AHCI, it doesn't mention option to have it elsewhere.

Jaromir


 I suspect Serial ATA AHCI 1.2.1 specification page 111 has the hint.

 Figure 23: Port/CCC and MSI Message Mapping, Example 1
 Figure 24: Port and MSI Message Mapping, Example 2

I suspect MSI-X also assume this layout. pci_msix_alloc_map() might
be used.

 My SuperMicro A2SDi-H-TP4F's disk is connected to not channel 0
but channel 6. I also verified that GHC.MRSM is 0.

ahcisata1 at pci0 dev 20 function 0: vendor 8086 product 19c2 (rev. 0x11)
ahcisata1: interrupting at msix2 vec 0
ahcisata1: 64-bit DMA
ahcisata1: AHCI revision 1.31, 4 ports, 32 slots, CAP 
0xc3349f43
X GHC = 8002
atabus8 at ahcisata1 channel 3
atabus9 at ahcisata1 channel 4
atabus10 at ahcisata1 channel 5
atabus11 at ahcisata1 channel 6



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-04 Thread Masanobu SAITOH

On 2018/12/04 16:44, Masanobu SAITOH wrote:

On 2018/12/04 4:55, Jaromír Doleček wrote:

Le lun. 3 déc. 2018 à 12:09, Masanobu SAITOH  a écrit :

C3000's AHCI has multi-vector MSI-X table and it doesn't work since
Nobember 20th...


Can you try if by chance this code adapted nvme(4) changes anything on
your system?

http://www.netbsd.org/~jdolecek/ahcisata_msixoff.diff


It panics:

[   1.0427481] panic: extent_alloc_region: bad size
[   1.0427481] cpu0: Begin traceback...
[   1.0427481] vpanic() at netbsd:vpanic+0x16f
[   1.0427481] snprintf() at netbsd:snprintf
[   1.0427481] extent_alloc_region() at netbsd:extent_alloc_region+0x29e
[   1.0427481] bus_space_reserve() at netbsd:bus_space_reserve+0x95
[   1.0427481] bus_space_map() at netbsd:bus_space_map+0x85
[   1.0427481] ahci_pci_attach() at netbsd:ahci_pci_attach+0xde
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] config_found_sm_loc() at netbsd:config_found_sm_loc+0x48
[   1.0427481] pci_probe_device() at netbsd:pci_probe_device+0x582
[   1.0427481] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x197
[   1.0427481] pcirescan() at netbsd:pcirescan+0x47
[   1.0427481] pciattach() at netbsd:pciattach+0x193
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] config_found_sm_loc() at netbsd:config_found_sm_loc+0x48
[   1.0427481] mp_pci_scan() at netbsd:mp_pci_scan+0xa3
[   1.0427481] mainbus_attach() at netbsd:mainbus_attach+0x332
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] cpu_configure() at netbsd:cpu_configure+0x2b
[   1.0427481] main() at netbsd:main+0x331
[   1.0427481] cpu0: End traceback...
[   1.0427481] fatal breakpoint trap in supervisor mode
[   1.0427481] trap type 1 code 0 rip 0x8021de45 cs 0x8 rflags 0x202 
cr2 0 ilevel 0x8 rsp 0x81d027f0
[   1.0427481] curlwp 0x81858900 pid 0.1 lowest kstack 
0x81cfe2c0
Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x5:  leave


pcictl dump shows:

PCI configuration registers:
  Common header:
    0x00: 0x19c28086 0x02b00147 0x01060110 0x

    Vendor Name: Intel (0x8086)
    Device Name: C3000 SATA Controller 1 (0x19c2)
    Command register: 0x0147
  I/O space accesses: on
  Memory space accesses: on
  Bus mastering: on
  Special cycles: off
  MWI transactions: off
  Palette snooping: off
  Parity error checking: on
  Address/data stepping: off
  System error (SERR): on
  Fast back-to-back transactions: off
  Interrupt disable: off
    Status register: 0x02b0
  Immediate Readiness: off
  Interrupt status: inactive
  Capability List support: on
  66 MHz capable: on
  User Definable Features (UDF) support: off
  Fast back-to-back capable: on
  Data parity error detected: off
  DEVSEL timing: medium (0x1)
  Slave signaled Target Abort: off
  Master received Target Abort: off
  Master received Master Abort: off
  Asserted System Error (SERR): off
  Parity error detected: off
    Class Name: mass storage (0x01)
    Subclass Name: SATA (0x06)
    Interface Name: AHCI 1.0 (0x01)
    Revision ID: 0x10
    BIST: 0x00
    Header Type: 0x00 (0x00)
    Latency Timer: 0x00
    Cache Line Size: 0bytes (0x00)

  Type 0 ("normal" device) header:
    0x10: 0xdffb4000 0xdffbd000 0xe071 0xe061
    0x20: 0xe021 0xdffbc000 0x 0x72708086
    0x30: 0x 0x0080 0x 0x010e

    Base address register at 0x10
  type: 32-bit nonprefetchable memory
  base: 0xdffb4000
    Base address register at 0x14
  type: 32-bit nonprefetchable memory
  base: 0xdffbd000
    Base address register at 0x18
  type: I/O
  base: 0xe070
    Base address register at 0x1c
  type: I/O
  base: 0xe060
    Base address register at 0x20
  type: I/O
  base: 0xe020
    Base address register at 0x24
  type: 32-bit nonprefetchable memory
  base: 0xdffbc000
    Cardbus CIS Pointer: 0x
    Subsystem vendor ID: 0x8086
    Subsystem ID: 0x7270
    Expansion ROM Base Address Register: 0x
  base: 0x
  Expansion ROM Enable: off
  Validation Status: Validation not supported
  Validation Details: 0x0
    Capability list pointer: 0x80
    Reserved @ 0x38: 0x
    Maximum Latency: 0x00
    Minimum Grant: 0x00
    Interrupt pin: 0x01 (pin A)
    Interrupt line: 0x0e

  Capability register at 0x80
    type: 0x05 (MSI)
  Capability register at 0x70
    type: 0x01 (Power Management)
  Capability register at 0xa8
    type: 0x12 (SATA)
  Capability register at 0xd0
    type: 0x11 (MSI-X)

  PCI Power Management Capabilities Register
    Capabilities register: 0x4003
  Version: 1.2
  PME# clock: off
  Device specific initialization: off
  3.3V auxiliary current: self-powered
  D1 power management state support: off
  D2 power management state support: off

Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-03 Thread Masanobu SAITOH

On 2018/12/04 4:55, Jaromír Doleček wrote:

Le lun. 3 déc. 2018 à 12:09, Masanobu SAITOH  a écrit :

C3000's AHCI has multi-vector MSI-X table and it doesn't work since
Nobember 20th...


Can you try if by chance this code adapted nvme(4) changes anything on
your system?

http://www.netbsd.org/~jdolecek/ahcisata_msixoff.diff


It panics:

[   1.0427481] panic: extent_alloc_region: bad size
[   1.0427481] cpu0: Begin traceback...
[   1.0427481] vpanic() at netbsd:vpanic+0x16f
[   1.0427481] snprintf() at netbsd:snprintf
[   1.0427481] extent_alloc_region() at netbsd:extent_alloc_region+0x29e
[   1.0427481] bus_space_reserve() at netbsd:bus_space_reserve+0x95
[   1.0427481] bus_space_map() at netbsd:bus_space_map+0x85
[   1.0427481] ahci_pci_attach() at netbsd:ahci_pci_attach+0xde
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] config_found_sm_loc() at netbsd:config_found_sm_loc+0x48
[   1.0427481] pci_probe_device() at netbsd:pci_probe_device+0x582
[   1.0427481] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x197
[   1.0427481] pcirescan() at netbsd:pcirescan+0x47
[   1.0427481] pciattach() at netbsd:pciattach+0x193
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] config_found_sm_loc() at netbsd:config_found_sm_loc+0x48
[   1.0427481] mp_pci_scan() at netbsd:mp_pci_scan+0xa3
[   1.0427481] mainbus_attach() at netbsd:mainbus_attach+0x332
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] cpu_configure() at netbsd:cpu_configure+0x2b
[   1.0427481] main() at netbsd:main+0x331
[   1.0427481] cpu0: End traceback...
[   1.0427481] fatal breakpoint trap in supervisor mode
[   1.0427481] trap type 1 code 0 rip 0x8021de45 cs 0x8 rflags 0x202 
cr2 0 ilevel 0x8 rsp 0x81d027f0
[   1.0427481] curlwp 0x81858900 pid 0.1 lowest kstack 
0x81cfe2c0
Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x5:  leave


pcictl dump shows:

PCI configuration registers:
  Common header:
0x00: 0x19c28086 0x02b00147 0x01060110 0x

Vendor Name: Intel (0x8086)
Device Name: C3000 SATA Controller 1 (0x19c2)
Command register: 0x0147
  I/O space accesses: on
  Memory space accesses: on
  Bus mastering: on
  Special cycles: off
  MWI transactions: off
  Palette snooping: off
  Parity error checking: on
  Address/data stepping: off
  System error (SERR): on
  Fast back-to-back transactions: off
  Interrupt disable: off
Status register: 0x02b0
  Immediate Readiness: off
  Interrupt status: inactive
  Capability List support: on
  66 MHz capable: on
  User Definable Features (UDF) support: off
  Fast back-to-back capable: on
  Data parity error detected: off
  DEVSEL timing: medium (0x1)
  Slave signaled Target Abort: off
  Master received Target Abort: off
  Master received Master Abort: off
  Asserted System Error (SERR): off
  Parity error detected: off
Class Name: mass storage (0x01)
Subclass Name: SATA (0x06)
Interface Name: AHCI 1.0 (0x01)
Revision ID: 0x10
BIST: 0x00
Header Type: 0x00 (0x00)
Latency Timer: 0x00
Cache Line Size: 0bytes (0x00)

  Type 0 ("normal" device) header:
0x10: 0xdffb4000 0xdffbd000 0xe071 0xe061
0x20: 0xe021 0xdffbc000 0x 0x72708086
0x30: 0x 0x0080 0x 0x010e

Base address register at 0x10
  type: 32-bit nonprefetchable memory
  base: 0xdffb4000
Base address register at 0x14
  type: 32-bit nonprefetchable memory
  base: 0xdffbd000
Base address register at 0x18
  type: I/O
  base: 0xe070
Base address register at 0x1c
  type: I/O
  base: 0xe060
Base address register at 0x20
  type: I/O
  base: 0xe020
Base address register at 0x24
  type: 32-bit nonprefetchable memory
  base: 0xdffbc000
Cardbus CIS Pointer: 0x
Subsystem vendor ID: 0x8086
Subsystem ID: 0x7270
Expansion ROM Base Address Register: 0x
  base: 0x
  Expansion ROM Enable: off
  Validation Status: Validation not supported
  Validation Details: 0x0
Capability list pointer: 0x80
Reserved @ 0x38: 0x
Maximum Latency: 0x00
Minimum Grant: 0x00
Interrupt pin: 0x01 (pin A)
Interrupt line: 0x0e

  Capability register at 0x80
type: 0x05 (MSI)
  Capability register at 0x70
type: 0x01 (Power Management)
  Capability register at 0xa8
type: 0x12 (SATA)
  Capability register at 0xd0
type: 0x11 (MSI-X)

  PCI Power Management Capabilities Register
Capabilities register: 0x4003
  Version: 1.2
  PME# clock: off
  Device specific initialization: off
  3.3V auxiliary current: self-powered
  D1 power management state support: off
  D2 power management state support: off
  PME# support D0: off
  PME# suppo

Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-03 Thread Masanobu SAITOH

On 2018/12/01 5:27, Jared McNeill wrote:

On Fri, 30 Nov 2018, Martin Husemann wrote:


On Fri, Nov 30, 2018 at 02:22:37PM -0400, Jared McNeill wrote:

The driver can call pci_intr_type on the handle returned by pci_intr_alloc
to see what kind of interrupt (INTx/MSI/MSI-X) was negotiated.


Yeah, but the fear was that some hardware could either do a single INT
or (lots of) MSI{X}, and now you end up with (lots of) INT returned
from pci_intr_alloc() and can just use the first.

Does this not happen for real world hardware, or is the driver supposed to
check with pci_*_count() upfront?


I don't think pci_intr_alloc will ever return more than one INTx..


C3000's AHCI has multi-vector MSI-X table and it doesn't work since
Nobember 20th...



[   1.0426355] ahcisata0 at pci0 dev 19 function 0: vendor 8086 product 19b2 
(rev. 0x10)
[   1.0426355] allocated pic msix5 type edge pin 0 level 6 to cpu0 slot 21 idt 
entry 110
[   1.0426355] ahcisata0: interrupting at msix5 vec 0
[   1.0426355] ahcisata0: 64-bit DMA
[   1.0426355] ahcisata0: AHCI revision 1.31, 1 port, 32 slots, CAP 
0xc3369f40
[   1.0426355] atabus0 at ahcisata0 channel 3
[   1.0426355] ahcisata1 at pci0 dev 20 function 0: vendor 8086 product 19c2 
(rev. 0x10)
[   1.0426355] allocated pic msix6 type edge pin 0 level 6 to cpu0 slot 22 idt 
entry 111
[   1.0426355] ahcisata1: interrupting at msix6 vec 0
[   1.0426355] ahcisata1: 64-bit DMA
[   1.0426355] ahcisata1: AHCI revision 1.31, 2 ports, 32 slots, CAP 
0xc3369f41
[   1.0426355] atabus1 at ahcisata1 channel 4
[   1.0426355] atabus2 at ahcisata1 channel 5
(snip)
[   3.4536958] wd0 at atabus2 drive 0
[   6.4561682] wd0: IDENTIFY failed
[   6.4561682] wd0: fixing 0 sector size
[   6.4561682] wd0: secperunit and ncylinders are zero
[   9.4686496] wd0(ahcisata1:5:0): using PIO mode 0
(snip)
[   9.6688146] boot device: 



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


DVF_ATTACH_INPROGRESS

2018-11-29 Thread Masanobu SAITOH

 Hi.

 While I've been working for suspend/resume, I've noticed that some devices
are reported "WARNING: power management not supported" in config_attach_loc()
even though the driver has pmf_device_register() in it. The reason is that
pmf_device_register is deferred by config_interrupts(). It's not a real bug,
but the above message is important for suspend/resume work. I wrote the
following patch:

Index: sys/sys/device.h
===
RCS file: /cvsroot/src/sys/sys/device.h,v
retrieving revision 1.156
diff -u -p -r1.156 device.h
--- sys/sys/device.h18 Sep 2018 01:25:09 -  1.156
+++ sys/sys/device.h29 Nov 2018 08:33:29 -
@@ -202,6 +202,7 @@ struct device {
 #defineDVF_CLASS_SUSPENDED 0x0008  /* device class suspend was 
called */
 #defineDVF_DRIVER_SUSPENDED0x0010  /* device driver suspend was 
called */
 #defineDVF_BUS_SUSPENDED   0x0020  /* device bus suspend was 
called */
+#defineDVF_ATTACH_INPROGRESS   0x0040  /* device attach is in progress 
*/
 #defineDVF_DETACH_SHUTDOWN 0x0080  /* device detaches safely at 
shutdown */
 
 #ifdef _KERNEL

Index: sys/kern/subr_autoconf.c
===
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.263
diff -u -p -r1.263 subr_autoconf.c
--- sys/kern/subr_autoconf.c18 Sep 2018 01:25:09 -  1.263
+++ sys/kern/subr_autoconf.c29 Nov 2018 08:33:30 -
@@ -446,6 +446,10 @@ config_interrupts_thread(void *cookie)
while ((dc = TAILQ_FIRST(_config_queue)) != NULL) {
TAILQ_REMOVE(_config_queue, dc, dc_queue);
(*dc->dc_func)(dc->dc_dev);
+   dc->dc_dev->dv_flags &= ~DVF_ATTACH_INPROGRESS;
+   if (!device_pmf_is_registered(dc->dc_dev))
+   aprint_debug_dev(dc->dc_dev,
+   "WARNING: power management not supported\n");
config_pending_decr(dc->dc_dev);
kmem_free(dc, sizeof(*dc));
}
@@ -1597,9 +1601,10 @@ config_attach_loc(device_t parent, cfdat
 
 	(*dev->dv_cfattach->ca_attach)(parent, dev, aux);
 
-	if (!device_pmf_is_registered(dev))

-   aprint_debug_dev(dev, "WARNING: power management not "
-   "supported\n");
+   if (((dev->dv_flags & DVF_ATTACH_INPROGRESS) == 0)
+   && !device_pmf_is_registered(dev))
+   aprint_debug_dev(dev,
+   "WARNING: power management not supported\n");
 
 	config_process_deferred(_config_queue, dev);
 
@@ -1996,6 +2001,7 @@ config_interrupts(device_t dev, void (*f

dc->dc_func = func;
TAILQ_INSERT_TAIL(_config_queue, dc, dc_queue);
config_pending_incr(dev);
+   dev->dv_flags |= DVF_ATTACH_INPROGRESS;
 }
 
 /*

Index: sys/external/bsd/drm2/i915drm/intelfb.c
===
RCS file: /cvsroot/src/sys/external/bsd/drm2/i915drm/intelfb.c,v
retrieving revision 1.15
diff -u -p -r1.15 intelfb.c
--- sys/external/bsd/drm2/i915drm/intelfb.c 27 Aug 2018 15:09:35 -  
1.15
+++ sys/external/bsd/drm2/i915drm/intelfb.c 29 Nov 2018 08:33:30 -
@@ -119,6 +119,7 @@ intelfb_attach(device_t parent, device_t
error);
goto fail1;
}
+   self->dv_flags |= DVF_ATTACH_INPROGRESS;
sc->sc_scheduled = true;
 
 	/* Success!  */

@@ -177,6 +178,7 @@ intelfb_attach_task(struct i915drmkms_ta
};
int error;
 
+	sc->sc_dev->dv_flags &= ~DVF_ATTACH_INPROGRESS;

error = drmfb_attach(>sc_drmfb, );
if (error) {
aprint_error_dev(sc->sc_dev, "failed to attach drmfb: %d\n",


(intelfb.c doesn't use config_interrupts)

OK?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: NetBSD-8 kernel too big?

2018-09-02 Thread Masanobu SAITOH

On 2018/08/30 20:54, D'Arcy Cain wrote:

My latest build of NetBSD 8 results in a kernel size of 24,964,488
bytes.  Not sure if "kernel" is really the right term any more.  :-)

I have installed it on all of my servers except for one.  It is a HP
ProLiant DL160se G6.  The failure is something like this:

Read header failed: input/output error
Boot: ad0a:netbsd: input/output error

I did find something on the net about FreeBSD that suggests that this
can be fixed by disabling a memory hole in the bios but this system does
not seem to have that option.

The 7.1 kernel that boots is 17,812,998 bytes making it just over 16MB.
I am not sure how much of the file actually loads into memory but this
sounds close to a boundary.

I tried removing a bunch of stuff but I run the same kernel on multiple
servers and figuring out what to drop isn't that easy.  Removing a bunch
of stuff that seemed safe didn't make much difference.

I saw these two warnings in the dmesg output:

ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block:
128/64 (20131218/tbfadt-634)
ACPI BIOS Warning (bug): Incorrect checksum in table [OEMB] - 0x29,
should be 0x28 (20131218/tbprint-237)

The first is also in anther server's dmesg which boots the new kernel
fine and research on the net suggests that the second doesn't matter.

Anyone know what my next steps are?

Cheers.


On my machines:

netbsd-7:
 > % ls -l /netbsd
 > -rwxr-xr-x  1 root  wheel  17968892 Aug 29 23:04 /netbsd

netbsd-8:
 > % ls -l /netbsd
 > -rwxr-xr-x  1 root  wheel  24925744 Aug 25 00:19 /netbsd

almost the same size as yours.

netbsd-7:

% objdump -h /netbsd

/netbsd: file format elf64-x86-64

Sections:
Idx Name  Size  VMA   LMA   File off  Algn
  0 .text 008f1e76  8010  0010  0010  2**6
  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata   00358f53  809f1e80  009f1e80  009f1e80  2**5
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .eh_frame 00164688  80d4add8  00d4add8  00d4add8  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 link_set_modules 0430  80eaf460  00eaf460  00eaf460  
2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 link_set_sysctl_funcs 01e0  80eaf890  00eaf890  
00eaf890  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 link_set_ah_rfs 0038  80eafa70  00eafa70  00eafa70  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 link_set_ah_chips 0038  80eafaa8  00eafaa8  00eafaa8  
2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 link_set_domains 0058  80eafae0  00eafae0  00eafae0  
2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 link_set_bufq_strats 0020  80eafb38  00eafb38  00eafb38 
 2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 link_set_evcnts 00f8  80eafb58  00eafb58  00eafb58  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 link_set_dkwedge_methods 0010  80eafc50  00eafc50  
00eafc50  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 11 link_set_linux_module_param_desc 0190  80eafc60  
00eafc60  00eafc60  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 12 link_set_linux_module_param_info 01a0  80eafdf0  
00eafdf0  00eafdf0  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 13 link_set_ieee80211_funcs 0020  80eaff90  00eaff90  
00eaff90  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 link_set_prop_linkpools 0040  80eaffb0  00eaffb0  
00eaffb0  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .data 0009eae0  80fb  00fb  00eb  2**12
  CONTENTS, ALLOC, LOAD, DATA
 16 .data.cacheline_aligned 3004  8104eb00  0104eb00  
00f4eb00  2**6
  CONTENTS, ALLOC, LOAD, DATA
 17 .data.read_mostly 03b8  81051b40  01051b40  00f51b40  
2**5
  CONTENTS, ALLOC, LOAD, DATA
 18 .bss  00092000  81052000  01052000  00f51ef8  2**12
  ALLOC
 19 .note.netbsd.ident 0018  810e4000  810e4000  00f51ef8  
2**2
  CONTENTS, READONLY
 20 .comment  001a      00f51f10  2**0
  CONTENTS, READONLY
 21 .ident00019968      00f51f2a  2**0
  CONTENTS, READONLY
 22 .debug_info   4683      00f6b892  2**0
  CONTENTS, READONLY, DEBUGGING
 23 .debug_abbrev 0238      00f6ff15  2**0
  CONTENTS, READONLY, DEBUGGING
 

Re: if_wm.c: is this a bug, or am I missing something?

2018-04-22 Thread Masanobu SAITOH

On 2018/04/21 3:55, Mouse wrote:

I'm trying to write some bare-metal code (no OS) for a chip that
appears to be supported by if_wm.c, but for which the manufacturer's
documentation PDF is cripplingly broken, leading to my reading over
that file, trying to use it as hardware documentation of a sort...and I
see something odd which looks to me like a bug.  I'm wondering whether
it is or whether I'm just missing something.

In wm_get_swfw_semaphore, I find (lines 12961-12966 of 1.562)

 if (sc->sc_type == WM_T_80003)
 timeout = 50;
 else
 timeout = 200;

 for (timeout = 0; timeout < 200; timeout++) {

Given the first part of the for loop ("timeout = 0"), it looks to me as
though the assignments to timeout above it, and the test controlling
which assignment happens, are completely useless.  If I had to guess,
I'd guess the test and assignments were designed for a loop more like
for (;timeout>0;timeout--).

Is the bug in the code?  Or in my brain?


 You're correct!

 It have been fixed now (in if_wm.c rev 1.575).

Thanks!


/~\ The ASCII Mouse
\ / Ribbon Campaign
  X  Against HTML   mo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: amd64: svs

2018-01-15 Thread Masanobu SAITOH

On 2018/01/15 18:12, Emmanuel Dreyfus wrote:

On Mon, Jan 15, 2018 at 05:34:18PM +0900, Masanobu SAITOH wrote:

cpu0: MSR_IA32_ARCH_CAPABILITIES=0x1

I've not written this bit by myself.


Where can this be readen without your patch?



In cpu_ucode_intel.c::cpu_ucode_intel_apply().

-
cvs diff: Diffing .
Index: cpu_ucode_intel.c
===
RCS file: /cvsroot/src/sys/arch/x86/x86/cpu_ucode_intel.c,v
retrieving revision 1.12
diff -u -p -r1.12 cpu_ucode_intel.c
--- cpu_ucode_intel.c   1 Jun 2017 02:45:08 -   1.12
+++ cpu_ucode_intel.c   15 Jan 2018 09:31:34 -
@@ -110,8 +110,13 @@ cpu_ucode_intel_apply(struct cpu_ucode_s
uint32_t ucodetarget, oucodeversion, nucodeversion;
int platformid, cpuid;
struct intel1_ucode_header *uh;
+   struct cpu_info *ci;
+   struct cpu_info oldci;
+   int i;
void *uha;
size_t newbufsize = 0;
+   uint64_t msr = 0;
+   u_int descs[4];
int rv = 0;

if (sc->loader_version != CPU_UCODE_LOADER_INTEL1
@@ -144,15 +149,37 @@ cpu_ucode_intel_apply(struct cpu_ucode_s
intel_getcurrentucode(, );
cpuid = curcpu()->ci_index;

-   kpreempt_enable();
-
if (nucodeversion != ucodetarget) {
+   kpreempt_enable();
rv = EIO;
goto out;
}

+   ci = curcpu();
+   oldci = *ci;
+#if 0
+   /* Update cpu_info */
+   cpu_probe(curcpu());
+#endif
+   if (ci->ci_max_cpuid >= 7) {
+   x86_cpuid(7, descs);
+   if (descs[3] & CPUID_SEF_ARCH_CAP)
+   msr = rdmsr(MSR_IA32_ARCH_CAPABILITIES);
+   }
+   kpreempt_enable();
+
+   if ((ci->ci_max_cpuid >= 7) && (descs[3] & CPUID_SEF_ARCH_CAP))
+   printf("cpu%d: MSR_IA32_ARCH_CAPABILITIES=0x%lx\n", cpuid,
+   msr);
printf("cpu %d: ucode 0x%x->0x%x\n", cpuid,
   oucodeversion, nucodeversion);
+   for (i = 0; i < __arraycount(ci->ci_feat_val); i++) {
+   if (oldci.ci_feat_val[i] != ci->ci_feat_val[i])
+   printf("cpu%d: cpu_feature[%d] changed from "
+   "%08x to %08x\n", cpuid, i, oldci.ci_feat_val[i],
+   ci->ci_feat_val[i]);
+   }
+
 out:
if (newbufsize != 0)
kmem_free(uha, newbufsize);
-

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: amd64: svs

2018-01-15 Thread Masanobu SAITOH

On 2018/01/15 17:42, Masanobu SAITOH wrote:

On 2018/01/15 17:34, Masanobu SAITOH wrote:

On 2018/01/15 3:42, Dave Huang wrote:

On Jan 13, 2018, at 22:20, Emmanuel Dreyfus <m...@netbsd.org 
<mailto:m...@netbsd.org>> wrote:


John Klos <j...@ziaspace.com <mailto:j...@ziaspace.com>> wrote:


FWIW, Intel updated their microcode on 8-Jan-2018:

https://downloadcenter.intel.com/download/27431/

I can't find any documentation which shows what exactly is fixed for each
family of processors.


It does not seems to fix much.

I tried the tests from git repository
http://github.com/IAIK/meltdown.git

This is Linux-centric, but at least, the "secret" case seems to apply on
NetBSD. It reports vulnerability before (cpu0: microcode version 0x58,
platform ID 1) and after (cpu0: microcode version 0x80, platform ID 1)
microcode upgrade.

And I do not have a spectre test case. Anyone has that?


My understanding is that the microcode update doesn't actually *fix*
anything by itself. What it does is enable an OS to more easily fix
things.

The kernel would need to check for the new SPEC_CTRL CPUID bit,
and if present, twiddle with the new speculation control-related MSRs
at the appropriate times.
--
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org <mailto:k...@azeotrope.org> |  they raise a paw / the 
bat, the cat /
Telegram: @DahanC        |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 42 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++



FYI:

Diffs of "cpuctl -v identify 0" before microcode update and after on
some my machines.

Haswell:

% diff -up o1.log o2.log
--- o1.log  2018-01-11 18:11:36.897983031 +0900
+++ o2.log  2018-01-11 18:11:44.782255154 +0900
@@ -6,7 +6,7 @@ cpu0: 0003:   00
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00042120
 cpu0: 0006: 0077 0002 0009 
-cpu0: 0007:  2fbb  
+cpu0: 0007:  27ab  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300403   0603
@@ -35,8 +35,9 @@ cpu0: features1 0x7ffafbff<X2APIC,MOVBE,
 cpu0: features1 0x7ffafbff<F16C,RDRAND>
 cpu0: features2 0x2c100800
 cpu0: features3 0x21<LAHF,LZCNT>
-cpu0: features5 0x2fbb<FSGSBASE,TSCADJUST,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID>
-cpu0: features5 0x2fbb<RTM,FPUCSDS>
+cpu0: features5 0x27ab<FSGSBASE,TSCADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID>
+cpu0: features5 0x27ab
+cpu0: SEF edx 0xc00<IBRS,STIBP>
 cpu0: xsave features 0x7<x87,SSE,AVX>
 cpu0: xsave instructions 0x1
 cpu0: xsave area size: current 832, maximum 832, xgetbv enabled
@@ -55,4 +56,4 @@ cpu0: SMT ID 0
 cpu0: DSPM-eax 0x77<DTS,IDA,ARAT,PLN,ECMD,PTM>
 cpu0: DSPM-ecx 0x9<HWF,EPB>
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x17, platform ID 1
+cpu0: microcode version 0x23, platform ID 1


  Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Kaby Lake (XXX on xen dom0):

% diff -u xen-0.log xen-1.log
--- xen-0.log   2018-01-15 15:39:37.350775248 +0900
+++ xen-1.log   2018-01-15 15:39:46.135931681 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00142120
 cpu0: 0006: 27f7 0002 0009 
-cpu0: 0007:  029c6fbf  
+cpu0: 0007:  029c6fbf  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300804   0603


oops, the test binary was old.


@ -47,6 +47,7 @@
 cpu0: features5 0x29c6fbf<FSGSBASE,TSCADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS>
 cpu0: features5 0x29c6fbf<INVPCID,RTM,FPUCSDS,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT>
 cpu0: features5 0x29c6fbf
+cpu0: SEF edx 0xc00<IBRS,STIBP>
 cpu0: xsave features 0x1f<x87,SSE,AVX,BNDREGS,BNDCSR>
 cpu0: xsave instructions 0xf<XSAVEOPT,XSAVEC,XGETBV,XSAVES>
 cpu0: xsave area size: current 576, maximum 1088, xgetbv enabled



@@ -67,4 +67,4 @@
 cpu0: DSPM-eax 0x27f7<HWP_EPP,HDC>
 cpu0: DSPM-ecx 0x9<HWF,EPB>
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x48, platform ID 1
+cpu0: microcode version 0x80, platform ID 1


  Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Denverton:

% diff -u dnv-0.log dnv-1.log
--- dnv-0.log   2018-01-15 17:25:58.655815447 +0900
+++ dnv-1.log   2018-01-15 17:26:12.429468878 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 3c000121 0140003f 003f 0001
 cpu0: 0005: 0040 0040 0003 2020
 cpu0: 0006: 0055 0002 0009 
-cpu0: 0007:  2294e283  
+cpu0: 0007:  2294e283  2c00
 cpu0: 0008: 0

Re: amd64: svs

2018-01-15 Thread Masanobu SAITOH

On 2018/01/15 17:34, Masanobu SAITOH wrote:

On 2018/01/15 3:42, Dave Huang wrote:

On Jan 13, 2018, at 22:20, Emmanuel Dreyfus <m...@netbsd.org 
<mailto:m...@netbsd.org>> wrote:


John Klos <j...@ziaspace.com <mailto:j...@ziaspace.com>> wrote:


FWIW, Intel updated their microcode on 8-Jan-2018:

https://downloadcenter.intel.com/download/27431/

I can't find any documentation which shows what exactly is fixed for each
family of processors.


It does not seems to fix much.

I tried the tests from git repository
http://github.com/IAIK/meltdown.git

This is Linux-centric, but at least, the "secret" case seems to apply on
NetBSD. It reports vulnerability before (cpu0: microcode version 0x58,
platform ID 1) and after (cpu0: microcode version 0x80, platform ID 1)
microcode upgrade.

And I do not have a spectre test case. Anyone has that?


My understanding is that the microcode update doesn't actually *fix*
anything by itself. What it does is enable an OS to more easily fix
things.

The kernel would need to check for the new SPEC_CTRL CPUID bit,
and if present, twiddle with the new speculation control-related MSRs
at the appropriate times.
--
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org <mailto:k...@azeotrope.org> |  they raise a paw / the 
bat, the cat /
Telegram: @DahanC        |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 42 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++



FYI:

Diffs of "cpuctl -v identify 0" before microcode update and after on
some my machines.

Haswell:

% diff -up o1.log o2.log
--- o1.log  2018-01-11 18:11:36.897983031 +0900
+++ o2.log  2018-01-11 18:11:44.782255154 +0900
@@ -6,7 +6,7 @@ cpu0: 0003:   00
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00042120
 cpu0: 0006: 0077 0002 0009 
-cpu0: 0007:  2fbb  
+cpu0: 0007:  27ab  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300403   0603
@@ -35,8 +35,9 @@ cpu0: features1 0x7ffafbff<X2APIC,MOVBE,
 cpu0: features1 0x7ffafbff<F16C,RDRAND>
 cpu0: features2 0x2c100800
 cpu0: features3 0x21<LAHF,LZCNT>
-cpu0: features5 0x2fbb<FSGSBASE,TSCADJUST,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID>
-cpu0: features5 0x2fbb<RTM,FPUCSDS>
+cpu0: features5 0x27ab<FSGSBASE,TSCADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID>
+cpu0: features5 0x27ab
+cpu0: SEF edx 0xc00<IBRS,STIBP>
 cpu0: xsave features 0x7<x87,SSE,AVX>
 cpu0: xsave instructions 0x1
 cpu0: xsave area size: current 832, maximum 832, xgetbv enabled
@@ -55,4 +56,4 @@ cpu0: SMT ID 0
 cpu0: DSPM-eax 0x77<DTS,IDA,ARAT,PLN,ECMD,PTM>
 cpu0: DSPM-ecx 0x9<HWF,EPB>
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x17, platform ID 1
+cpu0: microcode version 0x23, platform ID 1


  Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Kaby Lake (XXX on xen dom0):

% diff -u xen-0.log xen-1.log
--- xen-0.log   2018-01-15 15:39:37.350775248 +0900
+++ xen-1.log   2018-01-15 15:39:46.135931681 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00142120
 cpu0: 0006: 27f7 0002 0009 
-cpu0: 0007:  029c6fbf  
+cpu0: 0007:  029c6fbf  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300804   0603


oops, the test binary was old.


@ -47,6 +47,7 @@
 cpu0: features5 0x29c6fbf<FSGSBASE,TSCADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS>
 cpu0: features5 0x29c6fbf<INVPCID,RTM,FPUCSDS,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT>
 cpu0: features5 0x29c6fbf
+cpu0: SEF edx 0xc00<IBRS,STIBP>
 cpu0: xsave features 0x1f<x87,SSE,AVX,BNDREGS,BNDCSR>
 cpu0: xsave instructions 0xf<XSAVEOPT,XSAVEC,XGETBV,XSAVES>
 cpu0: xsave area size: current 576, maximum 1088, xgetbv enabled



@@ -67,4 +67,4 @@
 cpu0: DSPM-eax 0x27f7<HWP_EPP,HDC>
 cpu0: DSPM-ecx 0x9<HWF,EPB>
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x48, platform ID 1
+cpu0: microcode version 0x80, platform ID 1


  Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Denverton:

% diff -u dnv-0.log dnv-1.log
--- dnv-0.log   2018-01-15 17:25:58.655815447 +0900
+++ dnv-1.log   2018-01-15 17:26:12.429468878 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 3c000121 0140003f 003f 0001
 cpu0: 0005: 0040 0040 0003 2020
 cpu0: 0006: 0055 0002 0009 
-cpu0: 0007:  2294e283  
+cpu0: 0007:  2294e283  2c00
 cpu0: 0008:    
 cpu0: 0

Re: amd64: svs

2018-01-15 Thread Masanobu SAITOH

On 2018/01/15 3:42, Dave Huang wrote:

On Jan 13, 2018, at 22:20, Emmanuel Dreyfus > wrote:


John Klos > wrote:


FWIW, Intel updated their microcode on 8-Jan-2018:

https://downloadcenter.intel.com/download/27431/

I can't find any documentation which shows what exactly is fixed for each
family of processors.


It does not seems to fix much.

I tried the tests from git repository
http://github.com/IAIK/meltdown.git

This is Linux-centric, but at least, the "secret" case seems to apply on
NetBSD. It reports vulnerability before (cpu0: microcode version 0x58,
platform ID 1) and after (cpu0: microcode version 0x80, platform ID 1)
microcode upgrade.

And I do not have a spectre test case. Anyone has that?


My understanding is that the microcode update doesn't actually *fix*
anything by itself. What it does is enable an OS to more easily fix
things.

The kernel would need to check for the new SPEC_CTRL CPUID bit,
and if present, twiddle with the new speculation control-related MSRs
at the appropriate times.
--
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org  |  they raise a paw / the 
bat, the cat /
Telegram: @DahanC        |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 42 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++



FYI:

Diffs of "cpuctl -v identify 0" before microcode update and after on
some my machines.

Haswell:

% diff -up o1.log o2.log
--- o1.log  2018-01-11 18:11:36.897983031 +0900
+++ o2.log  2018-01-11 18:11:44.782255154 +0900
@@ -6,7 +6,7 @@ cpu0: 0003:   00
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00042120
 cpu0: 0006: 0077 0002 0009 
-cpu0: 0007:  2fbb  
+cpu0: 0007:  27ab  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300403   0603
@@ -35,8 +35,9 @@ cpu0: features1 0x7ffafbff
 cpu0: features2 0x2c100800
 cpu0: features3 0x21
-cpu0: features5 0x2fbb
-cpu0: features5 0x2fbb
+cpu0: features5 0x27ab
+cpu0: features5 0x27ab
+cpu0: SEF edx 0xc00
 cpu0: xsave features 0x7
 cpu0: xsave instructions 0x1
 cpu0: xsave area size: current 832, maximum 832, xgetbv enabled
@@ -55,4 +56,4 @@ cpu0: SMT ID 0
 cpu0: DSPM-eax 0x77
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x17, platform ID 1
+cpu0: microcode version 0x23, platform ID 1


 Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Kaby Lake (XXX on xen dom0):

% diff -u xen-0.log xen-1.log
--- xen-0.log   2018-01-15 15:39:37.350775248 +0900
+++ xen-1.log   2018-01-15 15:39:46.135931681 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00142120
 cpu0: 0006: 27f7 0002 0009 
-cpu0: 0007:  029c6fbf  
+cpu0: 0007:  029c6fbf  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300804   0603
@@ -67,4 +67,4 @@
 cpu0: DSPM-eax 0x27f7
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x48, platform ID 1
+cpu0: microcode version 0x80, platform ID 1


 Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Denverton:

% diff -u dnv-0.log dnv-1.log
--- dnv-0.log   2018-01-15 17:25:58.655815447 +0900
+++ dnv-1.log   2018-01-15 17:26:12.429468878 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 3c000121 0140003f 003f 0001
 cpu0: 0005: 0040 0040 0003 2020
 cpu0: 0006: 0055 0002 0009 
-cpu0: 0007:  2294e283  
+cpu0: 0007:  2294e283  2c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300404   0603
@@ -44,6 +44,7 @@
 cpu0: features3 0x101
 cpu0: features5 0x2294e283
 cpu0: features5 0x2294e283
+cpu0: SEF edx 0x2c00
 cpu0: xsave features 0x1b
 cpu0: xsave instructions 0xf
 cpu0: xsave area size: current 576, maximum 1088, xgetbv enabled
@@ -59,4 +60,4 @@
 cpu0: DSPM-eax 0x55
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 

Re: xcall while cold == 1

2017-12-27 Thread Masanobu SAITOH

On 2017/12/27 1:37, Martin Husemann wrote:

On Tue, Dec 26, 2017 at 07:05:38PM +0900, Masanobu SAITOH wrote:

so checking mp_online is the best, right?


I wonder if we should additionaly check for ncpu >= 2 (or ncpuonline >=
2), but that is probably overoptimization.


I think doing optimization for single CPU machine with options MULTIPROCESSOR
kernel is not so important. It would rather do optimize for non-MULTIPROCESSOR
kernel.


Martin



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: xcall while cold == 1

2017-12-26 Thread Masanobu SAITOH

On 2017/12/26 11:59, Masanobu SAITOH wrote:

On 2017/12/25 20:26, Martin Husemann wrote:

On Mon, Dec 25, 2017 at 03:42:06PM +0900, Masanobu SAITOH wrote:

  Is this intended behavior? Is it possible to use xcall while
cold==1?


Cold is (as you noted) not the right condition (but pretty close).

Xcalls don't really make any sense before cpus have been spun up.
In your case it might be good to do the loop checking for SPCF_RUNNING
and if <= 1 is found, use the code path for single cpu systems (the
current else statatement).


In init_main.c::configure2():


    cold = 0;   /* clocks are running, we're warm now! */
    s = splsched();
    curcpu()->ci_schedstate.spc_flags |= SPCF_RUNNING;
    splx(s);

    /* Boot the secondary processors. */
    for (CPU_INFO_FOREACH(cii, ci)) {
    uvm_cpu_attach(ci);
    }
    mp_online = true;


so checking mp_online is the best, right?


Martin



Updated diff:
-
Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.44
diff -u -p -r1.44 kern_softint.c
--- kern_softint.c  22 Nov 2017 02:20:21 -  1.44
+++ kern_softint.c  26 Dec 2017 07:47:57 -
@@ -177,6 +177,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_softint
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -430,8 +431,10 @@ softint_disestablish(void *arg)
 * it again.  So, we are only looking for handler records with
 * SOFTINT_ACTIVE already set.
 */
-   where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
-   xc_wait(where);
+   if (__predict_true(mp_online)) {
+   where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
+   xc_wait(where);
+   }
 
 	for (;;) {

/* Collect flag values from each CPU. */
Index: subr_pserialize.c
===
RCS file: /cvsroot/src/sys/kern/subr_pserialize.c,v
retrieving revision 1.9
diff -u -p -r1.9 subr_pserialize.c
--- subr_pserialize.c   21 Nov 2017 08:49:14 -  1.9
+++ subr_pserialize.c   26 Dec 2017 07:47:57 -
@@ -157,6 +157,11 @@ pserialize_perform(pserialize_t psz)
KASSERT(psz->psz_owner == NULL);
KASSERT(ncpu > 0);
 
+	if (__predict_false(mp_online == false)) {

+   psz_ev_excl.ev_count++;
+   return;
+   }
+
/*
 * Set up the object and put it onto the queue.  The lock
 * activity here provides the necessary memory barrier to
Index: subr_psref.c
===
RCS file: /cvsroot/src/sys/kern/subr_psref.c,v
retrieving revision 1.9
diff -u -p -r1.9 subr_psref.c
--- subr_psref.c14 Dec 2017 05:45:55 -  1.9
+++ subr_psref.c26 Dec 2017 07:47:57 -
@@ -429,8 +429,14 @@ psreffed_p(struct psref_target *target,
.ret = false,
};
 
-	/* Ask all CPUs to say whether they hold a psref to the target.  */

-   xc_wait(xc_broadcast(0, _p_xc, , NULL));
+   if (__predict_true(mp_online)) {
+   /*
+* Ask all CPUs to say whether they hold a psref to the
+* target.
+*/
+   xc_wait(xc_broadcast(0, _p_xc, , NULL));
+   } else
+   psreffed_p_xc(, NULL);
 
 	return P.ret;

 }
-

 It might not be required to increment psz_ev_excl evcnt in 
softint_disestablish()
when mp_online == false.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: xcall while cold == 1

2017-12-25 Thread Masanobu SAITOH

On 2017/12/25 20:26, Martin Husemann wrote:

On Mon, Dec 25, 2017 at 03:42:06PM +0900, Masanobu SAITOH wrote:

  Is this intended behavior? Is it possible to use xcall while
cold==1?


Cold is (as you noted) not the right condition (but pretty close).

Xcalls don't really make any sense before cpus have been spun up.
In your case it might be good to do the loop checking for SPCF_RUNNING
and if <= 1 is found, use the code path for single cpu systems (the
current else statatement).


In init_main.c::configure2():


cold = 0;   /* clocks are running, we're warm now! */
s = splsched();
curcpu()->ci_schedstate.spc_flags |= SPCF_RUNNING;
splx(s);

/* Boot the secondary processors. */
for (CPU_INFO_FOREACH(cii, ci)) {
uvm_cpu_attach(ci);
}
mp_online = true;


so checking mp_online is the best, right?


Martin




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


xcall while cold == 1

2017-12-24 Thread Masanobu SAITOH

 Hi.

 While debugging PR#52820 ("boot -1" panics on systems with ixgX interfaces),
I've noticed that xcall doesn't work while cold == 1.

 When I added softint_disestablish() near the end of the ixgbe_attach().
The following panic occured:


panic: kernel diagnostic assertion "xc->xc_donep < xc->xc_headp" failed: file 
"../../../../kern/subr_xcall.c", line 278


This KASSERT is:


static inline uint64_t
xc_lowpri(xcfunc_t func, void *arg1, void *arg2, struct cpu_info *ci)
{
xc_state_t *xc = _low_pri;
CPU_INFO_ITERATOR cii;
uint64_t where;

mutex_enter(>xc_lock);
while (xc->xc_headp != xc->xc_donep) {
cv_wait(>xc_busy, >xc_lock);
}
xc->xc_arg1 = arg1;
xc->xc_arg2 = arg2;
xc->xc_func = func;
if (ci == NULL) {
xc_broadcast_ev.ev_count++;
for (CPU_INFO_FOREACH(cii, ci)) {
if ((ci->ci_schedstate.spc_flags & SPCF_RUNNING) == 0)
continue;
xc->xc_headp += 1;
ci->ci_data.cpu_xcall_pending = true;
cv_signal(>ci_data.cpu_xcall);
}
} else {
xc_unicast_ev.ev_count++;
xc->xc_headp += 1;
ci->ci_data.cpu_xcall_pending = true;
cv_signal(>ci_data.cpu_xcall);
}
KASSERT(xc->xc_donep < xc->xc_headp);<= Here!
where = xc->xc_headp;
mutex_exit(>xc_lock);

/* Return a low priority ticket. */
KASSERT((where & XC_PRI_BIT) == 0);
return where;
}


So I added the following debug printf:
-
@@ -252,6 +253,7 @@ xc_lowpri(xcfunc_t func, void *arg1, voi
xc_state_t *xc = _low_pri;
CPU_INFO_ITERATOR cii;
uint64_t where;
+   int i = -1;

mutex_enter(>xc_lock);
while (xc->xc_headp != xc->xc_donep) {
@@ -263,8 +265,11 @@ xc_lowpri(xcfunc_t func, void *arg1, voi
if (ci == NULL) {
xc_broadcast_ev.ev_count++;
for (CPU_INFO_FOREACH(cii, ci)) {
-   if ((ci->ci_schedstate.spc_flags & SPCF_RUNNING) == 0)
+   i++;
+   if ((ci->ci_schedstate.spc_flags & SPCF_RUNNING) == 0) {
+   printf("cpu %d: XXX not running\n", i);
continue;
+   }
xc->xc_headp += 1;
ci->ci_data.cpu_xcall_pending = true;
cv_signal(>ci_data.cpu_xcall);
@@ -275,6 +280,9 @@ xc_lowpri(xcfunc_t func, void *arg1, voi
ci->ci_data.cpu_xcall_pending = true;
cv_signal(>ci_data.cpu_xcall);
}
+   if (xc->xc_donep >= xc->xc_headp)
+   printf("XXX donep = %" PRIu64 ", headp = %" PRIu64
+   ", ci = %p\n", xc->xc_donep, xc->xc_headp, ci);
KASSERT(xc->xc_donep < xc->xc_headp);
where = xc->xc_headp;
mutex_exit(>xc_lock);
-

The output says

cpu 0: XXX not running
cpu 1: XXX not running
cpu 2: XXX not running
cpu 3: XXX not running
XXX donep = 0, headp = 0, ci = 0x0
panic: kernel diagnostic assertion "xc->xc_donep < xc->xc_headp" failed: file 
"../../../../kern/subr_xcall.c", line 286


(Yes, the exact reason is not cold==1 but all CPU's SPCF_RUNNING is not set)


 Is this intended behavior? Is it possible to use xcall while
cold==1?

For softint_establish(), the following diff avoid panic:

Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.44
diff -u -p -r1.44 kern_softint.c
--- kern_softint.c  22 Nov 2017 02:20:21 -  1.44
+++ kern_softint.c  25 Dec 2017 06:31:57 -
@@ -177,6 +177,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_softint
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -430,8 +431,10 @@ softint_disestablish(void *arg)
 * it again.  So, we are only looking for handler records with
 * SOFTINT_ACTIVE already set.
 */
-   where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
-   xc_wait(where);
+   if (__predict_true(cold == 0)) {
+   where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
+   xc_wait(where);
+   }

for (;;) {
/* Collect flag values from each CPU. */


 I don't know whether this is good fix or not.

 One of the workaround is not call softint_disesablish() in
xxx_attach() and defer it using with config_interrupts(), but
I think it's dirty.

 I'm now writing a code for ixg(4) to fallback from MSI-X to
legacy interrupt when resource 

Re: increase softint_bytes

2017-11-22 Thread Masanobu SAITOH

On 2017/11/21 15:35, Masanobu SAITOH wrote:

On 2017/11/20 17:28, Masanobu SAITOH wrote:

On 2017/11/17 18:42, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Could you test the following patch?

 http://www.netbsd.org/~msaitoh/vlan-20171120-0.dif


Updated patch

 http://www.netbsd.org/~msaitoh/vlan-20171121-0.dif

 Fix compile error (sorry)

 Revert if_wmreg.h 1.104 and if_wm.c 1.542


 Committed in -current.


--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-21 Thread Masanobu SAITOH

On 2017/11/21 17:03, matthew green wrote:

   It'll take a little time to write this change. And, it's low level
and important code, so it will take a time to test the stability
before sending pullup request. So,

   0) Apply the following change to -current.


Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.43
diff -u -p -r1.43 kern_softint.c
--- kern_softint.c  4 Jul 2016 04:20:14 -   1.43
+++ kern_softint.c  21 Nov 2017 06:41:35 -
@@ -217,7 +217,7 @@ typedef struct softcpu {
   
   static void	softint_thread(void *);
   
-u_int		softint_bytes = 8192;

+u_int  softint_bytes = 16384;
   u_intsoftint_timing;
   static u_int softint_max;
   static kmutex_t  softint_lock;


   1) Sent the pullup request to netbsd-8

   2) Write auto-resize code and commit.

   3) If it's stable, send the pullup request to netbsd-8.


sounds good to me.  i would even be ok with a switch
to 16k or even 32k (platform basis?)) for netbsd-8 or


 Committed with 32K.

 Thanks!



earlier right now and allow the resize code to live in
-current until -9.

thanks!


.mrg.




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: increase softint_bytes

2017-11-20 Thread Masanobu SAITOH



can't this be fixed by making it dynamic?


  It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


  I'm sorry. I misread kern_softint.c. The return value is not directly point to
the area, but it's offset of the area, so it would be easy to resize it.

  I'll try to modify the code to do auto-resize.


 It'll take a little time to write this change. And, it's low level
and important code, so it will take a time to test the stability
before sending pullup request. So,

 0) Apply the following change to -current.


Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.43
diff -u -p -r1.43 kern_softint.c
--- kern_softint.c  4 Jul 2016 04:20:14 -   1.43
+++ kern_softint.c  21 Nov 2017 06:41:35 -
@@ -217,7 +217,7 @@ typedef struct softcpu {
 
 static void	softint_thread(void *);
 
-u_int		softint_bytes = 8192;

+u_int  softint_bytes = 16384;
 u_int  softint_timing;
 static u_int   softint_max;
 static kmutex_tsoftint_lock;


 1) Sent the pullup request to netbsd-8

 2) Write auto-resize code and commit.

 3) If it's stable, send the pullup request to netbsd-8.


 OK?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-20 Thread Masanobu SAITOH

On 2017/11/17 18:42, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Could you test the following patch?

http://www.netbsd.org/~msaitoh/vlan-20171120-0.dif


--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-17 Thread Masanobu SAITOH

Hi, all.

On 2017/11/17 15:35, Masanobu SAITOH wrote:

Hi, mrg.

On 2017/11/17 5:05, matthew green wrote:

Masanobu SAITOH writes:

   Hi, all.

   Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif

Summary:

- Increase the default size from 8192 bytes to 32768 bytes.
- Add new option SOFTINT_BYTES to change the value.
- Add two new read-only sysctls kern.softint.{max,count}

Any comment?


can't this be fixed by making it dynamic?


  It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


 I'm sorry. I misread kern_softint.c. The return value is not directly point to
the area, but it's offset of the area, so it would be easy to resize it.

 I'll try to modify the code to do auto-resize.






 ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.


The code is:

ip_init(void)
{
    const struct protosw *pr;

    in_init();
    sysctl_net_inet_ip_setup(NULL);

    pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
    KASSERT(pr != NULL);

    ip_pktq = pktq_create(IFQ_MAXLEN, ipintr, NULL);
    KASSERT(ip_pktq != NULL); <== This.






the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.


  I think this KASSERT() is reasonable.


  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.


bridge, agr, strip, sl, stf, ppp, l2tp, gre and maybe some other
pseudo interfaces which is created by ifconfig command allocate
softint. Some of them may be created tens, hundreds or more.
So I think it's worth to check the current number.




thanks.


.mrg.







--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-16 Thread Masanobu SAITOH

On 2017/11/17 6:25, Jaromír Doleček wrote:

softint_establish() already fails if there is no space, and prints a WARNING in 
this case. The caller however needs to check for the failure, which ixgbe(4) 
mostly doesn't.


No. ixgbe() checks the return value of if_initialize() and return
correctly. That panic is not in ixgbe(4) driver but in ip_input.c::ip_init().


 Not sure if that is really the root cause, the panic seems to be actually in 
different area, and I didn't actually see the warning in the dmesg.


Making softint size fully dynamic is unfortunately not straighforward, it would 
need to reallocate the softint memory on all CPUs in a way that any serviced 
softints would continue working during this time.


 Each driver which call softint_establish() keep the return value, so
it's very hard without big modification.


It would be simple however to adjust it based on % of physical memory, instead 
of fixed value.


It might be good, but it's only small area, so fixed value might be OK.
And not so many people create a lot of pseudo interface, so I think it's
OK to make the max value as a kernel option to increase.



If I count correctly, with current 8192 bytes the system supports some 100 
softints, which seems to be on quite low side - more advanced hardware and 
drivers usually use queue and softint per cpu, so it can quickly run out on 
system with e.g. 32 cores.

I agree that the sysctl seems somewhat unnecessary, since the limit is only 
relevant during boot anyway,


As I wrote, it's incorrect because ifconfig xxxN create allocates softint.



so it's not very helpful.
Jaromir

2017-11-16 21:05 GMT+01:00 matthew green <m...@eterna.com.au 
<mailto:m...@eterna.com.au>>:

    Masanobu SAITOH writes:
>   Hi, all.
>
>   Some device drivers now allocate a lot of softints.
> See:
>
>   http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html 
<http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html>
>
> To avoid this panic, I wrote the following patch:
>
>   http://www.netbsd.org/~msaitoh/softint-20171116-0.dif 
<http://www.netbsd.org/~msaitoh/softint-20171116-0.dif>
>
> Summary:
>
>       - Increase the default size from 8192 bytes to 32768 bytes.
>       - Add new option SOFTINT_BYTES to change the value.
>       - Add two new read-only sysctls kern.softint.{max,count}
>
> Any comment?

can't this be fixed by making it dynamic?  ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.
the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.

thanks.


.mrg.





--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: increase softint_bytes

2017-11-16 Thread Masanobu SAITOH

Hi, mrg.

On 2017/11/17 5:05, matthew green wrote:

Masanobu SAITOH writes:

   Hi, all.

   Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif

Summary:

- Increase the default size from 8192 bytes to 32768 bytes.
- Add new option SOFTINT_BYTES to change the value.
- Add two new read-only sysctls kern.softint.{max,count}

Any comment?


can't this be fixed by making it dynamic?


 It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


 ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.


The code is:

ip_init(void)
{
const struct protosw *pr;

in_init();
sysctl_net_inet_ip_setup(NULL);

pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
KASSERT(pr != NULL);

ip_pktq = pktq_create(IFQ_MAXLEN, ipintr, NULL);
KASSERT(ip_pktq != NULL); <== This.






the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.


 I think this KASSERT() is reasonable.


  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.


bridge, agr, strip, sl, stf, ppp, l2tp, gre and maybe some other
pseudo interfaces which is created by ifconfig command allocate
softint. Some of them may be created tens, hundreds or more.
So I think it's worth to check the current number.




thanks.


.mrg.




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Kernel code documentation and pci data structures (was: Problem with Intel WiFi card)

2017-05-22 Thread Masanobu SAITOH

Hi, Rocky.

On 2017/05/18 18:21, Rocky Hotas wrote:

Hi Masanobu and hi all!
After the discussion on this thread,

http://mail-index.netbsd.org/netbsd-users/2017/02/01/msg019280.html

I'm trying to apply this patch

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/ppb.c.diff?r1=1.35=1.36

to NetBSD. The current version of OpenBSD kernel ppb.c file (which
works in my laptop, where I tried to temporarily install OpenBSD)
has some more lines and is based on the rev. 1.40.
I write to you because you made much of the last commits on the
NetBSD ppb.c sourcefile;


 It's one of the thing that I want to have it in our ppb.c.
I'd like to add PCI(e) hot-plug stuff because of nvme(4)
hot-plug. Reasons why I've not doing the work yet are that
I'm now working another job(fixing bugs in ixg(4) and wm(4))
and I'm not familiar with resource allocation near here :-(


I  write also to the list, to let the
community know these updates.
OpenBSD code is based upon the following members of the
`struct pci_attach_args':

struct extent   *pa_ioex;
struct extent   *pa_memex;

They must be added to the NetBSD code, and then correctly handled.
The extent(9) manpages both in NetBSD and in OpenBSDare both too
much concise. These pages

https://www.netbsd.org/docs/internals/en/
https://www.netbsd.org/docs/kernel/

don't mention nor pci neither extents.
So, a couple of questions:

1) What are the extent(9) used for? They were born in NetBSD (as it
is even stated in the OpenBSD manpage). After a quick discussion on
#netbsd-code with coypu, they seem to be related to address spaces.
If there is some guide like this one

http://www.unixag-kl.fh-kl.de/~jkunz/projekte/NetBSD-driver_writing-1.0.1e.pdf.gz

but for extents(9) and memory, please let me know.

2) About these lines (that are both in NetBSD and in OpenBSD):

static int
ppbmatch(device_t parent, cfdata_t match, void *aux)
{
struct pci_attach_args *pa = aux;


driver(9) explicitly states that "The match function would
type-cast the aux argument to its appropriate attachment structure and
use its contents to determine whether it supports the device". But who
initially fills the `aux' data? Whoever he is, he should then take
care about some new members, like `pa_ioex' and `pa_memex'.

Thank you for having read,

Rocky


 To configure ppb's address decoding, some of the code in
sys/dev/pci/pciconf.c can be used. Basically, to use pciconf.c,
"options PCI_NETBSD_CONFIGURE" is required and it's used for
machines that IPL or boot loader of embedded machine doesn't
initialize PCI bus stuff. Perhaps PCI_NETBSD_CONFIGURE is not
intended to use on x86, so some function might be moved outside
of #ifdef PCI_NETBSD_CONFIGURE. I think it's worth for you to
read sys/dev/pci/pciconf.c

 BTW,

http://mail-index.netbsd.org/netbsd-users/2017/02/01/msg019280.html


pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
iwn0 at pci2 dev 0 function 0: vendor 0x8086 product 0x4232 (rev. 0x00)
: can't map mem space
ppb2 at pci0 dev 28 function 1: vendor 0x8086 product 0x2942 (rev. 0x03)
ppb2: PCI Express capability version 1  x1 @ 
2.5GT/s



Did you check the PCI config area of ppb"1" on your machine?
iwn0 is under ppb1.

Could you show me the output of "pcictl pciN dump" of your ppb1?

And, If you are OK, could you me the full output of pcictl
using with the following script?

http://www.netbsd.org/~msaitoh/pcidump

(Note that the output is huge, so please don't send the output to
the ML. Could you put it somewhere and give us the link?)

 Regards.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PCI BAR's prefetchable bit and pci_mapreg_map()

2017-03-29 Thread Masanobu SAITOH

On 2017/03/24 8:13, Michael wrote:

Hello,

On Wed, 22 Mar 2017 16:35:21 +0900
Masanobu SAITOH <msai...@execsw.org> wrote:


  A) modify pci_mapreg_map(). Stop setting BUS_SPACE_MAP_PREFETCHABLE
 by defalut when prefetchable bit is set. If a driver really know
 the whole area of the BAR is prefetchable, set BUS_SPACE_MAP_PREFETCHABLE
 in the 4th argument(busflags) of pci_mapreg_map(). pci_mapreg_map()
 check for both the 4th argument and the prefetchable bit, it sets
 BUS_SPACE_MAP_PREFETCHABLE only when both bits are set.


On 2nd thought, I think we should ignore the prefetchable bit in the
BAR here


 in pci_mapreg_map()?


and just go with whatever the driver wants,


 You mean:

For pci_mapreg_map():

Don't check prefetchable bit and don't set BUS_SPACE_MAP_PREFETCHABLE
in pci_mapreg_map() like OpenBSD.

For each driver side:

Don't use pci_mapreg_map() and use pci_mapreg_info() and
bus_space_map() to use an area with BUS_SPACE_MAP_PREFETCHABLE
(Plan D in my original mail)

Right? Let call this plan E.

The plan E is OK, but it makes code duplication and the come won't
be simple. To make code simple,


 A) modify pci_mapreg_map(). Stop setting BUS_SPACE_MAP_PREFETCHABLE
by defalut when prefetchable bit is set. If a driver really know
the whole area of the BAR is prefetchable, set BUS_SPACE_MAP_PREFETCHABLE
in the 4th argument(busflags) of pci_mapreg_map(). pci_mapreg_map()
check for both the 4th argument and the prefetchable bit, it sets
BUS_SPACE_MAP_PREFETCHABLE only when both bits are set.

 B) Don't modify pci_mapreg_ma(). Add new API like FreeBSD's pmap_change_attr()
and use it when a driver want.

 C) make new function pci_mapreg_map_wc()

 D) Keep. If a driver know a BAR has prefetchable bit and not the
whole area is prefetchable, do pci_mem_find() -> drop 
BUS_SPACE_MAP_PREFETCHABLE
and do bus_space_map().


   E) Don't check prefetchable bit and don't set BUS_SPACE_MAP_PREFETCHABLE
  in any case in pci_mapreg_map() like OpenBSD.
  Don't use pci_mapreg_map() and use pci_mapreg_info() and
  bus_space_map() if a driver want to use an area with
  BUS_SPACE_MAP_PREFETCHABLE.
 
   F) Any other solution(s).


Plan A, B and C are useful than D and E to make code simple.


since you ran into
devices which have the bit set and shouldn't, and I have seen graphics
chips which don't set it for their video memory apertures.


Even if we choose A or E, we have to check all driver
which already use pci_mapreg_map() and modify some of
them.

For A:
- BAR has no prefetchable bit:
not required to modify
- BAR has prefetchable bit and BUS_SPACE_MAP_PREFETCHABLE can't be used:
not required to modify
- really prefetchable:
set BUS_SPACE_MAP_PREFETCHABLE in the 4th argument

For D:
- BAR has no prefetchable bit:
not required to modify
- BAR has prefetchable bit and BUS_SPACE_MAP_PREFETCHABLE can't be used:
not required to modify
- really prefetchable:
Extract it with pci_mapreg_info() and bus_space_map()



have fun
Michael




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


PCI BAR's prefetchable bit and pci_mapreg_map()

2017-03-22 Thread Masanobu SAITOH

 Hi.

 As some people know, our pci(9) had a problem with devices that
a BAR has prefetchable bit. It makes unexpected behavior on read/
write. To avoid this problem some device drivers have common hack.
Instead of using pci_mapreg_map(), it uses pci_mapreg_info() and
pass the result to bus_space_map() without BUS_SPACE_MAPPREFETCHABLE.
You can see it in:
pci/if_fxp_pci.c
pci/if_bge.c
pci/ixgbe.c
pci/ixv.c
x86/pci/msipic.c

Example in if_bge.c:

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_bge.c.diff?r1=1.272=1.275=h

 I found the cause in PCIe's specification. It's required to pay
money to get the specification, but it's not required for ECN and
Errata. You can see the information about prefetchable bit in
"Errata for the PCI Express(R) Base Specification Revision 2.1"
(November 18, 2010):


https://pcisig.com/specifications/pciexpress/specifications?field_technology_value%5B%5D=express_document_type_value%5B%5D=errata=

See

1.3.2.2 PCI Express Endpoint Rules

7.5.2.1 Base Address Register (offset 20h - 24h)

IMPLEMENTATION NOTE in 7.5.2.1 "Additional Guidance on the Prefetchable
Bit in Memory Space BARs" (< very important)

It says that the prefetchable bit can be set even if the range
include read side effect or effect of write combine...

 Our pci_map.c::pci_mapreg_map() first does pci_mem_find() and
the result is used for does bus_space_map() with BUS_SPACE_MAP_PREFETCHABLE
if a BAR has prefetchable bit. For x86, it result in adding
PMAP_WRITE_COMBINE in the map. cardbus_mem_find() does the same thing.

On other OSes:

 Linux:
It has ioremap() and ioremap_wc(). ioremap_wc() is used
in a driver only if the driver know the whole area
specified in a BAR is really prefetchable. ioremap() doesn't
set the area prefetchable.

 FreeBSD:
 After mapping a area with bus_alloc_resources_any(), if
the driver know the whole area is really prefetchable,
pmap_change_attr() is called with PAT_WRITE_COMBINE.

 OpenBSD:
 It ignores prefetchable bit in pci_mapreg_map()

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/pci_map.c.diff?r1=1.28=1.29=h


 So what should we do?

 A) modify pci_mapreg_map(). Stop setting BUS_SPACE_MAP_PREFETCHABLE
by defalut when prefetchable bit is set. If a driver really know
the whole area of the BAR is prefetchable, set BUS_SPACE_MAP_PREFETCHABLE
in the 4th argument(busflags) of pci_mapreg_map(). pci_mapreg_map()
check for both the 4th argument and the prefetchable bit, it sets
BUS_SPACE_MAP_PREFETCHABLE only when both bits are set.

 B) Don't modify pci_mapreg_ma(). Add new API like FreeBSD's pmap_change_attr()
and use it when a driver want.

 C) make new function pci_mapreg_map_wc()

 D) Keep. If a driver know a BAR has prefetchable bit and not the
whole area is prefetchable, do pci_mem_find() -> drop 
BUS_SPACE_MAP_PREFETCHABLE
and do bus_space_map().

 E) Any other solution(s).

 Usually, framebuffer of video really uses prefetchable access.
Not so many driver use pci_mapreg_map() and use pci_mem_find() and
bus_space_map(), so it's not difficult to change them.

I prefer A.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Changing the return value of xxx_attach() from void to int. (take 2)

2016-07-20 Thread Masanobu SAITOH

 Hello, all.

 Thanks all people who replied to my proposal. One of my fault is that
I didn't show any example code. I made patches. As jnemeth@ said, my
change is straightforward and the change should not cause any catastrophic
problem.



 In my last mail,


c) We can add the code to fallback to lower priority device driver
  like ukphy(4), ugen(4) or others.


 This change is not easy (though it also not difficult.), so in this change
doesn't include it. The fallback code will be a future plan.


Current patches:

0) Caller side (sys/device.h, kern/subr_autoconf.c and config(1))

http://www.netbsd.org/~msaitoh/attach-caller-20160720.dif

1) Device drivers

http://www.netbsd.org/~msaitoh/attach-ddrv-20160720.dif

2) detach related changes of wm(4) for an example

http://www.netbsd.org/~msaitoh/attach-wm-20160720.dif

 "./build.sh release" finishes without any error on x86 and the kernel works.

Details:


===
RCS file: /cvsroot/src/sys/sys/device.h,v
retrieving revision 1.149
diff -u -p -r1.149 device.h
--- sys/sys/device.h19 Jun 2016 09:35:06 -  1.149
+++ sys/sys/device.h20 Jul 2016 05:13:49 -
@@ -313,7 +313,7 @@ struct cfattach {
size_tca_devsize;   /* size of dev data (for alloc) */
int   ca_flags; /* flags for driver allocation etc */
int (*ca_match)(device_t, cfdata_t, void *);
-   void(*ca_attach)(device_t, device_t, void *);
+   int (*ca_attach)(device_t, device_t, void *);
int (*ca_detach)(device_t, int);
int (*ca_activate)(device_t, devact_t);
/* technically, the next 2 belong into "struct cfdriver" */
@@ -401,7 +401,7 @@ typedef int (*cfprint_t)(void *, const c
  * Pseudo-device attach information (function + number of pseudo-devs).
  */
 struct pdevinit {
-   void(*pdev_attach)(int);
+   int (*pdev_attach)(int);
int pdev_count;
 };



 It changes the return value from void to int for both nomal and pseudo
device driver.


Index: sys/kern/subr_autoconf.c
===
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.247
diff -u -p -r1.247 subr_autoconf.c
--- sys/kern/subr_autoconf.c19 Jul 2016 07:44:03 -  1.247
+++ sys/kern/subr_autoconf.c20 Jul 2016 05:13:48 -
@@ -1545,6 +1545,7 @@ config_attach_loc(device_t parent, cfdat
device_t dev;
struct cftable *ct;
const char *drvname;
+   int rv;

dev = config_devalloc(parent, cf, locs);
if (!dev)
@@ -1595,10 +1596,15 @@ config_attach_loc(device_t parent, cfdat
}
device_register(dev, aux);

-   /* Let userland know */
-   devmon_report_device(dev, true);
-
-   (*dev->dv_cfattach->ca_attach)(parent, dev, aux);
+   rv = (*dev->dv_cfattach->ca_attach)(parent, dev, aux);
+   if (rv == 0) {
+   /* Let userland know */
+   devmon_report_device(dev, true);
+   } else {
+   aprint_error("%s: %s: ERROR: failed in attach (rv = %d)\n",
+   __func__, device_xname(dev), rv);
+   dev->dv_flags &= ~DVF_ACTIVE;
+   }

if (!device_pmf_is_registered(dev))
aprint_debug_dev(dev, "WARNING: power management not "


 This part is the main change. We can get three benefit from this change.

1) We can call devmon_report_device() only when an ca_attach() call
   succeeded.

2) We can show a common error message when ca_attach() failed.
   This change has some benefits.

Some drivers don't call aprint_error*() and just return.
We couldn't know fails from such type of drivers. With
this change, we can know it for sure.

Some drivers' error messages aren't clear. It's difficult
for us to know that a ca_attach() call failed in
(the middle of) the attach function. With this change,
we can know it for sure.

mailaprint_error*() increments aprint_error_count. It's
the only difference between aprint_normal*() and
aprint_error*(). We couldn't know what driver's attach
function failed from the caller side. And, some drivers'
attach function call aprint_error*() more than one time,
so the value of aprint_error_count is not the same as
the number of ca_attach()'s fail. With this change,
we can know what driver's ca_attach() failed. And,
if we add a new counter like aprint_error_count, we
can know the exact number of ca_attach()'s fail. It
would reduce the existence value of aprint_error*().
Note that small number of device driver call
 

Re: printing aprint_error_count

2016-07-19 Thread Masanobu SAITOH

On 2016/07/15 18:52, Martin Husemann wrote:

On Fri, Jul 15, 2016 at 06:30:23PM +0900, Masanobu SAITOH wrote:

 Or enclose with #ifdef DIAGNOSTIC?


Assuming we do not use aprint_error when there is no "error",


Some drivers use aprint_error() to warn a problem and continue
attaching.


hiding it for
non-DIAGNOSTIC kernels does not make sense to me (at least in the !silent
or !quiet case).


Me neither though some aprint_error() might make user nervous.


I wonder if we should make the error count available via a read-only sysctl,
so userland could act on it later (especially in quiet or silent boots).


 It would be good, but if we change the return value of the
attach function from void to int(or others), it will become
less required.

 I'll commit without #ifdef DIAGNOSTIC.


Martin




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: printing aprint_error_count

2016-07-15 Thread Masanobu SAITOH

On 2016/07/15 16:45, Masanobu SAITOH wrote:

Hi.

 aprint_error*() increments aprint_error_count.
At the end of config_finalize():


KERNEL_UNLOCK_ONE(NULL);

errcnt = aprint_get_error_count();
if ((boothowto & (AB_QUIET|AB_SILENT)) != 0 &&
(boothowto & AB_VERBOSE) == 0) {
mutex_enter(_misc_lock);
if (config_do_twiddle) {
config_do_twiddle = 0;
printf_nolog(" done.\n");
}
mutex_exit(_misc_lock);
if (errcnt != 0) {
printf("WARNING: %d error%s while detecting hardware; "
"check system log.\n", errcnt,
errcnt == 1 ? "" : "s");
}
}
}


I think it's worth to print this message even if AB_QUIET
and AB_SILENT are not set.

Index: subr_autoconf.c
===
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.243
diff -u -p -r1.243 subr_autoconf.c
--- subr_autoconf.c11 Jul 2016 07:42:13 -1.243
+++ subr_autoconf.c15 Jul 2016 06:43:39 -
@@ -2182,11 +2182,11 @@ config_finalize(void)
 printf_nolog(" done.\n");
 }
 mutex_exit(_misc_lock);
-if (errcnt != 0) {
-printf("WARNING: %d error%s while detecting hardware; "
-"check system log.\n", errcnt,
-errcnt == 1 ? "" : "s");
-}
+}
+if (errcnt != 0) {
+printf("WARNING: %d error%s while detecting hardware; "
+"check system log.\n", errcnt,
+errcnt == 1 ? "" : "s");
 }
 }

I think it's not required to check AB_VERBOSE


 Or enclose with #ifdef DIAGNOSTIC?


because
not so many people set this flag and it makes difficult
to notice problem(s). Is it OK to commit?

 On my machines ichpcib's TCO timer and acpiec count error.


acpiec0 at acpi0 (H_EC, PNP0C09-1)
acpiec0: unable to evaluate _GPE: AE_NOT_FOUND

(snip)

tco0 at ichlpcib0: TCO (watchdog) timer configured.
tco0: TCO timer reboot disabled by hardware; hope SMBIOS properly handles it.






--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



printing aprint_error_count

2016-07-15 Thread Masanobu SAITOH

Hi.

 aprint_error*() increments aprint_error_count.
At the end of config_finalize():


KERNEL_UNLOCK_ONE(NULL);

errcnt = aprint_get_error_count();
if ((boothowto & (AB_QUIET|AB_SILENT)) != 0 &&
(boothowto & AB_VERBOSE) == 0) {
mutex_enter(_misc_lock);
if (config_do_twiddle) {
config_do_twiddle = 0;
printf_nolog(" done.\n");
}
mutex_exit(_misc_lock);
if (errcnt != 0) {
printf("WARNING: %d error%s while detecting hardware; "
"check system log.\n", errcnt,
errcnt == 1 ? "" : "s");
}
}
}


I think it's worth to print this message even if AB_QUIET
and AB_SILENT are not set.

Index: subr_autoconf.c
===
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.243
diff -u -p -r1.243 subr_autoconf.c
--- subr_autoconf.c 11 Jul 2016 07:42:13 -  1.243
+++ subr_autoconf.c 15 Jul 2016 06:43:39 -
@@ -2182,11 +2182,11 @@ config_finalize(void)
printf_nolog(" done.\n");
}
mutex_exit(_misc_lock);
-   if (errcnt != 0) {
-   printf("WARNING: %d error%s while detecting hardware; "
-   "check system log.\n", errcnt,
-   errcnt == 1 ? "" : "s");
-   }
+   }
+   if (errcnt != 0) {
+   printf("WARNING: %d error%s while detecting hardware; "
+   "check system log.\n", errcnt,
+   errcnt == 1 ? "" : "s");
}
 }

I think it's not required to check AB_VERBOSE because
not so many people set this flag and it makes difficult
to notice problem(s). Is it OK to commit?

 On my machines ichpcib's TCO timer and acpiec count error.


acpiec0 at acpi0 (H_EC, PNP0C09-1)
acpiec0: unable to evaluate _GPE: AE_NOT_FOUND

(snip)

tco0 at ichlpcib0: TCO (watchdog) timer configured.
tco0: TCO timer reboot disabled by hardware; hope SMBIOS properly handles it.



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Changing the return value of xxx_attach() from void to int.

2016-06-23 Thread Masanobu SAITOH

 Hi, all.

 As you know, the return value of device driver's attach function is void.
I've thought that we should change it to int for many years. I believe I'm
not the only person.

 xxx_attach() may fail the following cases:

got unexpected behavior.

resource allocation error.

driver is really broken.

some others.

 xxx_attach() is void, so the caller can't know the fail. It makes some
problems:

a) The OS may touch the broken device while the OS is running.
   It may causes a panic.

b) The shutdown sequence calls xxx_shutdown() even if
   it's not really attached. It may causes panic. We have met
   this bugs many times.

c) To prevent b), we have added extra code into xxx_shutdown().
   It wastes our time and the code become big and complex.

d) Resource leak. For example, A device_t structure is allocated
   by caller side.

 If we change the return value to int or any other type's value, we can
do the following things.

a) Don't register failed device to avoid problem.

b) If we don't do a), we can add code to not to call xxx_shutdown()
   for broken devices instead.

c) We can add the code to fallback to lower priority device driver
  like ukphy(4), ugen(4) or others.

d) (add some other good features?)

 What do you think about this change? If you're OK, I'll change _ALL_ device
drivers' attach function first by the following way:

0) Change return value to int.

1) Change "return;" to "return 0;" or "return -1;"
   (or bool value or Exxx. See below)

2) I won't modify the caller side for the time being.

In this way, it won't break anything because the caller doesn't check the
value. Even if I mistakenly modified the return value and we add code to
check the return value, it won't be a big problem because almost all drivers
don't go into the failure paths. So, I think it's not required to use new
CFATTACH_() because it makes both the driver and the caller side be
complex.

 Now I'm wondering if I should return "-1" or Exxx. FreeBSD returns Exxx
(e.g. ENOMEN, EIO, ENXIO, etc.). I don't know which one is better.

Changing to -1 is easier than Exxx because it's not required
to wonder what Exxx I should choose.

Changing to Exxx may be good if we check the error code and
do something depend on the value though I can't imagine anything now.

-1 and Exxx can be mixed because both are int. We can detect
error with "if (rv != 0)" even if it's mixed.

 Any objection, advice or idea?

 Thanks.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: PCI extended configuration support

2015-10-01 Thread Masanobu SAITOH
On 2015/09/28 22:41, Christos Zoulas wrote:
> On Sep 28,  7:51pm, msai...@execsw.org (Masanobu SAITOH) wrote:
> -- Subject: Re: PCI extended configuration support
> 
> | On 2015/09/27 11:03, Christos Zoulas wrote:
> | > In article <5604d1be.3080...@execsw.org>,
> | > Masanobu SAITOH  <msai...@execsw.org> wrote:
> | >> On 2015/09/11 19:44, Masanobu SAITOH wrote:
> | >>
> | >> More machines support the extended configuration area than before.
> | >>
> | >>   Is it OK to commit?
> | >
> | > LGTM, but since this is not performance critical can't we centralize the
> 
> | > range test and do it in a wrapper?
> |
> |  We have no good idea... :-(
> 
> I meant instead of editing each of the drivers edit the pci_machdep.h file
> of each arch from:
> 
> #define pci_conf_read(c, t, r)  \
>  (*(c)->pc_conf_read)((c)->pc_conf_v, (t), (r))
> #define pci_conf_write(c, t, r, v)  \
>  (*(c)->pc_conf_write)((c)->pc_conf_v, (t), (r), (v))
> 
> to:
> 
> static __inline pcireg_t pci_conf_read(void *_cpv, pcitag_t _tag, int 
> _offset) {
>   if ((unsigned int)_offset >= PCI_CONF_SIZE)
>   return (pcireg_t)-1;
>   return *((c)->pc_conf_read)(
>   ((struct _pci_chipset *)_cpv)->pc_conf_v, _tag, _offset);
> }
> 
> static __inline void pci_conf_write(void *_cpv, pcitag_t _tag, int _offset) {
>   if ((unsigned int)_offset >= PCI_CONF_SIZE)
>   return;
>   return *((c)->pc_conf_write)(
>   ((struct _pci_chipset *)_cpv)->pc_conf_v, _tag, _offset);
> }
> 
> Or you could even make a macro of it in pci_var.h like:
> 
> #define PCI_ARCH_CONF_READ(c, t, r) \
>   do { \
>   if ((unsigned int)(r) >= PCI_CONF_SIZE) \
>   return (pcireg_t) -1; \
>   return *((c)->pc_conf_read)((c), (t), (r)); \
>   while (/*CONSTCOND*/0)
> 
> And then change the macro in arch/include/pci_machdep.h to:
> 
> #define pci_conf_read(c, t, r) PCI_ARCH_CONF_READ(c, t, r)
> 
> Or you could just centralize the machdep macro to pci_var.h, and put it
> around ifdefs so that the machdep code can override it:
> 
> #ifndef pci_conf_read
> #define pci_conf_read(c, t, r) \
>   do { \
>   if ((unsigned int)(r) >= PCI_CONF_SIZE) \
>   return (pcireg_t) -1; \
>   return *((c)->pc_conf_read)((c), (t), (r)); \
>   while (/*CONSTCOND*/0)
> #endif

 The size of accessible area is not fixed and it's machine
(SoC or implement of PCI bus) dependent, so it should be done
in ((c)->pc_conf_read)(). Or use function like:

if ((unsigned int)(r) >= pci_conf_size()) \
return (pcireg_t) -1; \

 pci_conf_read() have to call chip dependent function. I think
it's not good idea to call chip dependent function in pci_conf_size()
and ((c)->pci_conf_read)() twice. And, NOT all archs have pc_conf_read().

 I think writing pci's commom API function with a macro is not good for
kernel module.

> | > Plus this can't be negative so perhaps
> | > it should be unsigned or checked for < 0 too??
> | >
> | > + if (reg >= PCI_CONF_SIZE)
> | > + return (pcireg_t) -1;
> | > +
> |
> |  Done.
> 
> I think you missed some...

 Fixed. Thanks.

> | > Finally a few more constants could be defined instead
> | > of hard-coded values
> | >
> | > + switch (pcie_devtype) {
> | > + case 0x4:   /* Root Port of PCI Express Root Complex */
> | > + case 0xa:   /* Root Complex Event Collector */
> |
> |  Done.
> 
> Thanks!
> 
> christos

 Final diff:

 
http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150929-nbsd-pci-extconf-support.diff


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PCI extended configuration support

2015-09-25 Thread Masanobu SAITOH

On 2015/09/11 19:44, Masanobu SAITOH wrote:

  Hi, Jonathan.

On 2015/09/09 23:02, Jonathan A. Kollasch wrote:

On Tue, Sep 08, 2015 at 12:58:31PM +0900, MAEKAWA Masahide wrote:

On 2015/09/08 0:59, Jonathan A. Kollasch wrote:

   - abuse of the pcitag_t  (I have patches that fix this.)


Your patches? Where? your local?


Attached.  This has been successfully tested on a machine that uses
Mode 2 configuration access.  Mode 2 is very rare nowadays, and is being
misdetected more often now that firmware is placing IO ports up in the
0xC000ish range.  The Mode 2 tag is incapable of representing a complete
PCI Bus,Device,Function tuple, and thus has no place being a generic
representation of a BDF for use by any configuration access method.
This patch makes the x86 pcitag_t a simple BDF.


  Thanks. Why you haven't committed that change? I think it's OK to commit.


   - pci_conf_size() is unnecessary


Why?


Because there's no need to care.  We already support extended
configuration space on some evbmips/arm, and haven't needed it yet.
Just read back 0x and ignore writes if extended configuration is
inaccessible.  No need to bloat the API and kernel.


 You mean the range check should be modified in all MD part?
And also, MI driver should check the ability of accessing extended
configuration area with:

 a) the return value of pci_get_ext_capability()
or
 b) write code in each driver

Only small number of driver will be required to use b), so it might
be OK to provide MI function to do it now.

And,


 - too much ACPI dependence, there's nothing that says you have to use
   ACPI MCFG to get this information


  I think updated patch solved the prolbem (see below)


 - possibility of too much KVA usage on 32-bit kernels

 - on-demand bus mapping may happen in inappropriate (interrupt) context


  In old patch, all bus area which was accessed were mapped and never unmapped.
In updated patch, configuration areas are mapped when a pci bus are attached 
(pci).
And then, if there is no any devices in the bus, it's unmapped. Is it OK?


 - x86 bus_space_read/write functions do not use the %eax/ax/al register,
 as required by the AMD documentation for Family 10h's (and others) extended
 configuration space access.


  We have not known it. Thanks. It's fixed in the new patch.

 
http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150911-nbsd-pci-extconf-support.diff

Could you check again?

  And, I suspect that you have other diffs locally, could we share your diff?


Jonathan Kollasch



Updated patch:

http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150925-nbsd-pci-extconf-support.diff

More machines support the extended configuration area than before.

 Is it OK to commit?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: PCI extended configuration support

2015-09-11 Thread Masanobu SAITOH

 Hi, Jonathan.

On 2015/09/09 23:02, Jonathan A. Kollasch wrote:

On Tue, Sep 08, 2015 at 12:58:31PM +0900, MAEKAWA Masahide wrote:

On 2015/09/08 0:59, Jonathan A. Kollasch wrote:

   - abuse of the pcitag_t  (I have patches that fix this.)


Your patches? Where? your local?


Attached.  This has been successfully tested on a machine that uses
Mode 2 configuration access.  Mode 2 is very rare nowadays, and is being
misdetected more often now that firmware is placing IO ports up in the
0xC000ish range.  The Mode 2 tag is incapable of representing a complete
PCI Bus,Device,Function tuple, and thus has no place being a generic
representation of a BDF for use by any configuration access method.
This patch makes the x86 pcitag_t a simple BDF.


 Thanks. Why you haven't committed that change? I think it's OK to commit.


   - pci_conf_size() is unnecessary


Why?


Because there's no need to care.  We already support extended
configuration space on some evbmips/arm, and haven't needed it yet.
Just read back 0x and ignore writes if extended configuration is
inaccessible.  No need to bloat the API and kernel.


 You mean the range check should be modified in all MD part?
And also, MI driver should check the ability of accessing extended
configuration area with:

a) the return value of pci_get_ext_capability()
or
b) write code in each driver

Only small number of driver will be required to use b), so it might
be OK to provide MI function to do it now.

And,


 - too much ACPI dependence, there's nothing that says you have to use
   ACPI MCFG to get this information


 I think updated patch solved the prolbem (see below)


 - possibility of too much KVA usage on 32-bit kernels

 - on-demand bus mapping may happen in inappropriate (interrupt) context


 In old patch, all bus area which was accessed were mapped and never unmapped.
In updated patch, configuration areas are mapped when a pci bus are attached 
(pci).
And then, if there is no any devices in the bus, it's unmapped. Is it OK?


 - x86 bus_space_read/write functions do not use the %eax/ax/al register,
 as required by the AMD documentation for Family 10h's (and others) extended
 configuration space access.


 We have not known it. Thanks. It's fixed in the new patch.


http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150911-nbsd-pci-extconf-support.diff

Could you check again?

 And, I suspect that you have other diffs locally, could we share your diff?


Jonathan Kollasch




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: Wake-on-LAN no longer working in netbsd-7

2015-09-10 Thread Masanobu SAITOH

On 2015/09/10 17:24, Martin Husemann wrote:

On Thu, Sep 10, 2015 at 10:21:01AM +0200, Marc Balmer wrote:

I used to start a netbsd-6 machine by sending Wake-on-LAN frames wo it.

After upgrading that machine to netbsd-7, it ignores these WoL packages and
does not start.

Its an i386 with bge(4) interface (a HP micro server).


 Which generation? G8 or others?
Could you show me the dmesg?



Something must have changed in how netbsd shutsdown -p a machine.


Check for acpi errors in dmesg and connect a console to check for errors
printed during shutdown. It should end in something like "entering ACPI sleep
state 5".

Martin




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


PCI extended configuration support

2015-09-07 Thread Masanobu SAITOH
 Hi, all.

 nonaka@ wrote code to access PCI extended configuration area.
Currently, the diff supoorts only on x86.


http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/nbsd-pci-extconf-support.diff

Is it OK to commit?

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Tester(s) needed: ixv(4)

2015-08-13 Thread Masanobu SAITOH

On 2015/08/13 18:14, Bert Kiers wrote:

On Thu, Aug 13, 2015 at 02:16:27PM +0900, Masanobu SAITOH wrote:

  Hi, all.

  I've commited change to support ixv(4), Intel 10G Ethernet's virtual
function. The reason why this driver had not been compilable is that
it supports MSI-X only.

  Currently, I have not environment to test this driver. Could some
people who are familiar with it test with the latest -current
(and fix bugs)?


Sounds interesting, but what is it?  Some advanced function of Intel
X540?


 82599 and newer chips has SR-IOV (Single Root I/O Virtualization) function.
It can generate vritual function(VF)s up to 63 VFs. Each VF can pass
through to an guest OS. Link status, statistics and some others are
managed in the physical function(PF) of the device.


I have read
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/ixgbe/ixv.c?only_with_tag=MAIN

but

http://netbsd.gw.com/cgi-bin/man-cgi?ixv++NetBSD-current is empty.


 Because the original FreeBSD's ixgbe/ixv drivers has ixgbe.4 only :-|


I have X540 (and X710), but don't know how to test.


It's required to use MSI/MSI-X function and SR-IOV. Currently,
x86 except XEN support MSI/MSI-X function. And, to control SR-IOV,
it's required to access the PCIe's extended configuration area.
It's not supported in NetBSD yet (though nonaka@ is working on it).
So, we can't use ixv(4) via NetBSD/xen DOM0 :-|
Amazon EC2's C3 and C4 instance support ixv(ixgbevf in linux), but
it can't be used because NetBSD/xen doesn't support MSI/MSI-X.

 NetBSD's ixv(4) can be test with Linux KVM or VMware vSphere.
Some people other than me must be familar with them :)

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Tester(s) needed: ixv(4)

2015-08-12 Thread Masanobu SAITOH
 Hi, all.

 I've commited change to support ixv(4), Intel 10G Ethernet's virtual
function. The reason why this driver had not been compilable is that
it supports MSI-X only.

 Currently, I have not environment to test this driver. Could some
people who are familiar with it test with the latest -current
(and fix bugs)?

 Regards.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: argument of pci_msi[x]_count()

2015-08-10 Thread Masanobu SAITOH
Hi, Christos.

On 2015/08/07 18:53, Christos Zoulas wrote:
 In article 55c349bf.2000...@execsw.org,
 Masanobu SAITOH  msai...@execsw.org wrote:
 Hi, all.

 Currently, pci_msi_count() and pci_msix_count() take one
 pci_attach_args argument.
 These functions may be used in other than attach function. So, it might be
 better to use pci_chipset_tag_t and pcitag_t.

 Is the following diff better than current specification?
 
 Ok, but this makes them different than their alloc counterparts. Should
 we change those too?

Almost all pci related functions don't take pci_attach_args as
an argument. Some functions are required to take it because
some elements(e.g. pa_iot and pa_memt) in the structure are required.

 Before introducing MSI/MSI-X API, the following functions take
pci_attach_args as an argument.

pci_find_device()
pci_mapreg_map()
pci_mapreg_submap()
pci_intr_map()
pci_aprint_devinfo()
pci_attach_hook()

Some above functions refers pa_iot and pa_memt in them.

And then, the following functions which take pci_attach_args
as an argument are added:

pci_intx_alloc()
pci_intr_alloc()
pci_msi_count()
pci_msi_alloc()
pci_msi_alloc_exact()
pci_msix_count()
pci_msix_alloc()
pci_msix_alloc_exact()
pci_msix_alloc_map()

 pci_intr_map() takes pci_attach_args, so it's consistent.
In reality, it's not required for x86 to take pci_attach_args.
The MSI/MSI-X related alloc function might not required
to take pci_attach_args, but I think it's ok to keep current
API for the consistency and possibility of using pci_attach_args
on other archs.

 So, is it ok to change pci_msi[x]_count() only?


 christos
 


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PR/49328 CVS commit: src/sys/dev/pci/ixgbe

2015-02-17 Thread Masanobu SAITOH

On 2015/02/17 19:06, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Wed, 4 Feb 2015, Christos Zoulas wrote:
...

christos


I have tested NetBSD 7.99.x

The problem is now solved. The PR can be closed.


Regards
Uwe


 Thanks. I'll send the pullup requet to netbsd-7.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ixg(4) mutex problem (PR#49328)

2015-02-05 Thread Masanobu SAITOH

On 2015/01/29 16:31, Masanobu SAITOH wrote:
(snip)

  Some of Ethernet drivers avoid using splnet()/splx() and use mutexes.
wm, bnx, ixg, powerpc/booke/dev/pq3etsec.c and others. I think it's better
to know the common solution and puse the way.


 I wrote the following page:


https://wiki.netbsd.org/users/msaitoh/Comparison_of_implementations_of_Ethernet_drivers/

Please add, modify and fix this page :-)

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PR/49328 CVS commit: src/sys/dev/pci/ixgbe

2015-02-03 Thread Masanobu SAITOH
 Hi, all.

 I fixed and committed some smallproblem to -current.

 The rest is:

- Revert ixgbe_netbsd.c rev. 1.2
- make CORE_LOCK adaptive
- Release spin lock while reinitializing the jumb buffer structure.

 Is it OK to commit?


Index: ixgbe.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.c,v
retrieving revision 1.17
diff -u -r1.17 ixgbe.c
--- ixgbe.c 4 Feb 2015 04:37:13 -   1.17
+++ ixgbe.c 4 Feb 2015 04:38:10 -
@@ -3933,12 +3933,16 @@
/* Free current RX buffer structs and their mbufs */
ixgbe_free_receive_ring(rxr);
 
+   IXGBE_RX_UNLOCK(rxr);
+
/* Now reinitialize our supply of jumbo mbufs.  The number
 * or size of jumbo mbufs may have changed.
 */
ixgbe_jcl_reinit(adapter-jcl_head, rxr-ptag-dt_dmat,
2 * adapter-num_rx_desc, adapter-rx_mbuf_sz);
 
+   IXGBE_RX_LOCK(rxr);
+
/* Configure header split? */
if (ixgbe_header_split)
rxr-hdr_split = TRUE;
Index: ixgbe.h
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.h,v
retrieving revision 1.1
diff -u -r1.1 ixgbe.h
--- ixgbe.h 12 Aug 2011 21:55:29 -  1.1
+++ ixgbe.h 4 Feb 2015 04:38:10 -
@@ -476,7 +476,7 @@
 
 
 #define IXGBE_CORE_LOCK_INIT(_sc, _name) \
-mutex_init((_sc)-core_mtx, MUTEX_DEFAULT, IPL_NET)
+mutex_init((_sc)-core_mtx, MUTEX_DEFAULT, IPL_SOFTNET)
 #define IXGBE_CORE_LOCK_DESTROY(_sc)  mutex_destroy((_sc)-core_mtx)
 #define IXGBE_TX_LOCK_DESTROY(_sc)mutex_destroy((_sc)-tx_mtx)
 #define IXGBE_RX_LOCK_DESTROY(_sc)mutex_destroy((_sc)-rx_mtx)
Index: ixgbe_netbsd.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe_netbsd.c,v
retrieving revision 1.2
diff -u -r1.2 ixgbe_netbsd.c
--- ixgbe_netbsd.c  28 Jan 2015 00:30:25 -  1.2
+++ ixgbe_netbsd.c  4 Feb 2015 04:38:10 -
@@ -56,7 +56,7 @@
 
*dtp = NULL;
 
-   if ((dt = kmem_zalloc(sizeof(*dt), KM_NOSLEEP)) == NULL)
+   if ((dt = kmem_zalloc(sizeof(*dt), KM_SLEEP)) == NULL)
return ENOMEM;
 
dt-dt_dmat = dmat;
@@ -136,19 +136,19 @@
ixgbe_extmem_t *em;
int nseg, rc;
 
-   em = kmem_zalloc(sizeof(*em), KM_NOSLEEP);
+   em = kmem_zalloc(sizeof(*em), KM_SLEEP);
 
if (em == NULL)
return NULL;
 
rc = bus_dmamem_alloc(dmat, size, PAGE_SIZE, 0, em-em_seg, 1, nseg,
-   BUS_DMA_NOWAIT);
+   BUS_DMA_WAITOK);
 
if (rc != 0)
goto post_zalloc_err;
 
rc = bus_dmamem_map(dmat, em-em_seg, 1, size, em-em_vaddr,
-   BUS_DMA_NOWAIT);
+   BUS_DMA_WAITOK);
 
if (rc != 0)
goto post_dmamem_err;


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PR/49328 CVS commit: src/sys/dev/pci/ixgbe

2015-02-03 Thread Masanobu SAITOH
On 2015/02/04 14:08, Christos Zoulas wrote:
 On Feb 4,  1:49pm, msai...@execsw.org (Masanobu SAITOH) wrote:
 -- Subject: Re: PR/49328 CVS commit: src/sys/dev/pci/ixgbe
 
 |  Hi, all.
 |
 |  I fixed and committed some smallproblem to -current.
 |
 |  The rest is:
 |
 | - Revert ixgbe_netbsd.c rev. 1.2
 | - make CORE_LOCK adaptive
 | - Release spin lock while reinitializing the jumb buffer structure.
 |
 |  Is it OK to commit?
 
 Does it work with LOCKDEBUG?

 Yes, it works with LOCKDEBUG.

ifconfig up works
ifconfig down works
ifconfig up - down repeatedly works while forwarding works
reboot works
reboot while forwarding works

If yes, go for it.
 
 christos
 


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: if_wm between netbsd-6 and netbsd-7 issue

2015-02-03 Thread Masanobu SAITOH

On 2015/01/29 20:39, Edgar Fuß wrote:

EF I even get a message Please update the Bootagent, whatever that means,
EF exactly.
MS That code is a workaround code for old boot rom which doesn't release
MS semaphore.
Ah, thanks. Do you happen to know how to perform that update using NetBSD?


bootutil.exe? though I've not used.

https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19186

 Andother solution is to make that message DPRINTF()-based and
forget about it :-)
That message was taken from FreeBSD and it uses DEBUGOUT():


http://svnweb.freebsd.org/base/head/sys/dev/e1000/e1000_82571.c?annotate=267935#l431


MS I have commited the change now. Thank you all.
Thank you for fixing it in the first place!


 You're welcome.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


ixg(4) mutex problem (PR#49328)

2015-01-29 Thread Masanobu SAITOH

 Hi.

 Uwe Toenjes found a bug in ixg(4). See:

http://mail-index.netbsd.org/current-users/2014/10/11/msg025932.html

Then, PR#49328 was submitted by him.

http://gnats.netbsd.org/49328

This problem is can be reproduced when

LOCKDEBUG is defined
ifconfig ixg0 up

ixg(4) take a spin mutex before accessing MAC core:

-- ixgbe.h ---
#define IXGBE_CORE_LOCK_INIT(_sc, _name) \
mutex_init((_sc)-core_mtx, MUTEX_DEFAULT, IPL_NET)

--- ixgbe.c::ixbge_ioctl() --
case SIOCADDMULTI:
case SIOCDELMULTI:
case SIOCSIFFLAGS:
case SIOCSIFMTU:
default:
if ((error = ether_ioctl(ifp, command, data)) != ENETRESET)
return error;
if ((ifp-if_flags  IFF_RUNNING) == 0)
;
else if (command == SIOCSIFCAP || command == SIOCSIFMTU) {
IXGBE_CORE_LOCK(adapter);
ixgbe_init_locked(adapter);
IXGBE_CORE_UNLOCK(adapter);

In this case, IXGBE_CORE_LOCK() is used to prevent interrupt.

I borrowed an old Intel 10G card(82598 based) from my colleague and tested.
I got a panic like as follows:


Mutex error: lockdebug_barrier: spin lock held

lock address : 0x80001d0eaa08 type :   spin
initialized  : 0x805cb716
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  0
current cpu  :  7 last held:  7
current lwp  : 0xfe840d2049c0 last held: 0xfe840d2049c0
last locked* : 0x805ca63e unlocked : 00
owner field  : 0x00010600 wait/spin:0/1

panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
cpu7: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
lockdebug_more() at netbsd:lockdebug_more
mutex_enter() at netbsd:mutex_enter+0x45e
pmap_growkernel() at netbsd:pmap_growkernel+0x2d
uvm_map_prepare() at netbsd:uvm_map_prepare+0x24d
uvm_map() at netbsd:uvm_map+0x97
uvm_km_alloc() at netbsd:uvm_km_alloc+0xec
_bus_dmamem_map.isra.7() at netbsd:_bus_dmamem_map.isra.7+0x69
ixgbe_jcl_reinit() at netbsd:ixgbe_jcl_reinit+0x180
ixgbe_init_locked() at netbsd:ixgbe_init_locked+0x510
ixgbe_init() at netbsd:ixgbe_init+0x22
ether_ioctl() at netbsd:ether_ioctl+0xc8
ixgbe_ioctl() at netbsd:ixgbe_ioctl+0x41
in6_update_ifa1() at netbsd:in6_update_ifa1+0x70e
in6_update_ifa() at netbsd:in6_update_ifa+0x36
in6_ifattach_linklocal() at netbsd:in6_ifattach_linklocal+0x2eb
in6_ifattach() at netbsd:in6_ifattach+0x15d
in6_if_up() at netbsd:in6_if_up+0xf
ifioctl_common() at netbsd:ifioctl_common+0x343
ether_ioctl() at netbsd:ether_ioctl+0x162
ixgbe_ioctl() at netbsd:ixgbe_ioctl+0x41
doifioctl() at netbsd:doifioctl+0x493
soo_ioctl() at netbsd:soo_ioctl+0x2af
sys_ioctl() at netbsd:sys_ioctl+0x17e
syscall() at netbsd:syscall+0x9a
--- syscall (number 54) ---
7f7ff6cced0a:
cpu7: End traceback...
--

 ifconfig ixg0 up
 - ioctl()
   - take spin mutex
   - alloc a lot of dmamem and dmamap.
   - it got a shortage of the kernel address space and called pmap_growkernel()
 - tried to get adaptive mutex.
   - panic

 My stack trace is different from Uew's traces, but the root of the
problem is the same.

 How should we fix this problem?

 Some of Ethernet drivers avoid using splnet()/splx() and use mutexes.
wm, bnx, ixg, powerpc/booke/dev/pq3etsec.c and others. I think it's better
to know the common solution and puse the way.

 I'm not so familiar with mutex and I have no experience using mutex...
Any patches are welcome. I'll try them if you send me :-)

 Thanks.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: if_wm between netbsd-6 and netbsd-7 issue

2015-01-28 Thread Masanobu SAITOH

On 2015/01/29 3:16, Edgar Fuß wrote:

Could you try the following patch?

Ah, looks much better, thanks!
I haven't tried to actually use the interface, though.

I even get a message Please update the Bootagent, whatever that means, 
exactly.


 That code is a workaround code for old boot rom which doesn't release 
semaphore.

 I have commited the change now. Thank you all.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: if_wm between netbsd-6 and netbsd-7 issue

2015-01-27 Thread Masanobu SAITOH

Hi, all


On 2015/01/12 22:38, Edgar Fuß wrote:

I have a similar problem with a IBM PRO/1000 PT Dual Port Server Adapter Low 
Profile - 39Y6128 (in a HP MicroServer) that was at least recognized with 6, but on 
7, I get:

wm0 at pci2 dev 0 function 0: Intel PRO/1000 PT (82571EB) (rev. 0x06)
wm0: interrupting at ioapic0 pin 18
wm0: PCI-Express bus
wm0: could not acquire SWSM SMBI
wm0: wm_nvm_acquire: failed to get semaphore
wm0: could not acquire SWSM SMBI
wm0: wm_nvm_acquire: failed to get semaphore
wm0: No EEPROM
wm0: unable to read Ethernet address
wm1 at pci2 dev 0 function 1: Intel PRO/1000 PT (82571EB) (rev. 0x06)
wm1: interrupting at ioapic0 pin 19
wm1: PCI-Express bus
wm1: could not acquire SWSM SMBI
wm1: wm_nvm_acquire: failed to get semaphore
wm1: could not acquire SWSM SMBI
wm1: wm_nvm_acquire: failed to get semaphore
wm1: No EEPROM
wm1: unable to read Ethernet address


Could you try the following patch?

Index: if_wm.c
===
RCS file: /cvsroot/src/sys/dev/pci/if_wm.c,v
retrieving revision 1.309
diff -u -r1.309 if_wm.c
--- if_wm.c 16 Jan 2015 10:36:14 -  1.309
+++ if_wm.c 27 Jan 2015 09:11:03 -
@@ -1844,7 +1844,7 @@
case WM_T_82571:
case WM_T_82572:
reg = CSR_READ(sc, WMREG_SWSM2);
-   if ((reg  SWSM2_LOCK) != 0) {
+   if ((reg  SWSM2_LOCK) == 0) {
CSR_WRITE(sc, WMREG_SWSM2, reg | SWSM2_LOCK);
force_clear_smbi = true;
} else



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ixg(4) performances

2014-09-03 Thread Masanobu SAITOH

On 2014/09/04 0:40, Emmanuel Dreyfus wrote:

On Wed, Sep 03, 2014 at 04:11:29PM +0200, Bert Kiers wrote:

NetBSD 6.1 says:
vendor 0x8086 product 0x1528 (ethernet network, revision 0x01) at pci1 dev 0 
function 0 not configured


In src/sys/dev/pci/ixgbe/ we know about producct Id 0x1529 and 0x152A but
not 0x1528. But this can probably be easily borrowed from FreeBSD:
http://svnweb.freebsd.org/base/head/sys/dev/ixgbe/

They call it IXGBE_DEV_ID_X540T, You can try to add in ixgbe_type.h:
#define IXGBE_DEV_ID_82599_X540T 0x1528

Then in ixgbe.c add a IXGBE_DEV_ID_82599_X540T line in
ixgbe_vendor_info_array[]

In ixgbe_82599.c you need a case IXGBE_DEV_ID_82599_X540T
next to case IXGBE_DEV_ID_82599_BACKPLANE_FCOE if media
is indeed backplane. Otherwise add it at the appropriate place
in the switch statement.

And finally you need to add a case IXGBE_DEV_ID_82599_X540T
next to case IXGBE_DEV_ID_82599_BACKPLANE_FCOE in ixgbe_api.c


Our ixg(4) driver doesn't support X520. At least there is no
file ixgbe/ixgbe_x540.c of FreeBSD.


Bus FreeBSD NetBSD
 82597  PCI-X   ixgbdge
 82598  PCIeixgbe   ixg
 82599(X520)PCIeixgbe   ixg
 X540   PCIeixgbe   (ixg)(not yet)


--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ixg(4) performances

2014-09-03 Thread Masanobu SAITOH

On 2014/09/04 11:24, Masanobu SAITOH wrote:

On 2014/09/04 0:40, Emmanuel Dreyfus wrote:

On Wed, Sep 03, 2014 at 04:11:29PM +0200, Bert Kiers wrote:

NetBSD 6.1 says:
vendor 0x8086 product 0x1528 (ethernet network, revision 0x01) at pci1 dev 0 
function 0 not configured


In src/sys/dev/pci/ixgbe/ we know about producct Id 0x1529 and 0x152A but
not 0x1528. But this can probably be easily borrowed from FreeBSD:
http://svnweb.freebsd.org/base/head/sys/dev/ixgbe/

They call it IXGBE_DEV_ID_X540T, You can try to add in ixgbe_type.h:
#define IXGBE_DEV_ID_82599_X540T 0x1528

Then in ixgbe.c add a IXGBE_DEV_ID_82599_X540T line in
ixgbe_vendor_info_array[]

In ixgbe_82599.c you need a case IXGBE_DEV_ID_82599_X540T
next to case IXGBE_DEV_ID_82599_BACKPLANE_FCOE if media
is indeed backplane. Otherwise add it at the appropriate place
in the switch statement.

And finally you need to add a case IXGBE_DEV_ID_82599_X540T
next to case IXGBE_DEV_ID_82599_BACKPLANE_FCOE in ixgbe_api.c


Our ixg(4) driver doesn't support X520. At least there is no


s/X520/X540/


file ixgbe/ixgbe_x540.c of FreeBSD.


 BusFreeBSD   NetBSD
  82597PCI-Xixgbdge
  82598PCIeixgbeixg
  82599(X520)PCIeixgbeixg
  X540PCIeixgbe(ixg)(not yet)





--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: FW: ixg(4) performances

2014-08-31 Thread Masanobu SAITOH
Hi, Emmanuel.

On 2014/09/01 11:10, Emmanuel Dreyfus wrote:
 Terry Moore t...@mcci.com wrote:
 
 Since you did a dword read, the extra 0x9 is the device status register.
 This makes me suspicious as the device status register is claiming that you
 have unsupported request detected) [bit 3] and correctable error
 detected [bit 0].  Further, this register is RW1C for all these bits -- so
 when you write 94810, it should have cleared the 9 (so a subsequent read
 should have returned 4810).

 Please check.
 
 You are right;
 # pcictl /dev/pci5 read -d 0 -f 1 0xa8
 00092810
 # pcictl /dev/pci5 write -d 0 -f 1 0xa8 0x00094810
 # pcictl /dev/pci5 read -d 0 -f 1 0xa8
 4810
 
 Might be good to post a pcictl dump of your device, just to expose all the
 details.
 
 It explicitely says 2.5 Gb/s x 8 lanes
 
 # pcictl /dev/pci5 dump -d0 -f 1
 PCI configuration registers:
Common header:
  0x00: 0x10fb8086 0x00100107 0x0201 0x00800010
 
  Vendor Name: Intel (0x8086)
  Device Name: 82599 (SFI/SFP+) 10 GbE Controller (0x10fb)
  Command register: 0x0107
I/O space accesses: on
Memory space accesses: on
Bus mastering: on
Special cycles: off
MWI transactions: off
Palette snooping: off
Parity error checking: off
Address/data stepping: off
System error (SERR): on
Fast back-to-back transactions: off
Interrupt disable: off
  Status register: 0x0010
Interrupt status: inactive
Capability List support: on
66 MHz capable: off
User Definable Features (UDF) support: off
Fast back-to-back capable: off
Data parity error detected: off
DEVSEL timing: fast (0x0)
Slave signaled Target Abort: off
Master received Target Abort: off
Master received Master Abort: off
Asserted System Error (SERR): off
Parity error detected: off
  Class Name: network (0x02)
  Subclass Name: ethernet (0x00)
  Interface: 0x00
  Revision ID: 0x01
  BIST: 0x00
  Header Type: 0x00+multifunction (0x80)
  Latency Timer: 0x00
  Cache Line Size: 0x10
 
Type 0 (normal device) header:
  0x10: 0xdfe8000c 0x 0xbc01 0x
  0x20: 0xdfe7c00c 0x 0x 0x00038086
  0x30: 0x 0x0040 0x 0x0209
 
  Base address register at 0x10
type: 64-bit prefetchable memory
base: 0xdfe8, not sized
  Base address register at 0x18
type: i/o
base: 0xbc00, not sized
  Base address register at 0x1c
not implemented(?)
  Base address register at 0x20
type: 64-bit prefetchable memory
base: 0xdfe7c000, not sized
  Cardbus CIS Pointer: 0x
  Subsystem vendor ID: 0x8086
  Subsystem ID: 0x0003
  Expansion ROM Base Address: 0x
  Capability list pointer: 0x40
  Reserved @ 0x38: 0x
  Maximum Latency: 0x00
  Minimum Grant: 0x00
  Interrupt pin: 0x02 (pin B)
  Interrupt line: 0x09
 
Capability register at 0x40
  type: 0x01 (Power Management, rev. 1.0)
Capability register at 0x50
  type: 0x05 (MSI)
Capability register at 0x70
  type: 0x11 (MSI-X)
Capability register at 0xa0
  type: 0x10 (PCI Express)
 
PCI Message Signaled Interrupt
  Message Control register: 0x0180
MSI Enabled: no
Multiple Message Capable: no (1 vector)
Multiple Message Enabled: off (1 vector)
64 Bit Address Capable: yes
Per-Vector Masking Capable: yes
  Message Address (lower) register: 0x
  Message Address (upper) register: 0x
  Message Data register: 0x
  Vector Mask register: 0x
  Vector Pending register: 0x
 
PCI Power Management Capabilities Register
  Capabilities register: 0x4823
Version: 1.2
PME# clock: off
Device specific initialization: on
3.3V auxiliary current: self-powered
D1 power management state support: off
D2 power management state support: off
PME# support: 0x09
  Control/status register: 0x2000
Power state: D0
PCI Express reserved: off
No soft reset: off
PME# assertion disabled
PME# status: off
 
PCI Express Capabilities Register
  Capability version: 2
  Device type: PCI Express Endpoint device
  Interrupt Message Number: 0
  Link Capabilities Register: 0x00027482
Maximum Link Speed: unknown 2 value
Maximum Link Width: x8 lanes
Port Number: 0
  Link Status Register: 0x1081
Negotiated Link Speed: 2.5Gb/s
*

Which Version of NetBSD are you using?

 I committed some changes fixing Gb/s to GT/s in pci_sbur.c.
It was in April, 2013. I suspect you are using netbsd-6, or
you are using -current with old /usr/lib/libpci.so.


  

Re: Marvell 88SE9230 AHCI?

2014-07-23 Thread Masanobu SAITOH

On 2014/07/24 1:31, Manuel Bouyer wrote:

On Wed, Jul 23, 2014 at 08:46:34AM -0400, Thor Lancelot Simon wrote:

I just ordered a Supermicro low-power rackmount x86 machine which -- I
discovered after ordering it -- uses a Marvell 88SE9230 as a SATA controller.

From FreeBSD's ahci.c, I see this is approximately an AHCI compatible
controller but that it has some quirks.  Is anyone using NetBSD with this
controller or the similar 88SE9128, 9128, or 9130?


There is an entry 88SE9128 in netbsd-6's ahcisata_pci.c, so at last this
one works I guess.


 I have SuperMicro X9SBAA-F which has 88SE9128.
I tested with two different disks. One works fine
but another didn't work with IDENTIFY faied message.
I've not investigated the detail yet.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Marvell 88SE9230 AHCI?

2014-07-23 Thread Masanobu SAITOH

On 2014/07/24 14:16, Masanobu SAITOH wrote:

On 2014/07/24 1:31, Manuel Bouyer wrote:

On Wed, Jul 23, 2014 at 08:46:34AM -0400, Thor Lancelot Simon wrote:

I just ordered a Supermicro low-power rackmount x86 machine which -- I
discovered after ordering it -- uses a Marvell 88SE9230 as a SATA controller.

From FreeBSD's ahci.c, I see this is approximately an AHCI compatible
controller but that it has some quirks.  Is anyone using NetBSD with this
controller or the similar 88SE9128, 9128, or 9130?


There is an entry 88SE9128 in netbsd-6's ahcisata_pci.c, so at last this
one works I guess.


  I have SuperMicro X9SBAA-F which has 88SE9128.


 s/88SE9128/88SE9230/


I tested with two different disks. One works fine
but another didn't work with IDENTIFY faied message.
I've not investigated the detail yet.




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Marvell 88SE9230 AHCI?

2014-07-23 Thread Masanobu SAITOH

On 2014/07/24 14:17, Masanobu SAITOH wrote:

On 2014/07/24 14:16, Masanobu SAITOH wrote:

On 2014/07/24 1:31, Manuel Bouyer wrote:

On Wed, Jul 23, 2014 at 08:46:34AM -0400, Thor Lancelot Simon wrote:

I just ordered a Supermicro low-power rackmount x86 machine which -- I
discovered after ordering it -- uses a Marvell 88SE9230 as a SATA controller.

From FreeBSD's ahci.c, I see this is approximately an AHCI compatible
controller but that it has some quirks.\


I noticed that quirk, too. I checked their change one month ago.
Perhaps the following changes might help:


http://svnweb.freebsd.org/base/head/sys/dev/ahci/ahci.c?r1=85r2=224498


 Is anyone using NetBSD with this
controller or the similar 88SE9128, 9128, or 9130?


There is an entry 88SE9128 in netbsd-6's ahcisata_pci.c, so at last this
one works I guess.


  I have SuperMicro X9SBAA-F which has 88SE9128.


  s/88SE9128/88SE9230/


I tested with two different disks. One works fine
but another didn't work with IDENTIFY faied message.
I've not investigated the detail yet.







--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


AsiaBSDCon 2014 WIP session: Improving MII PHY

2014-03-19 Thread Masanobu SAITOH
 And,

 I put my slide of AsiaBSDCon 2014 Working In Progress session at:

http://www.netbsd.org/~msaitoh/Improving_MII_PHY.pdf

Thanks.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: 4byte aligned com(4) and PCI_MAPREG_TYPE_MEM

2014-02-16 Thread Masanobu SAITOH

Hi.

(2014/02/15 1:58), Izumi Tsutsui wrote:

I'd suggest to clarify what problems you are trying to solve
and consider how it should be solved, before updating your patch.

The problems you mentioned are:
(1) merge initialization of sparse register mappings (with 4 byte stride)


Right.


(2) defer consinit() for puc com to use uvm_km_alloc() in it


Right though the real goal is not to defer consinit but to support
memory mapped I/O in com(4).

And,

  (3) Almost all drivers don't clear struct com_regs regs. It's not
 a real bug but should be cleard for further change.


Your patch is trying to solve them by:
(1) change COM_INIT_REGS() (and comprobe1()) APIs to pass stride byte
(2) add an MI extent_initted global variable and check it in MD consinit()

My vote is:
(1) leave existing APIs and handle the quirk in MD attachment
(2) add an x86 specific MD variable to defer consinit() till cpu_startup()


 I think static int iniited variable is used to do that, but there
is no way to know whether extent_init() (or uvm_init())  is called
or not.


Because:
(1)
  - it's really pain to change the MI APIs
(so many attachments and most of them will rarely be tested unfortunately)
  - only three or four attachments can share the new API
while such embedded devices often might have more other quirks


 Before writing that patch, I wrote another patch which did't change
the arguments of COM_INIT_REGS(). We have two choices

a) Use COM_INIT_REGS() for all drivers.
b) Copy COM_INIT_REGS() and modify to support 4 byte stride.

I couldn't decice which one was better or not. Now I think not change
the argument and make a new macro is better than adding new argument,
because considering the byte order is not required for 1 byte slide
devices.


  - even if stride handling is really necessary in MI part,
it's much better to prepare new wrap functions,
like wdcprobe1() and wdcprobe() in wdc.c
(i.e. prepare a new COM_INIT_REGS_STRIDE() macro with a new arg
 and make exiting COM_INIT_REGS() macro use it)

(2)
  - it's unclear what functions actually require the extent_init()
(I guess uvm_init() is enough to call uvm_km_alloc())


 Me neither... I had thought that uvm_init() was enough, but someone
advived me that exten_init() should be called.


  - in general MI code assumes that console devices are properly
mapped by MD bootstrap or its firmware


 Yes. But, it's little hard to make such mapping as PCI bus_space_tag
and handle in eary boot stage.


  - some ports already has MD flags to switch malloc(9) or static
memory in early device mappings and initialization


 It can be used, but IMHO, checking whether bus_space_map() can be
called is more generic than it.



Just my two cents,

---
Izumi Tsutsui



 Please give me a little time to write another patch.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: RFC: stop having a single global page size

2014-01-31 Thread Masanobu SAITOH

(2014/01/31 18:14), Martin Husemann wrote:

On Fri, Jan 31, 2014 at 12:59:15AM -0800, Matt Thomas wrote:

Why would anyone want this?  Say you have a system in which the MMU can have
per translation table page sizes.  a 16KB page size might be desirable for the
kernel and for LP64 processes.  If you are running a ILP32 process, possibly
of an older architecture, you might want to use a 4KB page size.


I guess another usage is a (huge) framebuffer/aperture mmap'd by the
userland driver, where you could easily save a lot TLB entries by using
1 MB pages (or bigger) instead of 4/8k ones. I think Solaris does this.

Martin


 and so called the super pages that FreeBSD does.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ~5 percent kernel performance loss in the last 3 weeks

2013-11-19 Thread Masanobu SAITOH

(2013/11/20 0:02), Christoph Badura wrote:

After updating my -current kernel from 6.99.24 to 6.99.27 so I could
commit my ubsec(4) changes I noticed that under 6.99.27 I get between
3 and 8 percent less throughput on accelerated crypto ops.

Note that I am using the exact same ubsec(4) code[1] with both kernels, so
I think it is unlikely a problem with ubsec(4).
I did not change userland.

The old kernel is 6.99.24 from Oct, 27th and the 6.99.27 is from Nov, 17th.
The machine is an ML110 G6 w/32G RAM and Intel Xeon X3430 4 core @2.4 GHz
running amd64.

There's nothing obvious to me in the dmesg diff that would give me a clue:

--- dmesg.6.99.24   2013-11-19 15:44:24.0 +0100
+++ dmesg.6.99.27   2013-11-19 15:39:29.0 +0100
@@ -4,7 +4,7 @@
  Copyright (c) 1982, 1986, 1989, 1991, 1993
  The Regents of the University of California.  All rights reserved.

-NetBSD 6.99.24 (GENERIC) #4: Mon Oct 28 18:58:32 CET 2013
+NetBSD 6.99.27 (GENERIC) #9: Sun Nov 17 17:47:24 CET 2013

bad@flexible-demeanour:/home/bad/work/nb/src/sys/arch/amd64/compile/GENERIC
  total memory = 32759 MB
  avail memory = 31792 MB
@@ -133,11 +133,14 @@
  acpicpu0: T5: FFH, lat   1 us, pow   285 mW,  38 %
  acpicpu0: T6: FFH, lat   1 us, pow   190 mW,  25 %
  acpicpu0: T7: FFH, lat   1 us, pow95 mW,  13 %
+coretemp0 at cpu0: thermal sensor, 1 C resolution
  acpicpu1 at cpu1: ACPI CPU
+coretemp1 at cpu1: thermal sensor, 1 C resolution
  acpicpu2 at cpu2: ACPI CPU
+coretemp2 at cpu2: thermal sensor, 1 C resolution
  acpicpu3 at cpu3: ACPI CPU
+coretemp3 at cpu3: thermal sensor, 1 C resolution
  timecounter: Timecounter clockinterrupt frequency 100 Hz quality 0
-timecounter: Timecounter TSC frequency 2394079440 Hz quality 3000


 It's strange.

Could you revert the following change and test again?

and, show me the output of cpuctl identify 0

http://mail-index.netbsd.org/source-changes/2013/11/15/msg049232.html

Module Name:src
Committed By:   msaitoh
Date:   Fri Nov 15 08:47:55 UTC 2013

Modified Files:
src/sys/arch/x86/acpi: acpi_cpu_md.c
src/sys/arch/x86/include: specialreg.h
src/sys/arch/x86/pci: amdtemp.c
src/sys/arch/x86/x86: coretemp.c cpu.c cpu_topology.c cpu_ucode_amd.c
cpu_ucode_intel.c est.c identcpu.c intel_busclock.c lapic.c odcm.c
patch.c powernow.c tprof_amdpmi.c tprof_pmi.c tsc.c viac7temp.c
src/usr.sbin/cpuctl/arch: i386.c

Log Message:
 Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid)(not changed)

CPUID_TO_FAMILY(cpuid)  (new)
CPUID_TO_MODEL(cpuid)   (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid)  (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid)   (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid)   (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid)(The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


To generate a diff of this commit:
cvs rdiff -u -r1.72 -r1.73 src/sys/arch/x86/acpi/acpi_cpu_md.c
cvs rdiff -u -r1.71 -r1.72 src/sys/arch/x86/include/specialreg.h
cvs rdiff -u -r1.17 -r1.18 src/sys/arch/x86/pci/amdtemp.c
cvs rdiff -u -r1.30 -r1.31 src/sys/arch/x86/x86/coretemp.c
cvs rdiff -u -r1.105 -r1.106 src/sys/arch/x86/x86/cpu.c
cvs rdiff -u -r1.7 -r1.8 src/sys/arch/x86/x86/cpu_topology.c \
src/sys/arch/x86/x86/powernow.c
cvs rdiff -u -r1.6 -r1.7 src/sys/arch/x86/x86/cpu_ucode_amd.c \
src/sys/arch/x86/x86/viac7temp.c
cvs rdiff -u -r1.3 -r1.4 src/sys/arch/x86/x86/cpu_ucode_intel.c \
src/sys/arch/x86/x86/tprof_amdpmi.c
cvs rdiff -u -r1.27 -r1.28 src/sys/arch/x86/x86/est.c
cvs rdiff -u -r1.37 -r1.38 src/sys/arch/x86/x86/identcpu.c
cvs rdiff -u -r1.16 -r1.17 src/sys/arch/x86/x86/intel_busclock.c
cvs rdiff -u -r1.46 -r1.47 src/sys/arch/x86/x86/lapic.c
cvs rdiff -u -r1.2 -r1.3 src/sys/arch/x86/x86/odcm.c
cvs rdiff -u -r1.21 -r1.22 src/sys/arch/x86/x86/patch.c
cvs rdiff -u -r1.12 -r1.13 src/sys/arch/x86/x86/tprof_pmi.c
cvs rdiff -u -r1.32 -r1.33 src/sys/arch/x86/x86/tsc.c
cvs rdiff -u -r1.49 -r1.50 src/usr.sbin/cpuctl/arch/i386.c







  uhub0 at usb0: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
  uhub0: 2 ports with 2 removable, self powered
  uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
@@ -175,7 +178,9 @@
  audio0 at pad0: half duplex, playback, capture
  boot device: wd0
  root on wd0a dumps on wd0b
+/: replaying log to memory
  root file system type: ffs
+/: replaying log to disk
  ipmi0: version 2.0 interface KCS iobase 0xca2/2 spacing 1
  wsdisplay0: screen 1 added (80x25, vt100 emulation)
  wsdisplay0: screen 2 added (80x25, vt100 emulation)

Results from openssl speed -evp 

Re: broadcom 57766 and jumbo frames

2013-07-01 Thread Masanobu SAITOH

Hello, Emmanuel.

(2013/06/28 19:31), Emmanuel Dreyfus wrote:

Hi

New iMac have a broadcom 57766, which is handled out of the box by the
bge(4) driver in -current. The driver almosts treats it as a 57765,
which seems almost correct when looking at Linux driver contributed
by Broadcom:
http://code.google.com/p/linux-picosam9g45/source/diff?spec=svn55086ad95d740577def0b4e6ecc2c0ae9b0d6decr=55086ad95d740577def0b4e6ecc2c0ae9b0d6decformat=sidepath=/drivers/net/ethernet/broadcom/tg3.c

Unfortunately, jumbo frames do not work either on 57766 nor on 57765
on NetBSD. I tried telling it was jumbo capable without much success.
I also tried to incoporate some tweaks from the Linux commit above,
but it does not help.

It seems impossible to send or receive anything beyond around 1500
bytes with mtu set to 9000. I note that ifconfig still reports ec_enabled=0
I wonder if it is a problem with 57766 specific adjustements, or if it is the
bge(4) driver itelf that is jumbo frame incapable for now.

Anyone had jumbo frames actually working for another bge(4) board?


 As I wrote about wm(4) a few weeks ago, I don't use jumbo frame
function, so I don't know whether other bge's jumb frame function
work or not. Sorry :-|


 I have no any 577xx card, but I have something come to mind and it's
 easy to test, so I'll test with my cards.


 Could you compile a kernel with BGE_DEBUG and show me the dmesg?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Another MII PHY fix.

2013-06-06 Thread Masanobu SAITOH

 Hi, all.

 Without following patch. Some drivers which don't use link
interrput check the link status only every sc-mii_anegticks.
It's late. The patch is a part of FreeBSD's mii_physubr.c r158649:

 
http://svnweb.freebsd.org/base/head/sys/dev/mii/mii_physubr.c?r1=150756r2=158649

 Any objection to commit?

Index: mii_physubr.c
===
RCS file: /cvsroot/src/sys/dev/mii/mii_physubr.c,v
retrieving revision 1.76
diff -u -r1.76 mii_physubr.c
--- mii_physubr.c   6 Jun 2013 03:10:48 -   1.76
+++ mii_physubr.c   6 Jun 2013 06:07:47 -
@@ -324,23 +324,43 @@
/*
 * If we're not doing autonegotiation, we don't need to do
 * any extra work here.  However, we need to check the link
-* status so we can generate an announcement if the status
-* changes.
+* status so we can generate an announcement by returning
+* with 0 if the status changes.
 */
if ((IFM_SUBTYPE(ife-ifm_media) != IFM_AUTO) 
-   (IFM_SUBTYPE(ife-ifm_media) != IFM_1000_T))
+   (IFM_SUBTYPE(ife-ifm_media) != IFM_1000_T)) {
+   /*
+* Reset autonegotiation timer to 0 to make sure
+* the future autonegotiation start with 0.
+*/
+   sc-mii_tick = 0;
return (0);
+   }
 
/* Read the status register twice; BMSR_LINK is latch-low. */
reg = PHY_READ(sc, MII_BMSR) | PHY_READ(sc, MII_BMSR);
if (reg  BMSR_LINK) {
/*
-* See above.
+* Reset autonegotiation timer to 0 in case the link is down
+* in the next tick.
 */
+   sc-mii_tick = 0;
+   /* See above. */
return (0);
}
 
/*
+* mii_tick == 0 means it's the first tick after changing the media or
+* the link became down since the last tick (see above), so return with
+* 0 to update the status.
+*/
+   if (sc-mii_ticks == 0)
+   return (0);
+
+   /* Now increment the tick */
+   sc-mii_ticks++;
+
+   /*
 * Only retry autonegotiation every N seconds.
 */
KASSERT(sc-mii_anegticks != 0);


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mii_ticks = 0 problem

2013-06-05 Thread Masanobu SAITOH

(2013/06/03 11:41), Masanobu SAITOH wrote:

  Hi.

  The following patch fix a problem that a lot of phys misunderstand
the auto negotiation timer. Usually the autonegotiaton timer is set
to 10(MII_ANEGTICKS_GIGE), but a lot of dirver don't work as expected.

e.g.:

 (mii_ticks == 0)
 ifconfig bge0 media auto up
 (after 2 seconds, the link was up)
 ifconfig bge0 down
 ifconfig bge0 up
 (timer start from 2!!!)

  This problem is not serious but we should fix to work as we expected.

  Currently, some drivers do sc-mii_ticks = 0; BEFORE calling
*mii_phy_auto(). Some drivers don't do it. To not to forget it and to
reduce the duplication, I think it's better to move sc-mii_ticks = 0;
into *mii_phy_auto().

  Any objection?


Index: atphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/atphy.c,v
retrieving revision 1.12
diff -u -p -r1.12 atphy.c
--- atphy.c 23 Jul 2012 00:09:49 -  1.12
+++ atphy.c 3 Jun 2013 02:34:03 -
@@ -289,7 +289,6 @@ done:
if (sc-mii_ticks = sc-mii_anegticks)
break;

-   sc-mii_ticks = 0;
atphy_mii_phy_auto(sc);
break;
}
@@ -405,6 +404,7 @@ atphy_mii_phy_auto(struct mii_softc *sc)
  {
uint16_t anar;

+   sc-mii_ticks = 0;
anar = BMSR_MEDIA_TO_ANAR(sc-mii_capabilities) | ANAR_CSMA;
if (sc-mii_flags  MIIF_DOPAUSE)
anar |= ANAR_X_PAUSE_TOWARDS;
Index: brgphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/brgphy.c,v
retrieving revision 1.63
diff -u -p -r1.63 brgphy.c
--- brgphy.c1 Apr 2013 13:41:37 -   1.63
+++ brgphy.c3 Jun 2013 02:34:04 -
@@ -581,6 +581,7 @@ brgphy_mii_phy_auto(struct mii_softc *sc
  {
int anar, ktcr = 0;

+   sc-mii_ticks = 0;
brgphy_loop(sc);
PHY_RESET(sc);

Index: ciphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/ciphy.c,v
retrieving revision 1.19
diff -u -p -r1.19 ciphy.c
--- ciphy.c 12 May 2009 14:31:27 -  1.19
+++ ciphy.c 3 Jun 2013 02:34:04 -
@@ -275,7 +275,6 @@ setit:
if (++sc-mii_ticks = MII_ANEGTICKS)
break;

-   sc-mii_ticks = 0;
mii_phy_auto(sc, 0);
return (0);
}
Index: mii_physubr.c
===
RCS file: /cvsroot/src/sys/dev/mii/mii_physubr.c,v
retrieving revision 1.75
diff -u -p -r1.75 mii_physubr.c
--- mii_physubr.c   3 Oct 2012 07:08:58 -   1.75
+++ mii_physubr.c   3 Jun 2013 02:34:04 -
@@ -202,6 +202,7 @@ mii_phy_auto(struct mii_softc *sc, int w
struct mii_data *mii = sc-mii_pdata;
struct ifmedia_entry *ife = mii-mii_media.ifm_cur;

+   sc-mii_ticks = 0;
if ((sc-mii_flags  MIIF_DOINGAUTO) == 0) {
/*
 * Check for 1000BASE-X.  Autonegotiation is a bit
@@ -346,7 +347,6 @@ mii_phy_tick(struct mii_softc *sc)
if (++sc-mii_ticks = sc-mii_anegticks)
return (EJUSTRETURN);

-   sc-mii_ticks = 0;
PHY_RESET(sc);

if (mii_phy_auto(sc, 0) == EJUSTRETURN)
Index: rgephy.c
===
RCS file: /cvsroot/src/sys/dev/mii/rgephy.c,v
retrieving revision 1.29
diff -u -p -r1.29 rgephy.c
--- rgephy.c18 Jul 2010 03:00:39 -  1.29
+++ rgephy.c3 Jun 2013 02:34:04 -
@@ -319,7 +319,6 @@ rgephy_service(struct mii_softc *sc, str
if (sc-mii_ticks = sc-mii_anegticks)
return 0;

-   sc-mii_ticks = 0;
rgephy_mii_phy_auto(sc);
break;
}
@@ -428,6 +427,7 @@ rgephy_mii_phy_auto(struct mii_softc *mi
  {
int anar;

+   mii-mii_ticks = 0;
rgephy_loop(mii);
rgephy_reset(mii);

Index: urlphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/urlphy.c,v
retrieving revision 1.25
diff -u -p -r1.25 urlphy.c
--- urlphy.c31 Jan 2009 05:44:05 -  1.25
+++ urlphy.c3 Jun 2013 02:34:04 -
@@ -196,7 +196,6 @@ urlphy_service(struct mii_softc *sc, str
if (++sc-mii_ticks = sc-mii_anegticks)
return (0);

-   sc-mii_ticks = 0;
PHY_RESET(sc);

if (mii_phy_auto(sc, 0) == EJUSTRETURN)




 Done.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: NFS vs jumbograms?

2013-06-03 Thread Masanobu SAITOH

(2013/06/04 5:50), David Holland wrote:

On Mon, Jun 03, 2013 at 11:37:29AM -0400, Mouse wrote:
   At $JOB, we have two i386 machines with wm interfaces connected
   back-to-back with a short patch cable and a /30 subnet of 192.168.  One
   is NFS-serving some disk space to the other over this link.  They are
   running 4.0.1 (with a few tweaks of mine, but nothing touching
   sys/nfs); while it's fallen off the end of official support, I thought
   someone might happen to recall enough to be useful

nfs hasn't changed much, so if the problem is in nfs (rather than in
netinet) it's probably still in -current.

not that this actually helps.

Also note that there are a fair number of cases in nfs where lost
packets result in timeouts that don't get handled properly causing
things to wedge. If you haven't yet, try using tcp mounts as that
tends to avoid exercising some of those cases.


 I sometimes heard that wm had some problems with jumbo MTU, but I've
never checked whether it really has a bug or not. I suspect the bug
is not in NFS stack but in wm.

 I don't use jumbo MTU, so usually don't test the function when I add
the support of a new chip. Sorry :(
If someone show me a reproduceable configuraton, I'll try to fix it.

 Regards.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


mii_ticks = 0 problem

2013-06-02 Thread Masanobu SAITOH
 Hi.

 The following patch fix a problem that a lot of phys misunderstand
the auto negotiation timer. Usually the autonegotiaton timer is set
to 10(MII_ANEGTICKS_GIGE), but a lot of dirver don't work as expected.

e.g.:

(mii_ticks == 0)
ifconfig bge0 media auto up
(after 2 seconds, the link was up)
ifconfig bge0 down
ifconfig bge0 up
(timer start from 2!!!)

 This problem is not serious but we should fix to work as we expected.

 Currently, some drivers do sc-mii_ticks = 0; BEFORE calling
*mii_phy_auto(). Some drivers don't do it. To not to forget it and to
reduce the duplication, I think it's better to move sc-mii_ticks = 0;
into *mii_phy_auto().

 Any objection?


Index: atphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/atphy.c,v
retrieving revision 1.12
diff -u -p -r1.12 atphy.c
--- atphy.c 23 Jul 2012 00:09:49 -  1.12
+++ atphy.c 3 Jun 2013 02:34:03 -
@@ -289,7 +289,6 @@ done:
if (sc-mii_ticks = sc-mii_anegticks)
break;
 
-   sc-mii_ticks = 0;
atphy_mii_phy_auto(sc);
break;
}
@@ -405,6 +404,7 @@ atphy_mii_phy_auto(struct mii_softc *sc)
 {
uint16_t anar;
 
+   sc-mii_ticks = 0;
anar = BMSR_MEDIA_TO_ANAR(sc-mii_capabilities) | ANAR_CSMA;
if (sc-mii_flags  MIIF_DOPAUSE)
anar |= ANAR_X_PAUSE_TOWARDS;
Index: brgphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/brgphy.c,v
retrieving revision 1.63
diff -u -p -r1.63 brgphy.c
--- brgphy.c1 Apr 2013 13:41:37 -   1.63
+++ brgphy.c3 Jun 2013 02:34:04 -
@@ -581,6 +581,7 @@ brgphy_mii_phy_auto(struct mii_softc *sc
 {
int anar, ktcr = 0;
 
+   sc-mii_ticks = 0;
brgphy_loop(sc);
PHY_RESET(sc);
 
Index: ciphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/ciphy.c,v
retrieving revision 1.19
diff -u -p -r1.19 ciphy.c
--- ciphy.c 12 May 2009 14:31:27 -  1.19
+++ ciphy.c 3 Jun 2013 02:34:04 -
@@ -275,7 +275,6 @@ setit:
if (++sc-mii_ticks = MII_ANEGTICKS)
break;
 
-   sc-mii_ticks = 0;
mii_phy_auto(sc, 0);
return (0);
}
Index: mii_physubr.c
===
RCS file: /cvsroot/src/sys/dev/mii/mii_physubr.c,v
retrieving revision 1.75
diff -u -p -r1.75 mii_physubr.c
--- mii_physubr.c   3 Oct 2012 07:08:58 -   1.75
+++ mii_physubr.c   3 Jun 2013 02:34:04 -
@@ -202,6 +202,7 @@ mii_phy_auto(struct mii_softc *sc, int w
struct mii_data *mii = sc-mii_pdata;
struct ifmedia_entry *ife = mii-mii_media.ifm_cur;
 
+   sc-mii_ticks = 0;
if ((sc-mii_flags  MIIF_DOINGAUTO) == 0) {
/*
 * Check for 1000BASE-X.  Autonegotiation is a bit
@@ -346,7 +347,6 @@ mii_phy_tick(struct mii_softc *sc)
if (++sc-mii_ticks = sc-mii_anegticks)
return (EJUSTRETURN);
 
-   sc-mii_ticks = 0;
PHY_RESET(sc);
 
if (mii_phy_auto(sc, 0) == EJUSTRETURN)
Index: rgephy.c
===
RCS file: /cvsroot/src/sys/dev/mii/rgephy.c,v
retrieving revision 1.29
diff -u -p -r1.29 rgephy.c
--- rgephy.c18 Jul 2010 03:00:39 -  1.29
+++ rgephy.c3 Jun 2013 02:34:04 -
@@ -319,7 +319,6 @@ rgephy_service(struct mii_softc *sc, str
if (sc-mii_ticks = sc-mii_anegticks)
return 0;
 
-   sc-mii_ticks = 0;
rgephy_mii_phy_auto(sc);
break;
}
@@ -428,6 +427,7 @@ rgephy_mii_phy_auto(struct mii_softc *mi
 {
int anar;
 
+   mii-mii_ticks = 0;
rgephy_loop(mii);
rgephy_reset(mii);
 
Index: urlphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/urlphy.c,v
retrieving revision 1.25
diff -u -p -r1.25 urlphy.c
--- urlphy.c31 Jan 2009 05:44:05 -  1.25
+++ urlphy.c3 Jun 2013 02:34:04 -
@@ -196,7 +196,6 @@ urlphy_service(struct mii_softc *sc, str
if (++sc-mii_ticks = sc-mii_anegticks)
return (0);
 
-   sc-mii_ticks = 0;
PHY_RESET(sc);
 
if (mii_phy_auto(sc, 0) == EJUSTRETURN)


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ETHERCAP_* ioctl()

2012-10-31 Thread Masanobu SAITOH
 Hi, all.

 I sent the followin mail more than two years ago.

 http://mail-index.netbsd.org/tech-kern/2010/07/28/msg008613.html

 As the starting point to solve this problem, I committed the change to
add SIOCGETHERCAP stuff.

 Example:
 msk0: flags=8802BROADCAST,SIMPLEX,MULTICAST mtu 1500
 ec_capabilities=5VLAN_MTU,JUMBO_MTU
 ec_enabled=0
 address: 00:50:43:00:4b:c5
 media: Ethernet autoselect
 status: no carrier
 wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 
 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
 
 enabled=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
 ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
 ec_enabled=0
 address: 00:1b:21:58:68:34
 media: Ethernet autoselect (1000baseT 
 full-duplex,flowcontrol,rxpause,txpause)
 status: active
 inet 192.168.1.5 netmask 0xff00 broadcast 192.168.1.255
 inet6 fe80::21b:21ff:fe58:6834%wm0 prefixlen 64 scopeid 0x2
 inet6 2001:240:694:1:21b:21ff:fe58:6834 prefixlen 64


 What do you think about this output?

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)