Re: FW: ixg(4) performances
Hi, Emmanuel. On 2014/09/01 11:10, Emmanuel Dreyfus wrote: > Terry Moore wrote: > >> Since you did a dword read, the extra 0x9 is the device status register. >> This makes me suspicious as the device status register is claiming that you >> have "unsupported request detected)" [bit 3] and "correctable error >> detected" [bit 0]. Further, this register is RW1C for all these bits -- so >> when you write 94810, it should have cleared the 9 (so a subsequent read >> should have returned 4810). >> >> Please check. > > You are right; > # pcictl /dev/pci5 read -d 0 -f 1 0xa8 > 00092810 > # pcictl /dev/pci5 write -d 0 -f 1 0xa8 0x00094810 > # pcictl /dev/pci5 read -d 0 -f 1 0xa8 > 4810 > >> Might be good to post a "pcictl dump" of your device, just to expose all the >> details. > > It explicitely says 2.5 Gb/s x 8 lanes > > # pcictl /dev/pci5 dump -d0 -f 1 > PCI configuration registers: >Common header: > 0x00: 0x10fb8086 0x00100107 0x0201 0x00800010 > > Vendor Name: Intel (0x8086) > Device Name: 82599 (SFI/SFP+) 10 GbE Controller (0x10fb) > Command register: 0x0107 >I/O space accesses: on >Memory space accesses: on >Bus mastering: on >Special cycles: off >MWI transactions: off >Palette snooping: off >Parity error checking: off >Address/data stepping: off >System error (SERR): on >Fast back-to-back transactions: off >Interrupt disable: off > Status register: 0x0010 >Interrupt status: inactive >Capability List support: on >66 MHz capable: off >User Definable Features (UDF) support: off >Fast back-to-back capable: off >Data parity error detected: off >DEVSEL timing: fast (0x0) >Slave signaled Target Abort: off >Master received Target Abort: off >Master received Master Abort: off >Asserted System Error (SERR): off >Parity error detected: off > Class Name: network (0x02) > Subclass Name: ethernet (0x00) > Interface: 0x00 > Revision ID: 0x01 > BIST: 0x00 > Header Type: 0x00+multifunction (0x80) > Latency Timer: 0x00 > Cache Line Size: 0x10 > >Type 0 ("normal" device) header: > 0x10: 0xdfe8000c 0x 0xbc01 0x > 0x20: 0xdfe7c00c 0x 0x 0x00038086 > 0x30: 0x 0x0040 0x 0x0209 > > Base address register at 0x10 >type: 64-bit prefetchable memory >base: 0xdfe8, not sized > Base address register at 0x18 >type: i/o >base: 0xbc00, not sized > Base address register at 0x1c >not implemented(?) > Base address register at 0x20 >type: 64-bit prefetchable memory >base: 0xdfe7c000, not sized > Cardbus CIS Pointer: 0x > Subsystem vendor ID: 0x8086 > Subsystem ID: 0x0003 > Expansion ROM Base Address: 0x > Capability list pointer: 0x40 > Reserved @ 0x38: 0x > Maximum Latency: 0x00 > Minimum Grant: 0x00 > Interrupt pin: 0x02 (pin B) > Interrupt line: 0x09 > >Capability register at 0x40 > type: 0x01 (Power Management, rev. 1.0) >Capability register at 0x50 > type: 0x05 (MSI) >Capability register at 0x70 > type: 0x11 (MSI-X) >Capability register at 0xa0 > type: 0x10 (PCI Express) > >PCI Message Signaled Interrupt > Message Control register: 0x0180 >MSI Enabled: no >Multiple Message Capable: no (1 vector) >Multiple Message Enabled: off (1 vector) >64 Bit Address Capable: yes >Per-Vector Masking Capable: yes > Message Address (lower) register: 0x > Message Address (upper) register: 0x > Message Data register: 0x > Vector Mask register: 0x > Vector Pending register: 0x > >PCI Power Management Capabilities Register > Capabilities register: 0x4823 >Version: 1.2 >PME# clock: off >Device specific initialization: on >3.3V auxiliary current: self-powered >D1 power management state support: off >D2 power management state support: off >PME# support: 0x09 > Control/status register: 0x2000 >Power state: D0 >PCI Express reserved: off >No soft reset: off >PME# assertion disabled >PME# status: off > >PCI Express Capabilities Register > Capability version: 2 > Device type: PCI Express Endpoint device > Interrupt Message Number: 0 > Link Capabilities Register: 0x00027482 >Maximum Link Speed: unknown 2 value >Maximum Link Width: x8 lanes >Port Number: 0 > Link Status Register: 0x1081 >Negotiated Link Speed: 2.5Gb/s * Which Version of NetBSD are you using? I committed some changes fixing Gb/s
Re: Unallocated inode
On Fri, 29 Aug 2014 09:43:39 +, Christos Zoulas wrote: >On Fri, Aug 29, 2014 at 04:56:46PM +1000, Paul Ripke wrote: >> I'm currently running kernel: >> NetBSD slave 6.1_STABLE NetBSD 6.1_STABLE (SLAVE) #4: Fri May 23 23:42:30 >> EST 2014 >> stix@slave:/home/netbsd/netbsd-6/obj.amd64/home/netbsd/netbsd-6/src/sys/arch/amd64/compile/SLAVE >> amd64 >> Built from netbsd-6 branch synced around the build time. Over the >> last year, I have seen 2 instances where I've had cleared inodes, >> causing obvious errors: >> >> slave:ksh$ sudo find /home -xdev -ls > /dev/null >> find: /home/netbsd/cvsroot/pkgsrc/japanese/p5-Jcode/pkg/Attic/PLIST,v: Bad >> file descriptor >> find: >> /home/netbsd/cvsroot/pkgsrc/print/texlive-pdftools/patches/Attic/patch-ac,v: >> Bad file descriptor >> >> fsdb tells me they're "unallocated inode"s, which I can easily fix, >> but does anyone have any idea what might be causing them? This >> appears similar to issues reported previously: >> http://mail-index.netbsd.org/tech-kern/2013/10/19/msg015770.html >> >> My filesystem is FFSv2 with wapbl, sitting on a raidframe mirror >> over two SATA drives. > >Try unmounting it, and then running fsck -fn on it. Does it report >errors? > >christos Oh, yes, indeed. And fixes them fine: ** /dev/rraid0g ** File system is already clean ** Last Mounted on /home ** Phase 1 - Check Blocks and Sizes PARTIALLY ALLOCATED INODE I=106999488 CLEAR? [yn] y PARTIALLY ALLOCATED INODE I=106999489 CLEAR? [yn] y ** Phase 2 - Check Pathnames UNALLOCATED I=106999489 OWNER=0 MODE=0 SIZE=0 MTIME=Jan 1 10:00 1970 NAME=/netbsd/cvsroot/pkgsrc/japanese/p5-Jcode/pkg/Attic/PLIST,v REMOVE? [yn] y UNALLOCATED I=106999488 OWNER=0 MODE=0 SIZE=0 MTIME=Jan 1 10:00 1970 NAME=/netbsd/cvsroot/pkgsrc/print/texlive-pdftools/patches/Attic/patch-ac,v REMOVE? [yn] y ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? [yn] y SUMMARY INFORMATION BAD SALVAGE? [yn] y BLK(S) MISSING IN BIT MAPS SALVAGE? [yn] y 5197833 files, 528560982 used, 396039539 free (1740275 frags, 49287408 blocks, 0.2% fragmentation) * FILE SYSTEM WAS MODIFIED * Running a second fsck pass comes up clean. What surprises me is that my machine has been up ~100 days... I find it hard to believe that a power loss or similar unclean shutdown would generate filesystem corruption that could sit silent for that long before suddenly emerging. Cheers, -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: FW: ixg(4) performances
Terry Moore wrote: > Since you did a dword read, the extra 0x9 is the device status register. > This makes me suspicious as the device status register is claiming that you > have "unsupported request detected)" [bit 3] and "correctable error > detected" [bit 0]. Further, this register is RW1C for all these bits -- so > when you write 94810, it should have cleared the 9 (so a subsequent read > should have returned 4810). > > Please check. You are right; # pcictl /dev/pci5 read -d 0 -f 1 0xa8 00092810 # pcictl /dev/pci5 write -d 0 -f 1 0xa8 0x00094810 # pcictl /dev/pci5 read -d 0 -f 1 0xa8 4810 > Might be good to post a "pcictl dump" of your device, just to expose all the > details. It explicitely says 2.5 Gb/s x 8 lanes # pcictl /dev/pci5 dump -d0 -f 1 PCI configuration registers: Common header: 0x00: 0x10fb8086 0x00100107 0x0201 0x00800010 Vendor Name: Intel (0x8086) Device Name: 82599 (SFI/SFP+) 10 GbE Controller (0x10fb) Command register: 0x0107 I/O space accesses: on Memory space accesses: on Bus mastering: on Special cycles: off MWI transactions: off Palette snooping: off Parity error checking: off Address/data stepping: off System error (SERR): on Fast back-to-back transactions: off Interrupt disable: off Status register: 0x0010 Interrupt status: inactive Capability List support: on 66 MHz capable: off User Definable Features (UDF) support: off Fast back-to-back capable: off Data parity error detected: off DEVSEL timing: fast (0x0) Slave signaled Target Abort: off Master received Target Abort: off Master received Master Abort: off Asserted System Error (SERR): off Parity error detected: off Class Name: network (0x02) Subclass Name: ethernet (0x00) Interface: 0x00 Revision ID: 0x01 BIST: 0x00 Header Type: 0x00+multifunction (0x80) Latency Timer: 0x00 Cache Line Size: 0x10 Type 0 ("normal" device) header: 0x10: 0xdfe8000c 0x 0xbc01 0x 0x20: 0xdfe7c00c 0x 0x 0x00038086 0x30: 0x 0x0040 0x 0x0209 Base address register at 0x10 type: 64-bit prefetchable memory base: 0xdfe8, not sized Base address register at 0x18 type: i/o base: 0xbc00, not sized Base address register at 0x1c not implemented(?) Base address register at 0x20 type: 64-bit prefetchable memory base: 0xdfe7c000, not sized Cardbus CIS Pointer: 0x Subsystem vendor ID: 0x8086 Subsystem ID: 0x0003 Expansion ROM Base Address: 0x Capability list pointer: 0x40 Reserved @ 0x38: 0x Maximum Latency: 0x00 Minimum Grant: 0x00 Interrupt pin: 0x02 (pin B) Interrupt line: 0x09 Capability register at 0x40 type: 0x01 (Power Management, rev. 1.0) Capability register at 0x50 type: 0x05 (MSI) Capability register at 0x70 type: 0x11 (MSI-X) Capability register at 0xa0 type: 0x10 (PCI Express) PCI Message Signaled Interrupt Message Control register: 0x0180 MSI Enabled: no Multiple Message Capable: no (1 vector) Multiple Message Enabled: off (1 vector) 64 Bit Address Capable: yes Per-Vector Masking Capable: yes Message Address (lower) register: 0x Message Address (upper) register: 0x Message Data register: 0x Vector Mask register: 0x Vector Pending register: 0x PCI Power Management Capabilities Register Capabilities register: 0x4823 Version: 1.2 PME# clock: off Device specific initialization: on 3.3V auxiliary current: self-powered D1 power management state support: off D2 power management state support: off PME# support: 0x09 Control/status register: 0x2000 Power state: D0 PCI Express reserved: off No soft reset: off PME# assertion disabled PME# status: off PCI Express Capabilities Register Capability version: 2 Device type: PCI Express Endpoint device Interrupt Message Number: 0 Link Capabilities Register: 0x00027482 Maximum Link Speed: unknown 2 value Maximum Link Width: x8 lanes Port Number: 0 Link Status Register: 0x1081 Negotiated Link Speed: 2.5Gb/s Negotiated Link Width: x8 lanes Device-dependent header: 0x40: 0x48235001 0x2b002000 0x 0x 0x50: 0x01807005 0x 0x 0x 0x60: 0x 0x 0x 0x 0x70: 0x003fa011 0x0004 0x2004 0x 0x80: 0x 0x 0x 0x 0x90: 0x 0x 0x 0x 0xa0: 0x00020010 0x10008cc2 0x4810 0x00027482 0xb0: 0x1081 0x 0x 0x 0xc0: 0x 0x001f 0x000
RE: FW: ixg(4) performances
Oh, and to answer the actual first, relevant question, I can try finding out if we (day job, 82599) can do line rate at 2.5GT/s. I think we can get a lot closer than you're getting but we don't test with NetBSD. -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
RE: FW: ixg(4) performances
I may be wrong in the transactions/transfers. However, I think you're reading the page incorrectly. The signalling rate is the physical speed of the link. On top of that is the 8/10 encoding (the Ethernet controller we're talking about is only Gen 2), the framing, etc, and the spec discusses the data rate in GT/s. Gb/s means nothing. It's like talking about the frequency of the Ethernet link, which we never do. We talk about how much data can be transferred. I'm also not sure if you've looked at an actual trace before, but a PCIe link is incredibly chatty, and every transfer only has a payload of 64/128/256b (especially regarding the actual controller again). So, those two coupled together (GT/s & chatty link with small packets) means talking about things in Gb/s is not something used by people who talk about PCIe every day (my day job). The signalling rate is not used when talking about the max data transfer rate. On Sun, 31 Aug 2014, Terry Moore wrote: -Original Message- From: Hisashi T Fujinaka [mailto:ht...@twofifty.com] Sent: Saturday, August 30, 2014 21:29 To: Terry Moore Cc: tech-kern@netbsd.org Subject: Re: FW: ixg(4) performances Doesn't anyone read my posts or, more important, the PCIe spec? 2.5 Giga TRANSFERS per second. I'm not sure I understand what you're saying. From the PCIe space, page 40: "Signaling rate - Once initialized, each Link must only operate at one of the supported signaling levels. For the first generation of PCI Express technology, there is only one signaling rate defined, which provides an effective 2.5 Gigabits/second/Lane/direction of raw bandwidth. The second generation provides an effective 5.0 Gigabits/second/Lane/direction of raw bandwidth. The third generation provides an effective 8.0 Gigabits/second/Lane/direction of 10 raw bandwidth. The data rate is expected to increase with technology advances in the future." This is not 2.5G Transfers per second. PCIe talks about transactions rather than transfers; one transaction requires either 12 bytes (for 32-bit systems) or 16 bytes (for 64-bit systems) of overhead at the transaction layer, plus 7 bytes at the link layer. The maximum number of transactions per second paradoxically transfers the fewest number of bytes; a 4K write takes 16+4096+5+2 byte times, and so only about 60,000 such transactions are possible per second (moving about 248,000,000 bytes). [Real systems don't see this, quite -- Wikipedia claims, for example 95% efficiency is typical for storage controllers.] A 4-byte write takes 16+4+5+2 byte times, and so roughly 9 million transactions are possible per second, but those 9 million transactions can only move 36 million bytes. Multiple lanes scale things fairly linearly. But there has to be one byte per lane; a x8 configuration says that physical transfers are padded so that each the 4-byte write (which takes 27 bytes on the bus) will have to take 32 bytes. Instead of getting 72 million transactions per second, you get 62.5 million transactions/second, so it doesn't scale as nicely. Reads are harder to analyze, because they depend on the speed and design of both ends of the link. The reader sends a read request packet, and the read-responder (some time later) sends back the response. As far as I can see, even at gen3 with lots of lanes, PCIe doesn't scale to 2.5 G transfers per second. Best regards, --Terry -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
RE: FW: ixg(4) performances
> -Original Message- > From: Hisashi T Fujinaka [mailto:ht...@twofifty.com] > Sent: Saturday, August 30, 2014 21:29 > To: Terry Moore > Cc: tech-kern@netbsd.org > Subject: Re: FW: ixg(4) performances > > Doesn't anyone read my posts or, more important, the PCIe spec? > > 2.5 Giga TRANSFERS per second. I'm not sure I understand what you're saying. >From the PCIe space, page 40: "Signaling rate - Once initialized, each Link must only operate at one of the supported signaling levels. For the first generation of PCI Express technology, there is only one signaling rate defined, which provides an effective 2.5 Gigabits/second/Lane/direction of raw bandwidth. The second generation provides an effective 5.0 Gigabits/second/Lane/direction of raw bandwidth. The third generation provides an effective 8.0 Gigabits/second/Lane/direction of 10 raw bandwidth. The data rate is expected to increase with technology advances in the future." This is not 2.5G Transfers per second. PCIe talks about transactions rather than transfers; one transaction requires either 12 bytes (for 32-bit systems) or 16 bytes (for 64-bit systems) of overhead at the transaction layer, plus 7 bytes at the link layer. The maximum number of transactions per second paradoxically transfers the fewest number of bytes; a 4K write takes 16+4096+5+2 byte times, and so only about 60,000 such transactions are possible per second (moving about 248,000,000 bytes). [Real systems don't see this, quite -- Wikipedia claims, for example 95% efficiency is typical for storage controllers.] A 4-byte write takes 16+4+5+2 byte times, and so roughly 9 million transactions are possible per second, but those 9 million transactions can only move 36 million bytes. Multiple lanes scale things fairly linearly. But there has to be one byte per lane; a x8 configuration says that physical transfers are padded so that each the 4-byte write (which takes 27 bytes on the bus) will have to take 32 bytes. Instead of getting 72 million transactions per second, you get 62.5 million transactions/second, so it doesn't scale as nicely. Reads are harder to analyze, because they depend on the speed and design of both ends of the link. The reader sends a read request packet, and the read-responder (some time later) sends back the response. As far as I can see, even at gen3 with lots of lanes, PCIe doesn't scale to 2.5 G transfers per second. Best regards, --Terry
Re: Making bpf MPSAFE (was Re: struct ifnet and ifaddr handling ...)
Hi darrenr and rmind, Thank you for your replying, and I'm sorry for not replying yet. I'm now in a busy period for several weeks. I'll be back at the next weekend. ozaki-r On Tue, Aug 26, 2014 at 4:49 AM, Mindaugas Rasiukevicius wrote: > Ryota Ozaki wrote: >> Hi, >> >> I thought I need more experience of pserialize >> (and lock primitives) to tackle ifnet work. >> So I suspended the work and now I am trying >> another easier task, bpf. >> >> http://www.netbsd.org/~ozaki-r/mpsafe-bpf.diff >> > > As Darren mentioned - there are various bugs in the code (also, malloc > change to M_NOWAIT is unhandled). You cannot drop the lock on the floor > and expect the state to stay consistent. Something has to preserve it. > > The following pattern applies both to the locks and pserialize(9): > > pserialize_read_enter(); > obj = lookup(); > pserialize_read_exit(); > // the object is volatile here, it might be already destroyed > > Nothing prevents obj from being destroyed after the critical path unless > you acquire some form of a reference during the lookup. > >> BTW, I worry about there is no easy way to >> know if a function in a critical section >> blocks/sleeps or not. So I wrote a patch to >> detect that: http://www.netbsd.org/~ozaki-r/debug-pserialize.diff >> Is it meaningful? > > Why per-LWP rather than per-CPU diagnostic counter? On the other hand, if > we start adding per-CPU counters, then we might as well start implementing > RCU. :) > > -- > Mindaugas