Re: re(4) causes memory corruption?

2014-04-08 Thread Yonghyeon PYUN
On Tue, Apr 08, 2014 at 11:21:12AM +0300, Andriy Gapon wrote:
> 
> I have this network card (it's actually integrated into a motherboard):
> 
> re0:  port
> 0xde00-0xdeff mem 0xfdaff000-0xfdaf,0xfdae-0xfdae irq 18 at device
> 0.0 on pci2
> re0: Using 1 MSI-X message
> re0: Chip rev. 0x3c00
> re0: MAC rev. 0x0040
> miibus0:  on re0
> rgephy0:  PHY 1 on miibus0
> rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
> 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX,
> 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, 
> auto-flow
> 
> When there is little traffic through the interface I do not observe any 
> problems
> with it.
> But within 15 seconds of applying some moderate traffic I would always 
> observe a
> heavy screen corruption often followed by a total freeze or a hardware 
> self-reset.
> An example of the moderate traffic is 6 MBytes/s which results in about 10K
> interrupts per seconds.
> 

PCIe re(4) controllers do not seem to have intelligent interrupt
moderation feature.  At least it's not documented at all.  To
overcome the H/W limitation, re(4) uses one-shot timer interrupt to
mitigate interrupt processing overhead.  However the maximum time
allowed to set for one-shot timer is less than or equal to 65us so
you may still see lots of interrupts under heavy load.

> I am not sure what causes the problem.  Could it be some driver using memoery
> that it should not or hardware writing where it should not or if this 
> something
> completely in the hardware.
> I will appreciate any hints on possible ways to analyze this issue.

It seems your controller is old RTL8168C and I'm not aware of
any memory corruption issues with the RTL8168C.  There were a
couple of re(4) instability reports but they were using relatively
recent re(4) controllers and none of them showed memory corruption.

> 
> Thanks!
> -- 
> Andriy Gapon
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: ECN marking implenetation for dummynet

2014-04-08 Thread hiren panchasara
On Tue, Apr 8, 2014 at 8:46 PM, Adrian Chadd  wrote:
> Hi! Cool! can you file a FreeBSD PR with this?

I'm testing this patch right now.

I will make sure it doesn't get lost. :-)

cheers,
Hiren
>
>
> -a
>
>
> On 2 April 2014 04:48, Midori Kato  wrote:
>> Hi FreeBSD developers,
>>
>> I'm Midori Kato. I was working with Lars Eggert about DCTCP.
>> I would like to share our patch for an ECN marking mechanism on
>> dummynet, which I used for DCTCP testing.
>>
>> My implementation allows to set ECN with RED as an AQM scheme. The
>> following command is an example:
>> $ ipfw pipe  config red 1/10/10/0.0 ecn
>>
>> Our implementation includes both DCTCP and RFC 3168 ECN marking methodology.
>>
>> If you are interested in our ECN implemention, I'm very happy to receive
>> your review! (I have already submitted my patch to Luigi and hope he
>> will merge ours in near future.)
>>
>> Regards,
>> -- Midori
>>
>> ___
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: ECN marking implenetation for dummynet

2014-04-08 Thread Adrian Chadd
Hi! Cool! can you file a FreeBSD PR with this?


-a


On 2 April 2014 04:48, Midori Kato  wrote:
> Hi FreeBSD developers,
>
> I'm Midori Kato. I was working with Lars Eggert about DCTCP.
> I would like to share our patch for an ECN marking mechanism on
> dummynet, which I used for DCTCP testing.
>
> My implementation allows to set ECN with RED as an AQM scheme. The
> following command is an example:
> $ ipfw pipe  config red 1/10/10/0.0 ecn
>
> Our implementation includes both DCTCP and RFC 3168 ECN marking methodology.
>
> If you are interested in our ECN implemention, I'm very happy to receive
> your review! (I have already submitted my patch to Luigi and hope he
> will merge ours in near future.)
>
> Regards,
> -- Midori
>
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: panic in -HEAD multicast code

2014-04-08 Thread Adrian Chadd
On 8 April 2014 03:46, Julien Charbon  wrote:
>
>  Hi Adrian,
>
>
> On 08/04/14 10:25, Adrian Chadd wrote:
>>
>> Hm, how's this happening here? I'm not detaching the interface.
>
>
>  Hm, if your are positive that nothing is detaching the interface on your
> behalf (like a /etc/rc.d/netif restart somewhere), then it looks like you
> got an unrelated case that just drives the same stacktrace than the race
> condition we found.

Hm, interesting. I wonder who i have to beat over the head about this.



-a
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: miibus0: mii_mediachg: can't handle non-zero PHY instance 31

2014-04-08 Thread Chris H
> On Mon, Apr 07, 2014 at 09:40:53AM -0700, Chris H wrote:
>> > On Sun, Apr 06, 2014 at 10:49:27PM -0700, Chris H wrote:
>> >> > On Thu, Apr 03, 2014 at 01:18:19PM -0700, Chris H wrote:
>> >> >> > On Tue, Apr 01, 2014 at 05:53:51PM -0700, Chris H wrote:
>> >> >> >> > On Tue, Apr 01, 2014 at 01:40:58PM -0700, Chris H wrote:
>> >> >> >> >> > On Tue, 2014-04-01 at 13:19 -0700, Chris H wrote:
>> >> >> >> >> >> [...]
>> >> >> >> >> >> miibus0:  on nfe0
>> >> >> >> >> >> rlphy0:  PHY 0 on miibus0
>> >> >> >> >> >> rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
>> >> >> >> >> >> auto, auto-flow
>> >> >> >> >> >> rlphy1:  PHY 1 on miibus0
>> >> >> >> >> > [...]---big-snip--8<---
>> >> >> >> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 1
>> >> >> >> >> >>
>> >> >> >> >> >> As you can see, it looks much the same. I have no idea what
>> >> >> >> >> >> I should do to better inform the driver/kernel how to better
>> >> >> >> >> >> handle it. Or is it the driver, itself?
>> >> >> >> >> >>
>> >> >> >> >> >> Thank you again, for your thoughtful response.
>> >> >> >> >> >>
>> >> >> >> >> >> --Chris
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > I think the way to fix a phy that responds at all addresses is 
>> >> >> >> >> > to set a
>> >> >> >> >> > hint in loader.conf masking out the ones that aren't real, 
>> >> >> >> >> > like so:
>> >> >> >> >> >
>> >> >> >> >> >  hint.miibus.0.phymask="1"
>> >> >> >> >> >
>> >> >> >> >> > You might be able to set ="0x0001" to make it more clear 
>> >> >> >> >> > it's a
>> >> >> >> >> > bitmask, but I'm not sure of that.
>> >> >> >> >>
>> >> >> >> >> Thank you very much for the hint. I'll give it a shot.
>> >> >> >> >> Any idea why this is happening? I have 4 other MB's using the 
>> >> >> >> >> Nvidia
>> >> >> >> >> chipset, and the nfe(4) driver. But they don't respond this way.
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > If some nfe(4) variants badly behave in probing stage, this should
>> >> >> >> > be handled by driver.  We already have too many hints and tunables
>> >> >> >> > and I don't think most users know that.  In addition, adding
>> >> >> >> > additional NIC may change miibus instance number.
>> >> >> >> >
>> >> >> >> > Could you show me the output of 'kenv | grep smbios'?
>> >> >> >> Yes, of course.
>> >> >> >>
>> >> >> >> Here it is:
>> >> >> >>
>> >> >> >> smbios.bios.reldate="11/22/2010"
>> >> >> >> smbios.bios.vendor="American Megatrends Inc."
>> >> >> >> smbios.bios.version="V2.7"
>> >> >> >> smbios.chassis.maker="MSI"
>> >> >> >> smbios.chassis.serial="To Be Filled By O.E.M."
>> >> >> >> smbios.chassis.tag="To Be Filled By O.E.M."
>> >> >> >> smbios.chassis.version="2.0"
>> >> >> >> smbios.memory.enabled="2097152"
>> >> >> >> smbios.planar.maker="MSI"
>> >> >> >> smbios.planar.product="K9N6PGM2-V2 (MS-7309)"
>> >> >> >> smbios.planar.serial="To be filled by O.E.M."
>> >> >> >> smbios.planar.version="2.0"
>> >> >> >> smbios.socket.enabled="1"
>> >> >> >> smbios.socket.populated="1"
>> >> >> >> smbios.system.maker="MSI"
>> >> >> >> smbios.system.product="MS-7309"
>> >> >> >> smbios.system.serial="To Be Filled By O.E.M."
>> >> >> >> smbios.system.uuid="----406186cd4497"
>> >> >> >> smbios.system.version="2.0"
>> >> >> >> smbios.version="2.6"
>> >> >> >>
>> >> >> >> Hope this helps, and thank you for all your time, and trouble.
>> >> >> >>
>> >> >> >
>> >> >> > Thanks for the info. Try attached patch and let me know how it
>> >> >> > works.  Make sure to remove the hint(hint.miibus.0.phymask="1")
>> >> >> > set in loader.conf before testing it.
>> >> >>
>> >> >> Hello, and thanks for all the attention.
>> >> >> Sorry for the delay. I chose to perform a dump(8) before attempting
>> >> >> the KERn rebuild with the patch. But the kernel threw a read error
>> >> >> message on one of the drives. So I had to sort out the problem on
>> >> >> the drive before I could complete the dump. Then, of course I had
>> >> >> to reslice, and format another drive to replace the ailing one,
>> >> >> before I could perform a restore(8), and start the nfe patch; build
>> >> >> && install kernel. Weird; the drive had only a few hours on it.
>> >> >> Well, anyway. The patch applied cleanly. So I built, and installed
>> >> >> a new kernel with it. X's out the hint.miibus.0.phymask="0x0001"
>> >> >> in loader.conf(5), and bounced the box. Bad news:
>> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 31
>> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 30
>> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 29
>> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 28
>> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 27
>> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 26
>> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 25
>> >> >> miibus0: mii_mediachg: can't handle non-zero PHY instance 24
>> >> >> miibus0: mii_mediachg: can't

Re: panic in -HEAD multicast code

2014-04-08 Thread Julien Charbon


 Hi Adrian,

On 08/04/14 10:25, Adrian Chadd wrote:

Hm, how's this happening here? I'm not detaching the interface.


 Hm, if your are positive that nothing is detaching the interface on 
your behalf (like a /etc/rc.d/netif restart somewhere), then it looks 
like you got an unrelated case that just drives the same stacktrace than 
the race condition we found.


--
Julien

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: panic in -HEAD multicast code

2014-04-08 Thread Adrian Chadd
Hm, how's this happening here? I'm not detaching the interface.


-a
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


re(4) causes memory corruption?

2014-04-08 Thread Andriy Gapon

I have this network card (it's actually integrated into a motherboard):

re0:  port
0xde00-0xdeff mem 0xfdaff000-0xfdaf,0xfdae-0xfdae irq 18 at device
0.0 on pci2
re0: Using 1 MSI-X message
re0: Chip rev. 0x3c00
re0: MAC rev. 0x0040
miibus0:  on re0
rgephy0:  PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX,
1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, 
auto-flow

When there is little traffic through the interface I do not observe any problems
with it.
But within 15 seconds of applying some moderate traffic I would always observe a
heavy screen corruption often followed by a total freeze or a hardware 
self-reset.
An example of the moderate traffic is 6 MBytes/s which results in about 10K
interrupts per seconds.

I am not sure what causes the problem.  Could it be some driver using memoery
that it should not or hardware writing where it should not or if this something
completely in the hardware.
I will appreciate any hints on possible ways to analyze this issue.

Thanks!
-- 
Andriy Gapon
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"