Re: An interesting anomaly in NFS client...

2024-11-08 Thread George Neville-Neil



On 8 Nov 2024, at 7:58, Rick Macklem wrote:

> On Thu, Nov 7, 2024 at 9:41 PM George Neville-Neil  
> wrote:
>>
>>
>>
>> On 7 Nov 2024, at 13:59, Rick Macklem wrote:
>>
>>> On Thu, Nov 7, 2024 at 9:34 AM George Neville-Neil  
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 7 Nov 2024, at 4:15, Mark Saad wrote:
>>>>
>>>>>>
>>>>>> On Nov 7, 2024, at 12:29 AM, Andriy Gapon  wrote:
>>>>>>
>>>>>> On 07/11/2024 02:43, George Neville-Neil wrote:
>>>>>>> Howdy,
>>>>>>> We've been digging into an interesting possible issue in the FreeBSD 
>>>>>>> NFS client. Here is the scenario. I have a FreeBSD VM on my Mac, the 
>>>>>>> Mac is the NFS server, the VM is the client.
>>>>>
>>>>> What are you using to run the vm ? What architecture is the vm ? What 
>>>>> about the Mac ?
>>>>
>>>> qemu, aarch64, M3 Mac.
>>>>
>>>> I doubt this is the source of the issue.
>>>>
>>>> I was poking through the code and I wonder if a slight time skew might be 
>>>> an issue.  I'm going to check into that.  The VM and the Mac both us NTP 
>>>> to stay in sync with the world, but who knows...
>>> Hi George,
>>>
>>> I'll take a look at the packet trace later, but...
>>>
>>> If you can easily reproduce the issue, do a:
>>> # nfsstat -E -c -z
>>> - before reproducing it, and a
>>> # nfsstat -E -c
>>> - after. Then look at the Cache Info: at the end of the output.
>>>
>>
>> I'll give that a look, and the thing that Mark found is also interesting.  I 
>> might ask Warner about it tomorrow, we're both at the Dev Summit.
> When I looked at the packet trace, I saw a lot of GETATTRs
> for different directories. If they are different directories and not
> the same ones over and over again, caching will not be the issue.
> (Btw, the attribute caching code hasn't changed in decades, afaik.)
>

Looks like the answer is what Mark sent, and I talked to Warner and what we do 
now is, if not great, still the right thing, and just isn't so happy on NFS.  
We use NFS in our work on kernel development because we develop on VMs to 
start. Other than this pause, world builds on a modern (M3) laptop are as fast 
on an average server (hurray SoCs) and when the thing crashes it reboots in 
seconds, rather than 10 minutes which is how long a modern Dell server takes to 
do its hardware checks.

The shorter answer from some folks is "use 9pfs because NFS (server) on MacOS 
is slw" which I'll look into as well.

Thanks for all the help, it's been an interesting journey ;-)

> Have fun at the dev summit, rick
>

Doing our best!

Best,
George



Re: An interesting anomaly in NFS client...

2024-11-07 Thread George Neville-Neil



On 7 Nov 2024, at 13:59, Rick Macklem wrote:

> On Thu, Nov 7, 2024 at 9:34 AM George Neville-Neil  
> wrote:
>>
>>
>>
>> On 7 Nov 2024, at 4:15, Mark Saad wrote:
>>
>>>>
>>>> On Nov 7, 2024, at 12:29 AM, Andriy Gapon  wrote:
>>>>
>>>> On 07/11/2024 02:43, George Neville-Neil wrote:
>>>>> Howdy,
>>>>> We've been digging into an interesting possible issue in the FreeBSD NFS 
>>>>> client. Here is the scenario. I have a FreeBSD VM on my Mac, the Mac is 
>>>>> the NFS server, the VM is the client.
>>>
>>> What are you using to run the vm ? What architecture is the vm ? What about 
>>> the Mac ?
>>
>> qemu, aarch64, M3 Mac.
>>
>> I doubt this is the source of the issue.
>>
>> I was poking through the code and I wonder if a slight time skew might be an 
>> issue.  I'm going to check into that.  The VM and the Mac both us NTP to 
>> stay in sync with the world, but who knows...
> Hi George,
>
> I'll take a look at the packet trace later, but...
>
> If you can easily reproduce the issue, do a:
> # nfsstat -E -c -z
> - before reproducing it, and a
> # nfsstat -E -c
> - after. Then look at the Cache Info: at the end of the output.
>

I'll give that a look, and the thing that Mark found is also interesting.  I 
might ask Warner about it tomorrow, we're both at the Dev Summit.

Thanks,
George



Re: An interesting anomaly in NFS client...

2024-11-07 Thread George Neville-Neil



On 7 Nov 2024, at 4:15, Mark Saad wrote:

>>
>> On Nov 7, 2024, at 12:29 AM, Andriy Gapon  wrote:
>>
>> On 07/11/2024 02:43, George Neville-Neil wrote:
>>> Howdy,
>>> We've been digging into an interesting possible issue in the FreeBSD NFS 
>>> client. Here is the scenario. I have a FreeBSD VM on my Mac, the Mac is the 
>>> NFS server, the VM is the client.
>
> What are you using to run the vm ? What architecture is the vm ? What about 
> the Mac ?

qemu, aarch64, M3 Mac.

I doubt this is the source of the issue.

I was poking through the code and I wonder if a slight time skew might be an 
issue.  I'm going to check into that.  The VM and the Mac both us NTP to stay 
in sync with the world, but who knows...

Best,
George



Re: An interesting anomaly in NFS client...

2024-11-07 Thread George Neville-Neil



On 6 Nov 2024, at 21:28, Andriy Gapon wrote:

> On 07/11/2024 02:43, George Neville-Neil wrote:
>> Howdy,
>>
>> We've been digging into an interesting possible issue in the FreeBSD NFS 
>> client. Here is the scenario. I have a FreeBSD VM on my Mac, the Mac is the 
>> NFS server, the VM is the client. I then attempt to build an out of tree 
>> kernel module that I'm working on. The build looks like it's hanging for 1.5 
>> seconds, and when we look at the packets (pcap file attached) we see a ton 
>> of GETATTRs over the first 1.5 seconds. I've put the pcap up here: 
>> oct_8_2024.pcapng 
>>
>> I also note that an issue was raised on the forums that seems similar, way 
>> back between FreeBSD 10 and 11:
>>
>> https://forums.freebsd.org/threads/nfs-cache-misses-after-upgrading-to-11-1- 
>> from-10-3.65491/ <https://forums.freebsd.org/threads/nfs-cache-misses-after- 
>> upgrading-to-11-1-from-10-3.65491/>
>>
>> I'm seeing this on 15 currentish (last few months).
>
> Could it be just make checking for stale targets?
> I.e., stat-ing various files to check their timestamps.
>

15,000 of them?  Seems excessive for a kernel module of 10 files, but maybe 
it's walking /usr/obj ?

Best,
George



An interesting anomaly in NFS client...

2024-11-06 Thread George Neville-Neil

Howdy,

We've been digging into an interesting possible issue in the FreeBSD NFS 
client.  Here is the scenario.  I have a FreeBSD VM on my Mac, the Mac 
is the NFS server, the VM is the client.  I then attempt to build an out 
of tree kernel module that I'm working on.  The build looks like it's 
hanging for 1.5 seconds, and when we look at the packets (pcap file 
attached) we see a ton of GETATTRs over the first 1.5 seconds.  I've put 
the pcap up here: 
[oct_8_2024.pcapng](https://people.freebsd.org/~gnn/oct_8_2024.pcapng)


I also note that an issue was raised on the forums that seems similar, 
way back between FreeBSD 10 and 11:


https://forums.freebsd.org/threads/nfs-cache-misses-after-upgrading-to-11-1-from-10-3.65491/

I'm seeing this on 15 currentish (last few months).

Best,
George




Re: compressed TIME-WAIT to be decomissioned

2022-01-12 Thread George Neville-Neil
Removed current@ given your comment below.

On 12 Jan 2022, at 13:48, Gleb Smirnoff wrote:

>   Hi!
>
> [crossposted to current@, but let's keep discussion at net@]
>
> I have already touched the topic with rrs@, jtl@, tuexen@, rscheff@ and
> Igor Sysoev (author of nginx).  Now posting for wider discussion.
>
> TLDR: struct tcptw shall be decomissioned
>
> Longer version covers three topics: why does tcptw exist? why is it no
> longer necessary? what would we get removing it?
>
> Why does struct tcptw exist?
>
> When TCP connection goes to TIME-WAIT state, it can only retransmit
> the very last ACK, thus doesn't need all of the control data in the kernel.
> However, we are required to keep it in memory for certain amount of time
> (2*MSL). So, let's save memory: free the socket, free the tcpcb and
> leave only inpcb that will point at small tcptw (much smaller than tcpcb)
> that holds enough info to retransmit the last ACK. This was done in
> early 2003, see 340c35de6a2.
>
> What was different in 2003 compared to 2022?
>
> * First of all, internet servers were running i386 with only 2 Gb of KVA
>   space. Unlike today, they were memory constrained in the first place, not
>   CPU bound like they are today.
>
> * Many of HTTP connections were made by older browsers, which were not able
>   to use persistent HTTP connections.  Those browsers that could, would
>   recycle connections more often, then today.  Default timeouts in Apache
>   for persistent connections were short.  So, the ratio of connections
>   in TIME-WAIT compared to live connections was much bigger than today.
>   Here is sample data from 2008 provided to me by Igor Sysoev:
>
>   ITEM SIZE LIMIT  USED  FREE  REQUESTS  FAILURES
>   tcpcb:728,   163840,22938,72722, 13029632,0
>   tcptw: 88,   163842,10253,72949,  2447928,0
>
>   We see that TIME-WAITs are ~ 50% of live connections.
>
>   Today I see that TIME-WAITs are ~ 1% of connections. My data is biased
>   here, since I'm looking at servers that do mostly video streaming. I'd
>   be grateful if anybody replies to this email with some other modern data
>   on ratio between tcpcb and tcptw allocations.
>
> * The Internet bandwidth was lower and thus average size of HTTP object
>   much smaller.  That made the average send socket buffer size much smaller
>   than today.  Note that TCP socket buffers autosizing came in 2009 only.
>   This means that today most significant portion of kernel memory consumed
>   by an average TCP connection is the send socket buffer, and
>   socket+inpcb+tcpcb is just a fraction of that.  Thus, swapping tcpcb to
>   tcptw we are saving a fraction of a fraction of memory consumed by average
>   connection.
>
> * Who told that 2*MSL (60 seconds) is adequate time to keep TIME-WAIT?
>   In 71d2d5adfe1 I added some stats on usage of tcptw and experimented a bit
>   with lowering net.inet.tcp.msl. It appeared that lowering it down three
>   times doesn't have statistically significant effect on TIME-WAIT use stats.
>   This means that the already miniscule number of TIME-WAIT connection on a
>   modern HTTP server can be lowered 3 times more.  Feel free to lower
>   net.inet.tcp.msl and do your own measurements with
>   'netstat -sp tcp | grep TIME-WAIT'.  I'd be glad to see your results.

The origin of the 2*MSL is pretty old and from a different type of network, 
but, my understanding of your proposal is not a change to this value anyway, is 
that correct?  The removal of tcptw is a separate issue, if I understand you 
correctly.

> Ok, now what would removal give us?
>
> * One less alloc/free during socket lifetime (immediately).
> * Reduced code complexity. inp->inp_ppcb always can be dereferenced as tcpcb.
>   Lot's of checking for inp->inp_flags & INP_TIMEWAIT goes away (eventually).
> * Shrink of struct inpcb. Today inpcb has some TCP-only data, e.g. HPTS.
>   Reason for that is obvious - compressed TIME-WAIT. A HPTS-driven connection
>   may transition to TIME-WAIT, so we can't use tcpcb. Now we would be able to.
>   So, for non TCP connections memory footprint shrinks (with following 
> changes).
> * Embedding inpcb into protocols cb. An inpcb becomes one piece of memory with
>   tcpcb. One more less alloc/free during socket lifetime. Reduced code
>   complexity, since now inpcb == tcpb (following changes).
>
> How much memory are we going to lose?
>
> (kgdb) p tcpcb_zone->uz_keg->uk_rsize
> $5 = 1064
> (kgdb) p tcptw_zone->uz_keg->uk_rsize
> $6 = 72
> (kgdb) p tcpcbstor->ips_zone->uz_keg->uk_rsize
> $8 = 424
>
> After change a connection in TIME-WAIT would consume 424+1064 bytes instead
> of 424+72. Multiply that by expected number of connections in TIME-WAIT on
> your machine.
>
> Comments welcome.

This all seems fine and I'm interested to see the proposed patch.  Even the 
smallest embedded machines that FreeBSD runs on without modification (i.e. just 
install/run) have plenty of memory a

Re: Client Networking Issues / NIC Lab

2021-05-06 Thread George Neville-Neil




On 23 Apr 2021, at 18:12, Kevin Bowling wrote:

On Fri, Apr 23, 2021 at 6:19 AM Rick Macklem  
wrote:


Kyle Evans wrote:
On Fri, Apr 23, 2021 at 12:22 AM Kevin Bowling 
 wrote:


Greetings,

[... snip ...]

Tehuti Networks seems to have gone out of business.  Probably not
worth worrying about.



That's unfortunate. I had a box of their 10G NICs and I got them to
put a driver up for review[0][1], but they weren't very responsive 
and

the existing codebase was in pretty rough shape.

Beyond that, your #3 seems to be the most appealing. #2 could 
probably

work in the mid-to-long term, but we'd likely be better off
bootstrapping interest with solid community-supported drivers then
reaching out to vendors once we can demonstrate that plan field of
dreams can work and drive some substantial amount of business.


I'll admit to knowing nothing about it, but is using the linuxKPI
to port Linux drivers into FreeBSD feasible?


Hi Rick,

I did consider this but do not think it makes sense for PCI Ethernet
NIC drivers.  I will explain my judgement for consideration.  In
complex systems such as an Ethernet driver, there are intrinsic and
extrinsic complexity.  The intrinsic properties of an Ethernet driver
are small enough that one person can understand them.  So we spend a
lot of time fighting against extrinsic problems that I outlined in my
email. Put in simpler terms, an iflib driver can be written by one
person and there are a number of people that are good at this in the
community.  The intrinsic complexity of the LKPI on top of an Ethernet
driver, as well as some license and social problems people have with
LKPI lead it to be a worse fit.

If you apply this to drm+i915 etc it is illuminating why the Linux KPI
is the right approach.  The intrinsic properties of the graphics stack
are beyond time and practicality for most in the community, the
graphics drivers have become labyrinths that most kernel devs don't
have internal knowledge of, rival the size of the rest of the kernel,
and keeping up is easier if internal code changes can be kept to a
minimum.


Obviously, given the size of the Linux community, it seems
more likely that it will have a driver that handles many chip
variants, plus updates for newer chips, I think.


I would agree that Linux has a much better Realtek driver.  I am
familiar with the Linux e1000 series for instance, and although they
tend to have most the workarounds the quality is a lot lower than most
users realize.


I do agree that having drivers that at least work for the
basics (maybe no Netmap, TSO, or similar) for the
commodity chips would make it easier for new adopters
of FreeBSD. (I avoid the problem by finding old, used
hardware. The variants of Intel PRO1000 and re chips I
have work fine with the drivers in FreeBSD13/14.;-)


Having good inbox network drivers is a way for FreeBSD to
differentiate itself.  I like nice drivers like cxgbe(4), it is a
great piece of engineering and to me even artful.  Consider some cxgbe
so you can test high speeds :)


Oh, and if TSO support is questionable, I think it would be
better to leave it disabled and at least generate a warning
when someone enables it, if it can be enabled at all.


I would like to preserve and correct TSO and other offloads as much as
possible.  There are consequences to half assing it such as burning
more electricity than necessary and causing unnecessary HW
upgrade/replacement.  Of course, where we can't deliver, we should
limit the feature set to known good ones.  Striking this balance will
require more feedback from the community, with faster turnaround time
on PRs.


Good luck with it, rick

Thanks,

Kyle Evans

[0] https://reviews.freebsd.org/D18856
[1] https://reviews.freebsd.org/D19433
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to 
"freebsd-net-unsubscr...@freebsd.org"




On the NIC Lab question, anyone on the project (and some off) can use 
the lab we built for high performance networking at Sentex.  This lab 
has plenty of machines and excellent remote hands:


https://wiki.freebsd.org/TestClusterOnePointers

https://wiki.freebsd.org/TestClusterOneReservations

Folks are welcome to contact me off list for access.

Best,
George
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: tcp-testsuite into src?

2021-03-23 Thread George Neville-Neil



On 22 Mar 2021, at 22:45, Alan Somers wrote:

> On Mon, Mar 22, 2021 at 7:31 PM Kevin Bowling 
> wrote:
>
>> Hi,
>>
>> I was talking with gnn and kevans on IRC about the tcp testsuite
>> (https://github.com/freebsd-net/tcp-testsuite).
>>
>> Currently we maintain this in ports, although the way the port is set
>> up doesn't make a lot of sense because the tests are stack specific
>> and we don't account for different FreeBSD versions let alone vendor
>> trees.  It seems reasonable to me to pull the tests themselves (i.e.
>> https://github.com/freebsd-net/tcp-testsuite) into src where they can
>> follow along with the tree they are running on, and provide vendors a
>> natural point of extension.
>>
>> /usr/tests has some existing examples of relying on out of tree
>> binaries to run so I am not convinced we need to import packetdrill
>> itself but I don't have a strong preference.  tuexen, do you have any
>> preference?
>>
>> Regards,
>> Kevin
>> ___
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>
>
> Yeah, it's not a problem to use binaries from ports in /usr/tests.  As long
> as the tests can compile they can live in the base system.  Is there a
> strong incentive to import them?  Do they need to be adjusted for each
> release?

I found out this morning that moving the tests into base is indeed the plan:

https://wiki.freebsd.org/TransportProtocols/11March2021

I'm happy to see this happen.

The next step will be documentation of how to add new/good tests to the suite.

Best,
George
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D26757: Fix to join AllHost mcast group again when adding an existing IP address

2020-10-13 Thread gnn (George Neville-Neil)
gnn accepted this revision.

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST ACTION
  https://reviews.freebsd.org/D26757/new/

REVISION DETAIL
  https://reviews.freebsd.org/D26757

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: yannis.planus_alstomgroup.com, #network, mw, ae, gnn
Cc: ae, imp, freebsd-net-list, melifaro, rscheff
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Status of Vector Packet Processing (VPP) portability into FreeBSD

2018-09-26 Thread George Neville-Neil
So, a straight up port is possible but tedious (duh) due to its 
Linuxism.  I think someone else may have taken this up after I stopped 
working on it, so let's see if anyone else replies.


Best,
George

On 26 Sep 2018, at 11:04, Jordan Caraballo wrote:


Hi George,

I am mainly interested in using it in combination with DPDK over 
FreeBSD

for routing purposes.

If you have suggestions over possibly hacking it to make it available 
for
FreeBSD, or to continue your work, I am more than willing to get 
advice.
This might be a worth to-do thing in the future term, so I would not 
mind

spending some time over it.

Regards,
- Jordan

El mié., 26 sept. 2018 a las 10:58, George Neville-Neil (<
g...@neville-neil.com>) escribió:


Indeed I have not had

On 26 Sep 2018, at 8:37, Jordan Caraballo wrote:


Hi guys,

I am wondering about the status of patching VPP for FreeBSD. I saw
that
George Neville-Neil started some work:
https://github.com/gvnn3/vpp-old/tree/freebsd, but it is outdated
(last
commit was 2 years ago).



Indeed I've not had the time to work on this.

Is this a near future plan? Do you have ongoing projects working 
with

this?



Are you interested in hacking on it or on using it?  I'd be happy to
review patches and the like if you're hacking on it.

Best,
George




--
Jordan



___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Status of Vector Packet Processing (VPP) portability into FreeBSD

2018-09-26 Thread George Neville-Neil

Indeed I have not had

On 26 Sep 2018, at 8:37, Jordan Caraballo wrote:


Hi guys,

I am wondering about the status of patching VPP for FreeBSD. I saw 
that

George Neville-Neil started some work:
https://github.com/gvnn3/vpp-old/tree/freebsd, but it is outdated 
(last

commit was 2 years ago).



Indeed I've not had the time to work on this.

Is this a near future plan? Do you have ongoing projects working with 
this?




Are you interested in hacking on it or on using it?  I'd be happy to 
review patches and the like if you're hacking on it.


Best,
George
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D9847: Try to extract the RFC1048 data from PXE

2017-03-02 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a reviewer: gnn.
This revision has a positive review.

REVISION DETAIL
  https://reviews.freebsd.org/D9847

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: kczekirda, oshogbo, bapt, tsoome, glebius, freebsd-net-list, #network, gnn
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D8905: if: Defer the if_up until the ifnet.if_ioctl is called.

2017-01-05 Thread gnn (George Neville-Neil)
gnn accepted this revision.
This revision has a positive review.

REVISION DETAIL
  https://reviews.freebsd.org/D8905

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, adrian, hiren, bz, glebius, karels, 
gnn
Cc: jhb, freebsd-net-list
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D5872: tcp: Don't prematurely drop receiving-only connections

2016-04-20 Thread gnn (George Neville-Neil)
gnn added a comment.


  Not my comment "once everyone agrees" :-)

REVISION DETAIL
  https://reviews.freebsd.org/D5872

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, glebius, adrian, delphij, 
decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com, 
freebsd-net-list, transport, jtl, hiren, lstewart
Cc: gnn, mike-karels.net, jtl
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] D5872: tcp: Don't prematurely drop receiving-only connections

2016-04-20 Thread gnn (George Neville-Neil)
gnn added a comment.


  Let's keep this moving along.  Mike isn't (yet) a committer but if someone 
can commit this once everyone agrees that would be great.

REVISION DETAIL
  https://reviews.freebsd.org/D5872

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, glebius, adrian, delphij, 
decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com, 
freebsd-net-list, transport, jtl, hiren, lstewart
Cc: gnn, mike-karels.net, jtl
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: MPTCP for FreeBSD repository on BitBucket/v0.51 update

2015-10-22 Thread George Neville-Neil



On 18 Oct 2015, at 23:42, Nigel Williams wrote:


Hi,

The MPTCP code is now available as a mercurial repository:
- Repository: https://bitbucket.org/nw-swin/caia-mptcp-freebsd
- Wiki: https://bitbucket.org/nw-swin/caia-mptcp-freebsd/wiki/

For those interested in trying the implementation/looking at the code, 
this should hopefully make the process a little easier (and save 
having to patch in updates). It should also make it possible to 
contribute code for those wishing to do so.


Some details:
- Has been branched off 'freebsd-head' at 
'http://hg-beta.freebsd.org/base', and will be merged on a weekly 
basis.
- I will be working off this repository so it will be up-to-date with 
recent changes.

- In place of patch releases, release versions will now be tagged.
- I'll also start to populate the 'Issues' section so that there is a 
better picture of current bugs/things TBD.


The version has also been updated to v0.51. See:
- http://caia.swin.edu.au/newtcp/mptcp/tools.html
- OR https://bitbucket.org/nw-swin/caia-mptcp-freebsd/wiki/Home

Functionally-wise this hasn't changed from the previous version, but 
has been merged with a recent revision of head.




Very nice!  Just wondering how you're testing this out.  I've been 
working on a lot of networking

tests and I'm sure MPTCP introduces some interesting complications.

Thanks,
George
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: BACnet on FreeBSD ?

2015-08-04 Thread George Neville-Neil



On 31 Jul 2015, at 9:26, Kurt Jaeger wrote:


Hi!

Has anyone ever worked with BACnet on FreeBSD ?

The Organisation:

http://www.bacnet.org/

The ISO norm: ISO 16484-5

There's a European SIG:

http://www.big-eu.org/

There's a protocol stack (which needs more massaging to build on 
FreeBSD):


http://sourceforge.net/projects/bacnet/
https://github.com/stargieg/bacnet-stack

Thanks for any pointers!



I don't know of anyone working on this actively.  Are you going to work 
on a port?


Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: remove IPsec SKIPJACK support...

2015-07-29 Thread George Neville-Neil
That's fine so long as its removed in HEAD now, and then the warning can 
go into 10 aka 10.3.


Best,
George


On 28 Jul 2015, at 13:25, Adrian Chadd wrote:


Hi,

I'd put together a deprecation plan, which starts with the kernel
warning that this stuff is being removed, MFC that to stable/10 and
stable/9 so people aren't surprised when they upgrade, and then have
it removed in 11.



-adrian


On 28 July 2015 at 04:34, Daniel Plominski  
wrote:
instead of code to remove it is a better idea manuals to revise, 
people

depend on old recommendations like
https://www.freebsd.org/doc/handbook/ipsec.html

would be better:
https://blog.plitc.eu/2014/freebsd-10-ipv4-vpn-relay-ipsec-entryopenvpn-middleopenvpn-exit-node-mit-jails/

or the racoon example from:
https://blog.plitc.eu/2014/freebsd-10-ipv4-ipsec-net-to-net-vpn-in-der-jail/

best regards

Daniel


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Closed] D1777: Associated fix for arp/nd6 timer usage.

2015-06-18 Thread gnn (George Neville-Neil)
gnn closed this revision.
gnn added a comment.

I believe we can close this.


REVISION DETAIL
  https://reviews.freebsd.org/D1777

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: rrs, imp, rwatson, lstewart, kib, adrian, jhb, bz, sbruno, gnn
Cc: ae, bz, freebsd-net-list, emaste, hiren, julian, hselasky
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Congestion Control Modification

2015-05-12 Thread George Neville-Neil

Sounds good.

Best,
George

On 1 May 2015, at 1:21, Karlis Laivins wrote:


Hello George,

Thank you for the tip! I have set up a virtual test environment with 
IMUNES
(interesting tool, by the way) and now I am running validation tests, 
to

see, if the results there are at least similar to those that can be
achieved on a physical testbed.

I will let you know if and when the implementation will be done as I 
will

certainly need objective feedback.

BR,
Karlis

On Fri, May 1, 2015 at 12:06 AM, George Neville-Neil 


wrote:

If you want to run some experiments, though, you could look at 
running PTPd

on 3 servers (master, and two slaves) which will get you decent
synchronization
among the three.  Where decent is less than the typical RTT of a TCP
packet on a
1Gbps LAN.

Best,
George


On 30 Apr 2015, at 14:48, Karlis Laivins wrote:

Yes, you are correct, I meant to write "relative OWD". As David Hayes 
put
it - "Relative OWD measurements are easier, and clock drift is not 
usually

a problem over the time it takes to send and receive an ACK".

Thank you for the correction!

On Thu, Apr 30, 2015 at 4:19 PM, Eggert, Lars  
wrote:


On 2015-4-30, at 15:04, Karlis Laivins  
wrote:



I have yet to solve the issue of
how to get the One Way Delay for the ACK message (the time it 
takes ACK



to


arrive from receiver of the ACK'ed data sender) correctly.



That won't work without synchronized clocks, which you can't really
assume
to be present.

Lars




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Congestion Control Modification

2015-04-30 Thread George Neville-Neil
If you want to run some experiments, though, you could look at running 
PTPd
on 3 servers (master, and two slaves) which will get you decent 
synchronization
among the three.  Where decent is less than the typical RTT of a TCP 
packet on a

1Gbps LAN.

Best,
George

On 30 Apr 2015, at 14:48, Karlis Laivins wrote:

Yes, you are correct, I meant to write "relative OWD". As David Hayes 
put
it - "Relative OWD measurements are easier, and clock drift is not 
usually

a problem over the time it takes to send and receive an ACK".

Thank you for the correction!

On Thu, Apr 30, 2015 at 4:19 PM, Eggert, Lars  wrote:

On 2015-4-30, at 15:04, Karlis Laivins  
wrote:

I have yet to solve the issue of
how to get the One Way Delay for the ACK message (the time it takes 
ACK

to

arrive from receiver of the ACK'ed data sender) correctly.


That won't work without synchronized clocks, which you can't really 
assume

to be present.

Lars

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


SIFTR and DTrace

2015-04-30 Thread George Neville-Neil

Howdy,

I have added support for a DTrace SDT to the SIFTR module in HEAD.  What 
this means is that you can
now get SIFTR data filtered out of the kernel directly.  I also added a 
simple script (share/dtrace/siftr) to
show how this works.  The test script is very wordy and only an example 
of how to use this.
In order to use SIFTR with DTrace either load the modules, dtraceall and 
siftr, or compile them

into the kernel.  Here is some example output:

sudo ./siftr
direction in state state-established local 22 remote 55907
snd_cwnd 22978 snd_wnd 131008 rcv_wnd 66608 snd_bwnd 0 snd_ssthresh 
1073725440

max_seg_size 1448 smoothed_rtt 11 sack_enabled 1
snd_scale 5 rcv_scale 6 flags 0x3e4 rxt_length 230
snd_buf_hiwater 33304 snd_buf_cc 0 rcv_buf_hiwater 66608
rcv_buf_cc 0 sent_inflight_bytes 0 t_segqlen 0
flowid 0 flowtype 0

Using a DTrace predicate you can select a particular flow based on, for 
instance, the local and
remote ports.  I have not put in the IP address reporting as yet nor 
have I added the ability
to pull out the timeval recorded by SIFTR.  Since the trace point is in 
the code where the

trace is taken it is possible to use DTrace timestamps natively.

Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Congestion Control Modification

2015-04-30 Thread George Neville-Neil
Are you planning to put the source to this up on a repo (github) or send 
along patches?

It would be good to get a look at this new algorithm.

Thanks,
George


On 21 Apr 2015, at 5:30, Karlis Laivins wrote:


Hi,

Thank you very much for such a comprehensive description of how to 
properly
create a loadable Kernel module, this does the trick - I was able to 
create
the module successfully, load it into Kernel and set it as the 
congestion
control algorithm used via sysctl net.inet.tcp.cc.algorithm=new (new - 
the

name of my test module).

Best Regards,
Karlis

On Tue, Apr 21, 2015 at 3:05 AM, grenville armitage 


wrote:


Hi,


On 04/18/2015 16:59, Karlis Laivins wrote:


Hello,

I have read an interesting publication about a proposed modification 
of

TCP
Congestion Control algorithm that would allow to improve it (the CC) 
by
dynamic bandwidth estimation. The idea seems so interesting that I 
would

like to try to implement it by modifying the NewReno code.

Do I understand correctly that to do the above stated, I would 
create a
copy of source file (in my case - cc_newreno.c) located in 
/usr/src/sys/
and rename it to, for example, cc_newreno_test.c and make changes to 
it?
How would I then compile it, and how would I create a 
newreno_test.ko file

that can be loaded into Kernel and tested?

Thank you in advance for your answers!



In case this helps shed some (probably incomplete) light, here are 
the

steps
I took late last year to make a modified version of NewReno:

I start with copying the newreno module under 
sys/netinet/cc/cc_newreno.c

as a template. The new source file will be called newrenoVarBeta.c

/usr/src/sys/netinet/cc % cp cc_newreno.c cc_newrenoVarBeta.c
/usr/src/sys/netinet/cc %

Then create a modules definition based on 
/usr/src/sys/modules/cc/cc_cubic

(because there isn't one for newreno per se)

/usr/src/sys/netinet/cc % cd /usr/src/sys/modules/cc
/usr/src/sys/modules/cc % mkdir cc_newrenoVarBeta
/usr/src/sys/modules/cc % cp cc_cubic/Makefile cc_newrenoVarBeta/
/usr/src/sys/modules/cc %

Tweak the cc_newrenoVarBeta/Makefile to say:

KMOD=   cc_newrenoVarBeta
SRCS=   cc_newrenoVarBeta.c

Made my changes to cc_newrenoVarBeta.c (including changing the 
module's

name from 'newreno' to something else)

Then built/installed the new module with:

/usr/src/sys/netinet/cc % cd 
/usr/src/sys/modules/cc/cc_newrenoVarBeta

/usr/src/sys/modules/cc % make clean && make && make install
 [..build and install output..]
/usr/src/sys/modules/cc %

All being well, cc_newrenoVarBeta.ko should now exist under 
/boot/kernel


Then use 'kldload cc_newrenoVarBeta.ko' to load your new CC algorithm

If all goes well, your new module will appear (with whatever name you 
gave
it) in the sysctl net.inet.tcp.cc.available list. Don't forget to 
actually
select your new module with sysctl net.inet.tcp.cc.algorithm when 
running

experiments.

cheers,
gja



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to 
"freebsd-net-unsubscr...@freebsd.org"



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Updated] D1944: PF and VIMAGE fixes

2015-04-14 Thread gnn (George Neville-Neil)
gnn added a comment.

Any update on this?

REVISION DETAIL
  https://reviews.freebsd.org/D1944

To: nvass-gmx.com, bz, zec, trociny, glebius, rodrigc, kristof, gnn
Cc: freebsd-virtualization, freebsd-pf, freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: FreeBSD responding with wrong receiving interface IP

2015-03-10 Thread George Neville-Neil

On 10 Mar 2015, at 11:26, Paul S. wrote:


Hi,

I've been deploying FreeBSD as customer edge routers for customers 
with sites that do not require high throughput (>1g/s).


Each site has two ISPs (Mostly Telstra + Verizon/Optus), and take full 
routes via OpenBGPd and BIRD. I use next-hop self on all received 
routes.


The FreeBSD boxes have static routes delegating the announced IP 
blocks to a L3 switch down the road. i.e: route add -net 10.100.1.0/24 
10.0.0.1, and then that /24 is originated via BGP to both upstreams.


Things in general work fine, but I've been receiving reports of 'weird 
traceroute results' from my customers.


Examples of this would be,

1 some.random.isp (...) (...)
2  gigabitethernet3-3.exi1.melbourne.telstra.net (203.50.77.49) 0.309 
ms  0.284 ms  0.227 ms
3  bundle-ether3-100.exi-core10.melbourne.telstra.net (203.50.80.1)  
1.966 ms  1.675 ms  1.852 ms
4  bundle-ether12.chw-core10.sydney.telstra.net (203.50.11.124) 16.707 
ms  15.917 ms  16.360 ms

5  customer-gw.syd.ALTER.net (...) (...)

This traceroute seems to claim that the packet was received over the 
Verizon gateway, which in reality it was not -- it was received 
directly over the Telstra interface, but my outbound AS-PATH towards 
some.random.isp uses Verizon.


So FreeBSD replies back with the Verizon address. Another person 
having the same issue (mostly, but on OpenBSD) can be found at 
http://openbsd.7691.n7.nabble.com/BGP-responding-with-wrong-IP-address-td90264.html


I would love to know if there's a way to fix this, or if I've missed 
something, or if there's something wrong in the way I set it up.


Thank you for taking the time to read.


I wonder if we could see some routing tables?  That might help.

Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1965: Add extended media types to if_media.h and ifconfig

2015-02-28 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.

BTW Mike Karels was in favor of this in an email thread.  He's not yet on 
phabricator but I'll ask him here as well.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1965

To: erj, adrian, jfvogel, gnn
Cc: glebius, freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Adding new media types to if_media.h

2015-02-09 Thread George Neville-Neil

On 8 Feb 2015, at 22:41, Mike Karels wrote:


Sorry to reply to a thread after such a long delay, but I think it is
unresolved, and needs more discussion.  I'd like to elaborate a bit on
my goals and proposal.  I believe Adrian has newer thoughts than have 
been

circulated here as well.

The last message(s) have gone to freebsd-arch and freebsd-net.  If 
someone
wants to pick one, we could consolidate, but this seems relevant to 
both.


I'm going to top-post to try to summarize and extend the discussion, 
but the

preceding emails follow for reference.

To recap: the existing if_media interface is running out of steam, at 
least
in that the "Media variant" field, with 5 bits, is going to be 
insufficient
to express existing 40 Gb/s variants.  The if_media media type is a 
32-bit
int with a bunch of sub-fields for type (e.g. Ethernet), 
subtype/variant
(e.g.  10baseT, 10base5, 1000baseT, etc), flags, and some MII-related 
fields.


I made a proposal to extend the interface in a small way, specifically 
to
replace the "media word" with a 64-bit int that is mostly the same, 
but
has a new, larger variant/subtype field.  The main reason for this 
proposal
is to maintain the driver KPI (glimpse showed me 240 inclusions of 
if_media.h
in the kernel in 8.2).  That interface includes an initialization 
using a

scalar value of fields ORed with each other.  It would also be easy to
preserve a 32-bit user-level API/ABI that can express most of the 
current
state, with a subtype/variant field value reserved for "other" (there 
is

already one for "unknown", but that is not quite the same).  fwiw, I
found 45 references to this user-level API in our tree, including both
base and "ports"-type software, which includes libpcap, snmpd, 
dhclient,
quagga, xorp, atm, devd, and rtsold, which argues for a 
backward-compatible
API/ABI as well as a more-complete current interface for ifconfig at 
least.


More generally, I see two problems with the existing if_media 
interface:


1. It doesn't have enough bits for all the fields, in particular, 
variant/

subtype for Ethernet.  That is the immediate issue.

2. The interface is not sufficiently generic; it was designed around 
Ethernet
including MII, token ring, FDDI, and a few other interface types.  
Some of
the fields like "instance" are primarily for MII as far as I know, and 
are
basically unused.  It is definitely not sufficient for 802.11, which 
has

rolled its own interfaces.

To solve the second problem, I think the right approach would be to 
reduce
this interface to a truly generic one, such as media type (e.g. 
Ethernet),
generic flags, and perhaps generic status.  Then there should be a 
separate
media-specific interface for each type, such as Ethernet and 802.11.  
To a
small extent, we already have that.  Solving the second, more general 
problem,
requires a whole new driver KPI that will require surgery to every 
driver,

which is not an exercise that I would consider.

Using a separate int for each existing field, as proposed, would break 
the
driver KPI, but would not really make the interface generic.  Trying 
to
make a single interface with the union of all network interface 
requirements
seems like a bad idea to me (we failed last time; the "we" is BSDi, 
where
I was the architect when this interface was first designed).  (No, I 
didn't

design this interface.)

Solving the first problem only, I think it is preferable to preserve a
compatible driver KPI, which means using a scalar value encoding what 
is
necessary.  Although that interface is rather Ethernet-centric, that 
is

really what it is used for.

An additional, selfish goal is to make it easy to back-port drivers 
using

the new interface to older versions (which I am quite likely to do).
Preserving the KPI and general user API will be highly useful there.
I'd be likely to do a 11-style version of ifconfig personally, but it
might not be difficult to do in a more general way.

I am willing to do a prototype for -current for evaluation.

Comments, alternatives, ?


I agree with your statements above and I'd like to see the prototype.

Best,
George



Mike


From: Eric Joyner 
Date: January 6, 2015 4:50:28 PM CST
To: m...@karels.net
Cc: Adrian Chadd , Rui Paulo , 
"freebsd-net@freebsd.org" , Eric Joyner 
, John Baldwin , Jack Vogel 
, "freebsd-a...@freebsd.org" 


Subject: Re: Adding new media types to if_media.h

Then going with what Mike says about leaving backwards compatibility, 
I

suppose we should do something like what Adrian suggested, and add a
separate struct. We can indicate that the separate struct exists by 
setting
the media type value in the if_media word to 0xc0, since that's the 
last
unused value. Then existing programs won't have to support any new 
features

if they don't want to.

Adrian -- where can I look for more information on what net80211 does 
for
media types? Do you mean what it does with MCS values, or something 
more;
I've looked at it a bit, but I do

Re: IEE1588/PTP support for NIC drivers ?

2015-02-09 Thread George Neville-Neil

On 5 Feb 2015, at 20:50, Olivier Cochard-Labbé wrote:


Hi,

Some network cards support IEE1588 hardware timestamp (like some Intel
card), but their drivers didn't support this feature.
I beleive there is a kernel feature missing for this suppport.

Searching on the archive's mailing-list, I've found this post about 
some

legal issue:
https://lists.freebsd.org/pipermail/freebsd-net/2007-October/015512.html

Is still a legal problem or just a missing feature ?



Missing feature.

We need an API and the like to get this going.  I've taken various stabs 
at it in the past
and will likely do so again but if anyone has working code I'd be more 
than happy to review.


Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Differential] [Accepted] D1691: sfxge: using 64-bit access for x86-64

2015-02-05 Thread gnn (George Neville-Neil)
gnn accepted this revision.
This revision is now accepted and ready to land.

REVISION DETAIL
  https://reviews.freebsd.org/D1691

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1694: sfxge: Move txq->next pointer to part writable on completion path

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to put Approved by: gnn (mentor) in the commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1694

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Requested Changes To] D1697: sfxge: Expect required init_state on data path and in periodic calls

2015-01-29 Thread gnn (George Neville-Neil)
gnn requested changes to this revision.
gnn added a comment.
This revision now requires changes to proceed.

__predict_false rarely, if ever, does the right thing.  Have you run any 
benchmarks to show that this improves performance?

REVISION DETAIL
  https://reviews.freebsd.org/D1697

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1708: sfxge: Separate software Tx queue limit for non-TCP traffic

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to put Approved by: gnn (mentor) in the commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1708

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1698: sfxge: Make it possible to build without EVQ statistics

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to put Approved by: gnn (mentor) in the commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1698

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1699: sfxge: Remove extra cache-line alignment and reorder sfxge_evq_t

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to put Approved by: gnn (mentor) in the commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1699

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Requested Changes To] D1707: sfxge: access statistics buffers under port lock

2015-01-29 Thread gnn (George Neville-Neil)
gnn requested changes to this revision.
gnn added a comment.
This revision now requires changes to proceed.

If you look at other drivers you'll see they have #define'd macros for the 
locks, rather than direct calls.  This allows us to name the lock in the macro. 
 See, for instance, this example from cxgbe/adapter.h:

#define ADAPTER_LOCK(sc)mtx_lock(&(sc)->sc_lock)
#define ADAPTER_UNLOCK(sc)  mtx_unlock(&(sc)->sc_lock)
#define ADAPTER_LOCK_ASSERT_OWNED(sc)   mtx_assert(&(sc)->sc_lock, MA_OWNED)
#define ADAPTER_LOCK_ASSERT_NOTOWNED(sc) mtx_assert(&(sc)->sc_lock, MA_NOTOWNED)

You should move to this model and then update the patch.

REVISION DETAIL
  https://reviews.freebsd.org/D1707

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1692: sfxge: Change sfxge_ev_qpoll() proto to avoid EVQ pointers array access

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to add Approved by: gnn (mentor) to your commit message.

REVISION DETAIL
  https://reviews.freebsd.org/D1692

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1700: sfxge: fixed TSO code to cope with VLAN headers

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to add Approved by: gnn (mentor) to your commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1700

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1701: sfxge: Add evq argument to sfxge_tx_qcomplete()

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to add Approved by: gnn (mentor) to your commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1701

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1706: sfxge: implemented parameter to restrict RSS channels

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to add Approved by: gnn (mentor) to your commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1706

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1703: sfxge: Remove unused esm_size member of the efsys_mem_t structure

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to add Approved by: gnn (mentor) to your commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1703

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1702: sfxge: Do not bzero() DMA allocated memory once again

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to add Approved by: gnn (mentor) to your commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1702

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1704: sfxge: Pass correct address to free allocated memory in the case of load error

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to add Approved by: gnn (mentor) to your commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1704

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1705: sfxge: Use SFXGE_MODERATION to initialize event moderation

2015-01-29 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a comment.
This revision is now accepted and ready to land.

Remember to add Approved by: gnn to your commit message.

BRANCH
  /head

REVISION DETAIL
  https://reviews.freebsd.org/D1705

To: arybchik, gnn
Cc: freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: 10Gbit Interface Testing

2015-01-04 Thread George Neville-Neil

On 9 Dec 2014, at 16:10, Isaac (.ike) Levy wrote:


Hi All,

In our relatively small environment, I'm looking for pointers in 
testing 10Gbit network performance, for internet-facing connectivity.  
Our enviornment employs pairs of routers running FreeBSD- also 
utilizing PF, CARP, and PFSYNC.


We have 2 core problems testing the 10Gbit interfaces:

1) A lack of external options on the internet for testing.  We've 
found it non-trivial to adequately saturate a 10Gbit internet 
connection in 2014, (without having one or two more 10Gbit connections 
to saturate).  We simply don't have enough outside resources we 
control to saturate our lines for reasonable tests.


2) We've done our homework on testing, but would love any input from 
this audience about ways to measure any of these:

- PPS (easier)
- Maximum Socket Connections (easier)
- New Socket Connections per Second (harder!)
- Redline Throughput (easier)
- Ways to measure PF performance, (state handling, etc...)
- Ways to start measure/test ALTQ based shaping, as we experiment with 
it.


Thanks for any input!


Take a look at some of the scripts in my netperf project:

g...@github.com:gvnn3/netperf.git

That uses Conductor:

https://github.com/gvnn3/conductor

which is still under development,
but the packet generation is just pkt-gen on FreeBSD with netmap capable 
10G cards (Chelsio or Intel).


Bet,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Accepted] D1309: VIMAGE PF fixes #1

2015-01-04 Thread gnn (George Neville-Neil)
gnn accepted this revision.
gnn added a reviewer: gnn.
This revision is now accepted and ready to land.

REVISION DETAIL
  https://reviews.freebsd.org/D1309

To: rodrigc, bz, glebius, trociny, zec, np, melifaro, hrs, wollman, bryanv, 
rpaulo, adrian, gnn, hiren, rwatson
Cc: freebsd-virtualization, freebsd-pf, freebsd-net
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


[Differential] [Commented On] D1201: Allow UMA allocated memory to be freed when VNET jails are torn down.

2014-12-02 Thread gnn (George Neville-Neil)
gnn added a comment.

No objection from me.

REVISION DETAIL
  https://reviews.freebsd.org/D1201

To: rodrigc, alfredperlstein, melifaro, glebius, hrs, wollman, bryanv, rpaulo, 
adrian, bz, gnn, hiren, rwatson
Cc: freebsd-net, emaste, gnn, rwatson
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Enabling VIMAGE in GENERIC

2014-12-01 Thread George Neville-Neil

On 1 Dec 2014, at 23:12, Julian Elischer wrote:


On 12/2/14, 12:07 PM, George Neville-Neil wrote:

On 30 Nov 2014, at 5:04, Julian Elischer wrote:


On 11/29/14, 5:28 PM, Craig Rodrigues wrote:
On Mon, Nov 24, 2014 at 9:03 AM, Julian Elischer 
mailto:jul...@freebsd.org>> wrote:

>
>
> also look at the following: (a little dated)
>
> 
http://p4web.freebsd.org/@md=d&cd=//depot/projects/vimage/&cdf=//depot/projects/vimage/porting_to_vimage.txt&c=tO0@//depot/projects/vimage/porting_to_vimage.txt?ac=22



This is a useful document.  I put it on the wiki: 
https://wiki.freebsd.org/VIMAGE/porting-to-vimage


Thanks.. wow, did I actually know ALL that only 5 years ago?
Scary.  probbaly worth having someone who is currently active and up 
to date look at it to see if it's all still correct..

especially the module load/unload stuff.



--
Craig





On a slight tangent.  I ran VIMAGE kernels vs. non VIMAGE kernels for 
both a VANILLA kernel
and a PF kernel (PF on but no rules) as a quick smoke test today. The 
raw forwarding performance
was unchanged between kernels with and without VIMAGE on a 10G based 
system in the Sentex lab
(lion1).  I will be doing a bit more work in this area and will then 
put up some results in my

netperf github repo.

The tests are easy enough to run if you have 3 systems, and Conductor 
installed.  The source, sink

and dut config files are all there to be updated and tried.

Best,
George


the interesting benchmarks are if you have multiple sessions and 
spread them across multiple vimage jails, and compare that with the 
same number of sessions crowded onto a single machine..


lock contention goes down of course so things can actually get faster.


All in good time.

Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Enabling VIMAGE in GENERIC

2014-12-01 Thread George Neville-Neil

On 30 Nov 2014, at 5:04, Julian Elischer wrote:


On 11/29/14, 5:28 PM, Craig Rodrigues wrote:
On Mon, Nov 24, 2014 at 9:03 AM, Julian Elischer > wrote:

>
>
> also look at the following: (a little dated)
>
> 
http://p4web.freebsd.org/@md=d&cd=//depot/projects/vimage/&cdf=//depot/projects/vimage/porting_to_vimage.txt&c=tO0@//depot/projects/vimage/porting_to_vimage.txt?ac=22



This is a useful document.  I put it on the wiki: 
https://wiki.freebsd.org/VIMAGE/porting-to-vimage


Thanks.. wow, did I actually know ALL that only 5 years ago?
Scary.  probbaly worth having someone who is currently active and up 
to date look at it to see if it's all still correct..

especially the module load/unload stuff.



--
Craig





On a slight tangent.  I ran VIMAGE kernels vs. non VIMAGE kernels for 
both a VANILLA kernel
and a PF kernel (PF on but no rules) as a quick smoke test today.  The 
raw forwarding performance
was unchanged between kernels with and without VIMAGE on a 10G based 
system in the Sentex lab
(lion1).  I will be doing a bit more work in this area and will then put 
up some results in my

netperf github repo.

The tests are easy enough to run if you have 3 systems, and Conductor 
installed.  The source, sink

and dut config files are all there to be updated and tried.

Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: netmap in GENERIC, by default, on HEAD

2014-11-05 Thread George Neville-Neil
On 5 Nov 2014, at 9:00, Andrey V. Elsukov wrote:

> On 05.11.2014 19:18, Evandro Nunes wrote:
>> On Wed, Nov 5, 2014 at 1:52 PM, Andrey V. Elsukov  wrote:
>>
>>> On 05.11.2014 18:39, George Neville-Neil wrote:
>>>> Howdy,
>>>>
>>>> Last night (Pacific Time) I committed a change so that GENERIC, on HEAD
>>>> has the netmap
>>>> device enabled.  This is to increase the breadth of our testing of that
>>>> feature prior
>>>> to the release of FreeBSD 11.
>>>>
>>>> In two weeks I will enable IPSec by default, again in preparation for 11.
>>>
>>> Hi,
>>>
>>> recently we did some IP forwarding tests and the GENERIC kernel is
>>> several times faster than GENERIC+IPSEC. Even when IPSEC has no SA.
>>>
>>> I didn't do test on vanilla kernel, but our kernel is able forward
>>> IPv4/IPv6 on rate close to 8.6 Mpps. The same kernel compiled with IPSEC
>>> can forward only 180 kpps. I think this problem should be solved before
>>> enabling it in GENERIC.
>>>
>>
>> this forward rate you mention is related to netmap? or usual
>> forwarding/fastforwarding? this is a huge number, do you mind sharing your
>> dmesg output and top -PSH output so I can check for interrupt CPU usage and
>> other relevant stuff?
>
> This is patched kernel without netmap and fastforwarding. We removed all
> lock contention on the forwarding path to be sure that it doesn't affect
> IPSEC.
>
> Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz (2200.05-MHz K8-class CPU)
> FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
> FreeBSD/SMP: 2 package(s) x 8 core(s) x 2 SMT threads
> real memory  = 68736253952 (65552 MB)
> avail memory = 66370662400 (63295 MB)
> ix0: 
> port 0x7020-0x703f mem 0xde68-0xde6f,0xde704000-0xde707fff irq
> 32 at device 0.0 on pci4
> ix0: Using MSIX interrupts with 16 vectors
> ix0: Ethernet address: 90:e2:ba:0d:73:54
> ix0: PCI Express Bus: Speed 5.0GT/s Width x8
>
> This is IPv6 forwarding test - 6 /64 prefixes each has 200 random
> addresses. They are routed between 6 vlans.
>
> # netstat -I ix0 -w 1
>  input  (ix0)   output
> packets  errs idrops  bytespackets  errs  bytes colls
> 8917043 0 0  5714368648149880 0  522587200 0
> 8943391 0 0  5715983368179318 0  525085504 0
> 8928155 0 0  5712621448168254 0  522712192 0
> 8921342 0   937  5716935048128132 0  521997184 0
> 8924322 0 0  5711700488211500 0  520264320 0
> 8934564 0 0  5714835848180040 0  524475264 0
> 8937039 0 0  5713846408234779 0  525686080 0
> 8926528 0 0  5714817288160380 0  524265920 0
> 8923160 0 0  5713972488229839 0  522569408 0
> 8930070 0  1705  5715949448216092 0  528481152 0
> 8916249 0 0  5712947848184286 0  524399360 0
> 8937301 0 0  5713910408221895 0  526383744 0
> 8927967 0 0  5716133128164779 0  524997760 0
> 8936306 0 0  5712517128167960 0  519575744 0
> 8922983 0   306  5714305288216466 0  525893056 0
> 8916209 0 0  5714342408202692 0  526046336 0
> 8945608 0 0  5714266248265756 0  524815552 0
> 8925548 0  1045  571808229681 0  530935232 0
> 8932145 0 0  5717472008149710 0  523409536 0
> 8929339 0 0  5716832008186790 0  520719040 0
> 8917697 0 0  5715851528212635 0  525775680 0
>
> # top -PSH
> last pid:  2788;  load averages: 12.01,  4.76,  1.92
>
> up 0+00:04:38  20:58:48
> 471 processes: 45 running, 344 sleeping, 82 waiting
> CPU 0:   0.0% user,  0.0% nice, 21.6% system, 68.2% interrupt, 10.2% idle
> CPU 1:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 2:   0.0% user,  0.0% nice,  2.7% system, 84.3% interrupt, 12.9% idle
> CPU 3:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 4:   0.0% user,  0.0% nice,  3.9% system, 86.7% interrupt,  9.4% idle
> CPU 5:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 6:   0.0% user,  0.0% nice,  5.5% system, 88.6% interrupt,  5.9% idle
> CPU 7:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 8:   0.0% user,  0.0% nice,  3.5% system, 90.2% interrupt,  6.3% idle
> CPU 9:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 10:  0.0% user,  0.0% nice,  3.1% system, 87.1% interrupt,  9.8% idle
> CPU 11:  0.

Re: IPSEC in GENERIC [was: Re: netmap in GENERIC, by default, on HEAD]

2014-11-05 Thread George Neville-Neil

On 5 Nov 2014, at 9:20, Alexander V. Chernikov wrote:


On 05.11.2014 19:39, George Neville-Neil wrote:

Howdy,

Last night (Pacific Time) I committed a change so that GENERIC, on 
HEAD has the netmap
device enabled.  This is to increase the breadth of our testing of 
that feature prior

to the release of FreeBSD 11.

In two weeks I will enable IPSec by default, again in preparation for 
11.

Please don't.

While I love to be able to use IPSEC features on unmodified GENERIC 
kernel, simply enabling

IPSEC is not the best thing to do in terms of performance.

Current IPSEC locking model is pretty complicated and is not scalable 
enough.
It looks like it requires quite a lot of man-hours/testing to be 
reworked to achieve good performance and I'm not sure

if making it enabled by default will help that.

Current IPv4/IPv6 stack code with some locking modifications is able 
to forward 8-10MPPS on something like 2xE2660.
I'm in process of merging these modification in "proper" way to our 
HEAD, progress can be seen in projects/routing.
While rmlocked radix/lle (and ifa_ref / ifa_unref, and bunch of other) 
changes are not there yet, you can probably get
x2-x4 forwarding/output performance for at least IPv4 traffic (e.g. 
2-3mpps depending on test conditions).


In contrast, I haven't seen IPSEC being able to process more than 
200kpps for any kind of workload.


What we've discussed with glebius@ and jmg@ at EuroBSDCon was to 
modify PFIL to be able to monitor/enforce
hooks ordering and make IPSEC code usual pfil consumer. In that case, 
running GENERIC with IPSEC but w/o any SA
won't influence packet processing path.  This also simplifies the 
process of making IPSEC loadable.




How soon do you think the pfil patch would be ready?  That sounds like a 
good first step
towards making all this enabled in HEAD so that we can make sure that 
IPSec gets the broad

testing that it needs.

Also, if folks who know about these problems can send workloads and test 
descriptions to the

list that would be very helpful.

Specific drivers and hardware types would be most helpful as this may be 
device related

as well.

I will turn this on on some machines in the network test lab to see what 
I can see.


Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


netmap in GENERIC, by default, on HEAD

2014-11-05 Thread George Neville-Neil

Howdy,

Last night (Pacific Time) I committed a change so that GENERIC, on HEAD 
has the netmap
device enabled.  This is to increase the breadth of our testing of that 
feature prior

to the release of FreeBSD 11.

In two weeks I will enable IPSec by default, again in preparation for 
11.


Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [PATCH 4/4] sfxge: Support tunable to control Tx deferred packet list limits

2014-09-30 Thread George Neville-Neil

On 25 Sep 2014, at 9:15, Andrew Rybchenko wrote:


Support tunable to control Tx deferred packet list limits

Also increase default for Tx queue get-list limit.
Too small limit results in TCP packets drops especiall when many
streams are running simultaneously.
Put list may be kept small enought since it is just a temporary
location if transmit function can't get Tx queue lock.
The information contained in this message is confidential and is 
intended for the addressee(s) only. If you have received this message 
in error, please notify the sender immediately and delete the message. 
Unless you are an addressee (or authorized to receive for an 
addressee), you may not use, copy or disclose to anyone this message 
or any information contained in this message. The unauthorized use, 
disclosure, copying or alteration of this message is strictly 
prohibited.


[dpl_max.txt]


Howdy,

All four of the submitted patches have been committed to HEAD as of 
today.


Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Neworking Wiki Page updated...

2014-09-30 Thread George Neville-Neil

Howdy,

I've attempted to gather all the Networking TODOs from our various, 
scattered, wiki pages and concentrate
them all in one place.  Please review 
https://wiki.freebsd.org/Networking and if you need
update that page.  That is now the central page for all our networking 
todos and the like.


Best,
George
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00

2014-08-28 Thread George Neville-Neil
Adrian,

Can you put this into a Phabricator for review?

Lars,

How have you been testing this?

Best,
George


On 27 Aug 2014, at 4:01, Eggert, Lars wrote:

> Yep
>
> On 2014-8-27, at 9:53, Adrian Chadd  wrote:
>
>> Ok. Is it the same patch you sent out in Feb?
>>
>>
>> -a
>>
>>
>> On 27 August 2014 00:43, Eggert, Lars  wrote:
>>> Not as far as I know.
>>>
>>> Lars
>>>
>>> On 2014-8-27, at 9:39, Adrian Chadd  wrote:
>>>
>>>> Is there a PR for it?
>>>>
>>>>
>>>> -a
>>>>
>>>>
>>>> On 27 August 2014 00:23, Eggert, Lars  wrote:
>>>>> It would be great if people could also review Aris' PRR patch - RFC6937 
>>>>> has been out for a while.
>>>>>
>>>>> Lars
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2014-8-26, at 20:09, Adrian Chadd  wrote:
>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I'm going to merge Tom's work in a week unless someone gives me a
>>>>>> really good reason not to.
>>>>>>
>>>>>> I think there's been enough work and discussion about it since the
>>>>>> first post from Lars in Feburary and enough review opportunity.
>>>>>>
>>>>>>
>>>>>> -a
>>>>>>
>>>>>>
>>>>>> On 26 August 2014 07:55, Tom Jones  wrote:
>>>>>>> On Tue, Aug 26, 2014 at 02:43:49PM +, Eggert, Lars wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> the newcwv patch is probably stale now with Tom Jones' recent patch 
>>>>>>>> based on
>>>>>>>> a more up-to-date version of the Internet-Draft, but the PRR patch 
>>>>>>>> should
>>>>>>>> still be useful?
>>>>>>>
>>>>>>> My newcwv patch is much more up to date than Aris's, but it is slightly
>>>>>>> different in implementation. I have had a few suggestions from Adrian, 
>>>>>>> but he
>>>>>>> couldn't comment on how it relates to the tcp internals.
>>>>>>>
>>>>>>> There is a PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191520
>>>>>>>
>>>>>>> The biggest difference in structure between mine and Aris's patch is 
>>>>>>> the use of
>>>>>>> tcp timers. It would be good to hear if my approach or Aris's is 
>>>>>>> prefered.
>>>>>>>
>>>>>>>> On 2014-6-19, at 23:35, George Neville-Neil  
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 4 Feb 2014, at 1:38, Eggert, Lars wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> below are two patches that implement RFC6937 ("Proportional Rate 
>>>>>>>>>> Reduction for TCP") and draft-ietf-tcpm-newcwv-00 ("Updating TCP to 
>>>>>>>>>> support Rate-Limited Traffic"). They were done by Aris 
>>>>>>>>>> Angelogiannopoulos for his MS thesis, which is at 
>>>>>>>>>> https://eggert.org/students/angelogiannopoulos-thesis.pdf.
>>>>>>>>>>
>>>>>>>>>> The patches should apply to -CURRENT as of Sep 17, 2013. (Sorry for 
>>>>>>>>>> the delay in sending them, we'd been trying to get some feedback 
>>>>>>>>>> from committers first, without luck.)
>>>>>>>>>>
>>>>>>>>>> Please note that newcwv is still a work in progress in the IETF, and 
>>>>>>>>>> the patch has some limitations with regards to the "pipeACK Sampling 
>>>>>>>>>> Period" mentioned in the Internet-Draft. Aris says this in his 
>>>>>>>>>> thesis about what exactly he implemented:
>>>>>>>>>>
>>>>>>>>>> "The second implementation choice, is in regards with the 
>>>>>>>>>> measurement of pipeACK. This variable is the most important 
>>>>>>>>>> introduced by the method and is used to compute the phase that the 
>>>>>>>>>> sender currently lies in. In order to compute pipeACK the approach 
>>>>>>>>>> suggested by the Internet Draft (ID) is followed [ncwv]. During 
>>>>>>>>>> initialization, pipeACK is set to the maximum possible value. A 
>>>>>>>>>> helper variable prevHighACK is introduced that is initialized to the 
>>>>>>>>>> initial sequence number (iss). prevHighACK holds the value of the 
>>>>>>>>>> highest acknowledged byte so far. pipeACK is measured once per RTT 
>>>>>>>>>> meaning that when an ACK covering prevHighACK is received, pipeACK 
>>>>>>>>>> becomes the difference between the current ACK and prevHighACK. This 
>>>>>>>>>> is called a pipeACK sample.  A newer version of the draft suggests 
>>>>>>>>>> that multiple pipeACK samples can be used during the pipeACK 
>>>>>>>>>> sampling period."
>>>>>>>>>>
>>>>>>>>>> Lars
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [prr.patch]
>>>>>>>>>>
>>>>>>>>>> [newcwv.patch]
>>>>>>>>>
>>>>>>>>> Apologies for not looking at this as yet.  It is now closer to the 
>>>>>>>>> top of my list.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> George
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Tom
>>>>>>> @adventureloop
>>>>>>> adventurist.me
>>>>>>>
>>>>>>> :wq
>>>>>>> ___
>>>>>>> freebsd-net@freebsd.org mailing list
>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>>>>
>>>>>
>>>


signature.asc
Description: OpenPGP digital signature


Re: fastforward/routing: a 3 million packet-per-second system?

2014-07-27 Thread George Neville-Neil

On 22 Jul 2014, at 20:30, Adrian Chadd wrote:


hi!

You can use 'pmcstat -S CPU_CLK_UNHALTED_CORE -O pmc.out' (then ctrl-C
it after say 5 seconds), which will log the data to pmc.out;
then 'pmcannotate -k /boot/kernel pmc.out /boot/kernel/kernel' to find
out where the most cpu cycles are being spent.



Chiming in late, but don't you mean instruction-retired instead of 
CPU_CLK_UNHALTED_CORE?


Best,
George



It should give us the location(s) inside the top CPU users.

Hopefully it'll then be much more obvious!

I'm glad you're digging into it!

-a



On 22 July 2014 12:21, John Jasen  wrote:

Navdeep;

I was struck by spending so much time in transmit as well.

Adrian's suggestion on mining lock profiling gave me an excuse to up 
the
tx queues in /boot/loader.conf. Our prior conversations indicated 
that

up to 64 should be acceptable?





On 07/22/2014 03:10 PM, Adrian Chadd wrote:

Hi

Right. Time to figure out why you're spending so much time in
cxgbe_transmit() and t4_eth_tx(). Maybe ask Navdeep for some ideas?


-a

On 22 July 2014 12:07, John Jasen  wrote:
The first is pretty easy to turn around. Reading on dtrace now. 
Thanks

for the pointers and help!

PMC: [CPU_CLK_UNHALTED_CORE] Samples: 142654 (100.0%) , 124560 
unresolved


%SAMP IMAGE  FUNCTION CALLERS
34.0 if_cxgbe.k t4_eth_txcxgbe_transmit:24.0 
t4_tx_task:10.0

28.8 if_cxgbe.k cxgbe_transmit
7.6 if_cxgbe.k service_iq   t4_intr
6.4 if_cxgbe.k get_scatter_segment  service_iq
4.9 if_cxgbe.k reclaim_tx_descs t4_eth_tx
3.2 if_cxgbe.k write_sgl_to_txd t4_eth_tx
2.8 hwpmc.ko   pmclog_process_callc pmc_process_samples
2.1 libc.so.7  bcopypmclog_read
1.9 if_cxgbe.k t4_eth_rxservice_iq
1.7 hwpmc.ko   pmclog_reserve   pmclog_process_callchain
1.2 libpmc.so. pmclog_read
1.0 if_cxgbe.k write_txpkts_wr  t4_eth_tx
0.8 kernel e1000_read_i2c_byte_ e1000_set_i2c_bb
0.6 libc.so.7  memset
0.5 if_cxgbe.k refill_flservice_iq




On 07/22/2014 02:45 PM, Adrian Chadd wrote:

Hi,

Well, start with CPU profiling with pmc:

kldload hwpmc
pmcstat -S CPU_CLK_UNHALTED_CORE -T -w 1

.. look at the freebsd dtrace onliners (google that) for lock
contention and CPU usage.

You could compile with LOCK_PROFILING (which will slow things down 
a
little even when not in use) then enable it for a few seconds 
(which

will definitely slow things down) to gather fine grained lock
contention data.

(sysctl debug.lock.prof when you have it compiled in. sysctl
debug.lock.prof.enable=1; wait 10 seconds; sysctl
debug.lock.prof.enable=0; sysctl debug.lock.prof.stats)


-a


On 22 July 2014 11:42, John Jasen  wrote:

If you have ideas on what to instrument, I'll be happy to do it.

I'm faintly familiar with dtrace, et al, so it might take me a 
few tries

to get it right -- or bludgeoning with the documentation.

Thanks!




On 07/22/2014 02:07 PM, Adrian Chadd wrote:

Hi!

Well, what's missing is some dtrace/pmc/lockdebugging 
investigations

into the system to see where it's currently maxing out at.

I wonder if you're seeing contention on the transmit paths as 
drivers

queue frames from one set of driver threads/queues to another
potentially completely different set of driver transmit
threads/queues.




-a


On 22 July 2014 08:18, John Jasen  wrote:

Feedback and/or tips and tricks more than welcome.

Outstanding questions:

Would increasing the number of processor cores help?

Would a system where both processor QPI ports connect to each 
other

mitigate QPI bottlenecks?

Are there further performance optimizations I am missing?

Server Description:

The system in question is a Dell Poweredge R820, 16GB of RAM, 
and two

Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz.

Onboard, in a 16x PCIe slot, I have one Chelsio T-580-CR 
two-port 40GbE

NIC, and in an 8x slot, another T-580-CR dual port.

I am running FreeBSD 10.0-STABLE.

BIOS tweaks:

Hyperthreading (or Logical Processors) is turned off.
Memory Node Interleaving is turned off, but did not appear to 
impact

performance.

/boot/loader.conf contents:
#for CARP+PF testing
carp_load="YES"
#load cxgbe drivers.
cxgbe_load="YES"
#maxthreads appears to not exceed CPU.
net.isr.maxthreads=12
#bindthreads may be indicated when using cpuset(1) on 
interrupts

net.isr.bindthreads=1
#random guess based on googling
net.isr.maxqlimit=60480
net.link.ifqmaxlen=9
#discussions with cxgbe maintainer and list led me to trying 
this.

Allows more interrupts
#to be fixed to CPUs, which in some cases, improves interrupt 
balancing.

hw.cxgbe.ntxq10g=16
hw.cxgbe.nrxq10g=16

/etc/sysctl.conf contents:

#the following is also enabled by rc.conf gateway_enable.
net.inet.ip.fastforwarding=1
#recommendations from BSD router project
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0
#probably should be removed, as cxgbe does not seem to 
affect/be

affected by irq storm settings
hw.intr_storm_thresh

Re: A new way to test systems in multiple machine scenarios...

2014-07-27 Thread George Neville-Neil

On 6 Jul 2014, at 4:52, Garrett Cooper wrote:

On Jul 5, 2014, at 20:04, "George Neville-Neil" 
 wrote:


Hi,

I've coded up a system to allow you to control multiple other systems 
for use in testing.


https://github.com/gvnn3/conductor

It's BSD licensed, of course, and is only alpha quality but I'm using 
it in the test lab

to control hosts in forwarding tests.

I'll be updating the system frequently over the coming months as I 
build out more test scenarios,

add documentation and the like.

There are two main scripts, player, and conductor.  You run N 
players, one per machine, and
a single conductor.  The conductor controls the players by sending 
down phases which are
encoded in INI style configs.  There are a few, simple, samples in 
the config/ directory

of the project.

Best,
George

NOTE: Conductor MUST run as root to be useful.  Do NOT run on the 
open Internet.  It is meant

for private test labs.


I took a quick glance at the code -- have you considered using 
multiprocessing managers instead?


https://docs.python.org/2/library/multiprocessing.html#managers


I had not.  Thanks.  I'll give it a look over.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Call for PF and IPFW rulesets

2014-07-27 Thread George Neville-Neil

Howdy,

I am currently doing performance comparisons and related work on PF in 
FreeBSD.


While I can certainly hand craft a bunch of rulesets that should be 
equivalent on both
systems I'm putting out a call to those who might be willing to share 
some real world

rulesets with me and the rest of the community.

I'll be putting these into the netperf project I've started on GitHub, 
so, realize that these

will be public.  https://github.com/gvnn3/netperf
The rulesets will also be used in papers and other material.

Please email me off list if you have things you are willing to share.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: A new way to test systems in multiple machine scenarios...

2014-07-10 Thread George Neville-Neil

On 8 Jul 2014, at 4:56, Garrett Cooper wrote:

On Jul 6, 2014, at 9:06 PM, Craig Rodrigues  
wrote:


On Sat, Jul 5, 2014 at 8:04 PM, George Neville-Neil 


wrote:


Hi,

I've coded up a system to allow you to control multiple other 
systems for

use in testing.

https://github.com/gvnn3/conductor



Cool!  The architecture you have is similar to that of the SPECsfs
benchmark test (  http://www.spec.org/sfs2008/ )
which involves a "coordinator node" and multiple "client nodes" which
direct NFS network
traffic towards a System Under Test (SUT).  Garrett Cooper actually 
set up

the original testbed
that I am using now at iXsystems. :)

It would be cool to put together tools like Jenkins, Kyua, and 
conductor to

do more advanced testing
of FreeBSD before the project puts out releases.


Agreed. The only thing that I have some concern about is the 
reinventing of the wheel in python. multiprocessing Managers are one 
viable option that’s existed since python 2.6; there’s a learning 
curve though, and you’ll run into problems with pickling some 
objects because the pickle protocol is far from complete (example: 
http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma 
); you might run into this problems regardless because you’re 
serializing objects using pickle instead of using dill (or using a 
simpler serialization method like JSON). Fabric has a framework 
that’s nice to use if you have ssh capability. There are other 
frameworks that use twisted conch I think too (another library that 
implements ssh access).


Yes, I learned quite a bit about pickling in writing this.  Conductor 
aims to be quite simple so I am

hoping to avoid any crazy corner cases to do with pickling.



Isilon has a framework they use, but it’s very customized to their 
infrastructure and product assumptions and it’s in need of an 
overhaul :(.


This, actually, is the problem I found.  Lots of folks have partial 
solutions that are either proprietary,
internal, not read for prime time, not quite what we want, etc. etc.  I 
did get one private

response of another system to look at as well.

I basically did this as a stake in the ground around which to build 
something we could possibly move forwards
with.  It's not a 100% solution but it's 80% of the solution to the 
problem I run into 80% of the time.


Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

A new way to test systems in multiple machine scenarios...

2014-07-05 Thread George Neville-Neil

Hi,

I've coded up a system to allow you to control multiple other systems 
for use in testing.


https://github.com/gvnn3/conductor

It's BSD licensed, of course, and is only alpha quality but I'm using it 
in the test lab

to control hosts in forwarding tests.

I'll be updating the system frequently over the coming months as I build 
out more test scenarios,

add documentation and the like.

There are two main scripts, player, and conductor.  You run N players, 
one per machine, and
a single conductor.  The conductor controls the players by sending down 
phases which are
encoded in INI style configs.  There are a few, simple, samples in the 
config/ directory

of the project.

Best,
George

NOTE: Conductor MUST run as root to be useful.  Do NOT run on the open 
Internet.  It is meant

for private test labs.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00

2014-06-19 Thread George Neville-Neil
On 4 Feb 2014, at 1:38, Eggert, Lars wrote:

> Hi,
>
> below are two patches that implement RFC6937 ("Proportional Rate Reduction 
> for TCP") and draft-ietf-tcpm-newcwv-00 ("Updating TCP to support 
> Rate-Limited Traffic"). They were done by Aris Angelogiannopoulos for his MS 
> thesis, which is at https://eggert.org/students/angelogiannopoulos-thesis.pdf.
>
> The patches should apply to -CURRENT as of Sep 17, 2013. (Sorry for the delay 
> in sending them, we'd been trying to get some feedback from committers first, 
> without luck.)
>
> Please note that newcwv is still a work in progress in the IETF, and the 
> patch has some limitations with regards to the "pipeACK Sampling Period" 
> mentioned in the Internet-Draft. Aris says this in his thesis about what 
> exactly he implemented:
>
> "The second implementation choice, is in regards with the measurement of 
> pipeACK. This variable is the most important introduced by the method and is 
> used to compute the phase that the sender currently lies in. In order to 
> compute pipeACK the approach suggested by the Internet Draft (ID) is followed 
> [ncwv]. During initialization, pipeACK is set to the maximum possible value. 
> A helper variable prevHighACK is introduced that is initialized to the 
> initial sequence number (iss). prevHighACK holds the value of the highest 
> acknowledged byte so far. pipeACK is measured once per RTT meaning that when 
> an ACK covering prevHighACK is received, pipeACK becomes the difference 
> between the current ACK and prevHighACK. This is called a pipeACK sample.  A 
> newer version of the draft suggests that multiple pipeACK samples can be used 
> during the pipeACK sampling period."
>
> Lars
>
>
> [prr.patch]
>
> [newcwv.patch]

Apologies for not looking at this as yet.  It is now closer to the top of my 
list.

Best,
George


signature.asc
Description: OpenPGP digital signature


Re: Allowing CARP to use arbitrary OUI prefix and allocating block from FreeBSD's OUI space assignment for that

2014-05-26 Thread George Neville-Neil
On 21 May 2014, at 10:58, Eygene Ryabinkin wrote:

> Mon, May 12, 2014 at 12:39:49AM +0400, Eygene Ryabinkin wrote:
>> Sun, May 11, 2014 at 04:30:32PM -0400, George Neville-Neil wrote:
>>> On May 8, 2014, at 16:04 , Gleb Smirnoff  wrote:
>>>> On Thu, May 08, 2014 at 12:10:48PM +0400, Eygene Ryabinkin wrote:
>>>> E>  - I'll do a patch for carp(4) that will allow it to use configurable
>>>> E>OUI from a sysctl knob (first 5 bytes of OUI);
>>>>
>>>> Please no sysctl knobs. This should be configurable via ifconfig(8)
>>>> per vhid.
>>>
>>> Agree, please do this via ifconfig.
>>
>> http://codelabs.ru/fbsd/carp-ouibase.diff
>
> Updated the patch, URL remains the same:
> http://codelabs.ru/fbsd/carp-ouibase.diff
>
> Changes:
>
> - full MAC is settable via ether/lladdr/link keyword, no
>  ouibase keyword now exists;
>
> - these keywords accept "carp" and "vrrp" keywords making
>  them to set new and old bases with the last octet set to
>  the VHID;
>
> - network.subr was updated not to mess with any keywords that
>  go after 'vhid' and just pass it down to ifconfig as is.
>
> I did two days of testing and hadn't yet found any problems.

Hi,

One note on CARP and VRRP moving forwards.  I realize that this patch is partly
a CARP specific issue but a lot of people on a few mailing lists seem to have 
assumed
that VRRP is no longer under patent threat.

According to the FreeBSD Foundation's lawyers there are two extant patents, due
to expire in 2017, 3 years hence.  

CISCO – US6108300: Method and apparatus for transparently providing a failover 
network device 
IBM:  US6148410: Fault tolerant recoverable TCP/IP connection router, Baskey, 
Dillenberger, Goldszmidt, Hunt, Levy-Abegnoli, Nick,Schmidt, November 14, 2000. 
  

So let's not go assuming that VRRP is "free" just yet.

As to updating our CARP to behave correctly, that's still on the table.

Best,
George


signature.asc
Description: OpenPGP digital signature


Re: [patch] [1/6] sfxge: fix mbuf leak if it does not fit in software queue

2014-03-18 Thread George Neville-Neil

On Mar 15, 2014, at 9:13 , Andrew Rybchenko  
wrote:

> 
> <1-sfxge-leak.patch>

Committed this morning.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: socket receive buffer size & window updates

2014-03-18 Thread George Neville-Neil

On Mar 11, 2014, at 13:58 , Vijay Singh  wrote:

> The socket option handler currently doesn't prevent connecting or connected
> sockets from changing their receive buffer sizes. In particular, I ran into
> a an application that sets the receive buffer size lower than what it
> originally was.
> 
> In tcp_output(), if no data is being sent, there is code that is trying to
> decide if a window update is needed.
> 
> If the socket receive buffer size was reduced after a larger window was
> already advertized, or perhaps even when there is data in the receive
> buffer, it seems to me that the computation in 592 could go -ve, and be
> interpreted as a large window update.
> 
> I was led to this issue after observing in packet traces that duplicate
> FINs were being sent on close. I tracked it down to this check. Should this
> be changed to a check like such?
> 

Interesting.  Do you have a bit of test code?

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Include port number in "Listen queue overflow" messages

2014-03-18 Thread George Neville-Neil

On Mar 7, 2014, at 1:23 , hiren panchasara  wrote:

> I am thinking of committing following change that includes port number
> in "Listen queue overflow" messages.
> 

I like it.

Best,
George

> New message would look something like:
> sonewconn: pcb 0xf8001b155760: Listen queue overflow on port
> 13120: 1 already in queue awaiting acceptance (454 occurrences)
> 
> I've recently ran into a situation at $work where I could not catch
> the culprit application via "netstat -A" and had to dive into kgdb to
> find the port from pcb where this application was listening to.
> 
> IMO, this change will make debugging easier.
> 
> cheers,
> Hiren
> 
> Index: sys/kern/uipc_socket.c
> ===
> --- sys/kern/uipc_socket.c  (revision 262861)
> +++ sys/kern/uipc_socket.c  (working copy)
> @@ -136,6 +136,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include 
> 
> @@ -491,8 +492,11 @@
>static int overcount;
> 
>struct socket *so;
> +   struct inpcb *inp;
>int over;
> 
> +   inp = sotoinpcb(head);
> +
>ACCEPT_LOCK();
>over = (head->so_qlen > 3 * head->so_qlimit / 2);
>ACCEPT_UNLOCK();
> @@ -504,10 +508,12 @@
>overcount++;
> 
>if (ratecheck(&lastover, &overinterval)) {
> -   log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow: "
> -   "%i already in queue awaiting acceptance "
> +   log(LOG_DEBUG, "%s: pcb %p: Listen queue overflow on "
> +   "port %d: %i already in queue awaiting acceptance 
> "
>"(%d occurrences)\n",
> -   __func__, head->so_pcb, head->so_qlen, overcount);
> +   __func__, head->so_pcb,
> +   ntohs(inp->inp_inc.inc_lport), head->so_qlen,
> +   overcount);
> 
>overcount = 0;
>}
> ___
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: DCTCP for FreeBSD

2014-03-18 Thread George Neville-Neil

On Feb 19, 2014, at 4:18 , Eggert, Lars  wrote:

> Hi,
> 
> Midori Kato has implemented Microsoft's/Stanford's Datacenter TCP (DCTCP) for 
> FreeBSD as part of her MS thesis with me. Find a patch attached.
> 

Thanks!  Any hints on how best to test this code?

Best,
George

> Also note that we're documenting a specification for DCTCP in an IETF draft: 
> http://tools.ietf.org/html/draft-bensley-tcpm-dctcp
> 
> Microsoft has made a licensing statement (RAND-Z) on the technology to the 
> IETF: https://datatracker.ietf.org/ipr/2319/ (I'm not sure what this means 
> for an eventual inclusion in FreeBSD.)
> 
> Roughly, Midori's patch consists of an extension of the modular congestion 
> control framework to expose ECN information to the modules, a module to 
> implement DCTCP, and a few experimental variants. See Midori's explanation:
> 
>> [1] A change for the modular congestion control framework (See Section 4.1 
>> if needed)
>> DCTCP uses the difference ECN processing from RFC3168. We need to prepare 
>> three functions to do the following ECN processing. 
>> a) The kernel decides whether an ECE flag should be set in the next outgoing 
>> TCP segment by snooping reserved bits in IP and TCP headers. (tcp_input.c)
>> b) The kernel controls a congestion if an ECE flag is set in an arriving TCP 
>> segment. (tcp_input.c)
>> c) After the outgoing TCP segment is generated, the kernel decides whether 
>> an ECT bit should be set in an ECN field of IP header in the outgoing 
>> packet. (tcp_output.c)
>> The current framework has no housekeeping functions for (a) and (b). 
>> Therefore, I add two functions into the moduler cc framework: 
>> ecnpkt_handler() and ect_handler().
>> 
>> - ecnpkt_handler() allows the kernel to do the additional ECN processing by 
>> snooping ECN field in IP and TCP headers. As an option, this function takes 
>> a flag, which tells whether this function is in the delayed ACK. This 
>> function returns an integer value. When the return value is set, the kernel 
>> force to disable delayed ACK.
>> - ect_handler() allows the kernel to use different rule from RFC3168 in 
>> terms of an ECT marking in the outgoing segment. This function returns an 
>> integer value. If the value is set, an ECT bit is set to the outgoing 
>> segment.
>> 
>> 
>> [2] Five changes from the original DCTCP algorithm
>> In order to reflect the DCTCP motivation, I modified the following 
>> processing. First four modifications are for senders and the last 
>> modification is for receivers.
>> 
>> (1) no congestion recovery in the receipt of ECE flags (See section 4.2.1 if 
>> needed)
>> FreeBSD handles ECN as a congestion event but it's not true for DCTCP 
>> senders. A DCTCP sender uses ECN as a means to understand the extent of 
>> congestions. Therefore, I remove congestion recovery mode in any situation 
>> for DCTCP senders.
>> 
>> (2) selective initial alpha value (See section 4.2.2 if needed) 
>> DCTCP defines alpha as a parameter to see the depth of a congestion. When 
>> the alpha value is large, it allows a saw-toothed CWND behavior to a DCTCP 
>> sender.
>> A problem is that the alpha value is not reliable during a dozen of RTTs 
>> because there is no way to identify the depth of a congestion over a network 
>> from the beginning. When considering the alpha reliability, I think the 
>> initial alpha should be selective for applications by users. When a user 
>> chooses DCTCP for latency-sensitive applications, the initial alpha is 
>> preferred. Otherwise, DCTCP senders had better to set the initial alpha 
>> value to zero from my experimental result (See section 7.2 of attaching 
>> file).
>> The default alpha value is set to zero in my implementation.
>> 
>> (3) alpha value initialization after an idle period (See section 4.2.3 if 
>> needed)
>> How long an idle period is no longer predictable. Therefore, for a DCTCP 
>> sender, using the out-dated alpha after an idle period is not good idea. A 
>> DCTCP sender resets alpha to the initial value when an idle period occurs.
>> 
>> The following changes is applied to eliminate a compatibility issue to 
>> standard ECN defined in RFC3465. DCTCP and standard ECN servers have no way 
>> to identify which mechanism is working on the peer. Thus, we need to 
>> eliminate the worst situation in a network mixing DCTCP senders/receivers 
>> and standard ECN senders/receivers.
>> (4) using CWR flag when the ECE flag is found for a RTT (See section 5.1 if 
>> needed)
>> This change is applied for a situation when a sender uses DCTCP and a 
>> reciever uses standard ECN. 
>> Under the situation, I find that a DCTCP sender minimizes CWND. The detailed 
>> technical reason is described in section 4.2 of my attaching file. 
>> Fortunately, the current tcp_input()  function complement this change, thus, 
>> there is no modification in my patch.
>> 
>> (5) enabling delayed ACK in the receipt of the CWR flag (See section 5.2 if 
>> needed)
>> This change is applied for a s

Re: [PATCH 1/6] sfxge: fix mbuf leak if it does not fit in software queue

2014-03-18 Thread George Neville-Neil

On Mar 18, 2014, at 7:44 , Adrian Chadd  wrote:

> Hi!
> 
> Who's the solarflare driver maintainer?
> 
> Ah, there isn't one. The closest is Ben Hutchings .
> 
> I can commit these if no-one else is willing but I don't have any
> solarflare hardware to test it on.
> 
> 

I am taking care of this.

I’ll be testing and commit these patches.  There is hardware inbound to Sentex.

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Multiqueue support for bpf

2013-09-28 Thread George Neville-Neil
Bump

Has anyone else reviewed this code?  I have looked it over but not run it.  
Visually it looks fine to me.

Best,
George

On Sep 4, 2013, at 4:04 , Takuya ASADA  wrote:

> Hi,
> 
> This is 2nd version of multiqueue bpf patch, I think I fixed things what
> you commented on previous mail.
> Here's a change list of the patch:
> 
> - Drop added functions on struct
> ifnet(if_get_[rt]xqueue_len/if_get_[rt]xqueue_affinity).
> HW queue number and queue affinity informations are maybe useful for some
> applications, but it's not really directly related to multiqueue bpf. I
> think we should discuss them separately.
> 
> - Use BITSET for queue mask.
> It seems better to use BITSET for queue mask structure, instead of boolean
> array.
> 
> - Drop tcpdump/libpcap changes.
> It also should discuss separately.
> 
> - M_QUEUEID/IFCAP_QUEUEID
> M_QUEUEID is the flag for mbuf which contains hw queue id.
> IFCAP_QUEUEID is the flag which means the driver has ability to set queue
> id on mbuf.
> 
> 
> 
> 2013/7/3 Luigi Rizzo 
> 
>> 
>> 
>> 
>> On Tue, Jul 2, 2013 at 5:56 PM, Takuya ASADA  wrote:
>> 
>>> Hi,
>>> 
>>> Do you have an updated URL for the diffs ? The link below from your
 original message
 seems not working now (NXDOMAIN)
 
 http://www.dokukino.com/mq_bpf_20110813.diff
 
>>> 
>>> Changes with recent head is on my repository:
>>> http://svnweb.freebsd.org/base/user/syuu/mq_bpf/
>>> And I attached a diff file on this mail.
>>> 
>>> 
>> thanks for the diffs (the URL to the repo is useful too,
>> but a URL to generate diffs is more convenient for reviewing changes).
>> 
>> I believe it still needs a bit of work before being merged.
>> 
>> My comments (in order of the patch):
>> 
>> === ifnet.9 (and related code in if.c, sockio.h) ===
>> - if_get_rxqueue_len()/if_get_rxqueue_len() is not a good name,
>>  as to me at least it suggests that it returns the size of the
>>  individual queue, rather than the number of queues.
>> 
>> - cpu affinity (in userspace) is a bitmask, whereas in the BSD kernel
>>  we almost never use the term "affinity", and favour "couid" or "oncpu"
>>  (i.e. an individual CPU id).
>>  I think you should either rename if_get_txqueue_affinity(), or make
>>  the return type a cpuset (which seems more sensible as the return
>>  value is passed to userspace)
>> 
>> === bpf.4 (and related code) ===
>> 
>> - the ioctl() to attach/detach/report queues attached to a specific
>>  bpf descriptor talk about "mask bit" but neither the internal nor
>>  the external implementation deal with bits.
>>  I'd rather document those ioctl as "attaching queue to file descriptor".
>> 
>> - the BPF ioctl names are generally inconsistent (using either S or SET
>>  and G or GET for the setter and getter methods).
>>  But you should pick one of the patterns and stick with it,
>>  not introduce a third variant (GT/ST).
>>  Given we are in 2013 we might go for the long form GET and SET
>>  so i suggest the following (spaces for clarity)
>> 
>> +#define BIOC ENA QMASK _IO('B', 133)
>> +#define BIOC DIS QMASK _IO('B', 134)
>> +#define BIOC SET RXQMASK _IOWR('B', 135, uint32_t)
>> +#define BIOC CLR RXQMASK _IOWR('B', 136, uint32_t)
>> +#define BIOC GET RXQMASK _IOR('B', 137, uint32_t)
>> +#define BIOC SET TXQMASK _IOWR('B', 138, uint32_t)
>> +#define BIOC CLR TXQMASK _IOWR('B', 139, uint32_t)
>> +#define BIOC GET TXQMASK _IOR('B', 140, uint32_t)
>> +#define BIOC SET OTHERMASK _IO('B', 141)
>> +#define BIOC CLR OTHERMASK _IO('B', 142)
>> +#define BIOC GET OTHERMASK _IOR('B', 143, uint32_t)
>> 
>>  Also related: the existing ioctls() use u_int as argumnts, rather
>>  than uint32_t. I personally prefer the uint32_t form, but you
>>  should at least add a comment to indicate that the choice is
>>  deliberate.
>> 
>> === if.c ===
>> 
>> 
>> - you have a KASSERT to report if ifp->if_get_*xqueue_affinity() is not
>>  set, but i'd rather run the function only if is set, so you can
>>  have a multiqueue interface which does not bind queues to specific cores
>>  (which i am not sure is always a great idea; too many processes
>>  statically bound to the same queue mean you lose opportunity to
>>  parallelize work.)
>> 
>> === mbuf.h ===
>> 
>> as mentioned earlier, the modification to struct mbuf should
>> be avoided if possible at all. It seems that you need just one
>> direction bit (which maybe is available already from the context)
>> and one queue identifier, which in the rx path, at least in your
>> implementation is always a replica of the 'flowid' field.
>> Can you see if perhaps the flowid field can be (ab)used on the
>> tx path as well ?
>> 
>> 
>> === if.h ===
>> 
>> - in if.h, can you use individual variables instead of arrays
>>  for  ifr_queue_affinity_index and friends ?
>>  The macros to map the fields of ifr_ifru one
>>  level up are a necessary evil,
>>  but there is no point in using the arrays.
>> 
>>  - SIOCGIFTXQAFFINITY seems to use the receive function (copy&paste typo)
>>   talks about
>> 

Re: Network stack changes

2013-09-20 Thread George Neville-Neil

On Sep 19, 2013, at 16:08 , Luigi Rizzo  wrote:

> On Thu, Sep 19, 2013 at 03:54:34PM -0400, George Neville-Neil wrote:
>> 
>> On Sep 14, 2013, at 15:24 , Luigi Rizzo  wrote:
>> 
>>> 
>>> 
>>> On Saturday, September 14, 2013, Olivier Cochard-Labb?  
>>> wrote:
>>>> On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo  wrote:
>>>>> 
>>>>> IXIA ? For the timescales we need to address we don't need an IXIA,
>>>>> a netmap sender is more than enough
>>>>> 
>>>> 
>>>> The great netmap generates only one IP flow (same src/dst IP and same
>>>> src/dst port).
>>> 
>>> True the sample app generates only one flow but it is trivial to modify it 
>>> to generate multiple flows. My point was, we have the ability to generate 
>>> high rate traffic, as long as we do tolerate a .1-1us jitter. Beyond that, 
>>> you do need some ixia-like solution.
>>> 
>> 
>> On the bandwidth side, can a modern sender with netmap really do a full 10G? 
>>  I hate the cost of an
>> IXIA but I have not been able to destroy our stack as effectively with 
>> anything else.
> 
> yes george, you can download the picobsd image
> 
> http://info.iet.unipi.it/~luigi/netmap/20120618-netmap-picobsd-head-amd64.bin
> 
> and try for yourself.
> 
> Granted this does not have all the knobs of an ixia but it can
> surely blast the full 14.88 Mpps to the link, and it only takes a
> bit of userspace programming to generate reasonably arbitrary streams
> of packets. A netmap sender/receiver is not CPU bound even with 1 core.
> 

Interesting.  It's on my todo.

Best,
George




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Network stack changes

2013-09-19 Thread George Neville-Neil

On Sep 14, 2013, at 15:24 , Luigi Rizzo  wrote:

> On Saturday, September 14, 2013, Olivier Cochard-Labbé 
> wrote:
>> On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo  wrote:
>>> 
>>> IXIA ? For the timescales we need to address we don't need an IXIA,
>>> a netmap sender is more than enough
>>> 
>> 
>> The great netmap generates only one IP flow (same src/dst IP and same
>> src/dst port).
> 
> True the sample app generates only one flow but it is trivial to modify it
> to generate multiple flows. My point was, we have the ability to generate
> high rate traffic, as long as we do tolerate a .1-1us jitter. Beyond that,
> you do need some ixia-like solution.
> 
On the bandwidth side, can a modern sender with netmap really do a full 10G?  I 
hate the cost of an
IXIA but I have not been able to destroy our stack as effectively with anything 
else.

Best,
George



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Network stack changes

2013-09-19 Thread George Neville-Neil

On Sep 14, 2013, at 15:24 , Luigi Rizzo  wrote:

> 
> 
> On Saturday, September 14, 2013, Olivier Cochard-Labbé  
> wrote:
> > On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo  wrote:
> >>
> >> IXIA ? For the timescales we need to address we don't need an IXIA,
> >> a netmap sender is more than enough
> >>
> >
> > The great netmap generates only one IP flow (same src/dst IP and same
> > src/dst port).
> 
> True the sample app generates only one flow but it is trivial to modify it to 
> generate multiple flows. My point was, we have the ability to generate high 
> rate traffic, as long as we do tolerate a .1-1us jitter. Beyond that, you do 
> need some ixia-like solution.
> 

On the bandwidth side, can a modern sender with netmap really do a full 10G?  I 
hate the cost of an
IXIA but I have not been able to destroy our stack as effectively with anything 
else.

Best,
George


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Network stack changes

2013-09-13 Thread George Neville-Neil

On Aug 29, 2013, at 7:49 , Adrian Chadd  wrote:

> Hi,
> 
> There's a lot of good stuff to review here, thanks!
> 
> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to keep
> locking things like that on a per-packet basis. We should be able to do
> this in a cleaner way - we can defer RX into a CPU pinned taskqueue and
> convert the interrupt handler to a fast handler that just schedules that
> taskqueue. We can ignore the ithread entirely here.
> 
> What do you think?
> 
> Totally pie in the sky handwaving at this point:
> 
> * create an array of mbuf pointers for completed mbufs;
> * populate the mbuf array;
> * pass the array up to ether_demux().
> 
> For vlan handling, it may end up populating its own list of mbufs to push
> up to ether_demux(). So maybe we should extend the API to have a bitmap of
> packets to actually handle from the array, so we can pass up a larger array
> of mbufs, note which ones are for the destination and then the upcall can
> mark which frames its consumed.
> 
> I specifically wonder how much work/benefit we may see by doing:
> 
> * batching packets into lists so various steps can batch process things
> rather than run to completion;
> * batching the processing of a list of frames under a single lock instance
> - eg, if the forwarding code could do the forwarding lookup for 'n' packets
> under a single lock, then pass that list of frames up to inet_pfil_hook()
> to do the work under one lock, etc, etc.
> 
> Here, the processing would look less like "grab lock and process to
> completion" and more like "mark and sweep" - ie, we have a list of frames
> that we mark as needing processing and mark as having been processed at
> each layer, so we know where to next dispatch them.
> 

One quick note here.  Every time you increase batching you may increase 
bandwidth
but you will also increase per packet latency for the last packet in a batch.
That is fine so long as we remember that and that this is a tuning knob
to balance the two.

> I still have some tool coding to do with PMC before I even think about
> tinkering with this as I'd like to measure stuff like per-packet latency as
> well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P /
> lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.)
> 

This would be very useful in identifying the actual hot spots, and would be 
helpful
to anyone who can generate a decent stream of packets with, say, an IXIA.

Best,
George




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: DTrace network providers

2013-08-22 Thread George Neville-Neil

On Aug 21, 2013, at 1:00 , Mark Johnston  wrote:

> Hello!
> 
> I've ported the ip, tcp and udp DTrace providers to FreeBSD, following
> the Solaris documentation here:
> 
> https://wikis.oracle.com/display/DTrace/ip+Provider
> https://wikis.oracle.com/display/DTrace/tcp+Provider
> https://wikis.oracle.com/display/DTrace/udp+Provider
> 
> My implementation of these providers makes use of dynamic translators,
> for which FreeBSD support was added in r254468; this patch won't compile
> with earlier revisions. The use of dynamic translators means that
> existing DTrace scripts which use these providers will just work when run
> on FreeBSD - no modifications needed. In particular, all of the examples
> in the links above will work properly on FreeBSD with my diff.
> 
> I've collected a bunch of example scripts for these providers and placed
> them here:
> 
> http://people.freebsd.org/~markj/dtrace/network-providers/
> 
> To run one you just need to execute "dtrace -s 

Re: how to get mac address info in kernel code?

2013-03-05 Thread George Neville-Neil

On Mar 5, 2013, at 08:54 , h bagade  wrote:

> Hi all,
> 
> I need to get interface MAC address within the kernel code and I couldn't
> use "getifaddrs" because it's user-mode. How can I have the MAC address
> information within kernel code?
> 
> Any hints or comments are really appreciated.

If you have access to the struct ifnet you can look at the if_addr member, 
which is
a struct ifaddr, defined in if_var.h .

Best,
George




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: A question about SYN cookies...

2013-02-07 Thread George Neville-Neil

On Feb 4, 2013, at 04:09 , Andre Oppermann  wrote:

> On 04.02.2013 01:09, George Neville-Neil wrote:
>> Howdy,
>> 
>> I've been reviewing the SYN cache and SYN cookie code and I'm wondering why 
>> we do all the work
>> of generating a SYN cache entry before sending a SYN cookie.  If the point 
>> of SYN cookies is to
>> defend against a SYN flood then, to my mind, the SYN/ACK for the cookie case 
>> should be sent off before
>> doing all the work to try to create and insert a cache entry.  Has anyone, 
>> as yet, looked at a way
>> to move the sending code earlier into syncache_add() and checked to see if 
>> there is a performance
>> improvement when a system is flooded with SYN packets?
> 
> So far all syncookie implementations have an information loss because
> they can't store all state in the cookie unless timestamps are enabled.
> Apparently Windows 8 still doesn't enable timestamps but does quite a
> bit of window scaling leading to problems.  See recent bug report here
> on net@.
> 

Yes, I heard about that off list and then got time to review the mailbox.

> For generating syncookies we have three possible strategies:
> 
> 1/ Use syncache and cookies in parallel and bump the oldest syncache
>entry replacing it with the new SYN attempt.  Syncookies are done
>on all SYN-ACK's going out.
> 
> 2/ Fill the syncache but do not bump the oldest entry, other than normal
>expiry.  All further SYN-ACK's are syncookies-only (w/o window scaling
>etc).  Those in the syncache do not need to carry syncookies and are
>real full SYN-ACK's.
> 
> 3/ Only send syncookies and do not cache anything.  No window scaling
>and SACK-PERM can be carried though.
> 
> So far we've been doing option 1.  We can switch to option 2 which, depending
> on the situation, may be better or worse. Options 3 isn't viable currently
> due to loss of window scaling and SACK.
> 
> Based on the recent Windows 8 issue I've devised a different HMAC based
> syncookie scheme where all necessary information can be stored in the ISS
> forgoing the need for the timestamp bits.  I have sent a description of
> the scheme to Colin and Nate to have it reviewed.  It must be 
> cryptographically
> strong enough to withstand cracking attempts for about 30 seconds.  Forward
> security isn't necessary as the syncookie secrets are completely random and
> renewed every 30 seconds.

I'll wait for Colin and Nates' evaluation of your scheme to weigh in, though
given the limited key space already in place I do wonder how you got that much
information into a 32 bit int.

Thanks,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

2013-02-07 Thread George Neville-Neil

On Feb 6, 2013, at 12:28 , Alfred Perlstein  wrote:

> On 2/6/13 4:46 AM, John Baldwin wrote:
>> On Wednesday, February 06, 2013 6:27:04 am Randall Stewart wrote:
>>> John:
>>> 
>>> A burst at line rate will *often* cause drops. This is because
>>> router queues are at a finite size. Also such a burst (especially
>>> on a long delay bandwidth network) cause your RTT to increase even
>>> if there is no drop which is going to hurt you as well.
>>> 
>>> A SHOULD in an RFC says you really really really really need to do it
>>> unless there is some thing that makes you willing to override it. It is
>>> slight wiggle room.
>>> 
>>> In this I agree with Andre, we should not be *not* doing it. Otherwise
>>> folks will be turning this on and it is plain wrong. It may be fine
>>> for your network but I would not want to see it in FreeBSD.
>>> 
>>> In my testing here at home I have put back into our stack max-burst. This
>>> uses Mark Allman's version (not Kacheong Poon's) where you clamp the cwnd at
>>> no more than 4 packets larger than your flight. All of my testing
>>> high-bw-delay or lan has shown this to improve TCP performance. This
>>> is because it helps you avoid bursting out so many packets that you overflow
>>> a queue.
>>> 
>>> In your long-delay bw link if you do burst out too many (and you never
>>> know how many that is since you can not predict how full all those
>>> MPLS queues are or how big they are) you will really hurt yourself even 
>>> worse.
>>> Note that generally in Cisco routers the default queue size is somewhere 
>>> between
>>> 100-300 packets depending on the router.
>> Due to the way our application works this never happens, but I am fine with
>> just keeping this patch private.  If there are other shops that need this 
>> they
>> can always dig the patch up from the archives.
>> 
> This is yet another time when I'm sad about how things happen in FreeBSD.
> 
> A developer come forward with a non-default option that's very useful for 
> some specific workloads, specifically one that contributes much time and $$$ 
> to the project and the community rejects the patches even though it's been 
> successful in other OSes.
> 
> It makes zero sense.
> 
> John, can you repost the patch?  Maybe there is a way to refactor this 
> somehow so it's like accept filters where we can plug in a hook for TCP?
> 
> I am very disappointed, but not surprised.
> 

I take away the complete opposite feeling.  This is how we work through these 
issues.
It's clear from the discussion that this need not be a default in the system,
and is a special case.  We had a reasoned discussion of what would be best to do
and at least two experts in TCP weighed in on the effect this change might have.

Not everything proposed by a developer need go into the tree, in particular 
since these
discussions are archived we can always revisit this later.

This is exactly how collaborative development should look, whether or not the 
patch
is integrated now, next week, next year, or ever.

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Make kernel aware of NIC queues

2013-02-06 Thread George Neville-Neil

On Feb 6, 2013, at 09:37 , Luigi Rizzo  wrote:

> On Wed, Feb 06, 2013 at 06:19:27PM +0400, Alexander V. Chernikov wrote:
>> Hello list!
>> 
>> Today more and more NICs are capable of splitting traffic to different 
>> Rx/TX rings permitting OS to dispatch this traffic on different CPU 
>> cores. However, there are some problems that arises from using multi-nic 
>> (or even singe multi-port NIC) configurations:
> ...
>> I propose implementing common API to permit drivers:
>> * read user-supplied number of queues/other queue options (e.g:
>> * notify kernel of each RX/TX queue being created/destroyed
>> * make binding queues to cores via given API
>> * Export data to userland (for example, via sysctl) to permit users:
>> a) quickly see current configuration
>> b) change CPU binding on-fly
>> c) change flowid numbers on-fly (with the possibility to set 1) 
>> NIC-supplied hash 2) manually supplied value 3) disable setting M_FLOWID)
>> 
>> Having common interface will help users to make network stack tuning 
>> easier and puts us one step further to make (probably userland) AI which 
>> can auto-tune system according to template ("router", "webserver") and 
>> rc.conf configuration (lagg presense, etc..)
>> 
>> 
>> What do you guys think?
> 
> this is certainly a good idea and a welcome one.
> 
> Linux has tried to come up with a common framework to implement
> this kind of controls using "ethtool", and we should probably
> have a look at their approach and reuse it (or at least the good ideas)
> to avoid reinventing the same thing.
> 
And, though Luigi didn't say it, I will, this should integrate with netmap.

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


A question about SYN cookies...

2013-02-03 Thread George Neville-Neil
Howdy,

I've been reviewing the SYN cache and SYN cookie code and I'm wondering why we 
do all the work
of generating a SYN cache entry before sending a SYN cookie.  If the point of 
SYN cookies is to
defend against a SYN flood then, to my mind, the SYN/ACK for the cookie case 
should be sent off before
doing all the work to try to create and insert a cache entry.  Has anyone, as 
yet, looked at a way
to move the sending code earlier into syncache_add() and checked to see if 
there is a performance
improvement when a system is flooded with SYN packets?

Best,
George



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type

2013-01-21 Thread George Neville-Neil
The following reply was made to PR kern/172113; it has been noted by GNATS.

From: George Neville-Neil 
To: John Baldwin 
Cc: bug-follo...@freebsd.org,
 egrosb...@rdtc.ru,
 j...@freebsd.org
Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in 
igb(4): m_getjcl: invalid cluster type
Date: Mon, 21 Jan 2013 14:25:00 -0500

 On Jan 19, 2013, at 23:26 , John Baldwin  wrote:
 
 > I was able to finally reproduce this panic today.  It seems to require
 > a server configured for PXE but that receives no DHCP reply (and
 > possibly with the requisite SuperMicro X8 board).  I was able to
 > prevent the panic with a subset of the referenced patch by only adding
 > the 'if_drv_flags & IFF_DRV_RUNNING' check to the start of
 > igb_msix_que().  The rest of the patch was unnecessary.  I also added
 > some debugging to print out the ICR, EICR, IMS, and EIMS registers in
 > this case.  It does look like the hardware is sending an interrupt =
 that
 > is not enabled in the interrupt mask (specifically LSC).  In fact, the
 > 82576 datasheet specifically mentions masking LSC until initialization
 > is complete to avoid spurious interrupts during boot and AFAICT igb(4)
 > does this since e1000_reset_hw() clears the interrupt mask via writes
 > to IMC and doesn't re-enable interrupts until igb_init_locked() is
 > invoked via 'ifconfig up'.  Here is my debug output:
 >=20
 > SMP: AP CPU #6 Launched!
 > SMP: AP CPU #4 Launched!
 > stray irq0
 > igb0: interrupt on que 0: icr 0x104 eicr 0
 > ims 0 eims 0x8000
 >=20
 > Hmmm.   Nothing clears EIMS.  After some more debugging, I determined
 > that e1000_reset_hw() always turns this bit in EIMS on, even if it is
 > off before e1000_reset_hw() is called(!).  I added explicit calls to
 > igb_disable_intr() to clear EIMS after each call to e1000_reset_hw().
 > This removes the 'stray irq0', but I still get a spurious interrupt
 > during boot (albeit with eims 0).  I can use the IFF_DRV_RUNNING hack
 > for now, but I think the real fix is something else.
 >=20
 
 I think Jack will have to chime in on this one.  Do you think it's all =
 SM X8 boards
 or just the one we happen to have?  I wonder if Jack or Jeffrey (the =
 testing guy he works
 with) have access to the right board.
 
 Best,
 George
 
 
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Dropping TCP options from retransmitted SYNs considered harmful

2012-10-15 Thread George Neville-Neil

On Oct 15, 2012, at 09:08 , John Baldwin  wrote:

> On Friday, October 12, 2012 12:13:11 pm John Baldwin wrote:
>> Back in 2001 FreeBSD added a hack to strip TCP options from retransmitted 
>> SYNs 
>> starting with the 3rd SYN in this block in tcp_timer.c:
>> 
>>  /*
>>   * Disable rfc1323 if we haven't got any response to
>>   * our third SYN to work-around some broken terminal servers
>>   * (most of which have hopefully been retired) that have bad VJ
>>   * header compression code which trashes TCP segments containing
>>   * unknown-to-them TCP options.
>>   */
>>  if ((tp->t_state == TCPS_SYN_SENT) && (tp->t_rxtshift == 3))
>>  tp->t_flags &= ~(TF_REQ_SCALE|TF_REQ_TSTMP);
>> 
>> There is even a PR for the original bug report: kern/1689
>> 
>> However, there is an unintended consequence of this change that can be 
>> disastrous.  Specifically, suppose you have a FreeBSD client connecting to a 
>> server, and that the SYNs are arriving at the server successfully, but the 
>> first few return SYN/ACKs are dropped.  Eventually a SYN/ACK makes it 
>> through 
>> and the connection is established.
>> 
>> The server (based on the first SYN it saw) believes it has negotiated window 
>> scaling with the client.  The client, however, has broken what it promised 
>> in 
>> that first SYN and believes it is not using any window scaling at all.  This 
>> causes two forms of breakage:
>> 
>> 1) When the server advertises a scaled window (e.g. '8' for a 64k window
>>scaled at 13), the client thinks it is an unscaled window ('8') and
>>sends data to the server very slowly.
>> 
>> 2) When the client advertises an unscaled window (e.g. '65535' for a 64k
>>window), the server thinks it has a huge window (65535 << 13 == 511MB)
>>to send into.
>> 
>> I'm not sure that 2) is a problem per se, but I have definitely seen 
>> instances 
>> of 1) (and examined the 'struct tcpcb' in kgdb on both the server and client 
>> end of the connections to verify they disagreed on the scaling).
>> 
>> The original motivation of this change is to work around broken terminal 
>> servers that were old when this change was added in 2001.  Over 10 years 
>> later 
>> I think we should at least have an option to turn this work-around off, and 
>> possibly disable it by default.
>> 
>> Thoughts?
> 
> How about this:
> 
> Index: tcp_timer.c
> ===
> --- tcp_timer.c   (revision 241579)
> +++ tcp_timer.c   (working copy)
> @@ -118,6 +118,11 @@ SYSCTL_INT(_net_inet_tcp, OID_AUTO, keepcnt, CTLFL
>   /* max idle probes */
> int   tcp_maxpersistidle;
> 
> +static int   tcp_rexmit_drop_options = 0;
> +SYSCTL_INT(_net_inet_tcp, OID_AUTO, rexmit_drop_options, CTLFLAG_RW,
> +&tcp_rexmit_drop_options, 0,
> +"Drop TCP options from 3rd and later retransmitted SYN");
> +
> static intper_cpu_timers = 0;
> SYSCTL_INT(_net_inet_tcp, OID_AUTO, per_cpu_timers, CTLFLAG_RW,
> &per_cpu_timers , 0, "run tcp timers on all cpus");
> @@ -578,7 +583,8 @@ tcp_timer_rexmt(void * xtp)
>* header compression code which trashes TCP segments containing
>* unknown-to-them TCP options.
>*/
> - if ((tp->t_state == TCPS_SYN_SENT) && (tp->t_rxtshift == 3))
> + if (tcp_rexmit_drop_options && (tp->t_state == TCPS_SYN_SENT) &&
> + (tp->t_rxtshift == 3))
>   tp->t_flags &= ~(TF_REQ_SCALE|TF_REQ_TSTMP);
>   /*
>* If we backed off this far, our srtt estimate is probably bogus.
> 
> Any other suggestions on the sysctl name?

The name's fine.  Commit that sucker and turn it off.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Dropping TCP options from retransmitted SYNs considered harmful

2012-10-12 Thread George Neville-Neil

On Oct 12, 2012, at 12:13 , John Baldwin  wrote:

> Back in 2001 FreeBSD added a hack to strip TCP options from retransmitted 
> SYNs 
> starting with the 3rd SYN in this block in tcp_timer.c:
> 
>   /*
>* Disable rfc1323 if we haven't got any response to
>* our third SYN to work-around some broken terminal servers
>* (most of which have hopefully been retired) that have bad VJ
>* header compression code which trashes TCP segments containing
>* unknown-to-them TCP options.
>*/
>   if ((tp->t_state == TCPS_SYN_SENT) && (tp->t_rxtshift == 3))
>   tp->t_flags &= ~(TF_REQ_SCALE|TF_REQ_TSTMP);
> 
> There is even a PR for the original bug report: kern/1689
> 
> However, there is an unintended consequence of this change that can be 
> disastrous.  Specifically, suppose you have a FreeBSD client connecting to a 
> server, and that the SYNs are arriving at the server successfully, but the 
> first few return SYN/ACKs are dropped.  Eventually a SYN/ACK makes it through 
> and the connection is established.
> 
> The server (based on the first SYN it saw) believes it has negotiated window 
> scaling with the client.  The client, however, has broken what it promised in 
> that first SYN and believes it is not using any window scaling at all.  This 
> causes two forms of breakage:
> 
> 1) When the server advertises a scaled window (e.g. '8' for a 64k window
>scaled at 13), the client thinks it is an unscaled window ('8') and
>sends data to the server very slowly.
> 
> 2) When the client advertises an unscaled window (e.g. '65535' for a 64k
>window), the server thinks it has a huge window (65535 << 13 == 511MB)
>to send into.
> 
> I'm not sure that 2) is a problem per se, but I have definitely seen 
> instances 
> of 1) (and examined the 'struct tcpcb' in kgdb on both the server and client 
> end of the connections to verify they disagreed on the scaling).
> 
> The original motivation of this change is to work around broken terminal 
> servers that were old when this change was added in 2001.  Over 10 years 
> later 
> I think we should at least have an option to turn this work-around off, and 
> possibly disable it by default.
> 
> Thoughts?
> 

I'm all for taking that code out.

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


SFF 8472, aka what optic is plugged into my network card...

2012-10-10 Thread George Neville-Neil
Howdy,

Modern 10G NICs have the ability to have different types of cables plugged into 
them.
There is a standard for the values contained in the cables which can be read 
here:

http://www.10gtek.com/templates/wzten/pdf/SFF-8472-(Diagnostic%20Monitoring%20Interface).pdf

It's probably the case that several NIC cards support changeable optics on 
FreeBSD at this point.
I know that Chelsio T4 cards do.  I would like to put proper definitions from 
this document
into our source code.  Does anyone have a preference for where and how this is 
done?

I figure something like net/sff8472.h would be the way to go.

Comments?

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Proposal for changes to network device drivers and network stack (RFC)

2012-09-05 Thread George Neville-Neil
One more note.  Can you break the patches down into more bite sized pieces?  
They're hard
to review as is.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Proposal for changes to network device drivers and network stack (RFC)

2012-09-05 Thread George Neville-Neil

On Aug 25, 2012, at 00:11 , Anuranjan Shukla  wrote:

> At Juniper Networks, we've been using FreeBSD to build JUNOS (Juniper's
> network operating system). So far the additions and changes to the
> functionality were made inline, making the task of upgrading to new
> versions of FreeBSD progressively difficult. We've been looking at JUNOS
> to see if we can build it off of a clean FreeBSD base rather than making
> changes to the OS inline. As part of that work, we've come up with a few
> expansive change proposals to FreeBSD kernel that will make this task
> possible for us, and hopefully also contribute something of interest to
> the community. If the community is in agreement with these, we'd like to
> contribute/commit them to FreeBSD.
> 
> This is a proposal and an RFC. The actual nomenclature is open to ideas
> (naming etc). From Juniper, Marcel (mar...@freebsd.org) will be attending
> the upcoming DevSummit at Cambridge. He's indicated that interested folks
> are welcome to chat with him about this stuff during the summit.
> 

Hello Anu,

Yes, a bunch of this was discussed at the DevSummit, and I think there is
already some agreement about these proposals, at least there was
in the room, now to get some on net@.

> The changes we propose are (the code/diffs etc are indicated
> at the end of this email):
> 
> - Network Device Drivers
> - Building FreeBSD kernel without network stack, network stack as a module
> - Changes to mbuf and socket structures (minor member additions)
> 
> Network Device Drivers:
> ---
> As we indicated during DevSummit 2012, JUNOS extended the interface
> functionality in a big way to support logical interfaces, interface
> hierarchies and scaling in general. Not surprisingly this resulted in
> changing the drivers to use our custom interface structure(s). A simple
> way to resolve this without impacting the rest of the large codebase is to
> avoid directly accessing (get/set) the ifnet structure from the drivers.
> Using get/set functions to update the functionality would make the driver
> more 'flexible' for the network stack to work with in situations where the
> stack wants to extend the interface functionality.
> 
> For eg,
> 
> em_start_locked(struct ifnet *ifp, struct tx_ring *txr)
> {
> -   struct adapter  *adapter = ifp->if_softc;
> +   struct adapter  *adapter = if_getsoftc(ifp);
>struct mbuf *m_head;
> 
>EM_TX_LOCK_ASSERT(txr);
> 
> -   if ((ifp->if_drv_flags & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) !=
> +   if ((if_getdrvflags(ifp) & (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) !=
>IFF_DRV_RUNNING)
>return;
> 
>if (!adapter->link_active)
>return;
> 
> -   while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
> +   while (!if_sendq_empty(ifp)) {
>/* Call cleanup if number of TX descriptors low */
>if (txr->tx_avail <= EM_TX_CLEANUP_THRESHOLD)
>em_txeof(txr);
>if (txr->tx_avail < EM_MAX_SCATTER) {
> -   ifp->if_drv_flags |= IFF_DRV_OACTIVE;
> +   if_setdrvflagbits(ifp,IFF_DRV_OACTIVE, 0);
>break;
>}
> -IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
> +   m_head = if_dequeue(ifp);
>if (m_head == NULL)
>break;
>/*
> @@ -1010,7 +1009,7 @@ em_start_locked(struct ifnet *ifp, struct tx_ring
>if (em_xmit(txr, &m_head)) {
>if (m_head == NULL)
>break;
> -   IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
> +   if_sendq_prepend(ifp, m_head);
>break;
> 
> This allows Juniper to have its own interface structure(s) instead of
> ifnet, and still be able to use the driver without modification. Since the
> notion of ifnet is abstracted away, other users can also find this useful
> in plugging in functionality without having muck around in the driver code.
> 
> The ifnet split/restructuring was discussed in DevSummit at BSDCan in May
> 2012. This change can also aid in that work.
> 
> This change can be applied to drivers in a phased way. Clearly, it won't
> have any impact on drivers that haven't been changed. At Juniper we're
> planning on converting em,fxp and tsec. Are there any strong feelings on
> whether the phase-wise change is ok or not?
> 
> 

I think these changes might be aided by something that bz@ and I talked
about after the network session, to wit, we could stand to abstract a good
deal of code away from the drivers, and up into the ethernet and other
modules.  I think trying to reduce the amount of code in the drivers,
much of which is common code, is the way for us to go.  As part
of that we can start to use the functions that you mention.

> Building FreeBSD without the network stack (network stack as a module)
> -

Re: Interface MTU question...

2012-07-12 Thread George Neville-Neil

On Jul 12, 2012, at 14:28 , Doug Barton wrote:

> While y'all are looking at MTU (which is an increasingly important topic
> as we move into a Gig+ world) I'm wondering what our support is for
> https://tools.ietf.org/html/rfc4821 ?? I asked this a while back and
> never got an answer.
> 
> This method of PMTUD is really important given the massive (stupid)
> brokenness of people routinely blocking all of ICMPv4.

We do not support that RFC and support in other OSs is quite limited.
It does not seem to be have been taken up by the Internet community
in general.

That being said, it is an interesting mechanism.  Probably not a bad idea
for a wish list item.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Interface MTU question...

2012-07-12 Thread George Neville-Neil

On Jul 12, 2012, at 12:55 , Jason Hellenthal wrote:

> 
> 
> On Thu, Jul 12, 2012 at 10:55:16AM -0400, George Neville-Neil wrote:
>> 
>> On Jul 11, 2012, at 17:57 , Navdeep Parhar wrote:
>> 
>>> On 07/11/12 14:30, g...@freebsd.org wrote:
>>>> Howdy,
>>>> 
>>>> Does anyone know the reason for this particular check in
>>>> ip_output.c?
>>>> 
>>>>if (rte != NULL && (rte->rt_flags & (RTF_UP|RTF_HOST))) {
>>>>/*
>>>> * This case can happen if the user changed the MTU
>>>> * of an interface after enabling IP on it.  Because
>>>> * most netifs don't keep track of routes pointing to
>>>> * them, there is no way for one to update all its
>>>> * routes when the MTU is changed.
>>>> */
>>>>if (rte->rt_rmx.rmx_mtu > ifp->if_mtu)
>>>>rte->rt_rmx.rmx_mtu = ifp->if_mtu;
>>>>mtu = rte->rt_rmx.rmx_mtu;
>>>>} else {
>>>>mtu = ifp->if_mtu;
>>>>}
>>>> 
>>>> To my mind the > ought to be != so that any change, up or down, of the
>>>> interface MTU is eventually reflected in the route.  Also, this code
>>>> does not check if it is both a HOST route and UP, but only if it is
>>>> one other the other, so don't be fooled by that, this check happens
>>>> for any route we have if it's up.
>>> 
>>> I believe rmx_mtu could be low due to some intermediate node between this 
>>> host and the final destination.  An increase in the MTU of the local 
>>> interface should not increase the path MTU if the limit was due to someone 
>>> else along the route.
>> 
>> Yes, it turns out to be complex.  We have several places that store the MTU. 
>>  There is the interface,
>> which knows the MTU of the directly connected link, a route, and the host 
>> cache.  All three of these
>> are used to determine the maximum segment size (MSS) of a TCP packet.  The 
>> route and the interface
>> determine the maximum MTU that the MSS can have, but, if there is an entry 
>> in the host cache
>> then it is preferred over either of the first two.  See tcp_update_mss() in 
>> tcp_input.c to
>> see what I'm talking about.
>> 
>> I believe that the quoted code above has been wrong from the day it was 
>> written, in that what it
>> really says is "if the route is up" and not "if the route is up and is a 
>> host route" which is
>> what I believe people to read that as.  If the belief is that this code is 
>> really only there for
>> hosts routes, then the proper fix is to make the sense of the first if match 
>> that belief
>> and, again, to change the > to != so that when the administrator of the box 
>> bumps the MTU in
>> either direction that the route reflects this.  It is not possible for PMTU 
>> on a single link
>> to a host route to bump the number down if the interface says it's not to be 
>> bumped.  And,
>> even so, any host cache entry will override and avoid this code.
>> 
> 
> Something else to look into ... 
> 
> # ifconfig lagg0 mtu 1492
> ifconfig: ioctl (set mtu): Invalid argument
> 
> This is on stable/8 r238264 when the interface was up/up and down/down
> 
> Also attempted on the member interfaces dc0 and dc1
> 

Can you file a bug on that one?

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Interface MTU question...

2012-07-12 Thread George Neville-Neil

On Jul 11, 2012, at 17:57 , Navdeep Parhar wrote:

> On 07/11/12 14:30, g...@freebsd.org wrote:
>> Howdy,
>> 
>> Does anyone know the reason for this particular check in
>> ip_output.c?
>> 
>>  if (rte != NULL && (rte->rt_flags & (RTF_UP|RTF_HOST))) {
>>  /*
>>   * This case can happen if the user changed the MTU
>>   * of an interface after enabling IP on it.  Because
>>   * most netifs don't keep track of routes pointing to
>>   * them, there is no way for one to update all its
>>   * routes when the MTU is changed.
>>   */
>>  if (rte->rt_rmx.rmx_mtu > ifp->if_mtu)
>>  rte->rt_rmx.rmx_mtu = ifp->if_mtu;
>>  mtu = rte->rt_rmx.rmx_mtu;
>>  } else {
>>  mtu = ifp->if_mtu;
>>  }
>> 
>> To my mind the > ought to be != so that any change, up or down, of the
>> interface MTU is eventually reflected in the route.  Also, this code
>> does not check if it is both a HOST route and UP, but only if it is
>> one other the other, so don't be fooled by that, this check happens
>> for any route we have if it's up.
> 
> I believe rmx_mtu could be low due to some intermediate node between this 
> host and the final destination.  An increase in the MTU of the local 
> interface should not increase the path MTU if the limit was due to someone 
> else along the route.

Yes, it turns out to be complex.  We have several places that store the MTU.  
There is the interface,
which knows the MTU of the directly connected link, a route, and the host 
cache.  All three of these
are used to determine the maximum segment size (MSS) of a TCP packet.  The 
route and the interface
determine the maximum MTU that the MSS can have, but, if there is an entry in 
the host cache
then it is preferred over either of the first two.  See tcp_update_mss() in 
tcp_input.c to
see what I'm talking about.

I believe that the quoted code above has been wrong from the day it was 
written, in that what it
really says is "if the route is up" and not "if the route is up and is a host 
route" which is
what I believe people to read that as.  If the belief is that this code is 
really only there for
hosts routes, then the proper fix is to make the sense of the first if match 
that belief
and, again, to change the > to != so that when the administrator of the box 
bumps the MTU in
either direction that the route reflects this.  It is not possible for PMTU on 
a single link
to a host route to bump the number down if the interface says it's not to be 
bumped.  And,
even so, any host cache entry will override and avoid this code.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Anyone working on RFC 4821, TCP Packetization Layer Path MTU Discovery?

2012-07-10 Thread George Neville-Neil
Howdy,

I just got an off list question about this RFC and haven't had time to dive 
into it,
but wondered if anyone had already looked at this in the context of our TCP 
stack.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: TCP Regression Test Suite

2012-07-08 Thread George Neville-Neil

On Jul 7, 2012, at 23:31 , Julian Stecklina wrote:

> Hello,
> 
> do you know of a TCP regression test suite with IPv6 support?
> 

Alas, not a good one.

Best
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Recent research results?

2012-06-24 Thread George Neville-Neil
Howdy,

One of the things I'm hoping to do for the foundation is to start periodically 
publishing a list
of papers that reference or relate to FreeBSD.  Basically a set of research 
highlights.  I'll
be asking this same question on a few of our public mailing lists but I'm going 
to start here.
If you know of recent (< 1 year) results published using FreeBSD please email 
me off list.  I 
do track Luigi's and Lawrence Stewart's work, but I am sure there is plenty out 
there
that we ought to be talking about.  For the purpose of this discussion 
"results" include:

1) Papers in journals or proceedings
2) Tech Reports (usually these are issued by companies and labs)
3) RFCs (sure, these are OS neutral but if someone here was involved that 
counts)

Again, please email me off list.

Thanks,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: seq# of RST in tcp_dropwithreset

2012-06-07 Thread George Neville-Neil

On Mar 27, 2012, at 18:13 , Navdeep Parhar wrote:

> When the kernel decides to respond with a RST to an incoming TCP
> segment, it uses its ack# (if valid) as the seq# of the RST.  See this
> in tcp_dropwithreset:
> 
>   if (th->th_flags & TH_ACK) {
>   tcp_respond(tp, mtod(m, void *), th, m, (tcp_seq)0,
>   th->th_ack, TH_RST);
>   } else {
>   if (th->th_flags & TH_SYN)
>   tlen++;
>   tcp_respond(tp, mtod(m, void *), th, m, th->th_seq+tlen,
>   (tcp_seq)0, TH_RST|TH_ACK);
>   }
> 
> This can have some unexpected results.  I observed this on a link with
> a very high delay (B is FreeBSD, A could be anything).
> 
> 1. There is a segment in flight from A to B.  The ack# is X (all tx
> from B to A is up to date and acknowledged).
> 2. socket is closed on B.  B sends a FIN with seq# X.
> 3. The segment from A arrives and elicits a RST from B.  The seq# of
> this RST will again be X.  A receives the FIN and then the RST with
> identical sequence numbers.  The situation resolves itself eventually,
> when A retransmits and the retransmitted segment ACKs the FIN too and
> so the next time around B sends a RST with the "correct" seq# (one
> after the FIN).
> 
> If there is a local tcpcb for the connection with state >=
> ESTABLISHED, wouldn't it be more accurate to use its snd_max as the
> seq# of the RST?
> 

Hi Navdeep,

Sorry I missed this so many months ago, but jhb@ was kind enough to point this
query out to me.  My understanding of correct operation in this case, is that 
we 
do not want to move the sequence number until we have received the ACK of our
FIN, as any other value would indicate to the TCP on A that we have received 
their
ACK of our FIN, which, in this case, we have not.  The fact that there isn't a 
better
way to indicate the error is a tad annoying, but, and others can correct me if 
they think
I'm wrong, this is the correct way for the stacks to come to eventual agreement
on the closing of the connection.

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Call for Papers Symposium on Architectures for Networking and Communications Systems

2012-05-05 Thread George Neville-Neil
a peer-reviewed workshop
> are allowed, if the authors clearly describe what significant new content
> has been included.
> 
> All submissions will be treated as confidential prior to publication 
> in the proceedings; rejected submissions will be permanently treated as
> confidential.
> 
> POSTER SESSION:
> 
> ANCS 2012 will include a poster session. Submission deadlines and guidelines 
> will be announced at a later date on the conference web site 
> <http://www.ancsconf.org/>.
> 
> GENERAL CHAIR:
>  Tilman Wolf,   University of Massachusetts, USA
> 
> PROGRAM CHAIRS:
>  Andrew W. Moore,   University of Cambridge, UK
>  Viktor Prasanna,   University of Southern California, USA
> 
> TECHNICAL PROGRAM COMMITTEE:
>  Michela Becchi,University of Missouri
>  Gordon Brebner,Xilinx Labs
>  Greg Byrd, North Carolina State University
>  Qunfeng Dong,  University of Science and Technology of China
>  Hans Eberle,   Oracle Labs
>  Holger Fröning,University of Heidelberg
>  Euan Harris,   Arista Networks
>  Hoang Le,  ISC8
>  Jun Li,Tsinghua University
>  Bill Lin,  University of California, San Diego
>  Derek McAuley, Nottingham University
>  Kieran Mansley,Solarflare
>  David Meyer,   Cisco Systems
>  Mario Nemirovsky,  Barcelona Supercomputing Center
>  George Neville-Neil,   FreeBSD Foundation
>  Luigi Rizzo,   Universita` di Pisa
>  Tom Rodeheffer,Microsoft Research
>  Dimitrios Serpanos,ISI/RC ATHENA & University of Patras
>  Frederico Silla,   Universitat Politècnica de València
>  Satnam Singh,  Google
>  Ripduman Sohan,University of Cambridge
>  Russ Tessier,  University of Massachusetts Amherst
>  Ola Torudbakken,   Oracle Systems
>  Fang Yu,   Microsoft Research
> 
> STEERING COMMITTEE:
>  Laxmi Bhuyan,  University of California, Riverside
>  H. Jonathan Chao,  NYU-Poly
>  Patrick Crowley,   Washington University
>  Mark Franklin, Washington University
>  Derek McAuley, University of Nottingham
>  Nick McKeown,  Stanford University
>  Bill Lin,  University of California, San Diego
>  Peter Z. Onufryk,  Integrated Device Technology, Inc.
>  K. K. Ramakrishnan,AT&T Labs Research
>  Raj Yavatkar,  Intel
> 
> FINANCE CHAIR:
>  Michela Becchi,University of Missouri, Columbia, USA
> 
> LOCAL ARRANGEMENTS CHAIR:
>  EJ Kim,Texas A&M University, USA
> 
> STUDENT TRAVEL GRANT CHAIR:
>  Michela Becchi,University of Missouri, Columbia, USA
> 
> WEB CHAIR:
>  Xinming Chen,  University of Massachusetts, USA
> 
> CONTACT INFORMATION:
> 
> For updated details, please see <http://www.ancsconf.org>
> 
> 
> 
> 
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: more network performance info: ether_output()

2012-05-01 Thread George Neville-Neil

On May 1, 2012, at 11:40 , Luigi Rizzo wrote:

> On Tue, May 01, 2012 at 10:27:42AM -0400, George Neville-Neil wrote:
>> 
>> On Apr 20, 2012, at 15:03 , Luigi Rizzo wrote:
>> 
>>> Continuing my profiling on network performance, another place
>>> were we waste a lot of time is if_ethersubr.c::ether_output()
>>> 
>>> In particular, from the beginning of ether_output() to the
>>> final call to ether_output_frame() the code takes slightly
>>> more than 210ns on my i7-870 CPU running at 2.93 GHz + TurboBoost.
>>> In particular:
>>> 
>>> - the route does not have a MAC address (lle) attached, which causes
>>> arpresolve() to be called all the times. This consumes about 100ns.
>>> It happens also with locally sourced TCP.
>>> Using the flowtable cuts this time down to about 30-40ns
>>> 
>>> - another 100ns is spend to copy the MAC header into the mbuf,
>>> and then check whether a local copy should be looped back.
>>> Unfortunately the code here is a bit convoluted so the
>>> header fields are copied twice, and using memcpy on the
>>> individual pieces.
>>> 
>>> Note that all the above happens not just with my udp flooding
>>> tests, but also with regular TCP traffic.
>> 
>> Hi Luigi,
>> 
>> I'm really glad you're working on this.  I may have missed this in a thread
>> but are you tracking these somewhere so we can pick them up and fix them?
>> 
>> Also, how are you doing the measurements.
> 
> The measurements are done with tools/tools/netrate/netsend and
> kernel patches to return from sendto() at various places in the
> stack (from the syscall entry point down to the device driver).
> A patch is attached. You don't really need netmap to run it,
> it was just a convenient place to put the variables.
> 
> I am not sure how much we can "fix", there are multiple expensive
> functions on the tx path, and probably also on the rx path.
> 
> My hope at least for the tx path is that we can find out a way to install a
> "fastpath" handler in the socket.
> When there is no handler installed (e.g. on the first packet or
> unsupported protocols/interfaces) everything works as usual. Then
> when the packet reaches the bottom of the stack, we try to update
> the socket with a copy of the headers generated in the process, and
> the name of the fastpath function to be called.  Next transmissions
> will then be able to shortcut the stack and go straight to the
> device output routine.
> 
> I don't have data on the receive path or good ideas on how to proceed -- the
> advantage of the tx path is that traffic is implicitly classified,
> whereas it might not be the case for incoming traffic, and classification
> might be the expensive step.
> 
> Hopefully we'll have time to discuss this next week in ottawa.

Yes, I think we should.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: more network performance info: ether_output()

2012-05-01 Thread George Neville-Neil

On Apr 20, 2012, at 15:03 , Luigi Rizzo wrote:

> Continuing my profiling on network performance, another place
> were we waste a lot of time is if_ethersubr.c::ether_output()
> 
> In particular, from the beginning of ether_output() to the
> final call to ether_output_frame() the code takes slightly
> more than 210ns on my i7-870 CPU running at 2.93 GHz + TurboBoost.
> In particular:
> 
> - the route does not have a MAC address (lle) attached, which causes
>  arpresolve() to be called all the times. This consumes about 100ns.
>  It happens also with locally sourced TCP.
>  Using the flowtable cuts this time down to about 30-40ns
> 
> - another 100ns is spend to copy the MAC header into the mbuf,
>  and then check whether a local copy should be looped back.
>  Unfortunately the code here is a bit convoluted so the
>  header fields are copied twice, and using memcpy on the
>  individual pieces.
> 
> Note that all the above happens not just with my udp flooding
> tests, but also with regular TCP traffic.

Hi Luigi,

I'm really glad you're working on this.  I may have missed this in a thread
but are you tracking these somewhere so we can pick them up and fix them?

Also, how are you doing the measurements.

Sorry, if these have been answered before.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Question about fixing udp6_input...

2012-04-19 Thread George Neville-Neil
Howdy,

At the moment the prototype for udp6_input() is the following:

int
udp6_input(struct mbuf **mp, int *offp, int proto)

and udp_input() looks like this:

void
udp_input(struct mbuf *m, int off)

As far as I can tell we immediately change **mp to *m and *offp to off
in udp6_input() and we also never use proto in the rest of the function.

Is there any reason to not make udp6_input() look exactly like udp_input() ?

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Patch to enable our tcpdump to handle CARP

2011-10-19 Thread George Neville-Neil
Howdy,

I've been trying to debug CARP problems of late. I noticed that our tcpdump 
didn't have CARP
support.  I took and fixed some code from OpenBSD so that our tcpdump can work 
with 
CARP.  Unlike OpenBSD you have to specify -T carp to read carp packets.  In 
their version
you specify -T VRRP, because they don't like VRRP.  I decided that we should go 
with
what most of the industry cares about rather than what OpenBSD cares about.

Patch is here: http://people.freebsd.org/~gnn/tcpdump-carp.diff

Technical comments welcome.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Fwd: [e2e] Nationwide 100Gbps testbed available to researchers

2011-09-27 Thread George Neville-Neil
Somehow we ought to get involved in this.

Best,
George


Begin forwarded message:

> From: Brian Tierney 
> Subject: [e2e] Nationwide 100Gbps testbed available to researchers
> Date: September 26, 2011 1:05:12 EDT
> To: end2end-inter...@postel.org
> 
> 
> Hi all:
> 
> This may be of interest to those of you working on protocols for high-speed 
> networks.
> 
> ESnet will be deploying a nationwide 100Gbps testbed by the end of this year. 
> This testbed is available to anyone.
> 
> For more information see:
> 
> https://sites.google.com/a/lbl.gov/ani-testbed/proposal-process
> 
> 
> 
> 
> 

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: IP_MINTTL and RFC5082 (TTL security, GTSM) support

2011-09-08 Thread George Neville-Neil

On Aug 18, 2011, at 03:32 , Alexander V. Chernikov wrote:

> Hello list!
> 
> FreeBSD supports IP_MINTTL since long ago (5.x ?). This is RFC3682-compatible 
> implementation.
> 
> It is very simple: if we can associate incoming packet with any socket, 
> socket is checked for minimum TTL value existence. If such value exists and 
> received packet TTL is lower, packet is dropped.
> 
> However, it is not enough for real security. ICMP messages are not checked 
> for minimum TTL (which is now required by RFC 5082  6.1.)
> 
> Icmp messages are passed via  .pr_ctlinput upper level protocol hook.
> Icmp code, originator address (sockaddr *) and part of problem datagramm 
> (received in icmp packet) are passed as arguments.
> 
> As a result, TTL of ICMP packet is not passed to upper layer proto and TTL 
> security cannot be enforced.
> 
> What can possibly be done:
> 
> * New hook .pr_ctlinput2 with additional argument pointing to original ICMP 
> header can be added. After that we convert all base code to use .pr_ctlinput2 
> and appropriate icmp_input() parts can be changed like this:
> 
> 
> ctlfunc2 = inetsw[ip_protox[icp->icmp_ip.ip_p]].pr_ctlinput2;
> if (ctlfunc2)
>  (*ctlfunc2)(code, (struct sockaddr *)&icmpsrc,
>(void *)&icp->icmp_ip, (void *)icp);
> else {
>  ctlfunc = inetsw[ip_protox[icp->icmp_ip.ip_p]].pr_ctlinput;
>if (ctlfunc)
>  (*ctlfunc)(code, (struct sockaddr *)&icmpsrc,
>  (void *)&icp->icmp_ip);
> 
> }
> 
> * .pr_ctlinput() can be altered (if it's not too late for 9.x) and some trick 
> like supplying TTL data directly after (struct sockaddr*) can be used as 8.x 
> MFC
> 
> 
> P.S. We should implement IP_MINTTL variant for IPv6. I can submit patches but 
> this seems to be reasonable only after we got some solution for ICMP security.
> 
> Linux people added compatible opt for IPv4 in 2.6.34:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d218d11133d888f9745802146a50255a4781d37a
> 
> .. and  IPV6_MINHOPCOUNT for IPv6 in 2.6.35:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e802af9cabb011f09b9c19a82faef3dd315f27eb
> 
> so we can consider using IPV6_MINHOPCOUNT as appropriate setsockopt name

Sounds good.  Do you have a patch already?  It seems like you might.

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Multiqueue support for bpf

2011-09-08 Thread George Neville-Neil

On Aug 19, 2011, at 04:21 , Takuya ASADA wrote:

> Any comments or suggestions?
> 
> 

One comment, one question.

First, I think we should try to integrate this work and then tune it up more.  
The API
is, I think, fine, and performance tuning takes a bit of work.

Second, what are the parameters set on buffers for the drivers?  I.e. how many 
slots
do they have in their queues etc.?  If they defaults are too small, and often 
they are,
then that's going to hurt your performance.

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Adding Flow Director sysctls to ixgbe(4)

2011-09-08 Thread George Neville-Neil

On Sep 8, 2011, at 10:48 , K. Macy wrote:

> On Thu, Sep 8, 2011 at 2:34 PM, John Baldwin  wrote:
>> On Monday, September 05, 2011 7:21:12 am Ben Hutchings wrote:
>>> On Mon, 2011-09-05 at 15:51 +0900, Takuya ASADA wrote:
 Hi,
 
 I implemented Ethernet Flow Director sysctls to ixgbe(4), here's a detail:
 
 - Adding removing signature filter
 On linux version of ixgbe driver, it has ability to set/remove perfect
 filter from userland using ethtool command.
 I implemented similar feature, but on sysctl, and not perfect filter
 but signature filter(which means hash collision may occurs).
>>> [...]
>>> 
>>> Linux also has a generic interface to RX filtering and hashing
>>> (ethtool_rxnfc) which ixgbe supports; wouldn't it be better for FreeBSD
>>> to support something like that?
>> 
>> Some sort of shared interface might be nice.  The cxgb(4) and cxgbe(4) 
>> drivers
>> both provide their own tools to manipulate filters, though they do not
>> provide explicit steering IIRC.
>> 
>> We would need to come up with some sort of standard interface (ioctls?) for
>> adding filters however.
> 
> I know this must sound like nitpicking, but please don't add more
> ioctls if you can avoid it. If you want to add new interfaces try to
> stick with sysctl as it tends to be less prone to breakage across
> releases.
> 
> 
> The biggest problem in defining a new API is the lack of anyone with a
> global overview of the functionality provided by NIC vendors and their
> near-term roadmaps. It doesn't make sense to add an API that we only
> know works for one or two vendors.
> 

I think this is doable.  I've seen enough of these cards to know a bit
of what we'd want.  This is a subject we've covered in a few different
BSDCans but it's probably time for a straw man.  I'm not against the
sysctl approach but I'll have to give this a bit more thought.

The only real options are sockets, ioctls and sysctls, that I can see.

Best,
George

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Intel NIC stops working

2011-09-08 Thread George Neville-Neil

On Aug 17, 2011, at 04:06 , Johannes wrote:

> Hi Sami,
> 
> thanks for the reply. Unfortunately, down/up the interface does not
> resolve the problem.
> I have to reboot my server in order to use the nic again.
> 

Crazy suggestion.  Check the memory on your server using memtest or the like.
Those stats you quoted just look very wrong to me, since they numbers are huge
for broadcast and multicast.  I think something in memory is trashed.  Now,
it may be a bug in the driver or somewhere else, but check your memory first.

Best,
George


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


  1   2   >