On a somewhat related note - I've just received my NZ/AU Region
Almond+ which is an arm9 Dual core router based on the Cortina CSC SoC
:

https://www.cortina-systems.com/product/digital-home-processors/16-products/996-cs7542-cs7522

More details :

On 2 September 2014 21:27, Jonathan Morton <chromati...@gmail.com> wrote:
>
> On 2 Sep, 2014, at 1:14 am, Aaron Wood wrote:
>
>>> For the purposes of shaping, the CPU shouldn't need to touch the majority 
>>> of the payload - only the headers, which are relatively small.  The bulk of 
>>> the payload should DMA from one NIC to RAM, then DMA back out of RAM to the 
>>> other NIC.  It has to do that anyway to route them, and without shaping 
>>> there'd be more of them to handle.  The difference might be in the data 
>>> structures used by the shaper itself, but I think those are also reasonably 
>>> compact.  It doesn't even have to touch userspace, since it's not acting as 
>>> the endpoint as my PowerBook was during my tests.
>>
>> In an ideal case, yes.  But is that how this gets managed?  (I have no idea, 
>> I'm certainly not a kernel developer).
>
> It would be monumentally stupid to integrate two GigE MACs onto an SoC, and 
> then to call it a "network processor", without adequate DMA support.  I don't 
> think Atheros are that stupid.
>
> Here's a more detailed datasheet:
>         
> http://pdf.datasheetarchive.com/indexerfiles/Datasheets-SW6/DSASW00118777.pdf
>
> "Another memory factor is the ability to support multiple I/O operations in 
> parallel via the WNPU's various ports. The on-chip SRAM in AR7100 WNPUs has 5 
> ports that enable simultaneous access to and from five sources: the two 
> gigabit Ethernet ports, the PCI port, the USB 2.0 port and the MIPS 
> processor."
>
> It's a reasonable question, however, whether the driver uses that support 
> properly.  Mainline Linux kernel code seems to support the SoC but not the 
> Ethernet; if it were just a minor variant of some other Atheros hardware, I'd 
> have expected to see it integrated into one of the existing drivers.  Or 
> maybe it is, and my greps just aren't showing it.
>
> At minimum, however, there are MMIO ranges reported for each MAC during 
> OpenWRT's boot sequence.  That's where the ring buffers are.  The most the 
> CPU has to do is read each packet from RAM and write it into those buffers, 
> or vice versa for receive - I think that's what my PowerBook has to do.  
> Ideally, a bog-standard DMA engine would take over that simple duty.  Either 
> way, that's something that has to happen whether it's shaped or not, so it's 
> unlikely to be our problem.
>
> The same goes for the wireless MACs, incidentally.  These are standard ath9k 
> mini-PCI cards, and the drivers *are* in mainline.  There shouldn't be any 
> surprises with them.
>
>> If the packet data is getting moved about from buffer to buffer (for 
>> instance to do the htb calculations?) could that substantially change the 
>> processing load?
>
> The qdiscs only deal with packet and socket headers, not the full packet 
> data.  Even then, they largely pass pointers around, inserting the headers 
> into linked lists rather than copying them into arrays.  I believe a lot of 
> attention has been directed at cache-friendliness in this area, and the MIPS 
> caches are of conventional type.
>
>>> Which brings me back to the timers, and other items of black magic.
>>
>> Which would point to under-utilizing the processor core, while still having 
>> high load? (I'm not seeing that, I'm curious if that would be the case).
>
> It probably wouldn't manifest as high system load.  Rather, poor timer 
> resolution or latency would show up as excessive delays between packets, 
> during which the CPU is idle.  The packet egress times may turn out to be 
> quantised - that would be a smoking gun, if detectable.
>
>>> Incidentally, transfer speed benchmarks involving wireless will certainly 
>>> be limited by the wireless link.  I assume that's not a factor here.
>>
>> That's the usual suspicion.  But these are RF-chamber, short-range lab 
>> setups where the radios are running at full speed in perfect environments...
>
> Sure.  But even turbocharged 'n' gear tops out at 450Mbps signalling, and 
> much less than that is available even theoretically for TCP/IP throughput.  
> My point is that you're probably not running *your* tests over wireless.
>
>> What this makes me realize is that I should go instrument the cpu stats with 
>> each of the various operating modes:
>>
>> * no shaping, anywhere
>> * egress shaping
>> * egress and ingress shaping at various limited levels:
>>     * 10Mbps
>>     * 20Mbps
>>     * 50Mbps
>>     * 100Mbps
>
> Smaller increments at the high end of the range may prove to be useful.  I 
> would expect the CPU usage to climb nonlinearly (busy-waiting) if there's a 
> bottleneck in a peripheral device, such as the PCI bus.  The way the kernel 
> classifies that usage may also be revealing.
>
>> Heck, what about running HTB simply from a 1ms timer instead of from a data 
>> driven timer?
>
> That might be what's already happening.  We have to figure out that before we 
> can work out a solution.
>
>  - Jonathan Morton
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Reply via email to