On Fri, 5 Aug 2022 at 20:31, <ljwob...@gmail.com> wrote:

Hey LJ,

> Disclaimer:  I work for Cisco on a bunch of silicon.  I'm not intimately 
> familiar with any of these devices, but I'm familiar with the high level 
> tradeoffs.  There are also exceptions to almost EVERYTHING I'm about to say, 
> especially once you get into the second- and third-order implementation 
> details.  Your mileage will vary...   ;-)

I expect it may come to this, my question may be too specific to be
answered without violating some NDA.

> If you have a model where one core/block does ALL of the processing, you 
> generally benefit from lower latency, simpler programming, etc.  A major 
> downside is that to do this, all of these cores have to have access to all of 
> the different memories used to forward said packet.  Conversely, if you break 
> up the processing into stages, you can only connect the FIB lookup memory to 
> the cores that are going to be doing the FIB lookup, and only connect the 
> encap memories to the cores/blocks that are doing the encapsulation work.  
> Those interconnects take up silicon space, which equates to higher cost and 
> power.

While an interesting answer, that is, the statement is, cost of giving
access to memory for cores versus having a more complex to program
pipeline of cores is a balanced tradeoff, I don't think it applies to
my specific question, while may apply to generic questions. We can
roughly think of FP having a similar amount of lines as Trio has PPEs,
therefore, a similar number of cores need access to memory, and
possibly higher number, as more than 1 core in line will need memory
access.
So the question is more, why a lot of less performant cores, where
performance is achieved by making pipeline, compared to fewer
performant cores, where individual  cores will work on packet to
completion, when the former has a similar number of core lines as
latter has cores.

> Packaging two cores on a single device is beneficial in that you only have 
> one physical chip to work with instead of two.  This often simplifies the 
> board designers' job, and is often lower power than two separate chips.  This 
> starts to break down as you get to exceptionally large chips as you bump into 
> the various physical/reticle limitations of how large a chip you can actually 
> build.  With newer packaging technology (2.5D chips, HBM and similar 
> memories, chiplets down the road, etc) this becomes even more complicated, 
> but the answer to "why would you put two XYZs on a package?" is that it's 
> just cheaper and lower power from a system standpoint (and often also from a 
> pure silicon standpoint...)

Thank you for this, this does confirm that benefits aren't perhaps as
revolutionary as the presentation of thread proposed, presentation
divided Trio evolution to 3 phases, and this multiple trios on package
was presented as one of those big evolutions, and perhaps some other
division of generations could have been more communicative.

> Lots and lots of Smart People Time has gone into different memory designs 
> that attempt to optimize this problem, and it's a major part of the 
> intellectual property of various chip designs.

I choose to read this as 'where a lot of innovation happens, a lot of
mistakes happen'. Hopefully we'll figure out a good answer here soon,
as the answers vendors are ending up with are becoming increasingly
visible compromises in the field. I suspect a large part of this is
that cloudy shops represent, if not disproportionate revenue,
disproportionate focus and their networks tend to be a lot more static
in config and traffic than access/SP networks. And when you have that
quality, you can make increasingly broad assumptions, assumptions
which don't play as well in SP networks.

-- 
  ++ytti

Reply via email to