Re: Scanning the Internet for Vulnerabilities Re: 202207272146.AYC

2022-07-27 Thread Abraham Y. Chen

Hi, John:

0) Thanks for sharing your thoughts. The IoT identification (IP address) 
versus privacy is a rather convoluted topic. It can quickly get 
distracted and diluted if we look at it by piecemeal. Allow me to go 
through an overview to convey my logic.


1) It is true that a dynamic IoT identification is harder to track down 
than a static one, thus providing some sense of privacy or security, 
theoretically. This went well with the need for dynamic practice due to 
the limited IPv4 address pool. So, this idea sank deep into most 
people's mind as inherent for the Internet.


2) It turned out that there were many ways (as you eluded to) to track 
down an IoT even with a dynamic address. There was a classical research 
paper that outlined various techniques to do so:


https://www.ccsl.carleton.ca/paper-archive/muir-computingsurveys-09.pdf

 To save your time, I extracted part of its conclusions as below:
 "6 Concluding Remarks ... while some commercial organizations have 
claimed that they can do it with 99% accuracy. … It’s meant for the 99 
percent of the general public who are just at home surfing. … We note 
that even if accurate IP geolocation is possible for 99% of IP 
addresses, if the remaining 1% is fixed and predictable by an adversary, 
and such that the adversary can place themselves within this subspace, 
then they can evade geolocation 100% of the time. …"


 We do not need to check its validity quantitatively, today, because 
technology has advanced a lot. However, it is probably still pretty 
accurate qualitatively, judging by how successful "targeted marketing" 
is, while how hard various perpetrators may be identified, not to 
mention physically locating one.


3) As long as the general public embrace the Internet technologists' 
promise of privacy by dynamic addressing, however, the LE (Law 
Enforcement) agencies have the excuse for exercising mass surveillance 
that scoops up everything possible from the Internet for offline 
analysis. Big businesses have been doing the same under the same cover. 
So, most people end up without privacy anyway. (Remember the news that 
German Chancellor's phone call was somehow picked up by the NSA of US? 
For anyone with a little imagination, it was a clear hint for the tip of 
an iceberg.).


4) Static communication terminal (IoT) identification practice will 
remove a significant number of entities (the 99%) from LE's monitor 
operation, enabling them to focus on the 1% as well as requiring them to 
submit justification for court order before doing so. The last part has 
disappeared under the Internet environment. See URL below for an 
example. The static IP address practice will simplify the whole game. 
That is, the LEs can do their job easier, while the general public will 
get the legally protected privacy back.


 
https://www.usatoday.com/story/news/2021/12/08/federal-court-upholds-terrorism-conviction-mass-surveillance-case/6440325001/


Regards,



Abe (2022-07-27 23:28 EDT)







On 2022-07-24 13:57, John Curran wrote:

On 24 Jul 2022, at 10:20 AM, Abraham Y. Chen  wrote:

Hi, John:

1) "...  dynamically assigned IP address space can still be tracked back to a given 
system ... ": I fully agree with this statement. However,
A. You overlooked the critical consideration of the response time. If this 
can not be done in real time for law enforcement purposes, it is meaningless.

Abe -

That’s correct - but that does not require having static addresses to 
accomplish (as you postulated earlier),
rather it just requires having appropriately functioning logging apparatus.


B. Also, the goal is to spot the specific perpetrator, not the "system" which is too 
general to be meaningful. In fact, this would penalize the innocent users who happen to be on the 
same implied "system".

Yes, it is quite obvious that a degree of care is necessary.


C. In addition, for your “whack-a-mole” metaphor, the party in charge is 
the mole, not the party with the mallet. It is a losing game for the mallet 
right from the beginning.

As with all enforcement, it is a question on changing to breakeven point 
calculation on incentives & risks
for the would be perpetrators, and presently there’s almost nearly no risk 
involved.


So, the current Internet practices put us way behind the starting line even 
before the game. Overall, this environment is favored by multi-national 
businesses with perpetrators riding along in the background. When security is 
breached, there are more than enough excuses to point the finger to. No wonder 
the outcome has always been disappointing for the general public.

Indeed.


2) What we need to do is to reverse the roles in every one of the above 
situations, if we hope for any meaningful result, at all. The starting point is 
to review the root differences between the Internet and the traditional 
communication systems. With near half a century of the Internet experience, we 
should be ready to study each issue 

Re: 4 gbit capable bufferbloat flent test target close to/traversing MegaIX in sydney?

2022-07-27 Thread Dave Taht
On Wed, Jul 27, 2022 at 2:48 PM Dave Taht  wrote:
>
> I am curious if there is anyone out there willing to run a server with
> irtt and netperf on it that I could do some bufferbloat testing
> against (in off peak hours)? I've been getting some severely bloated
> (250ms!) results on the 27ms path I'm on now at rates slightly above
> 1.2gbit (can share the ugly details privately)
>
> --
> FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
> Dave Täht CEO, TekLibre, LLC

I would like to clarify that the issue I'm having is NOT with MegaIX,
but another that doesn't want to listen, on a path to linode...




-- 
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC


4 gbit capable bufferbloat flent test target close to/traversing MegaIX in sydney?

2022-07-27 Thread Dave Taht
I am curious if there is anyone out there willing to run a server with
irtt and netperf on it that I could do some bufferbloat testing
against (in off peak hours)? I've been getting some severely bloated
(250ms!) results on the 27ms path I'm on now at rates slightly above
1.2gbit (can share the ugly details privately)

-- 
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC


tracfone / straighttalk contact?

2022-07-27 Thread Joshua Pool via NANOG
Anyone have a point of contact at tracfone or straighttalk?

Regards,
Josh


Re: 400G forwarding - how does it work?

2022-07-27 Thread James Bensley
On Tue, 26 Jul 2022 at 21:39, Lawrence Wobker  wrote:
> So if this pipeline can do 1.25 billion PPS and I want to be able to forward 
> 10BPPS, I can build a chip that has 8 of these pipelines and get my 
> performance target that way.  I could also build a "pipeline" that processes 
> multiple packets per clock, if I have one that does 2 packets/clock then I 
> only need 4 of said pipelines... and so on and so forth.

Thanks for the response Lawrence.

The Broadcom BCM16K KBP has a clock speed of 1.2Ghz, so I expect the
J2 to have something similar (as someone already mentioned, most chips
I've seen are in the 1-1.5Ghz range), so in this case "only" 2
pipelines would be needed to maintain the headline 2Bpps rate of the
J2, or even just 1 if they have managed to squeeze out two packets per
cycle through parallelisation within the pipeline.

Cheers,
James.


Re: 400G forwarding - how does it work?

2022-07-27 Thread James Bensley
On Wed, 27 Jul 2022 at 15:11, Masataka Ohta
 wrote:
>
> James Bensley wrote:
>
> > The BCM16K documentation suggests that it uses TCAM for exact
> > matching (e.g.,for ACLs) in something called the "Database Array"
> > (with 2M 40b entries?), and SRAM for LPM (e.g., IP lookups) in
> > something called the "User Data Array" (with 16M 32b entries?).
>
> Which documentation?
>
> According to:
>
> https://docs.broadcom.com/docs/16000-DS1-PUB
>
> figure 1 and related explanations:
>
> Database records 40b: 2048k/1024k.
> Table width configurable as 80/160/320/480/640 bits.
> User Data Array for associated data, width configurable as
> 32/64/128/256 bits.
>
> means that header extracted by 88690 is analyzed by 16K finally
> resulting in 40b (a lot shorter than IPv6 addresses, still may be
> enough for IPv6 backbone to identify sites) information by "database"
> lookup, which is, obviously by CAM because 40b is painful for
> SRAM, converted to "32/64/128/256 bits data".

Hi Masataka,

Yes I had read that data sheet. If you have 2M 40b entries in CAM, you
could also have 1M 80 entries (or a mixture); the 40b CAM blocks can
be chained together to store IPv4/IPv6/MPLS/whatever entries.

Cheers,
James.


Re: 400G forwarding - how does it work?

2022-07-27 Thread Jeff Tantsura
FYI

https://community.juniper.net/blogs/nicolas-fevrier/2022/07/27/voq-and-dnx-pipeline

Cheers,
Jeff

> On Jul 25, 2022, at 15:59, Lincoln Dale  wrote:
> 
> 
>> On Mon, Jul 25, 2022 at 11:58 AM James Bensley  
>> wrote:
> 
>> On Mon, 25 Jul 2022 at 15:34, Lawrence Wobker  wrote:
>> > This is the parallelism part.  I can take multiple instances of these 
>> > memory/logic pipelines, and run them in parallel to increase the 
>> > throughput.
>> ...
>> > I work on/with a chip that can forwarding about 10B packets per second… so 
>> > if we go back to the order-of-magnitude number that I’m doing about “tens” 
>> > of memory lookups for every one of those packets, we’re talking about 
>> > something like a hundred BILLION total memory lookups… and since memory 
>> > does NOT give me answers in 1 picoseconds… we get back to pipelining and 
>> > parallelism.
>> 
>> What level of parallelism is required to forward 10Bpps? Or 2Bpps like
>> my J2 example :)
> 
> I suspect many folks know the exact answer for J2, but it's likely under NDA 
> to talk about said specific answer for a given thing.
> 
> Without being platform or device-specific, the core clock rate of many 
> network devices is often in a "goldilocks" zone of (today) 1 to 1.5GHz with a 
> goal of 1 packet forwarded 'per-clock'. As LJ described the pipeline that 
> doesn't mean a latency of 1 clock ingress-to-egress but rather that every 
> clock there is a forwarding decision from one 'pipeline', and the MPPS/BPPS 
> packet rate is achieved by having enough pipelines in parallel to achieve 
> that.
> The number here is often "1" or "0.5" so you can work the number backwards. 
> (e.g. it emits a packet every clock, or every 2nd clock).
> 
> It's possible to build an ASIC/NPU to run a faster clock rate, but gets back 
> to what I'm hand-waving describing as "goldilocks". Look up power vs 
> frequency and you'll see its non-linear.
> Just as CPUs can scale by adding more cores (vs increasing frequency), ~same 
> holds true on network silicon, and you can go wider, multiple pipelines. But 
> its not 10K parallel slices, there's some parallel parts, but there are 
> multiple 'stages' on each doing different things.
> 
> Using your CPU comparison, there are some analogies here that do work:
>  - you have multiple cpu cores that can do things in parallel -- analogous to 
> pipelines
>  - they often share some common I/O (e.g. CPUs have PCIe, maybe sharing some 
> DRAM or LLC)  -- maybe some lookup engines, or centralized buffer/memory
>  - most modern CPUs are out-of-order execution, where under-the-covers, a 
> cache-miss or DRAM fetch has a disproportionate hit on performance, so its 
> hidden away from you as much as possible by speculative execution out-of-order
> -- no direct analogy to this one - it's unlikely most forwarding 
> pipelines do speculative execution like a general purpose CPU does - but they 
> definitely do 'other work' while waiting for a lookup to happen
> 
> A common-garden x86 is unlikely to achieve such a rate for a few different 
> reasons:
>  - packets-in or packets-out go via DRAM then you need sufficient DRAM (page 
> opens/sec, DRAM bandwidth) to sustain at least one write and one read per 
> packet. Look closer at DRAM and see its speed, Pay attention to page 
> opens/sec, and what that consumes.
>  - one 'trick' is to not DMA packets to DRAM but instead have it go into SRAM 
> of some form - e.g. Intel DDIO, ARM Cache Stashing, which at least 
> potentially saves you that DRAM write+read per packet
>   - ... but then do e.g. a LPM lookup, and best case that is back to a memory 
> access/packet. Maybe it's in L1/L2/L3 cache, but likely at large table sizes 
> it isn't.
>  - ... do more things to the packet (urpf lookups, counters) and it's yet 
> more lookups.
> 
> Software can achieve high rates, but note that a typical ASIC/NPU does on the 
> order of >100 separate lookups per packet, and 100 counter updates per packet.
> Just as forwarding in a ASIC or NPU is a series of tradeoffs, forwarding in 
> software on generic CPUs is also a series of tradeoffs.
> 
> 
> cheers,
> 
> lincoln.
> 


Do you want to get more involved with ARIN? (was: Fwd: [arin-announce] Apply Now for the ARIN 50 Fellowship Program)

2022-07-27 Thread John Curran
NANOGers -

For those folks (e.g., yourself, friends, coworkers, ...) who looking for a 
in-depth mentored introduction
to ARIN, please consider submitting an application to participate in the ARIN 
50 Fellowship Program!

(Details attached below, and more online at https://www.arin.net/fellowships)

Thanks!
/John

John Curran
President and CEO
American Registry for Internet Numbers


Begin forwarded message:

From: ARIN mailto:i...@arin.net>>
Subject: [arin-announce] Apply Now for the ARIN 50 Fellowship Program
Date: 27 July 2022 at 11:19:56 AM EDT
To: "arin-annou...@arin.net" 
mailto:arin-annou...@arin.net>>

Internet governance and number resource policy are powered by community 
participation. To encourage and foster new voices and active members within the 
ARIN region community, the ARIN Fellowship Program provides a specialized, 
interactive learning opportunity to individuals interested in these aspects of 
the Internet and their professional growth in the industry.

Applications for the ARIN 50 Fellowship Program are open now, and we encourage 
you to complete our short online application at the following link: 
https://arin.smapply.net/prog/virtual_fellowship_program_arin_50/. Applications 
are accepted from 27 July to 17 August.

ARIN 50 Fellows will attend four virtual sessions before and after the ARIN 50 
Public Policy and Members Meeting (20-21 October). With the personal support of 
an assigned mentor, Fellows will progress through an engaging and approachable 
agenda of presentations, discussions, and Q with ARIN leadership — including 
ARIN staff, special guests, and Advisory Council members. The sessions will 
provide an overview of the ARIN Policy Development Process, Internet 
governance, Internet number resource policy and its development, ARIN services 
and operations, and the Internet Number Registry System.

Fellows will have the opportunity to ask questions, get feedback, and gain the 
knowledge and confidence to join in community discussions, propose new ideas, 
and become part of the future of Internet governance and policy in the ARIN 
region.

The ARIN Fellowship Program is open to all individuals 18 years and older who 
reside in the ARIN region, understand the importance of ARIN’s mission, and are 
familiar with ARIN services.

To learn more about ARIN’s Fellowship Program, the selection process, and the 
Terms and Conditions, please visit: https://www.arin.net/fellowships.

Applicants that require approval from their employer prior to applying may 
request a customizable letter that highlights the benefits of taking part in 
ARIN’s Fellowship Program by emailing 
fellowsh...@arin.net.

If you have any other questions or need more information, please email 
fellowsh...@arin.net. ARIN looks forward to 
receiving your application soon!

Regards,

Amanda Gauldin
Community Programs Manager
American Registry for Internet Numbers (ARIN)


___
ARIN-Announce
You are receiving this message because you are subscribed to
the ARIN Announce Mailing List 
(arin-annou...@arin.net).
Unsubscribe or manage your mailing list subscription at:
https://lists.arin.net/mailman/listinfo/arin-announce
Please contact i...@arin.net if you experience any issues.



Telkom Indonesia and Indesat Ooredoo Contacts

2022-07-27 Thread Gabe Cole
Hi All,

I am looking for network contacts at Telkom Indonesian and Indesat Ooredoo to 
discuss circuit diversity requirements for a new project.

Kind regards,

-Gabe

G. Gabriel Cole
*RTE Group, Inc.*
*Strategic Consulting for Mission Critical Infrastructure*
56 Woodridge Rd
Wellesley, MA 02482
US +1-617-303-8707
fax +1-781-209-5577
www.rtegroup.com ( http://www.rtegroup.com )
g...@rtegroup.com
skype:  ggabrielcole
Twitter:  @DataCenterGuru
Linked In: http://www.linkedin.com/in/gabecole
Blog: http://datacenterguru.blogspot.com/

The information contained herein is confidential and proprietary to RTE Group, 
Inc. It is intended for presentation to and permitted use solely by those 
person(s) to whom it has been transmitted by RTE Group, Inc. and it is 
transmitted to such person(s) solely for, conditional upon, and only to the 
extent necessary for use by such person(s) as part of their business 
relationship with RTE Group, Inc. or to further their respective evaluation(s) 
of a potential business relationship with RTE Group, Inc., and no other use, 
release, or reproduction of this information is permitted.

Sent via Superhuman ( https://sprh.mn/?vip=g...@rtegroup.com )

Re: 400G forwarding - how does it work?

2022-07-27 Thread Dave Taht
This convo is giving me some hope that the sophisticated FQ and AQM
algorithms I favor can be made to run in more hardware at high rates,
but most of the work I'm aware of has targetted tofino and P4.

The only thing I am aware of shipping is AFD in some cisco hw. Anyone
using that?


RE: 400G forwarding - how does it work?

2022-07-27 Thread ljwobker
The Broadcom KBP -- often called an "external TCAM" is really closer to a 
completely separate NPU than just an external TCAM.  "Back in the day" we used 
external TCAMs to store forwarding state (FIB tables, ACL tables, whatever) on 
devices that were pretty much just a bunch of TCAM memory and an interface for 
the "main" NPU to ask for a lookup.  Today the modern KBP devices have WAY more 
functionality, they have lots of different databases and tables available, 
which can be sliced and diced into different widths and depths.  They can store 
lots of different kinds of state, from counters to LPM prefixes and ACLs.  At 
risk of correcting Ohta-san, note that most ACLs are implemented using TCAMs 
with wildcard/masking support, as opposed to an exact match lookup.  Exact 
match lookups are generally used for things that do not require masking or 
wildcard bits: MAC addresses and MPLS label values are the canonical examples 
here.  

The SRAM memories used in fast networking chips are almost always built such 
that they provide one lookup per clock, although hardware designers often use 
multiple banks of these to increase the number of *effective* lookups per 
clock.  TCAMs are also generally built such that they provide one lookup/result 
per clock, but again you can stack up multiple devices to increase this.

Many hardware designs also allow for more flexibility in how the various 
memories are utilized by the software -- almost everyone is familiar with the 
idea of "I can have a million entries of X bits, or half a million entries of 
2*X bits".  If the hardware and software complexity was free, we'd design 
memories that could be arbitrarily chopped into exactly the sizes we need, but 
that complexity is Absolutely Not Free so we end up picking a few discrete 
sizes and the software/forwarding code has to figure out how to use those bits 
efficiently.  And you can bet your life that as soon as you have a memory that 
can function using either 80b or 160b entries, you will immediately come across 
a use case that really really needs to use entries of 81b.

FYI: There's nothing particularly magical about 40b memory widths.  When 
building these chips you can (more or less) pick whatever width of SRAM you 
want to build, and the memory libraries that you use spit out the corresponding 
physical design.

Ohta-san correctly mentions that a critical part of the performance analysis is 
how fast the different parts of the pipeline can talk to each other.  Note that 
this concept applies whether we're talking about the connection between very 
small blocks within the ASIC/NPU, or the interface between the NPU and an 
external KBP/TCAM, or for that matter between multiple NPUs/fabric chips within 
a system.  At some point you'll always be constrained by whatever the slowest 
link in the pipeline is, so balancing all that stuff out is Yet One More Thing 
for the system designer to deal with.



--lj

-Original Message-
From: NANOG  On Behalf Of Masataka 
Ohta
Sent: Wednesday, July 27, 2022 9:09 AM
To: nanog@nanog.org
Subject: Re: 400G forwarding - how does it work?

James Bensley wrote:

> The BCM16K documentation suggests that it uses TCAM for exact matching 
> (e.g.,for ACLs) in something called the "Database Array"
> (with 2M 40b entries?), and SRAM for LPM (e.g., IP lookups) in 
> something called the "User Data Array" (with 16M 32b entries?).

Which documentation?

According to:

https://docs.broadcom.com/docs/16000-DS1-PUB

figure 1 and related explanations:

Database records 40b: 2048k/1024k.
Table width configurable as 80/160/320/480/640 bits.
User Data Array for associated data, width configurable as
32/64/128/256 bits.

means that header extracted by 88690 is analyzed by 16K finally resulting in 
40b (a lot shorter than IPv6 addresses, still may be enough for IPv6 backbone 
to identify sites) information by "database"
lookup, which is, obviously by CAM because 40b is painful for SRAM, converted 
to "32/64/128/256 bits data".

> 1 second / 164473684 packets = 1 packet every 6.08 nanoseconds, which 
> is within the access time of TCAM and SRAM

As high speed TCAM and SRAM should be pipelined, cycle time, which matters, is 
shorter than access time.

Finally, it should be pointed out that most, if not all, performance figures 
such as MIPS and Flops are merely guaranteed not to be exceeded.

In this case, if so deep packet inspections by lengthy header for some 
complicated routing schemes or to satisfy NSA requirements are required, 
communication speed between 88690 and 16K will be the limitation factor for PPS 
resulting in a lot less than maximum possible PPS.

Masataka Ohta



Re: 400G forwarding - how does it work?

2022-07-27 Thread Masataka Ohta

James Bensley wrote:


The BCM16K documentation suggests that it uses TCAM for exact
matching (e.g.,for ACLs) in something called the "Database Array"
(with 2M 40b entries?), and SRAM for LPM (e.g., IP lookups) in
something called the "User Data Array" (with 16M 32b entries?).


Which documentation?

According to:

https://docs.broadcom.com/docs/16000-DS1-PUB

figure 1 and related explanations:

Database records 40b: 2048k/1024k.
Table width configurable as 80/160/320/480/640 bits.
User Data Array for associated data, width configurable as
32/64/128/256 bits.

means that header extracted by 88690 is analyzed by 16K finally
resulting in 40b (a lot shorter than IPv6 addresses, still may be
enough for IPv6 backbone to identify sites) information by "database"
lookup, which is, obviously by CAM because 40b is painful for
SRAM, converted to "32/64/128/256 bits data".


1 second / 164473684 packets = 1 packet every 6.08 nanoseconds, which
is within the access time of TCAM and SRAM


As high speed TCAM and SRAM should be pipelined, cycle time, which
matters, is shorter than access time.

Finally, it should be pointed out that most, if not all, performance
figures such as MIPS and Flops are merely guaranteed not to be exceeded.

In this case, if so deep packet inspections by lengthy header for some
complicated routing schemes or to satisfy NSA requirements are required,
communication speed between 88690 and 16K will be the limitation factor
for PPS resulting in a lot less than maximum possible PPS.

Masataka Ohta


Re: 400G forwarding - how does it work?

2022-07-27 Thread Saku Ytti
On Tue, 26 Jul 2022 at 23:15, Jeff Tantsura  wrote:

> In general, if we look at the whole spectrum, on one side there’re massively 
> parallelized “many core” RTC ASICs, such as Trio, Lightspeed, and similar (as 
> the last gasp of Redback/Ericsson venture - we have built 1400 HW threads 
> ASIC (Spider).
> On another side of the spectrum - fixed pipeline ASICs, from BCM Tomahawk at 
> its extreme (max speed/radix - min features) moving with BCM Trident, 
> Innovium, Barefoot(quite different animal wrt programmability), etc - usually 
> shallow on chip buffer only (100-200M).
>
> In between we have got so called programmable pipeline silicon, BCM DNX and 
> Juniper Express are in this category, usually a combo of OCB + off chip 
> memory (most often HBM), (2-6G), usually have line-rate/high scale 
> security/overlay encap/decap capabilities. Usually have highly optimized RTC 
> blocks within a pipeline (RTC within macro). The way and speed to access DBs, 
> memories is evolving with each generation, number/speed of non networking 
> cores(usually ARM)  keeps growing - OAM, INT, local optimizations are primary 
> users of it.

What do we call Nokia FP? Where you have a pipeline of identical cores
doing different things, and the packet has to hit each core in line in
order? How do we contrast this to NPU where a given packet hits
exactly one core?

I think ASIC, NPU, pipeline, RTC are all quite ambiguous. When we say
pipeline, usually people assume a purpose build unique HW blocks
packet travels through (like DNX, Express) and not fully flexible
identical cores pipeline like FP.

So I guess I would consider 'true pipeline', pipeline of unique HW
blocks and 'true NPU' where a given packet hits exactly 1 core. And
anything else as more or less hybrid.

I expect once you get to the details of implementation all of these
generalisations use communicative power.

-- 
  ++ytti


Re: 400G forwarding - how does it work?

2022-07-27 Thread Saku Ytti
On Tue, 26 Jul 2022 at 21:28,  wrote:

> >No you are right, FP has much much more PPEs than Trio.
>
> Can you give any examples?

Nokia FP is like >1k, Juniper Trio is closer to 100 (earlier Trio LUs
had much less). I could give exact numbers for EA and YT if needed,
they are visible in the CLI and the end user can even profile them, to
see what ucode function they are spending their time on.

-- 
  ++ytti