On Fri, May 20, 2022 at 6:02 PM Elijah Stone <elro...@elronnd.net> wrote: > Yes--that was one of the representations I was thinking of. But it makes > random access hard. One alternative is a pair of parallel vectors which can > be fed to i.. This has the advantage that, like with adjacency matrices, it > is easy to flip the arrows when that is convenient.
Yeah... I saw your followup message right about when I finished writing the response you replied to here. > > But for highly optimized micro-modules you want a compiler which > > automatically detects and removes as many unnecessary mechanisms as possible > > from the implementation. > > > > And, for us to have that in J, we would probably want a J compiler for some > > constrained subset of the language, and a systematic and well documented way > > of calling between J and these compiled artifacts (along with callbacks for > > compiled adverbs and conjunctions). > > I have tentative plans for something like this. But I would like to point out > two things: > > 1. Calling convention should be effortless, because verbs are transparent. > It should be possible for a user to define an adverb 'compile' st [x] u > compile y behaves as [x] u y but is faster, for a subset of verbs u. Yes. > 2. Restricted subset of j is not necessary in order to get good performance, > as demonstrated eg by > http://www.softwarepreservation.org/projects/apl/Papers/DYNAMICINCREMENTAL. > This sort of thing does require support from the underlying implementation, > though. There's going to be constraints. At the very least, there's going to be code which the compiler does not treat well. That said, my reading of that paper suggests results not all that different in character from J's "special code" (though different in implementation). > > The original design of APL (and inherited by J) was rooted in specification > > of machine architecture. So it shouldn't be too surprising that there's a > > good match between machine operations and language primitives. > > The original design of APL was a notation for _humans_ to use on blackboards. Indeed. > I speak of _chasing_ pointers, not just following them; that means deep > dependency chains. Coming back to the issue of BVH: I can imagine an > efficient BVH lookup for many keys implemented in pure j using > index-following. But for an efficient search for just one key, it is > necessary to leverage primitives, and I don't know of a good way to do that. > (I recognise this is moving the goalposts a bit; sorry about that. But this > is the problem I was trying to solve.) When I think about compiling J, I envision something of a hierarchy, with singly typed rank 0 operations out at the leaves, rank 1 operations somewhere in the middle and the full J language somewhere behind all that. When dealing with large data structures, it's fine to defer decisions about those singly typed rank 0 operations to the language primitives at execution time. But the stuff you're talking about really wants those decisions moved to earlier in the process and not repeated every time the operation is performed. > To give an example, here is a compacting GC I wrote in just a couple of lines > of j for a simple lisp implementation: > https://github.com/moon-chilled/j/blob/master/lisp/lisp.ijs#L73. > The actual 'compacting' part is rather small and neat. That's because it can > think of memory as an essentially flat structure. But the _tracing_ part does > a lot of extraneous work. I'm sure somebody cleverer than I could do a better > job. But, I would argue, this is the sort of solution the language directed > me towards. And, I would argue that the caching architectures of modern machines benefit from a similar organization. Looking at recent cpu architectures, if I understood what I was reading: we have superscalar operations (where a single instruction might take a single clock cycle, and sometimes multiple instructions can be completed in a single clock cycle -- in other words, in approximately 0.3 nanoseconds). This probably requires that the operations work with data in a current cache line (256 Bytes, or 4 "longs"). Access to L1 cache (32k Bytes) introduces a 1 nanosecond stall. Access to L2 cache (0.25MB) introduces a 4 nanosecond stall, which is starting to be noticable. Access to L3 cache (maybe 8MB though it depends on the machine) introduces a 40 nanosecond stall. And, access to main memory introduces an 80 nanosecond stall. Meanwhile, pointer chasing, on large arbitrary data structures, would typically access main memory. Compared to 1 nanosecond for L1 cache, this factor of 80 can easily be larger than O(log n). For example: 2^.1e9 29.8974 Anyways, we sort of necessarily have to work with oversimplifications -- there's a lot going on here. So ... informed testing is important. As is careful judgement. Thanks, -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm