-------- Original Message -------- Subject: Re: [Jchat] J subset compilation Date: Thu, 27 Feb 2014 23:34:39 -0500 From: Robert Bernecky <[email protected]> To: [email protected] Having been down this road several times (and still looking for the exit sign...), I think it makes sense to start out by asking what problem you are trying to solve. I ask this because the ILs you picked are extremely low-level, and you will have roughly zero opportunity to perform algebraic optimizations on any but the most trivial array-level computations. E.g, consider if you could handle: M2← ⍉⍉ M NB. Transpose transpose, if you can't read the APL... and turn it into: M2← M Let's assume you stick with a primitive language, such as those below. Oh, and remember that C does not have arrays. Neither does D, which seems like a good candidate for an IL, at first glance. They have ONLY vectors. If you think they have arrays, try these two things: 1. Write a function that accepts two array arguments of arbitrary shape and returns an array of the sum of the first and the transpose of the second. I ask this one because, as far as I know, you can't pass an array to a C function unless it is a vector or has statically declared shape. 2. Write a transpose function: M2← ⍉M 3. Ensure that the function in (2) works on empty arrays, such as an array of shape 3 0. This is where C falls apart. Anytime someone tells you that they can mimic arrays with vectors of vectors, ask them how they handle empty arrays. 4. Ensure that all your array-level optimizations work for the function in (2). Now, if you plan to ignore array-level optimizations, then you end up with what is, effectively, an interpreter that has the syntax analysis performed statically. This is A Good Thing for iterative functions working on small arrays, but people like Dyalog are doing some nice JITly things to achieve that same end, transparently. And, you still have to write a LOT of your own run-time code, or else (ta da!) invoke the J interpreter for the missing bits. If you plan to ignore array-level optimizations, then you will have great difficulty in coming near the performance of hand-written scalar-oriented code, or array-optimized functional array language code ( E.g., SAC& APEX), because it is difficult to fuse loops in such a way as to eliminate array-valued temps. I discussed this in my MSc thesis, which you can find here: http://www.snakeisland.com/apexup.htm If you ignore array-level optimizations, then you will be nailed to the wall on performance, due to the high latencies and limited bandwidth of contemporary memory subsystems. I suggest you look carefully at SAC. I am using it as my primary back end for APEX, and am wrapping up (I hope) some research into array-level optimizations for SAC. www.sac-home.org It has some excellent ideas for both language design (it also has some very baroque ones, IMO) and optimizations. An alternate approach that I have been considering is taking the concepts of SAC and the SAC compiler, and designing a new compiler (SAC is written in C.) in J or APL. This would still be a fair amount of work, but if done right (I.e., the compiler has to be able to compile itself, which means you really need support for structs and boxed arrays.), it could be a highly productive tool AND also serve as the basis for serious research into optimizers. [I hate writing pages of C in order to end up doing something like a slight variant on set membership...] BTW, if it's a toy you want, I recommend a yo-yo. They'll be back in fashion soon. [Actually, we have one for mayor here in Toronto.] Getting something trivial working is not extremely hard - we did ACORN for the CRAY Y-MP in about two weeks, for really simple things. Getting it to work on real applications, or even on language subsets, is a lot more work. E.g., how do you plan to handle file I/O? Array formatting? Bob P.S.: Don't forget 1-bit storage for Boolean arrays. P.P.S: Be sure to scalarize ALL small arrays, in your copious spare time. Otherwise, code like Mike Jenkin's APL model for domino, which contained this nifty expression for a pivot operation to swap two matrix rows: M[i,j;] ← M[j,i;] will run about twice as long than it would if you didn't allocate, fill, use, then deallocate, two 2-element vectors: i,j and j,i. When I replaced the above line (by hand, at the time) by this: tmp←M[i;] M[i;]←M[j;] M[j;]←tmp the whole domino function roughly doubled in speed. This is one tip of the iceberg. b On 14-02-27 10:19 PM, Joe Bogner wrote:
I started to toy with the idea of working on a subset of J that can be compiled. The idea has been brought up before in the past. Yes, I realize it's a path fraught with peril and a likely premature demise. Still, as a toy, I figure it would be fun. Options for intermediate language: 1. GCC RTL - It looks very challenging and as far as I can tell would require recompiling GCC, which is something I don't want to do 2. LLVM IR - I cobbled together a small example that emits IR which can then be compiled. Hypothetically, we could use the LLVM C bindings to generate IR from J words directly from J. For my proof of concept, I used fsharp bindings since that seemed to be the easiest path on wndows. Julia, Rust, and Haskell have LLVM targets. 3. C - Either clang, gcc or tinycc. Generating C doesn't seem as stable and professional grade as the earlier options. However, I have come across decent languages that do it (Chicken Scheme and others). ELI does it, http://fastarray.appspot.com/compile.html. APEX also does it, http://www.snakeisland.com/apexup.htm. Could be used to also JIT code from J for small functions. I'm currently edging towards the C path. Maybe that makes more sense as a little proof of concept since it seems like it'd be quicker to get up and running. My vision here is to implement a subset of verbs in C ({ i. # $) .. really basic .. and then have a parser that takes a series of J expressions and translates them into the corresponding running list of C function calls - all in a single main(). Some things I'm excited about trying out: 1. clang/llvm vectorization - http://blog.llvm.org/2013/05/llvm-33-vectorization-improvements.html 2. parallel computation and concurrency - http://opensource.mlba-team.de/xdispatch/docs/current/index.html ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
-- Robert Bernecky Snake Island Research Inc 18 Fifth Street Ward's Island Toronto, Ontario M5J 2B9 [email protected] tel: +1 416 203 0854 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
