I was curious to do some basic benchmarking. Posting here if others are interested. Not sure if this should be moved to another thread. If someone replies please do what is appropriate (same thread or new one)
https://gist.github.com/joebo/61e0841fbf511e1aab4d The state machine implementation runs 1000 iterations in around 4 seconds Using powershell to benchmark on windows: PS C:\Users\Joe\downloads\j801\bin> Measure-Command { .\jconsole.exe c:\joe\parse.ijs -js "goTime''" } TotalMilliseconds : 4224.26 Dan's implementation is slightly faster PS C:\Users\Joe\downloads\j801\bin> Measure-Command { .\jconsole.exe c:\joe\parse.ijs -js "go2Time''" } TotalMilliseconds : 4070.153 The rosettacode PicoLisp implementation was quite a bit slower PS C:\Users\Joe\Downloads\picolisp3.1.4_cygwin\bin> Measure-Command { .\picolisp.exe parse.l } TotalMilliseconds : 5069.1331 Rewriting it to eliminate the piping sped it up PS C:\Users\Joe\Downloads\picolisp3.1.4_cygwin\bin> Measure-Command { .\picolisp.exe parse2.l } TotalMilliseconds : 3522.3157 Here is the final picolisp implementation: (de rdConf (File) (in File (until (eof) (setq Line (line)) (when (and Line (<> (car Line) "#") (<> (car Line) ";")) (let (Length (length Line) Index (index " " Line) Key (pack (if Index (pack (head (- Index 1) Line)) Line)) Value (if Index (pack (tail (- Length Index) Line)) T) ) (set (intern Key) Value) ) ) ) ) ) (do 10000 (rdConf "c:/temp/test.conf")) (printsp (list FULLNAME FAVOURITEFRUIT NEEDSPEELING SEEDSREMOVED OTHERFAMILY)) (bye) ("Foo Barber" "banana" T NIL "Rhu Barber, Harry Barber") Speed probably isn't important in a typical implementation of this task. I'm not posting the PicoLisp code or timings as a competition but just as another reference point. On Tue, Jan 14, 2014 at 12:25 AM, Raul Miller <rauldmil...@gmail.com> wrote: > Ok, I think I understand. > > The basic issue, here, seems to be that PicoLisp is stream oriented > and this is a stream oriented task. No one in the J community has > cared enough to build a stream oriented library for J. J has enough to > do stream oriented design for academic purposes, but ... Consider > xml/sax as an example of how one might approach streams in J - call > out to some standardized implementation and instead focus on what is > unique to the application. > > Meanwhile, for a J programmer, words like [: & @ [ and ] occupy a role > not too different from parenthesis for a Lisp programmer. Parenthesis > might seem simple, but in fact there are a fair number of contextual > rules that one must learn before really understanding their > significance. Do the parenthesis delimit a lambda definition? an > argument list? Do they denote a function call? Some other special > form? That, I think, is the issue you were focusing on when counting > tokens - how many special rules does a person have to understand to > parse the code. J has 9 parsing rules, each roughly at the same > complexity level as a lisp-like lambda. Explicit contexts add a few > more, though that's mostly syntactic sugar. > > Meanwhile, J is and is not fast. It can be fast, but only if you > design your code using big, homogeneous data structures to represent > large data sets. > > I'm not sure if I am making sense, so I suppose I should write some > code instead. > > Here's an implementation of a config file reader which should perform > reasonably well: > > ChrCls=: '#;';(' ',TAB);LF;a.-.'#; ',TAB,LF > NB. comment, space, line, other > > tokens=: (0;(0 10#:10*".;._2]0 :0);<ChrCls)&;: > 1.0 0.0 0.0 2.1 NB. 0: skip whitespace (start here) > 1.0 1.0 0.0 1.0 NB. 1: comment > 3.3 4.3 5.2 2.0 NB. 2: word > 3.0 3.0 5.1 3.0 NB. 3: comment after word > 3.0 4.0 5.1 2.1 NB. 4: space after word > 1.3 0.3 0.3 2.1 NB. 5: line end after word > ) > > readConf=: ({.,<@(;:inv)@}.);._2@tokens@fread > > This uses a state machine to deal with all the low level character > munging and then forms the result into a two column table where the > first column is the name and the remainder is whatever followed that > (with redundant whitespace removed). > > Is it readable? Not if you are not familiar with the language. In > fact, this took me about half an hour to write. And I would not bother > doing something like this, normally, unless performance really > mattered (which implies large file size). But, if performance does > matter, this approach should behave reasonably well and (at least for > J) should have much better performance than an implementation which > loops or otherwise uses separate primitives for separate states. > > That said, I should also note that the idea of using decimal fractions > for state machine instructions was Ken Iverson's. I'll not go into why > sequential machine was not implemented that way, because I feel guilty > about it. > > Thanks, > > -- > Raul > > > On Mon, Jan 13, 2014 at 11:18 PM, Joe Bogner <joebog...@gmail.com> wrote: >> Yes, it is complete. I didn't write it or test it as it was already posted >> to rosettacode. I will explain how it works assuming there is interest >> >> It uses some uncommon tricks. It leverages the read function which is the >> same function used in the repl to read input characters. So the goal is to >> take the file and skip any comments and then pass it on to set the variable >> with the key and value. >> >> (pipe (in File (while (echo "#" ";") (till "^J"))) >> >> Reads the file and echos until it encounters a comment character and then >> reads til EOL >> >> while (read) >>> (skip) >>> (set @ (or (line T) T)) ) ) ) >> >> Then read those echoed characters. read gets the first symbol, the key. >> Skip moves the input stream ahead by the space or nothing. Set assigns the >> variable @ which is the result from the last operation (read - which is the >> key) with the value from the rest of the line or T if it is blank (for >> boolean example in the config) >> >> My brain has been trained to think of parens as whitespace. It didn't start >> that way. I can see why you may consider them tokens. I was also counting >> unique function/operation tokens, not characters. The idea being if I only >> have 4 english words with 3 characters each on a line, that is easier for >> my brain to parse than 5 operations using 2 ascii symbols that I don't >> recognize the meaning. >> >> However as my J vocabulary improves it becomes less of an issue. I can >> parse i. or e. As fast as a function called idx or el? >> >> Line length is still important I think. Also a functional style with >> splitting up the train may help reusability, comprehension, and may help >> identify small areas to refactor. Those small topics like "filter out >> lines starting with a comment character" can get lost to me in a long line >> of compound operations. Again, some balance and personal perference and >> familiarity >> >> I became interested in picolisp for its speed, conciseness and >> expressiveness. Many of the same attributes as J. It is almost always in >> the shortest solutions on rosettacode too. Happy to help resolve your build >> issue off the list if you are interested. >> On Jan 13, 2014 10:28 PM, "Raul Miller" <rauldmil...@gmail.com> wrote: >> >>> On Mon, Jan 13, 2014 at 8:32 PM, Joe Bogner <joebog...@gmail.com> wrote: >>> > PicoLisp >>> > >>> > (de rdConf (File) >>> > (pipe (in File (while (echo "#" ";") (till "^J"))) >>> > (while (read) >>> > (skip) >>> > (set @ (or (line T) T)) ) ) ) >>> >>> Is that complete? >>> >>> I learned lisp back in highschool, and I've used drracket and emacs >>> and other such lisp environments, but never learned picolisp. I tried >>> to install picolisp but it would not build for me and I do not feel >>> like debugging the source for picolisp just for this message. >>> >>> My impression, though, is that a J implementation like what you have >>> written would look something like this: >>> >>> conf=: a:-.~(#~ 1 -.&e. '#;'&e.S:0)<;._2 fread file >>> >>> In other words, read the file as lines, removing blank lines and comment >>> lines. >>> >>> If all you are doing is saving the unparsed lines then we should >>> expect simpler code. But maybe I have missed a subtlety of picolisp? >>> >>> I get that @ is a wild card, but I do not understand the mechanism >>> well enough to say whether your implementation is correct, nor do I >>> know whether (while (read) .. is stashing the read result somewhere or >>> what. Nor do I know if your skip is assuming a rigid file structure or >>> is allowing free-form comments in the config file. >>> >>> > If I compare that to the your J implementation >>> > >>> >> deb L:0@:(({.~ ; [: < [: ;^:(1=#) ',' cut (}.~>:)) i.&1@:e.&' >>> =')&>@(#~ >>> >> a:&~: > ';#'e.~{.&>)@:(dlb&.>)@:(LF&cut) >>> > >>> > This J implementation feels more like code golf or a compressed >>> > string. How many tokens/operations are included in it? I won't count, >>> > but I am fairly sure it's more than the (pipe, in, while, echo, till, >>> > read, skip, set, or, line) 9 in the PicoLisp example. >>> >>> I count 64 tokens in the J implementation and 54 tokens in your >>> PicoLisp example. I'm not sure why you have implied that parentheses >>> are not tokens but I do not think they qualify as whitespace? >>> >>> We could get further into parsing and punctuation issues, but I'm not >>> sure whether that would be relevant. >>> >>> > When reading a long J string or an entry in the obfuscated C code >>> > contest, I try to recognize patterns or operations. Having used J for >>> > about 6 months, I can recognize probably about half the operations in >>> > that string without having to look them up. That's progress. It still >>> > feels like a "run on sentence" which is harder to read than short >>> > sentences. >>> >>> I also usually prefer shorter sentences. Not always, but I'd probably >>> try to split Dan's code into two or three lines. Posting a fair bit of >>> code to email has been an influence there. >>> >>> I imagine I would also favor a shorter implmentation than what Dan has >>> done here. For example, in his code I see the phrase e.&' =' but I see >>> no equals signs in the config file nor in the specification on the >>> companion task (whose J entry should perhaps be simplified?). >>> >>> > I think there's a fine balance between tacit expressions and clarity. >>> > It may be my level of inexperience with the language. However, I >>> > wonder if I've put as much time on it as any intro-level APL >>> > programmer. Are there any conventions in the language for # of tokens, >>> > trains, etc for a readable sentence? >>> >>> Personal taste? >>> >>> Thanks, >>> >>> -- >>> Raul >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm