It might be interesting to try it on a large file. Here's another state machine implementation that might perform better:
StateMachine=: 2 :0 (m;(0 10#:10*".;._2]0 :0);<n)&;: ) CleanChrs=: '#;';(' ',TAB);LF;a.-.'#; ',TAB,LF NB. comment, space, line, other clean=: 1 StateMachine CleanChrs 1.0 0.0 0.0 2.1 NB. 0: skip whitespace (start here) 1.0 1.0 0.0 1.0 NB. 1: comment 3.3 4.0 6.0 2.0 NB. 2: word 3.0 3.0 6.1 3.0 NB. 3: comment after word 3.3 5.3 6.0 2.0 NB. 4: first space after word 3.0 5.0 6.1 2.1 NB. 5: extra space after word 1.3 0.3 0.3 2.0 NB. 6: line end after word ) NB. .0 continue, .1 start, .3 end SplitChrs=: (' ',TAB);a.-.' ',TAB NB. space, other split=: 0 StateMachine SplitChrs 0.6 1.1 NB. start here 2.3 1.0 NB. other (first word) 0.6 3.1 NB. first space 3.0 3.0 NB. rest ) NB. .6 error readConf=: split;._2@clean@fread I think the performance problem you observed is because the first version started boxing too early. Here, I save boxing till the end, and create fewer boxes, both of which should reduce overhead. Thanks, -- Raul On Tue, Jan 14, 2014 at 7:02 AM, Joe Bogner <joebog...@gmail.com> wrote: > I was curious to do some basic benchmarking. Posting here if others > are interested. Not sure if this should be moved to another thread. If > someone replies please do what is appropriate (same thread or new one) > > https://gist.github.com/joebo/61e0841fbf511e1aab4d > > The state machine implementation runs 1000 iterations in around 4 seconds > > Using powershell to benchmark on windows: > > PS C:\Users\Joe\downloads\j801\bin> Measure-Command { .\jconsole.exe > c:\joe\parse.ijs -js "goTime''" } > TotalMilliseconds : 4224.26 > > Dan's implementation is slightly faster > > PS C:\Users\Joe\downloads\j801\bin> Measure-Command { .\jconsole.exe > c:\joe\parse.ijs -js "go2Time''" } > TotalMilliseconds : 4070.153 > > The rosettacode PicoLisp implementation was quite a bit slower > > PS C:\Users\Joe\Downloads\picolisp3.1.4_cygwin\bin> Measure-Command { > .\picolisp.exe parse.l } > TotalMilliseconds : 5069.1331 > > Rewriting it to eliminate the piping sped it up > > PS C:\Users\Joe\Downloads\picolisp3.1.4_cygwin\bin> Measure-Command { > .\picolisp.exe parse2.l } > TotalMilliseconds : 3522.3157 > > Here is the final picolisp implementation: > > (de rdConf (File) > (in File > (until (eof) > (setq Line (line)) > (when (and Line (<> (car Line) "#") (<> (car Line) ";")) > (let (Length (length Line) > Index (index " " Line) > Key (pack (if Index (pack (head (- Index 1) Line)) Line)) > Value (if Index (pack (tail (- Length Index) Line)) T) ) > (set (intern Key) Value) ) ) ) ) ) > > (do 10000 (rdConf "c:/temp/test.conf")) > > (printsp (list FULLNAME FAVOURITEFRUIT NEEDSPEELING SEEDSREMOVED OTHERFAMILY)) > (bye) > > ("Foo Barber" "banana" T NIL "Rhu Barber, Harry Barber") > > Speed probably isn't important in a typical implementation of this > task. I'm not posting the PicoLisp code or timings as a competition > but just as another reference point. > > > On Tue, Jan 14, 2014 at 12:25 AM, Raul Miller <rauldmil...@gmail.com> wrote: >> Ok, I think I understand. >> >> The basic issue, here, seems to be that PicoLisp is stream oriented >> and this is a stream oriented task. No one in the J community has >> cared enough to build a stream oriented library for J. J has enough to >> do stream oriented design for academic purposes, but ... Consider >> xml/sax as an example of how one might approach streams in J - call >> out to some standardized implementation and instead focus on what is >> unique to the application. >> >> Meanwhile, for a J programmer, words like [: & @ [ and ] occupy a role >> not too different from parenthesis for a Lisp programmer. Parenthesis >> might seem simple, but in fact there are a fair number of contextual >> rules that one must learn before really understanding their >> significance. Do the parenthesis delimit a lambda definition? an >> argument list? Do they denote a function call? Some other special >> form? That, I think, is the issue you were focusing on when counting >> tokens - how many special rules does a person have to understand to >> parse the code. J has 9 parsing rules, each roughly at the same >> complexity level as a lisp-like lambda. Explicit contexts add a few >> more, though that's mostly syntactic sugar. >> >> Meanwhile, J is and is not fast. It can be fast, but only if you >> design your code using big, homogeneous data structures to represent >> large data sets. >> >> I'm not sure if I am making sense, so I suppose I should write some >> code instead. >> >> Here's an implementation of a config file reader which should perform >> reasonably well: >> >> ChrCls=: '#;';(' ',TAB);LF;a.-.'#; ',TAB,LF >> NB. comment, space, line, other >> >> tokens=: (0;(0 10#:10*".;._2]0 :0);<ChrCls)&;: >> 1.0 0.0 0.0 2.1 NB. 0: skip whitespace (start here) >> 1.0 1.0 0.0 1.0 NB. 1: comment >> 3.3 4.3 5.2 2.0 NB. 2: word >> 3.0 3.0 5.1 3.0 NB. 3: comment after word >> 3.0 4.0 5.1 2.1 NB. 4: space after word >> 1.3 0.3 0.3 2.1 NB. 5: line end after word >> ) >> >> readConf=: ({.,<@(;:inv)@}.);._2@tokens@fread >> >> This uses a state machine to deal with all the low level character >> munging and then forms the result into a two column table where the >> first column is the name and the remainder is whatever followed that >> (with redundant whitespace removed). >> >> Is it readable? Not if you are not familiar with the language. In >> fact, this took me about half an hour to write. And I would not bother >> doing something like this, normally, unless performance really >> mattered (which implies large file size). But, if performance does >> matter, this approach should behave reasonably well and (at least for >> J) should have much better performance than an implementation which >> loops or otherwise uses separate primitives for separate states. >> >> That said, I should also note that the idea of using decimal fractions >> for state machine instructions was Ken Iverson's. I'll not go into why >> sequential machine was not implemented that way, because I feel guilty >> about it. >> >> Thanks, >> >> -- >> Raul >> >> >> On Mon, Jan 13, 2014 at 11:18 PM, Joe Bogner <joebog...@gmail.com> wrote: >>> Yes, it is complete. I didn't write it or test it as it was already posted >>> to rosettacode. I will explain how it works assuming there is interest >>> >>> It uses some uncommon tricks. It leverages the read function which is the >>> same function used in the repl to read input characters. So the goal is to >>> take the file and skip any comments and then pass it on to set the variable >>> with the key and value. >>> >>> (pipe (in File (while (echo "#" ";") (till "^J"))) >>> >>> Reads the file and echos until it encounters a comment character and then >>> reads til EOL >>> >>> while (read) >>>> (skip) >>>> (set @ (or (line T) T)) ) ) ) >>> >>> Then read those echoed characters. read gets the first symbol, the key. >>> Skip moves the input stream ahead by the space or nothing. Set assigns the >>> variable @ which is the result from the last operation (read - which is the >>> key) with the value from the rest of the line or T if it is blank (for >>> boolean example in the config) >>> >>> My brain has been trained to think of parens as whitespace. It didn't start >>> that way. I can see why you may consider them tokens. I was also counting >>> unique function/operation tokens, not characters. The idea being if I only >>> have 4 english words with 3 characters each on a line, that is easier for >>> my brain to parse than 5 operations using 2 ascii symbols that I don't >>> recognize the meaning. >>> >>> However as my J vocabulary improves it becomes less of an issue. I can >>> parse i. or e. As fast as a function called idx or el? >>> >>> Line length is still important I think. Also a functional style with >>> splitting up the train may help reusability, comprehension, and may help >>> identify small areas to refactor. Those small topics like "filter out >>> lines starting with a comment character" can get lost to me in a long line >>> of compound operations. Again, some balance and personal perference and >>> familiarity >>> >>> I became interested in picolisp for its speed, conciseness and >>> expressiveness. Many of the same attributes as J. It is almost always in >>> the shortest solutions on rosettacode too. Happy to help resolve your build >>> issue off the list if you are interested. >>> On Jan 13, 2014 10:28 PM, "Raul Miller" <rauldmil...@gmail.com> wrote: >>> >>>> On Mon, Jan 13, 2014 at 8:32 PM, Joe Bogner <joebog...@gmail.com> wrote: >>>> > PicoLisp >>>> > >>>> > (de rdConf (File) >>>> > (pipe (in File (while (echo "#" ";") (till "^J"))) >>>> > (while (read) >>>> > (skip) >>>> > (set @ (or (line T) T)) ) ) ) >>>> >>>> Is that complete? >>>> >>>> I learned lisp back in highschool, and I've used drracket and emacs >>>> and other such lisp environments, but never learned picolisp. I tried >>>> to install picolisp but it would not build for me and I do not feel >>>> like debugging the source for picolisp just for this message. >>>> >>>> My impression, though, is that a J implementation like what you have >>>> written would look something like this: >>>> >>>> conf=: a:-.~(#~ 1 -.&e. '#;'&e.S:0)<;._2 fread file >>>> >>>> In other words, read the file as lines, removing blank lines and comment >>>> lines. >>>> >>>> If all you are doing is saving the unparsed lines then we should >>>> expect simpler code. But maybe I have missed a subtlety of picolisp? >>>> >>>> I get that @ is a wild card, but I do not understand the mechanism >>>> well enough to say whether your implementation is correct, nor do I >>>> know whether (while (read) .. is stashing the read result somewhere or >>>> what. Nor do I know if your skip is assuming a rigid file structure or >>>> is allowing free-form comments in the config file. >>>> >>>> > If I compare that to the your J implementation >>>> > >>>> >> deb L:0@:(({.~ ; [: < [: ;^:(1=#) ',' cut (}.~>:)) i.&1@:e.&' >>>> =')&>@(#~ >>>> >> a:&~: > ';#'e.~{.&>)@:(dlb&.>)@:(LF&cut) >>>> > >>>> > This J implementation feels more like code golf or a compressed >>>> > string. How many tokens/operations are included in it? I won't count, >>>> > but I am fairly sure it's more than the (pipe, in, while, echo, till, >>>> > read, skip, set, or, line) 9 in the PicoLisp example. >>>> >>>> I count 64 tokens in the J implementation and 54 tokens in your >>>> PicoLisp example. I'm not sure why you have implied that parentheses >>>> are not tokens but I do not think they qualify as whitespace? >>>> >>>> We could get further into parsing and punctuation issues, but I'm not >>>> sure whether that would be relevant. >>>> >>>> > When reading a long J string or an entry in the obfuscated C code >>>> > contest, I try to recognize patterns or operations. Having used J for >>>> > about 6 months, I can recognize probably about half the operations in >>>> > that string without having to look them up. That's progress. It still >>>> > feels like a "run on sentence" which is harder to read than short >>>> > sentences. >>>> >>>> I also usually prefer shorter sentences. Not always, but I'd probably >>>> try to split Dan's code into two or three lines. Posting a fair bit of >>>> code to email has been an influence there. >>>> >>>> I imagine I would also favor a shorter implmentation than what Dan has >>>> done here. For example, in his code I see the phrase e.&' =' but I see >>>> no equals signs in the config file nor in the specification on the >>>> companion task (whose J entry should perhaps be simplified?). >>>> >>>> > I think there's a fine balance between tacit expressions and clarity. >>>> > It may be my level of inexperience with the language. However, I >>>> > wonder if I've put as much time on it as any intro-level APL >>>> > programmer. Are there any conventions in the language for # of tokens, >>>> > trains, etc for a readable sentence? >>>> >>>> Personal taste? >>>> >>>> Thanks, >>>> >>>> -- >>>> Raul >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm