It might be interesting to try it on a large file.

Here's another state machine implementation that might perform better:

StateMachine=: 2 :0
  (m;(0 10#:10*".;._2]0 :0);<n)&;:
)

CleanChrs=: '#;';(' ',TAB);LF;a.-.'#; ',TAB,LF
NB. comment, space, line, other

clean=: 1 StateMachine CleanChrs
  1.0  0.0  0.0  2.1  NB. 0: skip whitespace (start here)
  1.0  1.0  0.0  1.0  NB. 1: comment
  3.3  4.0  6.0  2.0  NB. 2: word
  3.0  3.0  6.1  3.0  NB. 3: comment after word
  3.3  5.3  6.0  2.0  NB. 4: first space after word
  3.0  5.0  6.1  2.1  NB. 5: extra space after word
  1.3  0.3  0.3  2.0  NB. 6: line end after word
)
NB. .0 continue, .1 start, .3 end

SplitChrs=: (' ',TAB);a.-.' ',TAB
NB. space, other

split=: 0 StateMachine SplitChrs
  0.6 1.1 NB. start here
  2.3 1.0 NB. other (first word)
  0.6 3.1 NB. first space
  3.0 3.0 NB. rest
)
NB. .6 error

readConf=: split;._2@clean@fread

I think the performance problem you observed is because the first
version started boxing too early. Here, I save boxing till the end,
and create fewer boxes, both of which should reduce overhead.

Thanks,

-- 
Raul


On Tue, Jan 14, 2014 at 7:02 AM, Joe Bogner <joebog...@gmail.com> wrote:
> I was curious to do some basic benchmarking. Posting here if others
> are interested. Not sure if this should be moved to another thread. If
> someone replies please do what is appropriate (same thread or new one)
>
> https://gist.github.com/joebo/61e0841fbf511e1aab4d
>
> The state machine implementation runs 1000 iterations in around 4 seconds
>
> Using powershell to benchmark on windows:
>
> PS C:\Users\Joe\downloads\j801\bin> Measure-Command { .\jconsole.exe
> c:\joe\parse.ijs -js "goTime''" }
> TotalMilliseconds : 4224.26
>
> Dan's implementation is slightly faster
>
> PS C:\Users\Joe\downloads\j801\bin> Measure-Command { .\jconsole.exe
> c:\joe\parse.ijs -js "go2Time''" }
> TotalMilliseconds : 4070.153
>
> The rosettacode PicoLisp implementation was quite a bit slower
>
> PS C:\Users\Joe\Downloads\picolisp3.1.4_cygwin\bin> Measure-Command {
> .\picolisp.exe parse.l }
> TotalMilliseconds : 5069.1331
>
> Rewriting it to eliminate the piping sped it up
>
> PS C:\Users\Joe\Downloads\picolisp3.1.4_cygwin\bin> Measure-Command {
> .\picolisp.exe parse2.l }
> TotalMilliseconds : 3522.3157
>
> Here is the final picolisp implementation:
>
> (de rdConf (File)
>     (in File
>         (until (eof)
>           (setq Line (line))
>           (when (and Line (<> (car Line) "#") (<> (car Line) ";"))
>             (let (Length (length Line)
>                   Index (index " " Line)
>                   Key (pack (if Index (pack (head (- Index 1) Line)) Line))
>                   Value (if Index (pack (tail (- Length Index)  Line)) T) )
>                (set (intern Key) Value) ) ) ) ) )
>
> (do 10000 (rdConf "c:/temp/test.conf"))
>
> (printsp (list FULLNAME FAVOURITEFRUIT NEEDSPEELING SEEDSREMOVED OTHERFAMILY))
> (bye)
>
> ("Foo Barber" "banana" T NIL "Rhu Barber, Harry Barber")
>
> Speed probably isn't important in a typical implementation of this
> task. I'm not posting the PicoLisp code or timings as a competition
> but just as another reference point.
>
>
> On Tue, Jan 14, 2014 at 12:25 AM, Raul Miller <rauldmil...@gmail.com> wrote:
>> Ok, I think I understand.
>>
>> The basic issue, here, seems to be that PicoLisp is stream oriented
>> and this is a stream oriented task. No one in the J community has
>> cared enough to build a stream oriented library for J. J has enough to
>> do stream oriented design for academic purposes, but ... Consider
>> xml/sax as an example of how one might approach streams in J - call
>> out to some standardized implementation and instead focus on what is
>> unique to the application.
>>
>> Meanwhile, for a J programmer, words like [: & @ [ and ] occupy a role
>> not too different from parenthesis for a Lisp programmer. Parenthesis
>> might seem simple, but in fact there are a fair number of contextual
>> rules that one must learn before really understanding their
>> significance. Do the parenthesis delimit a lambda definition? an
>> argument list? Do they denote a function call? Some other special
>> form? That, I think, is the issue you were focusing on when counting
>> tokens - how many special rules does a person have to understand to
>> parse the code. J has 9 parsing rules, each roughly at the same
>> complexity level as a lisp-like lambda. Explicit contexts add a few
>> more, though that's mostly syntactic sugar.
>>
>> Meanwhile, J is and is not fast. It can be fast, but only if you
>> design your code using big, homogeneous data structures to represent
>> large data sets.
>>
>> I'm not sure if I am making sense, so I suppose I should write some
>> code instead.
>>
>> Here's an implementation of a config file reader which should perform
>> reasonably well:
>>
>> ChrCls=: '#;';(' ',TAB);LF;a.-.'#; ',TAB,LF
>> NB. comment, space, line, other
>>
>> tokens=: (0;(0 10#:10*".;._2]0 :0);<ChrCls)&;:
>>   1.0  0.0  0.0  2.1  NB. 0: skip whitespace (start here)
>>   1.0  1.0  0.0  1.0  NB. 1: comment
>>   3.3  4.3  5.2  2.0  NB. 2: word
>>   3.0  3.0  5.1  3.0  NB. 3: comment after word
>>   3.0  4.0  5.1  2.1  NB. 4: space after word
>>   1.3  0.3  0.3  2.1  NB. 5: line end after word
>> )
>>
>> readConf=: ({.,<@(;:inv)@}.);._2@tokens@fread
>>
>> This uses a state machine to deal with all the low level character
>> munging and then forms the result into a two column table where the
>> first column is the name and the remainder is whatever followed that
>> (with redundant whitespace removed).
>>
>> Is it readable? Not if you are not familiar with the language. In
>> fact, this took me about half an hour to write. And I would not bother
>> doing something like this, normally, unless performance really
>> mattered (which implies large file size). But, if performance does
>> matter, this approach should behave reasonably well and (at least for
>> J) should have much better performance than an implementation which
>> loops or otherwise uses separate primitives for separate states.
>>
>> That said, I should also note that the idea of using decimal fractions
>> for state machine instructions was Ken Iverson's. I'll not go into why
>> sequential machine was not implemented that way, because I feel guilty
>> about it.
>>
>> Thanks,
>>
>> --
>> Raul
>>
>>
>> On Mon, Jan 13, 2014 at 11:18 PM, Joe Bogner <joebog...@gmail.com> wrote:
>>> Yes, it is complete. I didn't write it or test it as it was already posted
>>> to rosettacode. I will explain how it works assuming there is interest
>>>
>>> It uses some uncommon tricks. It leverages the read function which is the
>>> same function used in the repl to read input characters.  So the goal is to
>>> take the file and skip any comments and then pass it on to set the variable
>>> with the key and value.
>>>
>>> (pipe (in File (while (echo "#" ";") (till "^J")))
>>>
>>> Reads the file and echos until it encounters a comment character and then
>>> reads til EOL
>>>
>>> while (read)
>>>>          (skip)
>>>>          (set @ (or (line T) T)) ) ) )
>>>
>>> Then read those echoed characters. read gets the first symbol, the key.
>>> Skip moves the input stream ahead by the space or nothing. Set assigns the
>>> variable @ which is the result from the last operation (read - which is the
>>> key) with the value from the rest of the line or T if it is blank (for
>>> boolean example in the config)
>>>
>>> My brain has been trained to think of parens as whitespace. It didn't start
>>> that way. I can see why you may consider them tokens. I was also counting
>>> unique function/operation tokens, not characters. The idea being if I only
>>> have 4 english words with 3 characters each on a line, that is easier for
>>> my brain to parse than 5 operations using 2 ascii symbols that I don't
>>> recognize the meaning.
>>>
>>> However as my J vocabulary improves it becomes less of an issue. I can
>>> parse i.  or e. As fast as a function called idx or el?
>>>
>>> Line length is still important I think. Also a functional style with
>>> splitting up the train may help reusability,  comprehension, and may help
>>> identify small areas to refactor.  Those small topics like "filter out
>>> lines starting with a comment character" can get lost to me in a long line
>>> of compound operations. Again, some balance and personal perference and
>>> familiarity
>>>
>>> I became interested in picolisp for its speed, conciseness and
>>> expressiveness.  Many of the same attributes as J. It is almost always in
>>> the shortest solutions on rosettacode too. Happy to help resolve your build
>>> issue off the list if you are interested.
>>>  On Jan 13, 2014 10:28 PM, "Raul Miller" <rauldmil...@gmail.com> wrote:
>>>
>>>> On Mon, Jan 13, 2014 at 8:32 PM, Joe Bogner <joebog...@gmail.com> wrote:
>>>> > PicoLisp
>>>> >
>>>> > (de rdConf (File)
>>>> >    (pipe (in File (while (echo "#" ";") (till "^J")))
>>>> >       (while (read)
>>>> >          (skip)
>>>> >          (set @ (or (line T) T)) ) ) )
>>>>
>>>> Is that complete?
>>>>
>>>> I learned lisp back in highschool, and I've used drracket and emacs
>>>> and other such lisp environments, but never learned picolisp. I tried
>>>> to install picolisp but it would not build for me and I do not feel
>>>> like debugging the source for picolisp just for this message.
>>>>
>>>> My impression, though, is that a J implementation like what you have
>>>> written would look something like this:
>>>>
>>>> conf=: a:-.~(#~ 1 -.&e. '#;'&e.S:0)<;._2 fread file
>>>>
>>>> In other words, read the file as lines, removing blank lines and comment
>>>> lines.
>>>>
>>>> If all you are doing is saving the unparsed lines then we should
>>>> expect simpler code. But maybe I have missed a subtlety of picolisp?
>>>>
>>>> I get that @ is a wild card, but I do not understand the mechanism
>>>> well enough to say whether your implementation is correct, nor do I
>>>> know whether (while (read) .. is stashing the read result somewhere or
>>>> what. Nor do I know if your skip is assuming a rigid file structure or
>>>> is allowing free-form comments in the config file.
>>>>
>>>> > If I compare that to the your J implementation
>>>> >
>>>> >>     deb L:0@:(({.~ ; [: < [: ;^:(1=#) ',' cut (}.~>:)) i.&1@:e.&'
>>>> =')&>@(#~
>>>> >> a:&~: > ';#'e.~{.&>)@:(dlb&.>)@:(LF&cut)
>>>> >
>>>> > This J implementation feels more like code golf or a compressed
>>>> > string. How many tokens/operations are included in it? I won't count,
>>>> > but I am fairly sure it's more than the (pipe, in, while, echo, till,
>>>> > read, skip, set, or, line) 9 in the PicoLisp example.
>>>>
>>>> I count 64 tokens in the J implementation and 54 tokens in your
>>>> PicoLisp example. I'm not sure why you have implied that parentheses
>>>> are not tokens but I do not think they qualify as whitespace?
>>>>
>>>> We could get further into parsing and punctuation issues, but I'm not
>>>> sure whether that would be relevant.
>>>>
>>>> > When reading a long J string or an entry in the obfuscated C code
>>>> > contest, I try to recognize patterns or operations. Having used J for
>>>> > about 6 months, I can recognize probably about half the operations in
>>>> > that string without having to look them up. That's progress. It still
>>>> > feels like a "run on sentence" which is harder to read than short
>>>> > sentences.
>>>>
>>>> I also usually prefer shorter sentences. Not always, but I'd probably
>>>> try to split Dan's code into two or three lines. Posting a fair bit of
>>>> code to email has been an influence there.
>>>>
>>>> I imagine I would also favor a shorter implmentation than what Dan has
>>>> done here. For example, in his code I see the phrase e.&' =' but I see
>>>> no equals signs in the config file nor in the specification on the
>>>> companion task (whose J entry should perhaps be simplified?).
>>>>
>>>> > I think there's a fine balance between tacit expressions and clarity.
>>>> > It may be my level of inexperience with the language. However, I
>>>> > wonder if I've put as much time on it as any intro-level APL
>>>> > programmer. Are there any conventions in the language for # of tokens,
>>>> > trains, etc for a readable sentence?
>>>>
>>>> Personal taste?
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Raul
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to