Raul - in my basic benchmarking that sped it up about 5% (200ms) Here's my implementation - trying to be more explicit. It takes about 5891 ms to run 10000 iterations as compared to 4000 for the other J implementations (43% slower). I'm sure it might be able to be improved. There's more boxing and each than probably necessary.
I grab the lines in two passes and operate on them. The disabled lines is a requirement that both the prior J implementations seem to have missed. Rosettacode says it should also output "seedsremoved = false" starts=: 13 : '(<x) = {. each y' keyValue=: 4 : 0 DefaultValue=:x Spaces=:> (' ' (i.&1@:=) each y) Keys=:Spaces {. each y Values=: (Spaces+1) }. each y Values=:(3 : '> ((# y) > 0) } (DefaultValue;y)') each Values (Keys,.Values) ) readConf=: 3 : 0 All=:LF cut y NB. All lines except comments, disabled and blank Lines=:(-. (';' starts All) + ('#' starts All) + (CR starts All)) # All NB. only the disabled lines Disabled=: 2}. each (';' starts All) # All NB. chop off ;<space> ('T' keyValue Lines),('F' keyValue Disabled) ) go=: 3 : 0 readConf (fread 'c:\temp\test.conf') ) +--------------+-------------------------+ |FULLNAME |Foo Barber | +--------------+-------------------------+ |FAVOURITEFRUIT|banana | +--------------+-------------------------+ |NEEDSPEELING |T | +--------------+-------------------------+ |OTHERFAMILY |Rhu Barber, Harry Barber | +--------------+-------------------------+ |SEEDSREMOVED |F | +--------------+-------------------------+ On Tue, Jan 14, 2014 at 9:48 AM, Raul Miller <rauldmil...@gmail.com> wrote: > It might be interesting to try it on a large file. > > Here's another state machine implementation that might perform better: > > StateMachine=: 2 :0 > (m;(0 10#:10*".;._2]0 :0);<n)&;: > ) > > CleanChrs=: '#;';(' ',TAB);LF;a.-.'#; ',TAB,LF > NB. comment, space, line, other > > clean=: 1 StateMachine CleanChrs > 1.0 0.0 0.0 2.1 NB. 0: skip whitespace (start here) > 1.0 1.0 0.0 1.0 NB. 1: comment > 3.3 4.0 6.0 2.0 NB. 2: word > 3.0 3.0 6.1 3.0 NB. 3: comment after word > 3.3 5.3 6.0 2.0 NB. 4: first space after word > 3.0 5.0 6.1 2.1 NB. 5: extra space after word > 1.3 0.3 0.3 2.0 NB. 6: line end after word > ) > NB. .0 continue, .1 start, .3 end > > SplitChrs=: (' ',TAB);a.-.' ',TAB > NB. space, other > > split=: 0 StateMachine SplitChrs > 0.6 1.1 NB. start here > 2.3 1.0 NB. other (first word) > 0.6 3.1 NB. first space > 3.0 3.0 NB. rest > ) > NB. .6 error > > readConf=: split;._2@clean@fread > > I think the performance problem you observed is because the first > version started boxing too early. Here, I save boxing till the end, > and create fewer boxes, both of which should reduce overhead. > > Thanks, > > -- > Raul > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm