on 4/9/03 5:17 pm, MisterX wrote

> Has anyone got a "Fast" remove duplicate lines script?
> Best I get is 12ms per line... Any line being a word.

If split worked the way I think it should (and still could, quite compatibly
- perhaps I'll make in bugzilla a suggestion I made long ago) then doing
split/combine would probably do this almost instantaneously.

Even without that, I've found Rev/MC's hashed arrays fantastically
efficient.  Have you tried simply:

    put empty into aTemp
    repeat for each line t in tManyLines
       put true into aTemp[t]
    end repeat
    put the keys of aTemp into tFewerLines

Of course that will lose the order, but I'd expect it to be very fast.  If
you want to keep sequence (first appearance) then

    put empty into tFewerLines
    put empty into aTemp
    repeat for each line t in tManyLines
       if aTemp[t] = empty then
           put t & return after tFewerLines
           put true into aTemp[t]
       end if
    end repeat
 
should work, albeit a bit more slowly.
 
  Ben Rubinstein               |  Email: [EMAIL PROTECTED]
  Cognitive Applications Ltd   |  Phone: +44 (0)1273-821600
  http://www.cogapp.com        |  Fax  : +44 (0)1273-728866

_______________________________________________
metacard mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/metacard

Reply via email to