Tokenizer Optimizations Anyone?

Watson, Christopher Mon, 04 Mar 2002 12:21:12 -0800

Hello friends.

Below you will find the properties and "new" handler for my string
tokenizer. This object takes a string, and a collection of delimiters, and
splits the string into an internally maintained linear list of tokens, based
on the specified delimiting characters.


I'm looking for ANY ways to optimize this "new" handler, without having to
resort to TextCruncher or any other Xtra.

I'm sure the handler is understandable, so I don't need to go into great
detail to explain it. Basically, any number of delimiting characters can be
sent in, so the search needs to take all delimiters into account. Obviously,
it needs to find the nearest occurance of any one of the supplied
delimiters, and then needs to add to the linear list the chunk of text
between the current search point and the found delimiter. I'm currently
hacking the front off of the supplied text with each find, so that I can use
the offset function to locate the next nearest occurance of a delimiter.

I just want to know if there's a better way to do this.

Thanks!

---------------------------------------------------------------------------
-- [DOM-Lingo] Tokenizer
---------------------------------------------------------------------------
-- Copyright Š2002 Christopher Watson
-- All Rights Reserved Worldwide.
---------------------------------------------------------------------------

---------------------------------------------------------------------------
-- PROPERTIES
---------------------------------------------------------------------------
property mlTokens     -- [DOM-Lingo] Internal private token list
property miCurrToken  -- [DOM-Lingo] Token offset

---------------------------------------------------------------------------
-- CONSTANTS
---------------------------------------------------------------------------
property LINEFEED     -- [DOM-Lingo] ASCII 0x0A (decimal 10)

---------------------------------------------------------------------------
-- METHODS
---------------------------------------------------------------------------

---------------------------------------------------------------------------
-- [DOM-Lingo] new
---------------------------------------------------------------------------
on new me, psData, psDelims
  mlTokens = []
  miCurrToken = 0
  LINEFEED = numToChar(10)
  if psData.length = 0 then return me
  repeat while TRUE
    liNearestOffset = psData.length
    lbDelimFound = FALSE
    repeat with i = 1 to psDelims.length
      liOffset = offset(psDelims.char[i], psData)
      lbDelimFound = (lbDelimFound or (liOffset > 0))
      if (liOffset > 0) and (liOffset < liNearestOffset) then
liNearestOffset = liOffset
      if lbDelimFound and liNearestOffset = 1 then exit repeat
    end repeat
    if lbDelimFound then
      mlTokens.add(psData.char[1..liNearestOffset - 1])
      delete char 1 to (liNearestOffset - 1) of psData
    else
      if psData.length > 0 then mlTokens.add(psData)
      exit repeat
    end if
  end repeat
  return me
end

Christopher Watson
Sr. Software Engingeer
Director/Shockwave Development
Lightspan, Inc.
Tel: 858.824.8457
Fax: 858.824.8008

[To remove yourself from this list, or to change to digest mode, go to 
http://www.penworks.com/lingo-l.cgi  To post messages to the list, email 
[EMAIL PROTECTED]  (Problems, email [EMAIL PROTECTED]). Lingo-L is for 
learning and helping with programming Lingo.  Thanks!]

Tokenizer Optimizations Anyone?

Reply via email to