On: Thu, 25 Apr 2002 Gregory Lypny <[EMAIL PROTECTED]> wrote: > Thought I would pick your brains on the topic of comparing two big > lists. Both are tab delimited. bigList has about 100,000 lines and > 6 items (columns) per line. smallList is about 15,000 lines and 2 > items per line. I want to identify the lines in bigList in which > the third item is the same as the second item in a line in > smallList, and then pull out the intersection. I used something > like this, which works fine.
> set the itemDelimiter to tab > repeat for each line j of smallList > put lineOffset(item 2 of j, bigList) into thisLine > if thisLine is not 0 then put j & tab & \ > line thisLine of bigList & return after mergedList > end repeat > delete last character of mergedList -- Get rid of the trailing Return > Using the lineOffset function seemed the obvious choice to me, but I'm > also interested in other approaches. LineOffset on such a big variable is going to be pretty expensive. Another option would be to us split to build an array out of smallList and the loop over each line in big list and see if there is an array index for it. Split takes awhile and will use up a good bit of memory, but makes the lookups *much* faster. You could save some of that space by building up an array of just the relevant items in one list or the other by looping over the lines and creating one array index for each. Regards, Scott > Regards, > Greg ******************************************************** Scott Raney [EMAIL PROTECTED] http://www.metacard.com MetaCard: You know, there's an easier way to do that... _______________________________________________ metacard mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/metacard