Re: Filtering Columnar Data
Hello Josh, Here's the final version of the function. Thanks to everyone for their suggestions. I know there are other features and better error- checking that can be added. Maybe later. It can extract roughly 75 records from 700 in 1 tick on an Intel-based iMac. Regards, Gregory function filteredByColumn theData,theColNum,theSearchString,isWithout -- This function filters tab-delimited data by column -- By default the filter extracts lines containing the search string. Make isWithout = without to extract lines without it. -- Possible mods: options to make begins with or contains searches if theColNum is an integer then set the itemDelimiter to tab if isWithout is empty then sort lines of theData by (item theColNum of each is not theSearchString) -- Brings lines with theSearchString to the top repeat for each line thisLine in theData if item theColNum of thisLine is theSearchString then put thisLine return after filteredData else exit repeat end if end repeat else sort lines of theData by (item theColNum of each is theSearchString) -- Brings lines without theSearchString to the top repeat for each line thisLine in theData if item theColNum of thisLine is not theSearchString then put thisLine return after filteredData else exit repeat end if end repeat end if delete the last character of filteredData return filteredData else return The column number must be an integer. end if end filteredByColumn On Mon, Jul 16, 2007, at 7:49 PM, Josh Mellicker responded: On Jul 16, 2007, at 2:04 PM, Gregory Lypny wrote: I tested the function below, which is much like Josh's, and it filters more than 300 lines in 1 tick; the same code as a message handler rather than a function takes 5 ticks; and the filter command on the same data takes 39 ticks. Wow, I thought that handler was fast, but never tested/compared it. I wonder if sorting the lines by columnNum, then exiting on the first non-matching line would be faster or slower? ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Nice. Once again, I suggest not hardwiring theColNum because it could change... but you could always make a separate handler that figured out the column number from a header row, lookup table or other criteria, then call this handler. On Jul 17, 2007, at 1:26 PM, Gregory Lypny wrote: Hello Josh, Here's the final version of the function. Thanks to everyone for their suggestions. I know there are other features and better error-checking that can be added. Maybe later. It can extract roughly 75 records from 700 in 1 tick on an Intel-based iMac. Regards, Gregory function filteredByColumn theData,theColNum,theSearchString,isWithout -- This function filters tab-delimited data by column -- By default the filter extracts lines containing the search string. Make isWithout = without to extract lines without it. -- Possible mods: options to make begins with or contains searches if theColNum is an integer then set the itemDelimiter to tab if isWithout is empty then sort lines of theData by (item theColNum of each is not theSearchString) -- Brings lines with theSearchString to the top repeat for each line thisLine in theData if item theColNum of thisLine is theSearchString then put thisLine return after filteredData else exit repeat end if end repeat else sort lines of theData by (item theColNum of each is theSearchString) -- Brings lines without theSearchString to the top repeat for each line thisLine in theData if item theColNum of thisLine is not theSearchString then put thisLine return after filteredData else exit repeat end if end repeat end if delete the last character of filteredData return filteredData else return The column number must be an integer. end if end filteredByColumn On Mon, Jul 16, 2007, at 7:49 PM, Josh Mellicker responded: On Jul 16, 2007, at 2:04 PM, Gregory Lypny wrote: I tested the function below, which is much like Josh's, and it filters more than 300 lines in 1 tick; the same code as a message handler rather than a function takes 5 ticks; and the filter command on the same data takes 39 ticks. Wow, I thought that handler was fast, but never tested/compared it. I wonder if sorting the lines by columnNum, then exiting on the first non-matching line would be faster or slower? ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Hi Gregory, You need to include all trailing tabs to make 56 columns (that would be 55 tabs). That should work, but I experienced crashes when doing this. You need to stress-test this feature before distributing your software, if you plan to distribute it. In this case, a sqLite or mySql database might be more reliable and easier in the long term, depending on the size of your data. As you suggested yourself, an alternative might be to filter the data to include all lines containing the number you are searching for and do a repeat loop on the remaining data. Best regards, Mark Schonewille -- Economy-x-Talk Consulting and Software Engineering http://economy-x-talk.com http;//www.salery.com Quickly extract data from your HyperCard stacks with DIFfersifier. http://differsifier.economy-x-talk.com Op 16-jul-2007, om 17:58 heeft Gregory Lypny het volgende geschreven: Hello Everyone, I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. put fld Data of this card into tmpData filter tmpData with * tab * tab * tab myNumber tab * The trouble is, the first three wild card expressions are interpreted by Revolution to include tab characters, which means that I'm not necessarily filtering the fourth column! If a 9 appears by itself in column 58, then the filter command will pick it up. Sigh. I also tried the following, but I don't think I have the regular expression syntax right. filter it with [A-Z0-9] tab [0-9] tab [A-Z] tab myNumber tab * I know that my first column always contains five words, two of which are numbers. I wasn't sure how to represent the four spaces between the words. The second column is always a seven-digit number. The third contains first and last names, which will contain at least two words. This turns up empty. Any advice would be most appreciated. If I can't get the RegEx working for me, I know I can resort to a repeat loop and pull out all lines with myNumber in the fourth item. Regards, Gregory ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Gregory, Try this and see if it works. filter tmpData with * tab myNumber tab * There shouldn't be any need to include the preceding or trailing tabs, except for the ones directly surrounding the number. Chris On Jul 16, 2007, at 9:58 AM, Gregory Lypny wrote: Hello Everyone, I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. put fld Data of this card into tmpData filter tmpData with * tab * tab * tab myNumber tab * The trouble is, the first three wild card expressions are interpreted by Revolution to include tab characters, which means that I'm not necessarily filtering the fourth column! If a 9 appears by itself in column 58, then the filter command will pick it up. Sigh. I also tried the following, but I don't think I have the regular expression syntax right. filter it with [A-Z0-9] tab [0-9] tab [A-Z] tab myNumber tab * I know that my first column always contains five words, two of which are numbers. I wasn't sure how to represent the four spaces between the words. The second column is always a seven-digit number. The third contains first and last names, which will contain at least two words. This turns up empty. Any advice would be most appreciated. If I can't get the RegEx working for me, I know I can resort to a repeat loop and pull out all lines with myNumber in the fourth item. Regards, Gregory ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution -- Chris Sheffield Read Naturally The Fluency Company http://www.readnaturally.com -- ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Chris Sheffield wrote: Try this and see if it works. filter tmpData with * tab myNumber tab * There shouldn't be any need to include the preceding or trailing tabs, except for the ones directly surrounding the number. If he only wants hits in column 4, how will the above discern between that and a hit in any other numeric column? -- Richard Gaskin Fourth World Media Corporation ___ [EMAIL PROTECTED] http://www.FourthWorld.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
filter tmpData with * tab myNumber tab * will work, except it will return lines that have a 9 in ANY column... The best way I have found to do this is a generic handler that will retrieve lines from any 2-dimensional field of data based on the header value and search data. That is, if you want to know who likes pizza, ask put FindSomeLines(the long id of fld favorites, favFood, pizza) into tPizzaLovers here's the field: name favColor favFood Greg blue pizza Ken green burgers Rich chrome pizza and the handler: function FindSomeLines pFldName, pHeader, pSearchData put fld pFldName into tData put offset(pHeader, line 1 of tData) into tColNum set the itemdel to tab -- (if your field is tab-delimited) repeat for each line tLine in tData if item tColNum of tLine = pSearchData put tLine cr after tFoundLines end if end repeat delete the last char of tFoundLines return tFoundLines end FindSomeLines This is not tested, but should be close I hope! On Jul 16, 2007, at 9:18 AM, Chris Sheffield wrote: Gregory, Try this and see if it works. filter tmpData with * tab myNumber tab * There shouldn't be any need to include the preceding or trailing tabs, except for the ones directly surrounding the number. Chris On Jul 16, 2007, at 9:58 AM, Gregory Lypny wrote: Hello Everyone, I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. put fld Data of this card into tmpData filter tmpData with * tab * tab * tab myNumber tab * The trouble is, the first three wild card expressions are interpreted by Revolution to include tab characters, which means that I'm not necessarily filtering the fourth column! If a 9 appears by itself in column 58, then the filter command will pick it up. Sigh. I also tried the following, but I don't think I have the regular expression syntax right. filter it with [A-Z0-9] tab [0-9] tab [A-Z] tab myNumber tab * I know that my first column always contains five words, two of which are numbers. I wasn't sure how to represent the four spaces between the words. The second column is always a seven-digit number. The third contains first and last names, which will contain at least two words. This turns up empty. Any advice would be most appreciated. If I can't get the RegEx working for me, I know I can resort to a repeat loop and pull out all lines with myNumber in the fourth item. Regards, Gregory ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution -- Chris Sheffield Read Naturally The Fluency Company http://www.readnaturally.com -- ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
woops, already found an error, this line: put offset(pHeader, line 1 of tData) into tColNum should be put itemOffset(pHeader, line 1 of tData) into tColNum (and it doesn't really need the line 1 but I like it there just for clarity) On Jul 16, 2007, at 9:43 AM, Josh Mellicker wrote: filter tmpData with * tab myNumber tab * will work, except it will return lines that have a 9 in ANY column... The best way I have found to do this is a generic handler that will retrieve lines from any 2-dimensional field of data based on the header value and search data. That is, if you want to know who likes pizza, ask put FindSomeLines(the long id of fld favorites, favFood, pizza) into tPizzaLovers here's the field: name favColor favFood Greg blue pizza Ken green burgers Rich chrome pizza and the handler: function FindSomeLines pFldName, pHeader, pSearchData put fld pFldName into tData put offset(pHeader, line 1 of tData) into tColNum set the itemdel to tab -- (if your field is tab-delimited) repeat for each line tLine in tData if item tColNum of tLine = pSearchData put tLine cr after tFoundLines end if end repeat delete the last char of tFoundLines return tFoundLines end FindSomeLines This is not tested, but should be close I hope! On Jul 16, 2007, at 9:18 AM, Chris Sheffield wrote: Gregory, Try this and see if it works. filter tmpData with * tab myNumber tab * There shouldn't be any need to include the preceding or trailing tabs, except for the ones directly surrounding the number. Chris On Jul 16, 2007, at 9:58 AM, Gregory Lypny wrote: Hello Everyone, I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. put fld Data of this card into tmpData filter tmpData with * tab * tab * tab myNumber tab * The trouble is, the first three wild card expressions are interpreted by Revolution to include tab characters, which means that I'm not necessarily filtering the fourth column! If a 9 appears by itself in column 58, then the filter command will pick it up. Sigh. I also tried the following, but I don't think I have the regular expression syntax right. filter it with [A-Z0-9] tab [0-9] tab [A-Z] tab myNumber tab * I know that my first column always contains five words, two of which are numbers. I wasn't sure how to represent the four spaces between the words. The second column is always a seven- digit number. The third contains first and last names, which will contain at least two words. This turns up empty. Any advice would be most appreciated. If I can't get the RegEx working for me, I know I can resort to a repeat loop and pull out all lines with myNumber in the fourth item. Regards, Gregory ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution -- Chris Sheffield Read Naturally The Fluency Company http://www.readnaturally.com -- ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
This should work quickly enough for you: filter tmpData with *tabmyNumbertab* -- = good hits + false hits only set the itemdel to tab repeat for each line LNN in tmpData if item 4 of LNN is myNumber then put LNN cr after newTempData end repeat -- = newTempData should be the hit list another approach that might help you in some way sort lines of tmpData numeric by item 4 of each Jim Ault Las Vegas On 7/16/07 8:58 AM, Gregory Lypny [EMAIL PROTECTED] wrote: Hello Everyone, I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. put fld Data of this card into tmpData filter tmpData with * tab * tab * tab myNumber tab * The trouble is, the first three wild card expressions are interpreted by Revolution to include tab characters, which means that I'm not necessarily filtering the fourth column! If a 9 appears by itself in column 58, then the filter command will pick it up. Sigh. I also tried the following, but I don't think I have the regular expression syntax right. filter it with [A-Z0-9] tab [0-9] tab [A-Z] tab myNumber tab * I know that my first column always contains five words, two of which are numbers. I wasn't sure how to represent the four spaces between the words. The second column is always a seven-digit number. The third contains first and last names, which will contain at least two words. This turns up empty. Any advice would be most appreciated. If I can't get the RegEx working for me, I know I can resort to a repeat loop and pull out all lines with myNumber in the fourth item. Regards, Gregory ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Here's another solution... function quickSortAndFilter theData,theItemNr,theFilter set the itemDel to tab sort theData by (item theItemNr of theData is not theFilter) repeat for each line myLine in theData if item theItemNr of myLine is theFilter then put myLine cr after myNewData end if end repeat return myNewData end quickSortAndFilter Best regards, Mark Schonewille -- Economy-x-Talk Consulting and Software Engineering http://economy-x-talk.com http;//www.salery.com Quickly extract data from your HyperCard stacks with DIFfersifier. http://differsifier.economy-x-talk.com Op 16-jul-2007, om 17:58 heeft Gregory Lypny het volgende geschreven: Hello Everyone, I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. snip ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Sorry, I forgot an essential line... function quickSortAndFilter theData,theItemNr,theFilter set the itemDel to tab sort theData by (item theItemNr of theData is not theFilter) repeat for each line myLine in theData if item theItemNr of myLine is theFilter then put myLine cr after myNewData else exit repeat end if end repeat return myNewData end quickSortAndFilter Best regards, Mark Schonewille -- Economy-x-Talk Consulting and Software Engineering http://economy-x-talk.com http;//www.salery.com Quickly extract data from your HyperCard stacks with DIFfersifier. http://differsifier.economy-x-talk.com Op 16-jul-2007, om 17:58 heeft Gregory Lypny het volgende geschreven: Hello Everyone, I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. snip ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
cough/still doesn't work/cough sorry Best regards, Mark Schonewille -- Economy-x-Talk Consulting and Software Engineering http://economy-x-talk.com http;//www.salery.com Quickly extract data from your HyperCard stacks with DIFfersifier. http://differsifier.economy-x-talk.com Op 16-jul-2007, om 22:02 heeft Mark Schonewille het volgende geschreven: Sorry, I forgot an essential line... function quickSortAndFilter theData,theItemNr,theFilter set the itemDel to tab sort theData by (item theItemNr of theData is not theFilter) repeat for each line myLine in theData if item theItemNr of myLine is theFilter then put myLine cr after myNewData else exit repeat end if end repeat return myNewData end quickSortAndFilter Best regards, Mark Schonewille ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Here's another solution. Really. function quickSortAndFilter theData,theItemNr,theFilter set the itemDel to tab sort lines of theData by (item theItemNr of each is not theFilter) repeat for each line myLine in theData if item theItemNr of myLine is theFilter then put myLine cr after myNewData else exit repeat end if end repeat return myNewData end quickSortAndFilter Let me add some explanation: - theData is tab-delimited data - theItemNr is the number of the column you want to search - theFilter is the string you want to search for Best regards, Mark Schonewille -- Economy-x-Talk Consulting and Software Engineering http://economy-x-talk.com http;//www.salery.com Quickly extract data from your HyperCard stacks with DIFfersifier. http://differsifier.economy-x-talk.com Op 16-jul-2007, om 17:58 heeft Gregory Lypny het volgende geschreven: Hello Everyone, I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. snip ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Yet another solution using SORT and complex sort criteria ---intro 1) the sort command says sort using the result of the function 2) the function returns a true/false 3) the result of the sort is that the desired value in col 4 sorts to the top 4) the repeat loop scans each line to find when the desired value is no longer in item 4 5) thus complex/compound sort rules are possible by placing them in the sortKeyTF() function The magic is supplied by Rev since it will call the function for each line of the container. This function call provides the sort key to be used, which is, in this case, is true false. Any string or number can be returned as the sort key. --substantially based on Mark Schonewille's code on toVariable --we will build the lines of data to sort/filter put 100,101,102,99,104,105,106,107,108,109,110 into tStr set cursor to watch repeat with cnt = 1 to 1000 put cnt mod 10 into item 4 of tStr put tStr cr after tOutput end repeat replace comma with tab in tOutput --now tab delim -- THIS is where we do the sort and extract put 8 into myNumber put the millisecs into tStart put quickSortAndFilter(tOutput, myNumber) into onlyMyNumber put the millisecs - tStart into elapsed put elapsed millisecs into msg end toVariable function quickSortAndFilter theData,myNumber set the itemDel to tab sort theData descending by (sortKeyTF(myNumber, item 4 of each)) -- the sort is done, now keep only the myNumber lines repeat for each line myLine in theData if item 4 of myLine is myNumber then put myLine cr after myNewData else exit repeat end if end repeat return myNewData end quickSortAndFilter function sortKeyTF myNumber, thisItem4 return (1*myNumber) = (1*thisItem4) end sortKeyTF 41 millisecs to do 1000 lines, 171 msecs to do 10,000 lines -- Mac G5 dual 1.8 mHz Not the speediest way, but more powerful/flexible than just the filter command, and probably on par with a RegEx solution. Jim Ault Las Vegas On 7/16/07 1:14 PM, Mark Schonewille [EMAIL PROTECTED] wrote: Here's another solution. Really. function quickSortAndFilter theData,theItemNr,theFilter set the itemDel to tab sort lines of theData by (item theItemNr of each is not theFilter) repeat for each line myLine in theData if item theItemNr of myLine is theFilter then put myLine cr after myNewData else exit repeat end if end repeat return myNewData end quickSortAndFilter Let me add some explanation: - theData is tab-delimited data - theItemNr is the number of the column you want to search - theFilter is the string you want to search for ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Thanks for your replies, Chris, Josh, Mark, and Richard, You've confirmed what I suspected: that it will be tricky to get the filter command to do this. This is especially true in my case because the data field has 56 columns by default but individual rows may have data appended later, and so may grow beyond 56. I think I'll do what I used to do before the filter command was around, and that is to repeat an itemOffset over the data. I tested the function below, which is much like Josh's, and it filters more than 300 lines in 1 tick; the same code as a message handler rather than a function takes 5 ticks; and the filter command on the same data takes 39 ticks. Thanks once again, Gregory function filterByColumn theData,columnNum, theSearchString -- This function filters tab-delimited data by column if columnNum is an integer then set the itemDelimiter to tab repeat for each line thisLine in theData if item columnNum of thisLine is theSearchString then put thisLine return after filteredData end if end repeat delete the last character of filteredData return filteredData else return The column number must be an integer. end if end filterByColumn ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
I'm struggling with the filter command. I have tab-delimited data in a field, where each row has at least 56 columns. I want to filter the data based on a chosen number in that fourth column, for example, to return all lines that have a 9 there. The fourth column only contains a number from 1 to 10. The following handler works most of the time. I put three wild card expressions, lone asterisks separated by tabs, to make sure nothing is being picked up in the first three columns. The tab and asterisk (and it could be just an *) following myNumber says that it doesn't matter what is in the columns after the fourth. put fld Data of this card into tmpData filter tmpData with * tab * tab * tab myNumber tab * I seem to remember an old thread where it was discovered that when you start looking at complex filters like this, a repeat for each loop ends up faster. Worth testing anyway. Cheers, Sarah ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
On Jul 16, 2007, at 2:04 PM, Gregory Lypny wrote: I tested the function below, which is much like Josh's, and it filters more than 300 lines in 1 tick; the same code as a message handler rather than a function takes 5 ticks; and the filter command on the same data takes 39 ticks. Wow, I thought that handler was fast, but never tested/compared it. I wonder if sorting the lines by columnNum, then exiting on the first non-matching line would be faster or slower? function filterByColumn theData,columnNum, theSearchString -- This function filters tab-delimited data by column if columnNum is an integer then set the itemDelimiter to tab repeat for each line thisLine in theData if item columnNum of thisLine is theSearchString then put thisLine return after filteredData end if end repeat delete the last character of filteredData return filteredData else return The column number must be an integer. end if end filterByColumn ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Also, I would suggest finding the exact ColumnNum as part of the handler- either from the header row, or a separate header variable. If you don't, as the database evolves and columns are added (or deleted), then you'll have to go back and change all your hardwired column references... or change your data around to make the columns right, both big hassles... On Jul 16, 2007, at 2:04 PM, Gregory Lypny wrote: function filterByColumn theData,columnNum, theSearchString -- This function filters tab-delimited data by column if columnNum is an integer then set the itemDelimiter to tab repeat for each line thisLine in theData if item columnNum of thisLine is theSearchString then put thisLine return after filteredData end if end repeat delete the last character of filteredData return filteredData else return The column number must be an integer. end if end filterByColumn ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Thanks Josh, Mark, Jim, and Sarah, Yep, I'll be going with a repeat loop. I posted my version just after some of yours, where I also report that a repeat loop is seven to eight times faster than the filter command. I'll try to incorporate some of the features of yours into mine. Regards, Gregory ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Thanks, Ken. Crystal clear. Greg On 10-Dec-05, at 8:52 AM, Ken Ray responded: Message: 5 Date: Fri, 09 Dec 2005 23:06:07 -0600 From: Ken Ray [EMAIL PROTECTED] Subject: Re: Filtering Columnar Data To: Use Revolution List use-revolution@lists.runrev.com Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=US-ASCII On 12/9/05 10:26 PM, Gregory Lypny [EMAIL PROTECTED] wrote: Not sure I understand, Ken. Your first statement puts searchString in the fifth column, while the second puts it in the fourth. If I changed mine to filter theData with * tab searchString tab * or filter theData with * tab searchString * it would still find the string in the second column, but perhaps in higher columns too because the pattern can be shifted right. That's right. That's why to match a specific column, you need to include *all* the columns in your filter command. So for a 5-column set of data, in order to specify the second column and only the second column you'd do: filter theData with * tab searchString tab * tab \ * tab * and to filter on only the fourth column of 5, it would be: filter theData with * tab * tab * tab \ searchString tab * The only issue is when you're matching the last column... you have to make sure you *don't* put a * after the last column, so it would be: filter theData with * tab * tab * tab \ * tab searchString and not filter theData with * tab * tab * tab \ * tab searchString * But perhaps repeat for each is a better choice for you... ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Gregory Lypny wrote: Hello everyone, I use the filter command on tab-delimited text files when I want to pick off a string in a particular column. For example, if the string is located in the second column of a five column file, I use filter theData with * tab searchString tab * tab * tab * . This, I assume, ensures that my hits don't include lines where the string appears in any other column. It works like lightening when I search in any of the first four columns, but beyond that I get the dreaded spinning beach ball in Mac OS X (Tiger). Is there a better way? http://lists.runrev.com/pipermail/use-revolution/2005-July/062579.html -- Richard Gaskin Managing Editor, revJournal ___ Rev tips, tutorials and more: http://www.revJournal.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
On 12/9/05 4:19 PM, Gregory Lypny [EMAIL PROTECTED] wrote: Hello everyone, I use the filter command on tab-delimited text files when I want to pick off a string in a particular column. For example, if the string is located in the second column of a five column file, I use filter theData with * tab searchString tab * tab * tab * . This, I assume, ensures that my hits don't include lines where the string appears in any other column. It works like lightening when I search in any of the first four columns, but beyond that I get the dreaded spinning beach ball in Mac OS X (Tiger). Is there a better way? Yes, the problem (I think) is that you probably have the * after the last column - I just worked with this today, and ran into the same problem. If you remove the last asterisk you should be fine: * tab * tab * tab * tab searchString whereas for column 4 it would be: * tab * tab * tab searchString tab * Hope this helps, Ken Ray Sons of Thunder Software Web site: http://www.sonsothunder.com/ Email: [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
A, thanks for the insight, Richard. The repeat for each form of the loop control structure is faster than the filter command. A quick test showed that I can cut my time by about 150 milliseconds per 800 lines of data. Considering that I have some 4,000 files with thousands of lines per file, it'll make a real difference. Greg Lypny Associate Professor of Finance John Molson School of Business Concordia University Montreal, Canada Richard Gaskin responded concisely with the link below my original post: On 9-Dec-05, at 7:42 PM, [EMAIL PROTECTED] wrote: Gregory Lypny wrote: Hello everyone, I use the filter command on tab-delimited text files when I want to pick off a string in a particular column. For example, if the string is located in the second column of a five column file, I use filter theData with * tab searchString tab * tab * tab * . This, I assume, ensures that my hits don't include lines where the string appears in any other column. It works like lightening when I search in any of the first four columns, but beyond that I get the dreaded spinning beach ball in Mac OS X (Tiger). Is there a better way? http://lists.runrev.com/pipermail/use-revolution/2005-July/ 062579.html ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
Not sure I understand, Ken. Your first statement puts searchString in the fifth column, while the second puts it in the fourth. If I changed mine to filter theData with * tab searchString tab * or filter theData with * tab searchString * it would still find the string in the second column, but perhaps in higher columns too because the pattern can be shifted right. Greg On 9-Dec-05, at 7:42 PM, Ken Ray wrote: On 12/9/05 4:19 PM, Gregory Lypny [EMAIL PROTECTED] wrote: Hello everyone, I use the filter command on tab-delimited text files when I want to pick off a string in a particular column. For example, if the string is located in the second column of a five column file, I use filter theData with * tab searchString tab * tab * tab * . This, I assume, ensures that my hits don't include lines where the string appears in any other column. It works like lightening when I search in any of the first four columns, but beyond that I get the dreaded spinning beach ball in Mac OS X (Tiger). Is there a better way? Yes, the problem (I think) is that you probably have the * after the last column - I just worked with this today, and ran into the same problem. If you remove the last asterisk you should be fine: * tab * tab * tab * tab searchString whereas for column 4 it would be: * tab * tab * tab searchString tab * Hope this helps, Ken Ray Sons of Thunder Software Web site: http://www.sonsothunder.com/ Email: [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Filtering Columnar Data
On 12/9/05 10:26 PM, Gregory Lypny [EMAIL PROTECTED] wrote: Not sure I understand, Ken. Your first statement puts searchString in the fifth column, while the second puts it in the fourth. If I changed mine to filter theData with * tab searchString tab * or filter theData with * tab searchString * it would still find the string in the second column, but perhaps in higher columns too because the pattern can be shifted right. That's right. That's why to match a specific column, you need to include *all* the columns in your filter command. So for a 5-column set of data, in order to specify the second column and only the second column you'd do: filter theData with * tab searchString tab * tab \ * tab * and to filter on only the fourth column of 5, it would be: filter theData with * tab * tab * tab \ searchString tab * The only issue is when you're matching the last column... you have to make sure you *don't* put a * after the last column, so it would be: filter theData with * tab * tab * tab \ * tab searchString and not filter theData with * tab * tab * tab \ * tab searchString * But perhaps repeat for each is a better choice for you... Ken Ray Sons of Thunder Software Web site: http://www.sonsothunder.com/ Email: [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution