subject:"Re\: Filtering Columnar Data"

Re: Filtering Columnar Data

2007-07-17 Thread Gregory Lypny


Hello Josh,

Here's the final version of the function.  Thanks to everyone for  
their suggestions.  I know there are other features and better error- 
checking that can be added.  Maybe later.  It can extract roughly 75  
records from 700 in 1 tick on an Intel-based iMac.


Regards,

Gregory

function filteredByColumn theData,theColNum,theSearchString,isWithout
  -- This function filters tab-delimited data by column
  -- By default the filter extracts lines containing the search  
string.  Make isWithout = without to extract lines without it.
  -- Possible mods: options to make begins with or contains  
searches

  if theColNum is an integer
  then
set the itemDelimiter to tab
if isWithout is empty
then
  sort lines of theData by (item theColNum of each is not  
theSearchString)  -- Brings lines with theSearchString to the top

  repeat for each line thisLine in theData
if item theColNum of thisLine is theSearchString
then
  put thisLine  return after filteredData
else
  exit repeat
end if
  end repeat
else
  sort lines of theData by (item theColNum of each is  
theSearchString)  -- Brings lines without theSearchString to the top

  repeat for each line thisLine in theData
if item theColNum of thisLine is not theSearchString
then
  put thisLine  return after filteredData
else
  exit repeat
end if
  end repeat
end if
delete the last character of filteredData
return filteredData
  else
return The column number must be an integer.
  end if
end filteredByColumn






On Mon, Jul 16, 2007, at 7:49 PM, Josh Mellicker responded:


On Jul 16, 2007, at 2:04 PM, Gregory Lypny wrote:


I tested the function below, which is much like Josh's, and it
filters more than 300 lines in 1 tick; the same code as a message
handler rather than a function takes 5 ticks; and the filter
command on the same data takes 39 ticks.



Wow, I thought that handler was fast, but never tested/compared it.

I wonder if sorting the lines by columnNum, then exiting on the first
non-matching line would be faster or slower?



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-17 Thread Josh Mellicker


Nice.

Once again, I suggest not hardwiring theColNum because it could  
change... but you could always make a separate handler that figured  
out the column number from a header row, lookup table or other  
criteria, then call this handler.



On Jul 17, 2007, at 1:26 PM, Gregory Lypny wrote:


Hello Josh,

Here's the final version of the function.  Thanks to everyone for  
their suggestions.  I know there are other features and better  
error-checking that can be added.  Maybe later.  It can extract  
roughly 75 records from 700 in 1 tick on an Intel-based iMac.


Regards,

Gregory

function filteredByColumn theData,theColNum,theSearchString,isWithout
  -- This function filters tab-delimited data by column
  -- By default the filter extracts lines containing the search  
string.  Make isWithout = without to extract lines without it.
  -- Possible mods: options to make begins with or contains  
searches

  if theColNum is an integer
  then
set the itemDelimiter to tab
if isWithout is empty
then
  sort lines of theData by (item theColNum of each is not  
theSearchString)  -- Brings lines with theSearchString to the top

  repeat for each line thisLine in theData
if item theColNum of thisLine is theSearchString
then
  put thisLine  return after filteredData
else
  exit repeat
end if
  end repeat
else
  sort lines of theData by (item theColNum of each is  
theSearchString)  -- Brings lines without theSearchString to the top

  repeat for each line thisLine in theData
if item theColNum of thisLine is not theSearchString
then
  put thisLine  return after filteredData
else
  exit repeat
end if
  end repeat
end if
delete the last character of filteredData
return filteredData
  else
return The column number must be an integer.
  end if
end filteredByColumn






On Mon, Jul 16, 2007, at 7:49 PM, Josh Mellicker responded:


On Jul 16, 2007, at 2:04 PM, Gregory Lypny wrote:


I tested the function below, which is much like Josh's, and it
filters more than 300 lines in 1 tick; the same code as a message
handler rather than a function takes 5 ticks; and the filter
command on the same data takes 39 ticks.



Wow, I thought that handler was fast, but never tested/compared it.

I wonder if sorting the lines by columnNum, then exiting on the first
non-matching line would be faster or slower?



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Mark Schonewille


Hi Gregory,

You need to include all trailing tabs to make 56 columns (that would  
be 55 tabs). That should work, but I experienced crashes when doing  
this. You need to stress-test this feature before distributing your  
software, if you plan to distribute it.


In this case, a sqLite or mySql database might be more reliable and  
easier in the long term, depending on the size of your data.


As you suggested yourself, an alternative might be to filter the data  
to include all lines containing the number you are searching for and  
do a repeat loop on the remaining data.


Best regards,

Mark Schonewille

--

Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http;//www.salery.com

Quickly extract data from your HyperCard stacks with DIFfersifier.  
http://differsifier.economy-x-talk.com



Op 16-jul-2007, om 17:58 heeft Gregory Lypny het volgende geschreven:


Hello Everyone,

I'm struggling with the filter command.  I have tab-delimited data  
in a field, where each row has at least 56 columns.  I want to  
filter the data based on a chosen number in that fourth column, for  
example, to return all lines that have a 9 there.  The fourth  
column only contains a number from 1 to 10.  The following handler  
works most of the time.  I put three wild card expressions, lone  
asterisks separated by tabs, to make sure nothing is being picked  
up in the first three columns.  The tab and asterisk (and it could  
be just an *) following myNumber says that it doesn't matter what  
is in the columns after the fourth.


put fld Data of this card into tmpData
	filter tmpData with *  tab  *  tab  *  tab  myNumber   
tab  *


The trouble is, the first three wild card expressions are  
interpreted by Revolution to include tab characters, which means  
that I'm not necessarily filtering the fourth column!  If a 9  
appears by itself in column 58, then the filter command will pick  
it up.  Sigh.


I also tried the following, but I don't think I have the regular  
expression syntax right.


   filter it with [A-Z0-9]  tab  [0-9]  tab  [A-Z]  tab  
 myNumber  tab  *


I know that my first column always contains five words, two of  
which are numbers.  I wasn't sure how to represent the four spaces  
between the words.  The second column is always a seven-digit  
number.  The third contains first and last names, which will  
contain at least two words.  This turns up empty.


Any advice would be most appreciated.  If I can't get the RegEx  
working for me, I know I can resort to a repeat loop and pull out  
all lines with myNumber in the fourth item.


Regards,

Gregory





___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Chris Sheffield


Gregory,

Try this and see if it works.

filter tmpData with *  tab  myNumber  tab  *

There shouldn't be any need to include the preceding or trailing  
tabs, except for the ones directly surrounding the number.


Chris


On Jul 16, 2007, at 9:58 AM, Gregory Lypny wrote:


Hello Everyone,

I'm struggling with the filter command.  I have tab-delimited data  
in a field, where each row has at least 56 columns.  I want to  
filter the data based on a chosen number in that fourth column, for  
example, to return all lines that have a 9 there.  The fourth  
column only contains a number from 1 to 10.  The following handler  
works most of the time.  I put three wild card expressions, lone  
asterisks separated by tabs, to make sure nothing is being picked  
up in the first three columns.  The tab and asterisk (and it could  
be just an *) following myNumber says that it doesn't matter what  
is in the columns after the fourth.


put fld Data of this card into tmpData
	filter tmpData with *  tab  *  tab  *  tab  myNumber   
tab  *


The trouble is, the first three wild card expressions are  
interpreted by Revolution to include tab characters, which means  
that I'm not necessarily filtering the fourth column!  If a 9  
appears by itself in column 58, then the filter command will pick  
it up.  Sigh.


I also tried the following, but I don't think I have the regular  
expression syntax right.


   filter it with [A-Z0-9]  tab  [0-9]  tab  [A-Z]  tab  
 myNumber  tab  *


I know that my first column always contains five words, two of  
which are numbers.  I wasn't sure how to represent the four spaces  
between the words.  The second column is always a seven-digit  
number.  The third contains first and last names, which will  
contain at least two words.  This turns up empty.


Any advice would be most appreciated.  If I can't get the RegEx  
working for me, I know I can resort to a repeat loop and pull out  
all lines with myNumber in the fourth item.


Regards,

Gregory







___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


--
Chris Sheffield
Read Naturally
The Fluency Company
http://www.readnaturally.com
--


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Richard Gaskin


Chris Sheffield wrote:

Try this and see if it works.

filter tmpData with *  tab  myNumber  tab  *

There shouldn't be any need to include the preceding or trailing  
tabs, except for the ones directly surrounding the number.


If he only wants hits in column 4, how will the above discern between 
that and a hit in any other numeric column?


--
 Richard Gaskin
 Fourth World Media Corporation
 ___
 [EMAIL PROTECTED]   http://www.FourthWorld.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Josh Mellicker


filter tmpData with *  tab  myNumber  tab  *
 will work, except it will return lines that have a 9 in ANY  
column...



The best way I have found to do this is a generic handler that will  
retrieve lines from any 2-dimensional field of data based on the  
header value and search data.


That is, if you want to know who likes pizza, ask

put FindSomeLines(the long id of fld favorites, favFood, pizza)  
into tPizzaLovers



here's the field:

name   favColor   favFood
Greg   blue   pizza
Ken   green   burgers
Rich   chrome   pizza


and the handler:

function FindSomeLines pFldName, pHeader, pSearchData
 put fld pFldName into tData
 put offset(pHeader, line 1 of tData) into tColNum
 set the itemdel to tab -- (if your field is tab-delimited)

 repeat for each line tLine in tData
  if item tColNum of tLine = pSearchData
   put tLine  cr after tFoundLines
  end if
end repeat

delete the last char of tFoundLines

return tFoundLines
end FindSomeLines


This is not tested, but should be close I hope!





On Jul 16, 2007, at 9:18 AM, Chris Sheffield wrote:


Gregory,

Try this and see if it works.

filter tmpData with *  tab  myNumber  tab  *

There shouldn't be any need to include the preceding or trailing  
tabs, except for the ones directly surrounding the number.


Chris


On Jul 16, 2007, at 9:58 AM, Gregory Lypny wrote:


Hello Everyone,

I'm struggling with the filter command.  I have tab-delimited data  
in a field, where each row has at least 56 columns.  I want to  
filter the data based on a chosen number in that fourth column,  
for example, to return all lines that have a 9 there.  The fourth  
column only contains a number from 1 to 10.  The following handler  
works most of the time.  I put three wild card expressions, lone  
asterisks separated by tabs, to make sure nothing is being picked  
up in the first three columns.  The tab and asterisk (and it could  
be just an *) following myNumber says that it doesn't matter what  
is in the columns after the fourth.


put fld Data of this card into tmpData
	filter tmpData with *  tab  *  tab  *  tab  myNumber  
 tab  *


The trouble is, the first three wild card expressions are  
interpreted by Revolution to include tab characters, which means  
that I'm not necessarily filtering the fourth column!  If a 9  
appears by itself in column 58, then the filter command will pick  
it up.  Sigh.


I also tried the following, but I don't think I have the regular  
expression syntax right.


   filter it with [A-Z0-9]  tab  [0-9]  tab  [A-Z]  tab  
 myNumber  tab  *


I know that my first column always contains five words, two of  
which are numbers.  I wasn't sure how to represent the four spaces  
between the words.  The second column is always a seven-digit  
number.  The third contains first and last names, which will  
contain at least two words.  This turns up empty.


Any advice would be most appreciated.  If I can't get the RegEx  
working for me, I know I can resort to a repeat loop and pull out  
all lines with myNumber in the fourth item.


Regards,

Gregory







___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


--
Chris Sheffield
Read Naturally
The Fluency Company
http://www.readnaturally.com
--


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Josh Mellicker


woops, already found an error, this line:

 put offset(pHeader, line 1 of tData) into tColNum

should be

 put itemOffset(pHeader, line 1 of tData) into tColNum


(and it doesn't really need the line 1 but I like it there just for  
clarity)



On Jul 16, 2007, at 9:43 AM, Josh Mellicker wrote:


filter tmpData with *  tab  myNumber  tab  *
 will work, except it will return lines that have a 9 in ANY  
column...



The best way I have found to do this is a generic handler that will  
retrieve lines from any 2-dimensional field of data based on the  
header value and search data.


That is, if you want to know who likes pizza, ask

put FindSomeLines(the long id of fld favorites, favFood,  
pizza) into tPizzaLovers



here's the field:

name   favColor   favFood
Greg   blue   pizza
Ken   green   burgers
Rich   chrome   pizza


and the handler:

function FindSomeLines pFldName, pHeader, pSearchData
 put fld pFldName into tData
 put offset(pHeader, line 1 of tData) into tColNum
 set the itemdel to tab -- (if your field is tab-delimited)

 repeat for each line tLine in tData
  if item tColNum of tLine = pSearchData
   put tLine  cr after tFoundLines
  end if
end repeat

delete the last char of tFoundLines

return tFoundLines
end FindSomeLines


This is not tested, but should be close I hope!





On Jul 16, 2007, at 9:18 AM, Chris Sheffield wrote:


Gregory,

Try this and see if it works.

filter tmpData with *  tab  myNumber  tab  *

There shouldn't be any need to include the preceding or trailing  
tabs, except for the ones directly surrounding the number.


Chris


On Jul 16, 2007, at 9:58 AM, Gregory Lypny wrote:


Hello Everyone,

I'm struggling with the filter command.  I have tab-delimited  
data in a field, where each row has at least 56 columns.  I want  
to filter the data based on a chosen number in that fourth  
column, for example, to return all lines that have a 9 there.   
The fourth column only contains a number from 1 to 10.  The  
following handler works most of the time.  I put three wild card  
expressions, lone asterisks separated by tabs, to make sure  
nothing is being picked up in the first three columns.  The tab  
and asterisk (and it could be just an *) following myNumber says  
that it doesn't matter what is in the columns after the fourth.


put fld Data of this card into tmpData
	filter tmpData with *  tab  *  tab  *  tab  myNumber  
 tab  *


The trouble is, the first three wild card expressions are  
interpreted by Revolution to include tab characters, which means  
that I'm not necessarily filtering the fourth column!  If a 9  
appears by itself in column 58, then the filter command will pick  
it up.  Sigh.


I also tried the following, but I don't think I have the regular  
expression syntax right.


   filter it with [A-Z0-9]  tab  [0-9]  tab  [A-Z]   
tab  myNumber  tab  *


I know that my first column always contains five words, two of  
which are numbers.  I wasn't sure how to represent the four  
spaces between the words.  The second column is always a seven- 
digit number.  The third contains first and last names, which  
will contain at least two words.  This turns up empty.


Any advice would be most appreciated.  If I can't get the RegEx  
working for me, I know I can resort to a repeat loop and pull out  
all lines with myNumber in the fourth item.


Regards,

Gregory







___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


--
Chris Sheffield
Read Naturally
The Fluency Company
http://www.readnaturally.com
--


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Jim Ault

This should work quickly enough for you:

filter tmpData with *tabmyNumbertab*
-- = good  hits + false hits only
set the itemdel to tab
repeat for each line LNN in tmpData
  if item 4 of LNN is myNumber then put LNN  cr after newTempData
end repeat
-- = newTempData should be the hit list

another approach that might help you in some way
sort lines of tmpData numeric by item 4 of each

Jim Ault
Las Vegas

On 7/16/07 8:58 AM, Gregory Lypny [EMAIL PROTECTED] wrote:

 Hello Everyone,
 
 I'm struggling with the filter command.  I have tab-delimited data in
 a field, where each row has at least 56 columns.  I want to filter
 the data based on a chosen number in that fourth column, for example,
 to return all lines that have a 9 there.  The fourth column only
 contains a number from 1 to 10.  The following handler works most of
 the time.  I put three wild card expressions, lone asterisks
 separated by tabs, to make sure nothing is being picked up in the
 first three columns.  The tab and asterisk (and it could be just an
 *) following myNumber says that it doesn't matter what is in the
 columns after the fourth.
 
 put fld Data of this card into tmpData
 filter tmpData with *  tab  *  tab  *  tab  myNumber 
 tab  *
 
 The trouble is, the first three wild card expressions are interpreted
 by Revolution to include tab characters, which means that I'm not
 necessarily filtering the fourth column!  If a 9 appears by itself in
 column 58, then the filter command will pick it up.  Sigh.
 
 I also tried the following, but I don't think I have the regular
 expression syntax right.
 
 filter it with [A-Z0-9]  tab  [0-9]  tab  [A-Z]  tab 
 myNumber  tab  *
 
 I know that my first column always contains five words, two of which
 are numbers.  I wasn't sure how to represent the four spaces between
 the words.  The second column is always a seven-digit number.  The
 third contains first and last names, which will contain at least two
 words.  This turns up empty.
 
 Any advice would be most appreciated.  If I can't get the RegEx
 working for me, I know I can resort to a repeat loop and pull out all
 lines with myNumber in the fourth item.
 
 Regards,
 
 Gregory
 
 
 
 
 
 
 
 ___
 use-revolution mailing list
 use-revolution@lists.runrev.com
 Please visit this url to subscribe, unsubscribe and manage your subscription
 preferences:
 http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Mark Schonewille


Here's another solution...

function quickSortAndFilter theData,theItemNr,theFilter
  set the itemDel to tab
  sort theData by (item theItemNr of theData is not theFilter)
  repeat for each line myLine in theData
if item theItemNr of myLine is theFilter then
  put myLine  cr after myNewData
end if
  end repeat
  return myNewData
end quickSortAndFilter

Best regards,

Mark Schonewille

--

Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http;//www.salery.com

Quickly extract data from your HyperCard stacks with DIFfersifier.  
http://differsifier.economy-x-talk.com



Op 16-jul-2007, om 17:58 heeft Gregory Lypny het volgende geschreven:


Hello Everyone,

I'm struggling with the filter command.  I have tab-delimited data  
in a field, where each row has at least 56 columns.  I want to  
filter the data based on a chosen number in that fourth column, for  
example, to return all lines that have a 9 there.  The fourth  
column only contains a number from 1 to 10.  The following handler  
works most of the time.  I put three wild card expressions, lone  
asterisks separated by tabs, to make sure nothing is being picked  
up in the first three columns.  The tab and asterisk (and it could  
be just an *) following myNumber says that it doesn't matter what  
is in the columns after the fourth.

snip
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Mark Schonewille


Sorry, I forgot an essential line...

function quickSortAndFilter theData,theItemNr,theFilter
  set the itemDel to tab
  sort theData by (item theItemNr of theData is not theFilter)
  repeat for each line myLine in theData
if item theItemNr of myLine is theFilter then
  put myLine  cr after myNewData
else
  exit repeat
end if
  end repeat
  return myNewData
end quickSortAndFilter

Best regards,

Mark Schonewille

--

Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http;//www.salery.com

Quickly extract data from your HyperCard stacks with DIFfersifier.  
http://differsifier.economy-x-talk.com






Op 16-jul-2007, om 17:58 heeft Gregory Lypny het volgende geschreven:


Hello Everyone,

I'm struggling with the filter command.  I have tab-delimited data  
in a field, where each row has at least 56 columns.  I want to  
filter the data based on a chosen number in that fourth column,  
for example, to return all lines that have a 9 there.  The fourth  
column only contains a number from 1 to 10.  The following handler  
works most of the time.  I put three wild card expressions, lone  
asterisks separated by tabs, to make sure nothing is being picked  
up in the first three columns.  The tab and asterisk (and it could  
be just an *) following myNumber says that it doesn't matter what  
is in the columns after the fourth.

snip

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Mark Schonewille


cough/still doesn't work/cough
sorry

Best regards,

Mark Schonewille

--

Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http;//www.salery.com

Quickly extract data from your HyperCard stacks with DIFfersifier.  
http://differsifier.economy-x-talk.com



Op 16-jul-2007, om 22:02 heeft Mark Schonewille het volgende geschreven:


Sorry, I forgot an essential line...

function quickSortAndFilter theData,theItemNr,theFilter
  set the itemDel to tab
  sort theData by (item theItemNr of theData is not theFilter)
  repeat for each line myLine in theData
if item theItemNr of myLine is theFilter then
  put myLine  cr after myNewData
else
  exit repeat
end if
  end repeat
  return myNewData
end quickSortAndFilter

Best regards,

Mark Schonewille


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Mark Schonewille


Here's another solution. Really.

function quickSortAndFilter theData,theItemNr,theFilter
  set the itemDel to tab
  sort lines of theData by (item theItemNr of each is not theFilter)
  repeat for each line myLine in theData
if item theItemNr of myLine is theFilter then
  put myLine  cr after myNewData
else
  exit repeat
end if
  end repeat
  return myNewData
end quickSortAndFilter

Let me add some explanation:
- theData is tab-delimited data
- theItemNr is the number of the column you want to search
- theFilter is the string you want to search for

Best regards,

Mark Schonewille

--

Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http;//www.salery.com

Quickly extract data from your HyperCard stacks with DIFfersifier.  
http://differsifier.economy-x-talk.com





Op 16-jul-2007, om 17:58 heeft Gregory Lypny het volgende geschreven:


Hello Everyone,

I'm struggling with the filter command.  I have tab-delimited data  
in a field, where each row has at least 56 columns.  I want to  
filter the data based on a chosen number in that fourth column,  
for example, to return all lines that have a 9 there.  The fourth  
column only contains a number from 1 to 10.  The following handler  
works most of the time.  I put three wild card expressions, lone  
asterisks separated by tabs, to make sure nothing is being picked  
up in the first three columns.  The tab and asterisk (and it could  
be just an *) following myNumber says that it doesn't matter what  
is in the columns after the fourth.

snip

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Jim Ault

Yet another solution using SORT and complex sort criteria

---intro
1) the sort command says sort using the result of the function
2) the function returns a true/false
3) the result of the sort is that
the desired value in col 4 sorts to the top
4) the repeat loop scans each line to find
when the desired value is no longer in item 4
5) thus  complex/compound sort rules are possible
by placing them in the sortKeyTF() function

The magic is supplied by Rev since it will call the function for each line
of the container.  This function call provides the sort key to be used,
which is, in this case, is true false.  Any string or number can be returned
as the sort key.


--substantially based on Mark Schonewille's code

on toVariable
  --we will build the lines of data to sort/filter
put   100,101,102,99,104,105,106,107,108,109,110 into tStr
set cursor to watch
   repeat with cnt = 1 to 1000
 put cnt mod 10 into item 4 of tStr
  put tStr  cr after tOutput
end repeat
replace comma with tab in tOutput --now tab delim
 -- THIS is where we do the sort and extract
put 8 into myNumber
put the millisecs into tStart
put quickSortAndFilter(tOutput, myNumber) into onlyMyNumber
put the millisecs - tStart into elapsed
   put elapsed  millisecs into msg
end toVariable

function quickSortAndFilter theData,myNumber
  set the itemDel to tab
  sort theData descending by (sortKeyTF(myNumber, item 4  of each))
  --   the sort is done, now keep only the myNumber lines
  repeat for each line myLine in theData
if item 4 of myLine is myNumber then
  put myLine  cr after myNewData
else
  exit repeat
end if
  end repeat
  return myNewData
end quickSortAndFilter

function sortKeyTF myNumber, thisItem4
   return (1*myNumber) = (1*thisItem4)
end sortKeyTF

 41 millisecs to do 1000 lines, 171 msecs to do 10,000 lines
-- Mac G5 dual  1.8 mHz

Not the speediest way, but more powerful/flexible than just the filter
command, and probably on par with a RegEx solution.

Jim Ault
Las Vegas

On 7/16/07 1:14 PM, Mark Schonewille [EMAIL PROTECTED]
wrote:

 Here's another solution. Really.
 
 function quickSortAndFilter theData,theItemNr,theFilter
set the itemDel to tab
sort lines of theData by (item theItemNr of each is not theFilter)
repeat for each line myLine in theData
  if item theItemNr of myLine is theFilter then
put myLine  cr after myNewData
  else
exit repeat
  end if
end repeat
return myNewData
 end quickSortAndFilter
 
 Let me add some explanation:
 - theData is tab-delimited data
 - theItemNr is the number of the column you want to search
 - theFilter is the string you want to search for
 


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Gregory Lypny


Thanks for your replies, Chris, Josh, Mark, and Richard,

You've confirmed what I suspected: that it will be tricky to get the  
filter command to do this.  This is especially true in my case  
because the data field has 56 columns by default but individual rows  
may have data appended later, and so may grow beyond 56.


I think I'll do what I used to do before the filter command was  
around, and that is to repeat an itemOffset over the data.  I tested  
the function below, which is much like Josh's, and it filters more  
than 300 lines in 1 tick; the same code as a message handler rather  
than a function takes 5 ticks; and the filter command on the same  
data takes 39 ticks.


Thanks once again,

Gregory


function filterByColumn theData,columnNum, theSearchString
  -- This function filters tab-delimited data by column
  if columnNum is an integer
  then
set the itemDelimiter to tab
repeat for each line thisLine in theData
  if item columnNum of thisLine is theSearchString
  then
put thisLine  return after filteredData
  end if
end repeat
delete the last character of filteredData
return filteredData
  else
return The column number must be an integer.
  end if
end filterByColumn

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Sarah Reichelt


I'm struggling with the filter command.  I have tab-delimited data in
a field, where each row has at least 56 columns.  I want to filter
the data based on a chosen number in that fourth column, for example,
to return all lines that have a 9 there.  The fourth column only
contains a number from 1 to 10.  The following handler works most of
the time.  I put three wild card expressions, lone asterisks
separated by tabs, to make sure nothing is being picked up in the
first three columns.  The tab and asterisk (and it could be just an
*) following myNumber says that it doesn't matter what is in the
columns after the fourth.

put fld Data of this card into tmpData
filter tmpData with *  tab  *  tab  *  tab  myNumber 
tab  *


I seem to remember an old thread where it was discovered that when you
start looking at complex filters like this, a repeat for each loop
ends up faster. Worth testing anyway.

Cheers,
Sarah
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Josh Mellicker



On Jul 16, 2007, at 2:04 PM, Gregory Lypny wrote:

I tested the function below, which is much like Josh's, and it  
filters more than 300 lines in 1 tick; the same code as a message  
handler rather than a function takes 5 ticks; and the filter  
command on the same data takes 39 ticks.



Wow, I thought that handler was fast, but never tested/compared it.

I wonder if sorting the lines by columnNum, then exiting on the first  
non-matching line would be faster or slower?






function filterByColumn theData,columnNum, theSearchString
  -- This function filters tab-delimited data by column
  if columnNum is an integer
  then
set the itemDelimiter to tab
repeat for each line thisLine in theData
  if item columnNum of thisLine is theSearchString
  then
put thisLine  return after filteredData
  end if
end repeat
delete the last character of filteredData
return filteredData
  else
return The column number must be an integer.
  end if
end filterByColumn

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Josh Mellicker

Also, I would suggest finding the exact ColumnNum as part of the  
handler- either from the header row, or a separate header variable.


If you don't, as the database evolves and columns are added (or  
deleted), then you'll have to go back and change all your hardwired  
column references... or change your data around to make the columns  
right, both big hassles...



On Jul 16, 2007, at 2:04 PM, Gregory Lypny wrote:


function filterByColumn theData,columnNum, theSearchString
  -- This function filters tab-delimited data by column
  if columnNum is an integer
  then
set the itemDelimiter to tab
repeat for each line thisLine in theData
  if item columnNum of thisLine is theSearchString
  then
put thisLine  return after filteredData
  end if
end repeat
delete the last character of filteredData
return filteredData
  else
return The column number must be an integer.
  end if
end filterByColumn

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2007-07-16 Thread Gregory Lypny


Thanks Josh, Mark, Jim, and Sarah,

Yep, I'll be going with a repeat loop.  I posted my version just  
after some of yours, where I also report that a repeat loop is seven  
to eight times faster than the filter command.  I'll try to  
incorporate some of the features of yours into mine.


Regards,

Gregory
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2005-12-10 Thread Gregory Lypny


Thanks, Ken.  Crystal clear.

Greg

On 10-Dec-05, at 8:52 AM, Ken Ray responded:


Message: 5
Date: Fri, 09 Dec 2005 23:06:07 -0600
From: Ken Ray [EMAIL PROTECTED]
Subject: Re: Filtering Columnar Data
To: Use Revolution List use-revolution@lists.runrev.com
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain;   charset=US-ASCII

On 12/9/05 10:26 PM, Gregory Lypny [EMAIL PROTECTED]  
wrote:



Not sure I understand, Ken.  Your first statement puts searchString
in the fifth column, while the second puts it in the fourth.  If I
changed mine to

filter theData with *  tab  searchString  tab  *

or

filter theData with *  tab  searchString  *

it would still find the string in the second column, but perhaps in
higher columns too because the pattern can be shifted right.


That's right. That's why to match a specific column, you need to  
include
*all* the columns in your filter command. So for a 5-column set of  
data, in
order to specify the second column and only the second column you'd  
do:


  filter theData with *  tab  searchString  tab  *  tab  \
*  tab  *

and to filter on only the fourth column of 5, it would be:

  filter theData with *  tab  *  tab  *  tab  \
   searchString  tab  *

The only issue is when you're matching the last column... you have  
to make

sure you *don't* put a * after the last column, so it would be:

  filter theData with *  tab  *  tab  *  tab  \
   *  tab  searchString

and not

  filter theData with *  tab  *  tab  *  tab  \
   *  tab  searchString  *


But perhaps repeat for each is a better choice for you...


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2005-12-09 Thread Richard Gaskin


Gregory Lypny wrote:

Hello everyone,

I use the filter command on tab-delimited text files when I want to  
pick off a string in a particular column.  For example, if the string  
is located in the second column of a five column file, I use


filter theData with *  tab  searchString  tab  *  tab  
*   tab  *  .


This, I assume, ensures that my hits don't include lines where the  
string appears in any other column.  It works like lightening when I  
search in any of the first four columns, but beyond that I get the  
dreaded spinning beach ball in Mac OS X (Tiger).  Is there a better way?


http://lists.runrev.com/pipermail/use-revolution/2005-July/062579.html


--
 Richard Gaskin
 Managing Editor, revJournal
 ___
 Rev tips, tutorials and more: http://www.revJournal.com
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2005-12-09 Thread Ken Ray

On 12/9/05 4:19 PM, Gregory Lypny [EMAIL PROTECTED] wrote:

 Hello everyone,
 
 I use the filter command on tab-delimited text files when I want to
 pick off a string in a particular column.  For example, if the string
 is located in the second column of a five column file, I use
 
 filter theData with *  tab  searchString  tab  *  tab  *
  tab  *  .
 
 This, I assume, ensures that my hits don't include lines where the
 string appears in any other column.  It works like lightening when I
 search in any of the first four columns, but beyond that I get the
 dreaded spinning beach ball in Mac OS X (Tiger).  Is there a better way?

Yes, the problem (I think) is that you probably have the * after the last
column - I just worked with this today, and ran into the same problem. If
you remove the last asterisk you should be fine:

*  tab  *  tab  *  tab  *  tab  searchString

whereas for column 4 it would be:

*  tab  *  tab  *  tab  searchString  tab  *

Hope this helps,

Ken Ray
Sons of Thunder Software
Web site: http://www.sonsothunder.com/
Email: [EMAIL PROTECTED]

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2005-12-09 Thread Gregory Lypny

A, thanks for the insight, Richard.  The repeat for each form  
of the loop control structure is faster than the filter command.  A  
quick test showed that I can cut my time by about 150 milliseconds  
per 800 lines of data.  Considering that I have some 4,000 files with  
thousands of lines per file, it'll make a real difference.


Greg Lypny

Associate Professor of Finance
John Molson School of Business
Concordia University
Montreal, Canada


Richard Gaskin responded concisely with the link below my original post:

On 9-Dec-05, at 7:42 PM, [EMAIL PROTECTED] wrote:


Gregory Lypny wrote:

Hello everyone,

I use the filter command on tab-delimited text files when I  
want to

pick off a string in a particular column.  For example, if the string
is located in the second column of a five column file, I use

filter theData with *  tab  searchString  tab  *  tab 
*   tab  *  .

This, I assume, ensures that my hits don't include lines where the
string appears in any other column.  It works like lightening when I
search in any of the first four columns, but beyond that I get the
dreaded spinning beach ball in Mac OS X (Tiger).  Is there a  
better way?


http://lists.runrev.com/pipermail/use-revolution/2005-July/ 
062579.html


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2005-12-09 Thread Gregory Lypny

Not sure I understand, Ken.  Your first statement puts searchString  
in the fifth column, while the second puts it in the fourth.  If I  
changed mine to


filter theData with *  tab  searchString  tab  *

or

filter theData with *  tab  searchString  *

it would still find the string in the second column, but perhaps in  
higher columns too because the pattern can be shifted right.


Greg


On 9-Dec-05, at 7:42 PM, Ken Ray wrote:



On 12/9/05 4:19 PM, Gregory Lypny [EMAIL PROTECTED]  
wrote:



Hello everyone,

I use the filter command on tab-delimited text files when I want to
pick off a string in a particular column.  For example, if the string
is located in the second column of a five column file, I use

filter theData with *  tab  searchString  tab  *  tab  *
 tab  *  .

This, I assume, ensures that my hits don't include lines where the
string appears in any other column.  It works like lightening when I
search in any of the first four columns, but beyond that I get the
dreaded spinning beach ball in Mac OS X (Tiger).  Is there a  
better way?


Yes, the problem (I think) is that you probably have the * after  
the last
column - I just worked with this today, and ran into the same  
problem. If

you remove the last asterisk you should be fine:

*  tab  *  tab  *  tab  *  tab  searchString

whereas for column 4 it would be:

*  tab  *  tab  *  tab  searchString  tab  *

Hope this helps,

Ken Ray
Sons of Thunder Software
Web site: http://www.sonsothunder.com/
Email: [EMAIL PROTECTED]


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

2005-12-09 Thread Ken Ray

On 12/9/05 10:26 PM, Gregory Lypny [EMAIL PROTECTED] wrote:

 Not sure I understand, Ken.  Your first statement puts searchString
 in the fifth column, while the second puts it in the fourth.  If I
 changed mine to
 
 filter theData with *  tab  searchString  tab  *
 
 or
 
 filter theData with *  tab  searchString  *
 
 it would still find the string in the second column, but perhaps in
 higher columns too because the pattern can be shifted right.

That's right. That's why to match a specific column, you need to include
*all* the columns in your filter command. So for a 5-column set of data, in
order to specify the second column and only the second column you'd do:

  filter theData with *  tab  searchString  tab  *  tab  \
*  tab  *

and to filter on only the fourth column of 5, it would be:

  filter theData with *  tab  *  tab  *  tab  \
   searchString  tab  *

The only issue is when you're matching the last column... you have to make
sure you *don't* put a * after the last column, so it would be:

  filter theData with *  tab  *  tab  *  tab  \
   *  tab  searchString

and not 
 
  filter theData with *  tab  *  tab  *  tab  \
   *  tab  searchString  *


But perhaps repeat for each is a better choice for you...


Ken Ray
Sons of Thunder Software
Web site: http://www.sonsothunder.com/
Email: [EMAIL PROTECTED]

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

Re: Filtering Columnar Data

24 matches

Site Navigation

Mail list logo

Footer information