RFC: Optimizing stripDup()

2001-11-02 Thread diskot123

(Note that this implementation has chosen to force every key to have a
>value.   That's because it relies on empty to indicate that a key hasn't
>been defined yet.  If there was an atomic test for 'is-a-key' (is
there?)
>if someKey is among the lines of the keys of myArray
Actually it would probably be a lot faster to force every key to have a
dummy value. This is because if you have 10,000,000 keys and do a lookup,
lookup time should be extremely fast and constant on any key(0
milliseconds) because it's using a hash(ie. associative array) and MC can
either find it or it can't. On the other hand 'is among' would force MC
to try to match each and every line of the keys..until it finds it, and
if there are 10,000,000 keys it could take a while.

I agree though that it would be useful for MetaCard to return something
other than empty..if a key can't be found..to differentiate between the
empty and non existing case.

Tuviah
http://members.aol.com/tuvsnyder/



Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-11-01 Thread Jeanne A. E. DeVoto

At 2:56 AM -0800 11/1/2001, Ben Rubinstein wrote:
>(Note that this implementation has chosen to force every key to have a
>value.   That's because it relies on empty to indicate that a key hasn't
>been defined yet.  If there was an atomic test for 'is-a-key' (is there?)

How about
 if someKey is among the lines of the keys of myArray
?

--
Jeanne A. E. DeVoto ~ [EMAIL PROTECTED]
http://www.runrev.com/
Runtime Revolution Limited - Power to the Developer!



Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-11-01 Thread Ben Rubinstein

on 1/11/01 6:11 AM, LiangTyan Fui at [EMAIL PROTECTED] wrote:

> 
> The browser constructs the the following string and pass onto the server:
> http://localhost/cgi-bin/echo.mt?option=1&option=2&option=3
> 
> Suppose you are using a split command to parse the query string in the
> echo.mt script:
> 
> put $QUERY_STRING into x
> split x by "&" and "="
> put keys(x) into xKeys
> 
> you'll get only "option" in xKeys, and x["option"] gives you "3".
> 
> Hmm. should I bug report this?

It's not a bug - split is behaving correctly according to the spec - it's
just that this spec makes it tantalisingly close to a handy parser for URL
strings, but not quite right.  But the thing to do is use this as a cue to
define the spec you do want for that purpose, and implement it - eg:

function splitQueryString qs --> array
  set the itemdelimiter to "&"
  repeat for each item p in qs
put space into char offset("=", p) of p -- or use split in 2.4
put urlDecode(word 1 of p) into k
put urlDecode(word 2 to -1 of p) into v
--
-- may or may not want this, depending on what you're doing
if v = empty then put true into v
--
get formdata[k]
if it <> empty then put it & "," before v
put v into formdata[k]
  end repeat
  return queryparms
end splitQueryString

This will handle the multi-option case, and is also neater because it's
specialised for handling query strings - so it can do the URLdecoding.  If
you had a parallel function that parsed data 'POST'ed instead of sent in the
query string, then you could isolate the encoding issue - call one or other
function, and then either way have an array of parameters and values.

(Note that this implementation has chosen to force every key to have a
value.   That's because it relies on empty to indicate that a key hasn't
been defined yet.  If there was an atomic test for 'is-a-key' (is there?)
one could use that - or if you really want it, go the extra mile and keep a
second array to show whether a key has been encountered already.)

  Ben Rubinstein   |  Email: [EMAIL PROTECTED]
  Cognitive Applications Ltd   |  Phone: +44 (0)1273-821600
  http://www.cogapp.com|  Fax  : +44 (0)1273-728866



Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-11-01 Thread LiangTyan Fui

On 11/1/01 3:46 PM, andu wrote:

> LiangTyan Fui wrote:
>> 
>> On 10/29/01 11:09 PM, eugen helbling wrote:
>> 
>>> Hi LiangTyan,
>>> just to demonstrate that the new "split"(I like them) can do in case of
>>> duplicate removing.
>>> 
>>> for example you have a field/variable  having something like
>>> "nameA,firstnameB" & cr & "nameA,firstnameC"
>>> in it, you can use "split" handler to select names witout duplicates.
>>> 
>>> get "nameA,firstnameB" & cr & "nameA,firstnameC"
>>> split it by return and comma
>>> get the keys of it   -- in it you have "nameA" only one time
>> 
>> This could be a good trick on eliminating duplicate records, but I realised
>> it could be a problem on URL parsing.
>> If you try to submit a series of check-buttons or radio-buttons with
>> browser:
>> 
>> MetaCard split test
>> 
>> http://localhost/cgi-bin/echo.mt"; method="get">
>> Option 1
>> Option 2
>> Option 3
>> 
>> 
>> 
>> 
>> The browser constructs the the following string and pass onto the server:
>> http://localhost/cgi-bin/echo.mt?option=1&option=2&option=3
>> 
>> Suppose you are using a split command to parse the query string in the
>> echo.mt script:
>> 
>> put $QUERY_STRING into x
>> split x by "&" and "="
>> put keys(x) into xKeys
>> 
>> you'll get only "option" in xKeys, and x["option"] gives you "3".
>> 
>> Hmm. should I bug report this?
> 
> Would you like to have?
> x["option"]=1
> and
> x["option"]=2
> and
> x["option"]=3 ?

I would preferred:
x["option"] ="1"& cr &"2"& cr &"3"

or
x["option"] ="1"& the itemdel &"2"& the itemdel &"3"

Regards,
LiangTyan Fui


> I'd say it's normal behavior to retain only the last given value.
>>
>> Regards,
>> LiangTyan Fui
> 
> Andu
> 
> Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
> Info: http://www.xworlds.com/metacard/mailinglist.htm
> Please send bug reports to <[EMAIL PROTECTED]>, not this list.
> 
> 


Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-31 Thread andu

LiangTyan Fui wrote:
> 
> On 10/29/01 11:09 PM, eugen helbling wrote:
> 
> > Hi LiangTyan,
> > just to demonstrate that the new "split"(I like them) can do in case of
> > duplicate removing.
> >
> > for example you have a field/variable  having something like
> > "nameA,firstnameB" & cr & "nameA,firstnameC"
> > in it, you can use "split" handler to select names witout duplicates.
> >
> > get "nameA,firstnameB" & cr & "nameA,firstnameC"
> > split it by return and comma
> > get the keys of it   -- in it you have "nameA" only one time
> 
> This could be a good trick on eliminating duplicate records, but I realised
> it could be a problem on URL parsing.
> If you try to submit a series of check-buttons or radio-buttons with
> browser:
> 
> MetaCard split test
> 
> http://localhost/cgi-bin/echo.mt"; method="get">
> Option 1
> Option 2
> Option 3
> 
> 
> 
> 
> The browser constructs the the following string and pass onto the server:
> http://localhost/cgi-bin/echo.mt?option=1&option=2&option=3
> 
> Suppose you are using a split command to parse the query string in the
> echo.mt script:
> 
> put $QUERY_STRING into x
> split x by "&" and "="
> put keys(x) into xKeys
> 
> you'll get only "option" in xKeys, and x["option"] gives you "3".
> 
> Hmm. should I bug report this?

Would you like to have?
x["option"]=1
and
x["option"]=2
and
x["option"]=3 ?

I'd say it's normal behavior to retain only the last given value.

> 
> Regards,
> LiangTyan Fui

Andu

Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-31 Thread LiangTyan Fui

On 10/29/01 11:09 PM, eugen helbling wrote:

> Hi LiangTyan,
> just to demonstrate that the new "split"(I like them) can do in case of
> duplicate removing.
> 
> for example you have a field/variable  having something like
> "nameA,firstnameB" & cr & "nameA,firstnameC"
> in it, you can use "split" handler to select names witout duplicates.
> 
> get "nameA,firstnameB" & cr & "nameA,firstnameC"
> split it by return and comma
> get the keys of it   -- in it you have "nameA" only one time

This could be a good trick on eliminating duplicate records, but I realised
it could be a problem on URL parsing.
If you try to submit a series of check-buttons or radio-buttons with
browser:

MetaCard split test

http://localhost/cgi-bin/echo.mt"; method="get">
Option 1
Option 2
Option 3




The browser constructs the the following string and pass onto the server:
http://localhost/cgi-bin/echo.mt?option=1&option=2&option=3

Suppose you are using a split command to parse the query string in the
echo.mt script:

put $QUERY_STRING into x
split x by "&" and "="
put keys(x) into xKeys

you'll get only "option" in xKeys, and x["option"] gives you "3".

Hmm. should I bug report this?

Regards,
LiangTyan Fui


> regards 
> eugen
> 
> LiangTyan Fui wrote:
> 
>> 
>> Here is a function that I've written quite some time ago. It takes a list of
>> text theList, separated by theitemDel, remove duplicate items in the list,
>> and returns a new list without duplicate items.
>> Unfortunately, this function running rather slowly on a large list (a few
>> thousands records) - that is why I am posting here as a little "open
>> source", Request For Comment: Make it faster guys!
>> Remember, the sequence of the records cannot be changed, that say you are
>> not likely to use sort.
>> 
>> Regards,
>> LiangTyan Fui
>> 
>> #
>> 
>> function stripDup theList,theitemDel
>>   # verify param
>>   if theList= "" then return ""
>>   if theitemDel = "" then put cr into theitemDel
>> 
>>   #
>>   set theitemDel to char 1 of theitemDel
>> 
>>   put number of items of theList into theListItem
>>   put 1 into k
>>   repeat
>> put item k of theList into key1
>> repeat with c = theListItem down to k+1
>>   if key1 = item c of theList then
>> delete item c of theList
>> subtract 1 from theListItem
>>   end if
>> end repeat
>> if k >= theListItem then exit repeat
>> add 1 to k
>>   end repeat
>>   return theList
>> end stripDup
>> 
>> Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
>> Info: http://www.xworlds.com/metacard/mailinglist.htm
>> Please send bug reports to <[EMAIL PROTECTED]>, not this list.
> 


Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-29 Thread eugen helbling

Hi LiangTyan,
just to demonstrate that the new "split"(I like them) can do in case of duplicate 
removing.
 
for example you have a field/variable  having something like 
"nameA,firstnameB" & cr & "nameA,firstnameC"
in it, you can use "split" handler to select names witout duplicates.

get "nameA,firstnameB" & cr & "nameA,firstnameC"
split it by return and comma
get the keys of it   -- in it you have "nameA" only one time

regards
eugen

LiangTyan Fui wrote:

> 
> Here is a function that I've written quite some time ago. It takes a list of
> text theList, separated by theitemDel, remove duplicate items in the list,
> and returns a new list without duplicate items.
> Unfortunately, this function running rather slowly on a large list (a few
> thousands records) - that is why I am posting here as a little "open
> source", Request For Comment: Make it faster guys!
> Remember, the sequence of the records cannot be changed, that say you are
> not likely to use sort.
> 
> Regards,
> LiangTyan Fui
> 
> #
> 
> function stripDup theList,theitemDel
>   # verify param
>   if theList= "" then return ""
>   if theitemDel = "" then put cr into theitemDel
> 
>   #
>   set theitemDel to char 1 of theitemDel
> 
>   put number of items of theList into theListItem
>   put 1 into k
>   repeat
> put item k of theList into key1
> repeat with c = theListItem down to k+1
>   if key1 = item c of theList then
> delete item c of theList
> subtract 1 from theListItem
>   end if
> end repeat
> if k >= theListItem then exit repeat
> add 1 to k
>   end repeat
>   return theList
> end stripDup
> 
> Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
> Info: http://www.xworlds.com/metacard/mailinglist.htm
> Please send bug reports to <[EMAIL PROTECTED]>, not this list.


-- 
Eugen Helbling
_

   GINIT Technology GmbH[EMAIL PROTECTED]
   Eugen Helbling  www.ginit-technology.com
   Emmy-Noether-Str. 11phone:   +49-721-96681-0
   D-76131 Karlsruhe   fax:   +49-721-96681-111

Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-28 Thread Michael J. Lew

This version is about 200 times faster than the original for a list 
of 400 words:

function stripDup theList,theitemDel
   set the itemDelimiter to theItemDel
   repeat for each item theItem in theList
 if itemArray[theItem] is empty then
   put 1 into itemArray[theItem]
   put theItemDel & theItem after doneList
 end if
   end repeat
   delete char 1 of doneList --a leading delimiter char
   return doneList
end stripDup
-- 
Michael J. Lew

Senior Lecturer
Department of Pharmacology
The University of Melbourne
Parkville 3010
Victoria
Australia

Phone +613 8344 8304

**
New email address: [EMAIL PROTECTED]
**

Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-26 Thread LiangTyan Fui

On 10/27/01 2:34 AM, Jacqueline Landman Gay wrote:

> Geoff Canyon wrote:
>> 
>> Maybe something like:
>> 
> 
> Geoff's is faster than mine. I never thought to use arrays. Pretty impressive.

Yup, the winner is Geoff. Thanks!
Now, where should I post this script to?
Do we have an organised xTalk library somewhere? Or should I use
http://www.mctools.org?

Regards,
LiangTyan Fui




Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-26 Thread Jacqueline Landman Gay

LiangTyan Fui wrote:
> 
> Here is a function that I've written quite some time ago. It takes a list of
> text theList, separated by theitemDel, remove duplicate items in the list,
> and returns a new list without duplicate items.
> Unfortunately, this function running rather slowly on a large list (a few
> thousands records) - that is why I am posting here as a little "open
> source", Request For Comment: Make it faster guys!
> Remember, the sequence of the records cannot be changed, that say you are
> not likely to use sort.

On a test field of almost 8,000 lines, the original script takes 112
ticks on my Mac, compared to 8 ticks for this version:
 
function stripDup theList,theitemDel
  # verify param
  -- put the ticks into startticks -- Enable for time tests
  if theList= "" then return ""
  if theitemDel = "" then put cr into theitemDel
  put char 1 of theitemDel into theitemDel
  set the itemDelimiter to theitemDel
  put "" into theList2
  set wholematches to true
  repeat for each item i in theList
if itemOffset(i, theList2) = 0
then put i & theItemDel after theList2
  end repeat
  -- put the ticks - startticks -- Enable for time tests
  return theList2
end stripDup

-- 
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com

Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-26 Thread Jacqueline Landman Gay

Geoff Canyon wrote:
> 
> Maybe something like:
> 

Geoff's is faster than mine. I never thought to use arrays. Pretty impressive.

-- 
Jacqueline Landman Gay | [EMAIL PROTECTED]
HyperActive Software   | http://www.hyperactivesw.com

Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-26 Thread Geoff Canyon

Maybe something like:

function stripDup theList,theitemDel
  # verify param
  if theList= "" then return ""
  if theitemDel = "" then put cr into theitemDel
  
  #
  set the itemDelimiter to char 1 of theitemDel
  put empty into tResultList

  repeat for each item tItem in theList
if tItemList[tItem] is empty then
  -- this is much faster than:
  -- if tItem is not among the items of tResultList then
  -- associative arrays rock
  put 1 into tItemList[tItem]
  put tItem & theitemDel after tResultList
end if
  end repeat
  return (char 1 to -2 of tResultList) --remove final delimiter
end stripDup

In my testing, this finds the unique words, in order, in an 8000 word text file in 
about .05 seconds. Again, associative arrays rock. The repeat for each... structure 
does too.

gc


Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




Re: RFC: Optimizing stripDup()

2001-10-26 Thread Yennie

I don't know if this is the optimal solution, but there *is* a way you could 
use sort to speed things up: first number the items, then sort them by their 
initial value, remove dups, then sort again by their number. Finally, strip 
the numbers. Not the simplest, but it should be faster for large data than 
checking every item against every other.

HTH,
Brian

<< Here is a function that I've written quite some time ago. It takes a list 
of
text theList, separated by theitemDel, remove duplicate items in the list,
and returns a new list without duplicate items.
Unfortunately, this function running rather slowly on a large list (a few
thousands records) - that is why I am posting here as a little "open
source", Request For Comment: Make it faster guys!
Remember, the sequence of the records cannot be changed, that say you are
not likely to use sort. >>


Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.




RFC: Optimizing stripDup()

2001-10-26 Thread LiangTyan Fui

Here is a function that I've written quite some time ago. It takes a list of
text theList, separated by theitemDel, remove duplicate items in the list,
and returns a new list without duplicate items.
Unfortunately, this function running rather slowly on a large list (a few
thousands records) - that is why I am posting here as a little "open
source", Request For Comment: Make it faster guys!
Remember, the sequence of the records cannot be changed, that say you are
not likely to use sort.

Regards,
LiangTyan Fui

#

function stripDup theList,theitemDel
  # verify param
  if theList= "" then return ""
  if theitemDel = "" then put cr into theitemDel
  
  #
  set theitemDel to char 1 of theitemDel

  put number of items of theList into theListItem
  put 1 into k
  repeat
put item k of theList into key1
repeat with c = theListItem down to k+1
  if key1 = item c of theList then
delete item c of theList
subtract 1 from theListItem
  end if
end repeat
if k >= theListItem then exit repeat
add 1 to k
  end repeat
  return theList
end stripDup


Archives: http://www.mail-archive.com/metacard@lists.runrev.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to <[EMAIL PROTECTED]>, not this list.