RFC: Optimizing stripDup()
(Note that this implementation has chosen to force every key to have a >value. That's because it relies on empty to indicate that a key hasn't >been defined yet. If there was an atomic test for 'is-a-key' (is there?) >if someKey is among the lines of the keys of myArray Actually it would probably be a lot faster to force every key to have a dummy value. This is because if you have 10,000,000 keys and do a lookup, lookup time should be extremely fast and constant on any key(0 milliseconds) because it's using a hash(ie. associative array) and MC can either find it or it can't. On the other hand 'is among' would force MC to try to match each and every line of the keys..until it finds it, and if there are 10,000,000 keys it could take a while. I agree though that it would be useful for MetaCard to return something other than empty..if a key can't be found..to differentiate between the empty and non existing case. Tuviah http://members.aol.com/tuvsnyder/ Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
At 2:56 AM -0800 11/1/2001, Ben Rubinstein wrote: >(Note that this implementation has chosen to force every key to have a >value. That's because it relies on empty to indicate that a key hasn't >been defined yet. If there was an atomic test for 'is-a-key' (is there?) How about if someKey is among the lines of the keys of myArray ? -- Jeanne A. E. DeVoto ~ [EMAIL PROTECTED] http://www.runrev.com/ Runtime Revolution Limited - Power to the Developer! Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
on 1/11/01 6:11 AM, LiangTyan Fui at [EMAIL PROTECTED] wrote: > > The browser constructs the the following string and pass onto the server: > http://localhost/cgi-bin/echo.mt?option=1&option=2&option=3 > > Suppose you are using a split command to parse the query string in the > echo.mt script: > > put $QUERY_STRING into x > split x by "&" and "=" > put keys(x) into xKeys > > you'll get only "option" in xKeys, and x["option"] gives you "3". > > Hmm. should I bug report this? It's not a bug - split is behaving correctly according to the spec - it's just that this spec makes it tantalisingly close to a handy parser for URL strings, but not quite right. But the thing to do is use this as a cue to define the spec you do want for that purpose, and implement it - eg: function splitQueryString qs --> array set the itemdelimiter to "&" repeat for each item p in qs put space into char offset("=", p) of p -- or use split in 2.4 put urlDecode(word 1 of p) into k put urlDecode(word 2 to -1 of p) into v -- -- may or may not want this, depending on what you're doing if v = empty then put true into v -- get formdata[k] if it <> empty then put it & "," before v put v into formdata[k] end repeat return queryparms end splitQueryString This will handle the multi-option case, and is also neater because it's specialised for handling query strings - so it can do the URLdecoding. If you had a parallel function that parsed data 'POST'ed instead of sent in the query string, then you could isolate the encoding issue - call one or other function, and then either way have an array of parameters and values. (Note that this implementation has chosen to force every key to have a value. That's because it relies on empty to indicate that a key hasn't been defined yet. If there was an atomic test for 'is-a-key' (is there?) one could use that - or if you really want it, go the extra mile and keep a second array to show whether a key has been encountered already.) Ben Rubinstein | Email: [EMAIL PROTECTED] Cognitive Applications Ltd | Phone: +44 (0)1273-821600 http://www.cogapp.com| Fax : +44 (0)1273-728866 Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
On 11/1/01 3:46 PM, andu wrote: > LiangTyan Fui wrote: >> >> On 10/29/01 11:09 PM, eugen helbling wrote: >> >>> Hi LiangTyan, >>> just to demonstrate that the new "split"(I like them) can do in case of >>> duplicate removing. >>> >>> for example you have a field/variable having something like >>> "nameA,firstnameB" & cr & "nameA,firstnameC" >>> in it, you can use "split" handler to select names witout duplicates. >>> >>> get "nameA,firstnameB" & cr & "nameA,firstnameC" >>> split it by return and comma >>> get the keys of it -- in it you have "nameA" only one time >> >> This could be a good trick on eliminating duplicate records, but I realised >> it could be a problem on URL parsing. >> If you try to submit a series of check-buttons or radio-buttons with >> browser: >> >> MetaCard split test >> >> http://localhost/cgi-bin/echo.mt"; method="get"> >> Option 1 >> Option 2 >> Option 3 >> >> >> >> >> The browser constructs the the following string and pass onto the server: >> http://localhost/cgi-bin/echo.mt?option=1&option=2&option=3 >> >> Suppose you are using a split command to parse the query string in the >> echo.mt script: >> >> put $QUERY_STRING into x >> split x by "&" and "=" >> put keys(x) into xKeys >> >> you'll get only "option" in xKeys, and x["option"] gives you "3". >> >> Hmm. should I bug report this? > > Would you like to have? > x["option"]=1 > and > x["option"]=2 > and > x["option"]=3 ? I would preferred: x["option"] ="1"& cr &"2"& cr &"3" or x["option"] ="1"& the itemdel &"2"& the itemdel &"3" Regards, LiangTyan Fui > I'd say it's normal behavior to retain only the last given value. >> >> Regards, >> LiangTyan Fui > > Andu > > Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ > Info: http://www.xworlds.com/metacard/mailinglist.htm > Please send bug reports to <[EMAIL PROTECTED]>, not this list. > > Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
LiangTyan Fui wrote: > > On 10/29/01 11:09 PM, eugen helbling wrote: > > > Hi LiangTyan, > > just to demonstrate that the new "split"(I like them) can do in case of > > duplicate removing. > > > > for example you have a field/variable having something like > > "nameA,firstnameB" & cr & "nameA,firstnameC" > > in it, you can use "split" handler to select names witout duplicates. > > > > get "nameA,firstnameB" & cr & "nameA,firstnameC" > > split it by return and comma > > get the keys of it -- in it you have "nameA" only one time > > This could be a good trick on eliminating duplicate records, but I realised > it could be a problem on URL parsing. > If you try to submit a series of check-buttons or radio-buttons with > browser: > > MetaCard split test > > http://localhost/cgi-bin/echo.mt"; method="get"> > Option 1 > Option 2 > Option 3 > > > > > The browser constructs the the following string and pass onto the server: > http://localhost/cgi-bin/echo.mt?option=1&option=2&option=3 > > Suppose you are using a split command to parse the query string in the > echo.mt script: > > put $QUERY_STRING into x > split x by "&" and "=" > put keys(x) into xKeys > > you'll get only "option" in xKeys, and x["option"] gives you "3". > > Hmm. should I bug report this? Would you like to have? x["option"]=1 and x["option"]=2 and x["option"]=3 ? I'd say it's normal behavior to retain only the last given value. > > Regards, > LiangTyan Fui Andu Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
On 10/29/01 11:09 PM, eugen helbling wrote: > Hi LiangTyan, > just to demonstrate that the new "split"(I like them) can do in case of > duplicate removing. > > for example you have a field/variable having something like > "nameA,firstnameB" & cr & "nameA,firstnameC" > in it, you can use "split" handler to select names witout duplicates. > > get "nameA,firstnameB" & cr & "nameA,firstnameC" > split it by return and comma > get the keys of it -- in it you have "nameA" only one time This could be a good trick on eliminating duplicate records, but I realised it could be a problem on URL parsing. If you try to submit a series of check-buttons or radio-buttons with browser: MetaCard split test http://localhost/cgi-bin/echo.mt"; method="get"> Option 1 Option 2 Option 3 The browser constructs the the following string and pass onto the server: http://localhost/cgi-bin/echo.mt?option=1&option=2&option=3 Suppose you are using a split command to parse the query string in the echo.mt script: put $QUERY_STRING into x split x by "&" and "=" put keys(x) into xKeys you'll get only "option" in xKeys, and x["option"] gives you "3". Hmm. should I bug report this? Regards, LiangTyan Fui > regards > eugen > > LiangTyan Fui wrote: > >> >> Here is a function that I've written quite some time ago. It takes a list of >> text theList, separated by theitemDel, remove duplicate items in the list, >> and returns a new list without duplicate items. >> Unfortunately, this function running rather slowly on a large list (a few >> thousands records) - that is why I am posting here as a little "open >> source", Request For Comment: Make it faster guys! >> Remember, the sequence of the records cannot be changed, that say you are >> not likely to use sort. >> >> Regards, >> LiangTyan Fui >> >> # >> >> function stripDup theList,theitemDel >> # verify param >> if theList= "" then return "" >> if theitemDel = "" then put cr into theitemDel >> >> # >> set theitemDel to char 1 of theitemDel >> >> put number of items of theList into theListItem >> put 1 into k >> repeat >> put item k of theList into key1 >> repeat with c = theListItem down to k+1 >> if key1 = item c of theList then >> delete item c of theList >> subtract 1 from theListItem >> end if >> end repeat >> if k >= theListItem then exit repeat >> add 1 to k >> end repeat >> return theList >> end stripDup >> >> Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ >> Info: http://www.xworlds.com/metacard/mailinglist.htm >> Please send bug reports to <[EMAIL PROTECTED]>, not this list. > Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
Hi LiangTyan, just to demonstrate that the new "split"(I like them) can do in case of duplicate removing. for example you have a field/variable having something like "nameA,firstnameB" & cr & "nameA,firstnameC" in it, you can use "split" handler to select names witout duplicates. get "nameA,firstnameB" & cr & "nameA,firstnameC" split it by return and comma get the keys of it -- in it you have "nameA" only one time regards eugen LiangTyan Fui wrote: > > Here is a function that I've written quite some time ago. It takes a list of > text theList, separated by theitemDel, remove duplicate items in the list, > and returns a new list without duplicate items. > Unfortunately, this function running rather slowly on a large list (a few > thousands records) - that is why I am posting here as a little "open > source", Request For Comment: Make it faster guys! > Remember, the sequence of the records cannot be changed, that say you are > not likely to use sort. > > Regards, > LiangTyan Fui > > # > > function stripDup theList,theitemDel > # verify param > if theList= "" then return "" > if theitemDel = "" then put cr into theitemDel > > # > set theitemDel to char 1 of theitemDel > > put number of items of theList into theListItem > put 1 into k > repeat > put item k of theList into key1 > repeat with c = theListItem down to k+1 > if key1 = item c of theList then > delete item c of theList > subtract 1 from theListItem > end if > end repeat > if k >= theListItem then exit repeat > add 1 to k > end repeat > return theList > end stripDup > > Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ > Info: http://www.xworlds.com/metacard/mailinglist.htm > Please send bug reports to <[EMAIL PROTECTED]>, not this list. -- Eugen Helbling _ GINIT Technology GmbH[EMAIL PROTECTED] Eugen Helbling www.ginit-technology.com Emmy-Noether-Str. 11phone: +49-721-96681-0 D-76131 Karlsruhe fax: +49-721-96681-111 Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
This version is about 200 times faster than the original for a list of 400 words: function stripDup theList,theitemDel set the itemDelimiter to theItemDel repeat for each item theItem in theList if itemArray[theItem] is empty then put 1 into itemArray[theItem] put theItemDel & theItem after doneList end if end repeat delete char 1 of doneList --a leading delimiter char return doneList end stripDup -- Michael J. Lew Senior Lecturer Department of Pharmacology The University of Melbourne Parkville 3010 Victoria Australia Phone +613 8344 8304 ** New email address: [EMAIL PROTECTED] ** Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
On 10/27/01 2:34 AM, Jacqueline Landman Gay wrote: > Geoff Canyon wrote: >> >> Maybe something like: >> > > Geoff's is faster than mine. I never thought to use arrays. Pretty impressive. Yup, the winner is Geoff. Thanks! Now, where should I post this script to? Do we have an organised xTalk library somewhere? Or should I use http://www.mctools.org? Regards, LiangTyan Fui Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
LiangTyan Fui wrote: > > Here is a function that I've written quite some time ago. It takes a list of > text theList, separated by theitemDel, remove duplicate items in the list, > and returns a new list without duplicate items. > Unfortunately, this function running rather slowly on a large list (a few > thousands records) - that is why I am posting here as a little "open > source", Request For Comment: Make it faster guys! > Remember, the sequence of the records cannot be changed, that say you are > not likely to use sort. On a test field of almost 8,000 lines, the original script takes 112 ticks on my Mac, compared to 8 ticks for this version: function stripDup theList,theitemDel # verify param -- put the ticks into startticks -- Enable for time tests if theList= "" then return "" if theitemDel = "" then put cr into theitemDel put char 1 of theitemDel into theitemDel set the itemDelimiter to theitemDel put "" into theList2 set wholematches to true repeat for each item i in theList if itemOffset(i, theList2) = 0 then put i & theItemDel after theList2 end repeat -- put the ticks - startticks -- Enable for time tests return theList2 end stripDup -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
Geoff Canyon wrote: > > Maybe something like: > Geoff's is faster than mine. I never thought to use arrays. Pretty impressive. -- Jacqueline Landman Gay | [EMAIL PROTECTED] HyperActive Software | http://www.hyperactivesw.com Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
Maybe something like: function stripDup theList,theitemDel # verify param if theList= "" then return "" if theitemDel = "" then put cr into theitemDel # set the itemDelimiter to char 1 of theitemDel put empty into tResultList repeat for each item tItem in theList if tItemList[tItem] is empty then -- this is much faster than: -- if tItem is not among the items of tResultList then -- associative arrays rock put 1 into tItemList[tItem] put tItem & theitemDel after tResultList end if end repeat return (char 1 to -2 of tResultList) --remove final delimiter end stripDup In my testing, this finds the unique words, in order, in an 8000 word text file in about .05 seconds. Again, associative arrays rock. The repeat for each... structure does too. gc Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
Re: RFC: Optimizing stripDup()
I don't know if this is the optimal solution, but there *is* a way you could use sort to speed things up: first number the items, then sort them by their initial value, remove dups, then sort again by their number. Finally, strip the numbers. Not the simplest, but it should be faster for large data than checking every item against every other. HTH, Brian << Here is a function that I've written quite some time ago. It takes a list of text theList, separated by theitemDel, remove duplicate items in the list, and returns a new list without duplicate items. Unfortunately, this function running rather slowly on a large list (a few thousands records) - that is why I am posting here as a little "open source", Request For Comment: Make it faster guys! Remember, the sequence of the records cannot be changed, that say you are not likely to use sort. >> Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.
RFC: Optimizing stripDup()
Here is a function that I've written quite some time ago. It takes a list of text theList, separated by theitemDel, remove duplicate items in the list, and returns a new list without duplicate items. Unfortunately, this function running rather slowly on a large list (a few thousands records) - that is why I am posting here as a little "open source", Request For Comment: Make it faster guys! Remember, the sequence of the records cannot be changed, that say you are not likely to use sort. Regards, LiangTyan Fui # function stripDup theList,theitemDel # verify param if theList= "" then return "" if theitemDel = "" then put cr into theitemDel # set theitemDel to char 1 of theitemDel put number of items of theList into theListItem put 1 into k repeat put item k of theList into key1 repeat with c = theListItem down to k+1 if key1 = item c of theList then delete item c of theList subtract 1 from theListItem end if end repeat if k >= theListItem then exit repeat add 1 to k end repeat return theList end stripDup Archives: http://www.mail-archive.com/metacard@lists.runrev.com/ Info: http://www.xworlds.com/metacard/mailinglist.htm Please send bug reports to <[EMAIL PROTECTED]>, not this list.